s///gsi; with a wildcard

Discussion in 'Perl Misc' started by Jason Carlton, Mar 9, 2010.

  1. Every once in awhile, someone will copy and paste into my message
    board from Word. After it submits through my Perl script, I'll have
    something like this plugged in:

    Normal 0 false false false EN-US X-NONE X-NONE
    MicrosoftInternetExplorer4 /* Style Definitions */
    table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
    rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
    style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
    padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
    margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
    0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
    font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
    ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
    Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
    family:Calibri; mso-hansi-theme-font:minor-latin;}

    The fonts and all that are different for each post; the only
    consistency seems to be that it starts with "Normal 0 false false
    false", and it ends with a "}".

    Would something as simple as this be enough to consistently remove it?

    $comment =~ s/Normal 0 false false false.*?}//gsi;

    Or is there more to it than I'm thinking?
     
    Jason Carlton, Mar 9, 2010
    #1
    1. Advertisements

  2. Sorry if I made that too much to read.

    Basically, I want to remove "Normal 0 false false false" followed by
    random stuff, but always ending with }.

    Will this do it correctly, or will it remove other things that I'm not
    recognizing?

    $comment =~ s/Normal 0 false false false.*?}//gsi;

    TIA,

    Jason
     
    Jason Carlton, Mar 10, 2010
    #2
    1. Advertisements

  3. So, you're saying that you don't know the answer? If so, then why
    bother replying? Or spending time in a Perl NG, for that matter.
     
    Jason Carlton, Mar 10, 2010
    #3
  4. Jason Carlton

    sln Guest

    $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;
     
    sln, Mar 10, 2010
    #4
  5. Thanks, s.
     
    Jason Carlton, Mar 10, 2010
    #5
  6. Unfortunately, neither of these are working the way I expected:

    $comment =~ s/Normal 0 false false false.*?}//gsi;
    $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

    It's catching the "Normal 0 false false false", but not everything
    else that comes after, and before the "}".

    How do I make it remove everything from "Normal 0 false false false"
    until it finds the first "}"?

    TIA,

    Jason
     
    Jason Carlton, Mar 25, 2010
    #6
  7. Jason Carlton

    sln Guest

    You can generalize it more:

    $comment =~ s/Normal \s* \d+ \s* false \s* false \s* false [^}]* \} //xig;

    But, its probably not matching, so the format is different, maybe there
    is no terminating '}' in the real text. You don't need /s if you don't have
    a '.' in the pattern, thats why [^}]* \}

    Its not a good idea to get everything between the the "Normal" to "}"
    as thats not really enough info to make a pattern.

    It looks like this:
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4
    is a space delimited set of variable settings, followed by
    a '{' block '}' delimeted set of style definitions:

    You could use alternation to flag the start the definition if you
    know the possible values (the slots look constant), so:

    $comment =~ s/ (?:Normal|<something else>) \s* \d+ \s* (?:false|true) \s* (?:false|true) \s* (?:false|true) [^}]* \} //xig;

    But, I don't know this format and it possibly can't be relied upon.
    Also, the regex has a requirement that it have a style block (or at least something
    with a '}' as the terminator.

    -sln
     
    sln, Mar 25, 2010
    #7
  8. Jason Carlton

    J. Gleixner Guest

    $comment =~ s/Normal 0 false false false[^}]*}//gsi;

    my $str = 'Start Normal 0 false false false blah blah { more blah }
    Starting second match Normal 0 false false false blah blah { more blah }
    The End';
    $str =~ s/Normal 0 false false false[^}]*}//gsi;
    print $str;

    Start Starting second match The End
     
    J. Gleixner, Mar 25, 2010
    #8
  9. J, should that first "}" be a "{"? Like:

    $str =~ s/Normal 0 false false false[^{]*}//gsi;
     
    Jason Carlton, Mar 25, 2010
    #9
  10. Jason Carlton

    J. Gleixner Guest

    Jason Carlton wrote:
    [...]remove it?
    [...]
    Before asking if it's not correct, why not try it?

    [^}]* - match everything until it sees '}'
    } - include '}' in the pattern. -- without that you'll
    have '}' in your results.

    I gave example text, and the output it generates, if that
    doesn't match what you want, then please be a little
    more verbose. Provide a -short- example of the text before,
    and what you want the text to be after doing something to it.
     
    J. Gleixner, Mar 26, 2010
    #10
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.