s///gsi; with a wildcard

Discussion in 'Perl Misc' started by Jason Carlton, Mar 9, 2010.

  1. Every once in awhile, someone will copy and paste into my message
    board from Word. After it submits through my Perl script, I'll have
    something like this plugged in:

    Normal 0 false false false EN-US X-NONE X-NONE
    MicrosoftInternetExplorer4 /* Style Definitions */
    table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
    rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
    style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
    padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
    margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
    0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
    font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
    ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
    Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
    family:Calibri; mso-hansi-theme-font:minor-latin;}

    The fonts and all that are different for each post; the only
    consistency seems to be that it starts with "Normal 0 false false
    false", and it ends with a "}".

    Would something as simple as this be enough to consistently remove it?

    $comment =~ s/Normal 0 false false false.*?}//gsi;

    Or is there more to it than I'm thinking?
    Jason Carlton, Mar 9, 2010
    #1
    1. Advertising

  2. On Mar 8, 10:03 pm, Jason Carlton <> wrote:
    > Every once in awhile, someone will copy and paste into my message
    > board from Word. After it submits through my Perl script, I'll have
    > something like this plugged in:
    >
    > Normal 0 false false false EN-US X-NONE X-NONE
    > MicrosoftInternetExplorer4 /* Style Definitions */
    > table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
    > rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
    > style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
    > padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
    > margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
    > 0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
    > font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
    > ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
    > Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
    > family:Calibri; mso-hansi-theme-font:minor-latin;}
    >
    > The fonts and all that are different for each post; the only
    > consistency seems to be that it starts with "Normal 0 false false
    > false", and it ends with a "}".
    >
    > Would something as simple as this be enough to consistently remove it?
    >
    > $comment =~ s/Normal 0 false false false.*?}//gsi;
    >
    > Or is there more to it than I'm thinking?


    Sorry if I made that too much to read.

    Basically, I want to remove "Normal 0 false false false" followed by
    random stuff, but always ending with }.

    Will this do it correctly, or will it remove other things that I'm not
    recognizing?

    $comment =~ s/Normal 0 false false false.*?}//gsi;

    TIA,

    Jason
    Jason Carlton, Mar 10, 2010
    #2
    1. Advertising

  3. On Mar 9, 8:30 pm, Tad McClellan <> wrote:
    > Jason Carlton <> wrote:
    > > Sorry if I made that too much to read.

    >
    > You've shown in the past that anything you write is too much to read.
    >
    > :-(
    >
    > --
    > Tad McClellan
    > email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
    > The above message is a Usenet post.
    > I don't recall having given anyone permission to use it on a Web site.


    So, you're saying that you don't know the answer? If so, then why
    bother replying? Or spending time in a Perl NG, for that matter.
    Jason Carlton, Mar 10, 2010
    #3
  4. Jason Carlton

    Guest

    On Mon, 8 Mar 2010 19:03:03 -0800 (PST), Jason Carlton <> wrote:

    >Every once in awhile, someone will copy and paste into my message
    >board from Word. After it submits through my Perl script, I'll have
    >something like this plugged in:
    >
    >Normal 0 false false false EN-US X-NONE X-NONE
    >MicrosoftInternetExplorer4 /* Style Definitions */
    >table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
    >rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
    >style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
    >padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
    >margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
    >0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
    >font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
    >ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
    >Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
    >family:Calibri; mso-hansi-theme-font:minor-latin;}
    >
    >The fonts and all that are different for each post; the only
    >consistency seems to be that it starts with "Normal 0 false false
    >false", and it ends with a "}".
    >
    >Would something as simple as this be enough to consistently remove it?
    >
    >$comment =~ s/Normal 0 false false false.*?}//gsi;
    >
    >Or is there more to it than I'm thinking?


    $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;
    , Mar 10, 2010
    #4
  5. On Mar 9, 9:21 pm, wrote:
    > On Mon, 8 Mar 2010 19:03:03 -0800 (PST), Jason Carlton <> wrote:
    > >Every once in awhile, someone will copy and paste into my message
    > >board from Word. After it submits through my Perl script, I'll have
    > >something like this plugged in:

    >
    > >Normal 0 false false false EN-US X-NONE X-NONE
    > >MicrosoftInternetExplorer4 /* Style Definitions */
    > >table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
    > >rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
    > >style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
    > >padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
    > >margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
    > >0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
    > >font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
    > >ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
    > >Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
    > >family:Calibri; mso-hansi-theme-font:minor-latin;}

    >
    > >The fonts and all that are different for each post; the only
    > >consistency seems to be that it starts with "Normal 0 false false
    > >false", and it ends with a "}".

    >
    > >Would something as simple as this be enough to consistently remove it?

    >
    > >$comment =~ s/Normal 0 false false false.*?}//gsi;

    >
    > >Or is there more to it than I'm thinking?

    >
    > $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;


    Thanks, s.
    Jason Carlton, Mar 10, 2010
    #5
  6. On Mar 9, 11:49 pm, Jason Carlton <> wrote:
    > On Mar 9, 9:21 pm, wrote:
    >
    >
    >
    >
    >
    > > On Mon, 8 Mar 2010 19:03:03 -0800 (PST),JasonCarlton<> wrote:
    > > >Every once in awhile, someone will copy and paste into my message
    > > >board from Word. After it submits through my Perl script, I'll have
    > > >something like this plugged in:

    >
    > > >Normal 0 false false false EN-US X-NONE X-NONE
    > > >MicrosoftInternetExplorer4 /* Style Definitions */
    > > >table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
    > > >rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
    > > >style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
    > > >padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
    > > >margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
    > > >0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
    > > >font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
    > > >ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
    > > >Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
    > > >family:Calibri; mso-hansi-theme-font:minor-latin;}

    >
    > > >The fonts and all that are different for each post; the only
    > > >consistency seems to be that it starts with "Normal 0 false false
    > > >false", and it ends with a "}".

    >
    > > >Would something as simple as this be enough to consistently remove it?

    >
    > > >$comment =~ s/Normal 0 false false false.*?}//gsi;

    >
    > > >Or is there more to it than I'm thinking?

    >
    > > $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

    >
    > Thanks, s.


    Unfortunately, neither of these are working the way I expected:

    $comment =~ s/Normal 0 false false false.*?}//gsi;
    $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

    It's catching the "Normal 0 false false false", but not everything
    else that comes after, and before the "}".

    How do I make it remove everything from "Normal 0 false false false"
    until it finds the first "}"?

    TIA,

    Jason
    Jason Carlton, Mar 25, 2010
    #6
  7. Jason Carlton

    Guest

    On Thu, 25 Mar 2010 10:41:09 -0700 (PDT), Jason Carlton <> wrote:

    >On Mar 9, 11:49 pm, Jason Carlton <> wrote:
    >> On Mar 9, 9:21 pm, wrote:
    >>
    >>
    >>
    >>
    >>
    >> > On Mon, 8 Mar 2010 19:03:03 -0800 (PST),JasonCarlton<> wrote:
    >> > >Every once in awhile, someone will copy and paste into my message
    >> > >board from Word. After it submits through my Perl script, I'll have
    >> > >something like this plugged in:

    >>
    >> > >Normal 0 false false false EN-US X-NONE X-NONE
    >> > >MicrosoftInternetExplorer4 /* Style Definitions */
    >> > >table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
    >> > >rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
    >> > >style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
    >> > >padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
    >> > >margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
    >> > >0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
    >> > >font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
    >> > >ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
    >> > >Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
    >> > >family:Calibri; mso-hansi-theme-font:minor-latin;}

    >>
    >> > >The fonts and all that are different for each post; the only
    >> > >consistency seems to be that it starts with "Normal 0 false false
    >> > >false", and it ends with a "}".

    >>
    >> > >Would something as simple as this be enough to consistently remove it?

    >>
    >> > >$comment =~ s/Normal 0 false false false.*?}//gsi;

    >>
    >> > >Or is there more to it than I'm thinking?

    >>
    >> > $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

    >>
    >> Thanks, s.

    >
    >Unfortunately, neither of these are working the way I expected:
    >
    >$comment =~ s/Normal 0 false false false.*?}//gsi;
    >$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;
    >
    >It's catching the "Normal 0 false false false", but not everything
    >else that comes after, and before the "}".
    >
    >How do I make it remove everything from "Normal 0 false false false"
    >until it finds the first "}"?
    >
    >TIA,
    >
    >Jason


    You can generalize it more:

    $comment =~ s/Normal \s* \d+ \s* false \s* false \s* false [^}]* \} //xig;

    But, its probably not matching, so the format is different, maybe there
    is no terminating '}' in the real text. You don't need /s if you don't have
    a '.' in the pattern, thats why [^}]* \}

    Its not a good idea to get everything between the the "Normal" to "}"
    as thats not really enough info to make a pattern.

    It looks like this:
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4
    is a space delimited set of variable settings, followed by
    a '{' block '}' delimeted set of style definitions:

    You could use alternation to flag the start the definition if you
    know the possible values (the slots look constant), so:

    $comment =~ s/ (?:Normal|<something else>) \s* \d+ \s* (?:false|true) \s* (?:false|true) \s* (?:false|true) [^}]* \} //xig;

    But, I don't know this format and it possibly can't be relied upon.
    Also, the regex has a requirement that it have a style block (or at least something
    with a '}' as the terminator.

    -sln
    , Mar 25, 2010
    #7
  8. Jason Carlton

    J. Gleixner Guest

    Jason Carlton wrote:
    > On Mar 9, 11:49 pm, Jason Carlton <> wrote:
    >> On Mar 9, 9:21 pm, wrote:
    >>
    >>
    >>
    >>
    >>
    >>> On Mon, 8 Mar 2010 19:03:03 -0800 (PST),JasonCarlton<> wrote:
    >>>> Every once in awhile, someone will copy and paste into my message
    >>>> board from Word. After it submits through my Perl script, I'll have
    >>>> something like this plugged in:
    >>>> Normal 0 false false false EN-US X-NONE X-NONE
    >>>> MicrosoftInternetExplorer4 /* Style Definitions */
    >>>> table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
    >>>> rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
    >>>> style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
    >>>> padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
    >>>> margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
    >>>> 0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
    >>>> font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
    >>>> ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
    >>>> Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
    >>>> family:Calibri; mso-hansi-theme-font:minor-latin;}
    >>>> The fonts and all that are different for each post; the only
    >>>> consistency seems to be that it starts with "Normal 0 false false
    >>>> false", and it ends with a "}".
    >>>> Would something as simple as this be enough to consistently remove it?
    >>>> $comment =~ s/Normal 0 false false false.*?}//gsi;
    >>>> Or is there more to it than I'm thinking?
    >>> $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

    >> Thanks, s.

    >
    > Unfortunately, neither of these are working the way I expected:
    >
    > $comment =~ s/Normal 0 false false false.*?}//gsi;
    > $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;
    >
    > It's catching the "Normal 0 false false false", but not everything
    > else that comes after, and before the "}".
    >
    > How do I make it remove everything from "Normal 0 false false false"
    > until it finds the first "}"?


    $comment =~ s/Normal 0 false false false[^}]*}//gsi;

    my $str = 'Start Normal 0 false false false blah blah { more blah }
    Starting second match Normal 0 false false false blah blah { more blah }
    The End';
    $str =~ s/Normal 0 false false false[^}]*}//gsi;
    print $str;

    Start Starting second match The End
    J. Gleixner, Mar 25, 2010
    #8
  9. On Mar 25, 5:45 pm, "J. Gleixner" <>
    wrote:
    > JasonCarltonwrote:
    > > On Mar 9, 11:49 pm,JasonCarlton<> wrote:
    > >> On Mar 9, 9:21 pm, wrote:

    >
    > >>> On Mon, 8 Mar 2010 19:03:03 -0800 (PST),JasonCarlton<> wrote:
    > >>>> Every once in awhile, someone will copy and paste into my message
    > >>>> board from Word. After it submits through my Perl script, I'll have
    > >>>> something like this plugged in:
    > >>>> Normal 0 false false false EN-US X-NONE X-NONE
    > >>>> MicrosoftInternetExplorer4 /* Style Definitions */
    > >>>> table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
    > >>>> rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
    > >>>> style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
    > >>>> padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
    > >>>> margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
    > >>>> 0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
    > >>>> font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
    > >>>> ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
    > >>>> Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
    > >>>> family:Calibri; mso-hansi-theme-font:minor-latin;}
    > >>>> The fonts and all that are different for each post; the only
    > >>>> consistency seems to be that it starts with "Normal 0 false false
    > >>>> false", and it ends with a "}".
    > >>>> Would something as simple as this be enough to consistently remove it?
    > >>>> $comment =~ s/Normal 0 false false false.*?}//gsi;
    > >>>> Or is there more to it than I'm thinking?
    > >>> $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;
    > >> Thanks, s.

    >
    > > Unfortunately, neither of these are working the way I expected:

    >
    > > $comment =~ s/Normal 0 false false false.*?}//gsi;
    > > $comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

    >
    > > It's catching the "Normal 0 false false false", but not everything
    > > else that comes after, and before the "}".

    >
    > > How do I make it remove everything from "Normal 0 false false false"
    > > until it finds the first "}"?

    >
    > $comment =~ s/Normal 0 false false false[^}]*}//gsi;
    >
    > my $str = 'Start Normal 0 false false false blah blah { more blah }
    > Starting second match Normal 0 false false false blah blah { more blah }
    > The End';
    > $str =~ s/Normal 0 false false false[^}]*}//gsi;
    > print $str;
    >
    > Start  Starting second match  The End


    J, should that first "}" be a "{"? Like:

    $str =~ s/Normal 0 false false false[^{]*}//gsi;
    Jason Carlton, Mar 25, 2010
    #9
  10. Jason Carlton

    J. Gleixner Guest

    Jason Carlton wrote:
    [...]
    >>>>>> The fonts and all that are different for each post; the only
    >>>>>> consistency seems to be that it starts with "Normal 0 false false
    >>>>>> false", and it ends with a "}".
    >>>>>> Would something as simple as this be enough to consistently

    remove it?
    [...]
    > J, should that first "}" be a "{"? Like:
    > $str =~ s/Normal 0 false false false[^{]*}//gsi;


    Before asking if it's not correct, why not try it?

    [^}]* - match everything until it sees '}'
    } - include '}' in the pattern. -- without that you'll
    have '}' in your results.

    I gave example text, and the output it generates, if that
    doesn't match what you want, then please be a little
    more verbose. Provide a -short- example of the text before,
    and what you want the text to be after doing something to it.
    J. Gleixner, Mar 26, 2010
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Naveen M

    using FTP Wildcard in asp.net code

    Naveen M, Jul 21, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    421
    Christopher Frazier
    Jul 23, 2003
  2. JemPower

    Wildcard search in dataview

    JemPower, Oct 24, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    3,569
    Jim Nugent
    Nov 9, 2003
  3. =?Utf-8?B?QWRhbQ==?=

    wildcard mappings/httpmodule/asp.net 2.0/iis6

    =?Utf-8?B?QWRhbQ==?=, May 19, 2005, in forum: ASP .Net
    Replies:
    3
    Views:
    629
    Steven Cheng[MSFT]
    May 23, 2005
  4. Chris
    Replies:
    0
    Views:
    358
    Chris
    Jan 8, 2004
  5. Replies:
    7
    Views:
    809
Loading...

Share This Page