Regular Expressions: "Negated Strings" instead of "Negated Character Classes"

Discussion in 'Perl Misc' started by lmeurs@gmail.com, Jun 7, 2007.

  1. Guest

    Dear all,

    I'm sure the subject sounds more complicated than the actual matter.
    Let me explain my problem. With Perl regular expressions one can
    define a character and negated character classes.

    s/[abc]//g The letters a, b and c will be removed from a
    string
    s/[^abc]//g Now all the letters *but* a, b and c will be
    removed from a string

    One can also do the first with strings, like this:

    s/(one|two|three)//g Substrings 'one', 'two' and 'three' will be
    removed from a string

    But how can I turn this around, just like I did with character
    classes? What I'm looking for would look something like this:

    s/(^one|two|three)//g or s/!(one|two|three)//g

    Why? I am trying to get rid of all HTML-tags *but* break-, paragraph-
    and divider-tags.

    s/<\/?(br|p|div)( .+?)?>//ig This would remove the break-,
    paragraph- and divider-tags from a string

    How can I invert this regular expression? Any help would be really
    appreciated!

    Thanks a lot in advance,

    Laurens Meurs
    Rotterdam, the Netherlands
    , Jun 7, 2007
    #1
    1. Advertising

  2. Re: Regular Expressions: "Negated Strings" instead of "Negated CharacterClasses"

    wrote:
    > With Perl regular expressions one can
    > define a character and negated character classes.
    >
    > s/[abc]//g The letters a, b and c will be removed from a
    > string
    > s/[^abc]//g Now all the letters *but* a, b and c will be
    > removed from a string
    >
    > One can also do the first with strings, like this:
    >
    > s/(one|two|three)//g Substrings 'one', 'two' and 'three' will be
    > removed from a string
    >
    > But how can I turn this around, just like I did with character
    > classes? What I'm looking for would look something like this:
    >
    > s/(^one|two|three)//g or s/!(one|two|three)//g


    This is one approach:

    s{(\b\w+\b)}{
    my $match = $1;
    $match =~ /^(?:eek:ne|two|three)$/ ? $match : '';
    }eg;

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Jun 7, 2007
    #2
    1. Advertising

  3. Guest

    Dear Gunnar,

    Thanks for the quick reply, and it worked!

    I was looking so hard for a Perl way to do the trick, so that
    unfortunately I couldn't think of a brilliant workaround like this
    myself...

    But still being curious: is this the easiest way? Doesn't Perls
    Regular Expression engine provide something like I suggested?

    Thanks again, a lot!

    Laurens
    , Jun 7, 2007
    #3
  4. Guest

    And to be complete, the eventual solution to get rid of all HTML-tags,
    except for for example BR and P:

    my $t = " a <br /> b <p style='border: 1px red solid; '> c </p> d <hr>
    e <br> f <hr /> g";
    $t =~ s#(</?(\w+)(?: .+?)?>)#
    my $t1 = $1;
    my $t2 = $2;
    $t2 =~ /^(?: br|p)$/i ? $t1:"";
    #eg;

    results in both <hr> and <hr /> tags are removed from the original
    string, the new value is:

    "a <br /> b <p style='border: 1px red solid; '> c </p> d e <br> f
    g";

    Gr!
    , Jun 7, 2007
    #4
  5. Uri Guttman Guest

    Re: Regular Expressions: "Negated Strings" instead of "NegatedCharacter Classes"

    >>>>> "GH" == Gunnar Hjalmarsson <> writes:


    GH> This is one approach:

    GH> s{(\b\w+\b)}{
    GH> my $match = $1;
    GH> $match =~ /^(?:eek:ne|two|three)$/ ? $match : '';
    GH> }eg;

    or use a hash inside for better speed (untested):

    my %ignore_tags = map { $_ => 1 } qw( one two three ) ;

    s{(\b\w+\b)}{ $ignore_tags{$1} ? $1 : '' }eg;

    adding in the <> stuff is left as an exercise to the reader. for that
    reason alone, a parser should be used. most html parser modules are easy
    hack so they will filter out tags and rebuild the html text later.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Jun 7, 2007
    #5
  6. Uri Guttman Guest

    Re: Regular Expressions: "Negated Strings" instead of "NegatedCharacter Classes"

    >>>>> "l" == lmeurs <> writes:

    l> And to be complete, the eventual solution to get rid of all HTML-tags,
    l> except for for example BR and P:

    and to be really complete that will fail in many ways. html can only be
    fully parsed by a module and not by regexes. in some cases where you
    know or control the html you can mung it with regexes.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Jun 8, 2007
    #6
  7. On Jun 7, 10:58 pm, wrote:

    > But still being curious: is this the easiest way? Doesn't Perls
    > Regular Expression engine provide something like I suggested?


    Yes it does, negative lookahead.

    To remove any word but 'one' 'two' or 'three'...

    s/\b(?!one|two|three)\w+//g;

    Note you have to say \b to constrain it to finding whole words -
    otherwise it would be perfectly within it's rights to remove the
    'hree' from 'three'.
    Brian McCauley, Jun 8, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jay Douglas

    Custom Regular Expressions in ASP.net

    Jay Douglas, Nov 2, 2003, in forum: ASP .Net
    Replies:
    3
    Views:
    605
    mikeb
    Nov 3, 2003
  2. mark

    Regular expressions

    mark, Jun 30, 2003, in forum: Perl
    Replies:
    4
    Views:
    1,718
  3. Dustin D.
    Replies:
    1
    Views:
    11,178
  4. Jay Douglas
    Replies:
    0
    Views:
    598
    Jay Douglas
    Aug 15, 2003
  5. Noman Shapiro
    Replies:
    0
    Views:
    232
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page