Regular Expressions: "Negated Strings" instead of "Negated Character Classes"

L

lmeurs

Dear all,

I'm sure the subject sounds more complicated than the actual matter.
Let me explain my problem. With Perl regular expressions one can
define a character and negated character classes.

s/[abc]//g The letters a, b and c will be removed from a
string
s/[^abc]//g Now all the letters *but* a, b and c will be
removed from a string

One can also do the first with strings, like this:

s/(one|two|three)//g Substrings 'one', 'two' and 'three' will be
removed from a string

But how can I turn this around, just like I did with character
classes? What I'm looking for would look something like this:

s/(^one|two|three)//g or s/!(one|two|three)//g

Why? I am trying to get rid of all HTML-tags *but* break-, paragraph-
and divider-tags.

s/<\/?(br|p|div)( .+?)?>//ig This would remove the break-,
paragraph- and divider-tags from a string

How can I invert this regular expression? Any help would be really
appreciated!

Thanks a lot in advance,

Laurens Meurs
Rotterdam, the Netherlands
 
G

Gunnar Hjalmarsson

With Perl regular expressions one can
define a character and negated character classes.

s/[abc]//g The letters a, b and c will be removed from a
string
s/[^abc]//g Now all the letters *but* a, b and c will be
removed from a string

One can also do the first with strings, like this:

s/(one|two|three)//g Substrings 'one', 'two' and 'three' will be
removed from a string

But how can I turn this around, just like I did with character
classes? What I'm looking for would look something like this:

s/(^one|two|three)//g or s/!(one|two|three)//g

This is one approach:

s{(\b\w+\b)}{
my $match = $1;
$match =~ /^(?:eek:ne|two|three)$/ ? $match : '';
}eg;
 
L

lmeurs

Dear Gunnar,

Thanks for the quick reply, and it worked!

I was looking so hard for a Perl way to do the trick, so that
unfortunately I couldn't think of a brilliant workaround like this
myself...

But still being curious: is this the easiest way? Doesn't Perls
Regular Expression engine provide something like I suggested?

Thanks again, a lot!

Laurens
 
L

lmeurs

And to be complete, the eventual solution to get rid of all HTML-tags,
except for for example BR and P:

my $t = " a <br /> b <p style='border: 1px red solid; '> c </p> d <hr>
e <br> f <hr /> g";
$t =~ s#(</?(\w+)(?: .+?)?>)#
my $t1 = $1;
my $t2 = $2;
$t2 =~ /^(?: br|p)$/i ? $t1:"";
#eg;

results in both <hr> and <hr /> tags are removed from the original
string, the new value is:

"a <br /> b <p style='border: 1px red solid; '> c </p> d e <br> f
g";

Gr!
 
U

Uri Guttman

GH> This is one approach:

GH> s{(\b\w+\b)}{
GH> my $match = $1;
GH> $match =~ /^(?:eek:ne|two|three)$/ ? $match : '';
GH> }eg;

or use a hash inside for better speed (untested):

my %ignore_tags = map { $_ => 1 } qw( one two three ) ;

s{(\b\w+\b)}{ $ignore_tags{$1} ? $1 : '' }eg;

adding in the <> stuff is left as an exercise to the reader. for that
reason alone, a parser should be used. most html parser modules are easy
hack so they will filter out tags and rebuild the html text later.

uri
 
U

Uri Guttman

l> And to be complete, the eventual solution to get rid of all HTML-tags,
l> except for for example BR and P:

and to be really complete that will fail in many ways. html can only be
fully parsed by a module and not by regexes. in some cases where you
know or control the html you can mung it with regexes.

uri
 
B

Brian McCauley

But still being curious: is this the easiest way? Doesn't Perls
Regular Expression engine provide something like I suggested?

Yes it does, negative lookahead.

To remove any word but 'one' 'two' or 'three'...

s/\b(?!one|two|three)\w+//g;

Note you have to say \b to constrain it to finding whole words -
otherwise it would be perfectly within it's rights to remove the
'hree' from 'three'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top