R
robic0
I don't see a solution to this problem that
regular expressions can't exclude a string when
processing. It can exclude individual characters
fine. I started doing Perl 2 years ago and have
run into this nagging problem several times.
After extensive read on the Perl docs on re's
(especially in the last 2 days) I have come to the
conclusion that regular expressions have a serious
deficiency. This is serious because the not string
is a fundimental basic logic idea in a search from
a touted master search engine or should be.
To a degree it works with a known subset, but it
won't work to the degree shown below. This is a
serious flaw in regualar expressions!
I hope you masters can prove me wrong! I really do.
If not I would hope that the Perl authors can provide
some insight on when this construct can be fixed,
aka implemented.
Beat this code if you can (you can't). Don't look
at the code in this example, look instead at the
output.
Don't comment on any code syntax because thats not
welcome or the point.
Instead, refer you comments to the output ID's.
If you know of a way Perl regex can do this
please reply. I'm almost %99 sure Perl regex
can't do this. In fact the %1 is thrown out here
to either verify that or prove otherwise.
Thanks for your help...
print <<EOM;
\n# Serious Regular Expression deficiency,
# "not string", shown by XML comments..
# ----------------------------------------
EOM
use strict;
use warnings;
my $gabage1 = '
<big name="asdf" date="33" >
asdf
<!-- howdy folks -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
';
my $gabage2 = '
<big name="asdf" date="33" >
asdf
<!-- howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
';
my @sarrys = ($gabage1, $gabage2);
my $cnt = 1;
foreach my $xml (@sarrys) {
print "\n\n","/"x40,"\nXML $cnt:\n$xml\n";
# -------------
$_ = $xml;
print "="x40,
"\n** regex: s/<!--(.*)-->//s\n",
"-"x40,"\n";
print "id: $cnt","1\n";
while (s/<!--(.*)-->//s) { print "$1\n"; }
# -------------
$_ = $xml;
print "\n","="x40,
"\n** regex: s/<!--([^<>]*)-->//s\n",
"-"x40,"\n";
print "id: $cnt","2\n";
while (s/<!--([^<>]*)-->//s) { print "$1\n"; }
# -------------
$_ = $xml;
print "\n","="x40,
"\n** regex: s/<!--([\\w\\s]*)(?!<!--)-->//s\n",
"-"x40,"\n";
print "id: $cnt","3\n";
while (s/<!--([\w\s]*)(?!<!--)-->//s) { print "$1\n"; }
# -------------
$_ = $xml;
print "\n","="x40,
"\n** regex: s/<!--(.*)(?!<!--)-->//s\n",
"-"x40,"\n";
print "id: $cnt","4\n";
while (s/<!--(.*)(?!<!--)-->//s) { print "$1\n"; }
$cnt++;
}
__END__
C:\Drvs14\PerlMiscTest\Eraser\ESP\XMLP>perl test.pl
# Serious Regular Expression deficiency,
# "not string", shown by XML comments..
# ----------------------------------------
////////////////////////////////////////
XML 1:
<big name="asdf" date="33" >
asdf
<!-- howdy folks -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
========================================
** regex: s/<!--(.*)-->//s
----------------------------------------
id: 11
howdy folks -->
<in2>jjjj</in2>
<!-- and still more
========================================
** regex: s/<!--([^<>]*)-->//s
----------------------------------------
id: 12
howdy folks
and still more
========================================
** regex: s/<!--([\w\s]*)(?!<!--)-->//s
----------------------------------------
id: 13
howdy folks
and still more
========================================
** regex: s/<!--(.*)(?!<!--)-->//s
----------------------------------------
id: 14
howdy folks -->
<in2>jjjj</in2>
<!-- and still more
////////////////////////////////////////
XML 2:
<big name="asdf" date="33" >
asdf
<!-- howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
========================================
** regex: s/<!--(.*)-->//s
----------------------------------------
id: 21
howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more
========================================
** regex: s/<!--([^<>]*)-->//s
----------------------------------------
id: 22
and still more
========================================
** regex: s/<!--([\w\s]*)(?!<!--)-->//s
----------------------------------------
id: 23
and still more
========================================
** regex: s/<!--(.*)(?!<!--)-->//s
regular expressions can't exclude a string when
processing. It can exclude individual characters
fine. I started doing Perl 2 years ago and have
run into this nagging problem several times.
After extensive read on the Perl docs on re's
(especially in the last 2 days) I have come to the
conclusion that regular expressions have a serious
deficiency. This is serious because the not string
is a fundimental basic logic idea in a search from
a touted master search engine or should be.
To a degree it works with a known subset, but it
won't work to the degree shown below. This is a
serious flaw in regualar expressions!
I hope you masters can prove me wrong! I really do.
If not I would hope that the Perl authors can provide
some insight on when this construct can be fixed,
aka implemented.
Beat this code if you can (you can't). Don't look
at the code in this example, look instead at the
output.
Don't comment on any code syntax because thats not
welcome or the point.
Instead, refer you comments to the output ID's.
If you know of a way Perl regex can do this
please reply. I'm almost %99 sure Perl regex
can't do this. In fact the %1 is thrown out here
to either verify that or prove otherwise.
Thanks for your help...
print <<EOM;
\n# Serious Regular Expression deficiency,
# "not string", shown by XML comments..
# ----------------------------------------
EOM
use strict;
use warnings;
my $gabage1 = '
<big name="asdf" date="33" >
asdf
<!-- howdy folks -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
';
my $gabage2 = '
<big name="asdf" date="33" >
asdf
<!-- howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
';
my @sarrys = ($gabage1, $gabage2);
my $cnt = 1;
foreach my $xml (@sarrys) {
print "\n\n","/"x40,"\nXML $cnt:\n$xml\n";
# -------------
$_ = $xml;
print "="x40,
"\n** regex: s/<!--(.*)-->//s\n",
"-"x40,"\n";
print "id: $cnt","1\n";
while (s/<!--(.*)-->//s) { print "$1\n"; }
# -------------
$_ = $xml;
print "\n","="x40,
"\n** regex: s/<!--([^<>]*)-->//s\n",
"-"x40,"\n";
print "id: $cnt","2\n";
while (s/<!--([^<>]*)-->//s) { print "$1\n"; }
# -------------
$_ = $xml;
print "\n","="x40,
"\n** regex: s/<!--([\\w\\s]*)(?!<!--)-->//s\n",
"-"x40,"\n";
print "id: $cnt","3\n";
while (s/<!--([\w\s]*)(?!<!--)-->//s) { print "$1\n"; }
# -------------
$_ = $xml;
print "\n","="x40,
"\n** regex: s/<!--(.*)(?!<!--)-->//s\n",
"-"x40,"\n";
print "id: $cnt","4\n";
while (s/<!--(.*)(?!<!--)-->//s) { print "$1\n"; }
$cnt++;
}
__END__
C:\Drvs14\PerlMiscTest\Eraser\ESP\XMLP>perl test.pl
# Serious Regular Expression deficiency,
# "not string", shown by XML comments..
# ----------------------------------------
////////////////////////////////////////
XML 1:
<big name="asdf" date="33" >
asdf
<!-- howdy folks -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
========================================
** regex: s/<!--(.*)-->//s
----------------------------------------
id: 11
howdy folks -->
<in2>jjjj</in2>
<!-- and still more
========================================
** regex: s/<!--([^<>]*)-->//s
----------------------------------------
id: 12
howdy folks
and still more
========================================
** regex: s/<!--([\w\s]*)(?!<!--)-->//s
----------------------------------------
id: 13
howdy folks
and still more
========================================
** regex: s/<!--(.*)(?!<!--)-->//s
----------------------------------------
id: 14
howdy folks -->
<in2>jjjj</in2>
<!-- and still more
////////////////////////////////////////
XML 2:
<big name="asdf" date="33" >
asdf
<!-- howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more -->
asdfb
</big>
========================================
** regex: s/<!--(.*)-->//s
----------------------------------------
id: 21
howdy folks %SYSTEM is down <who cares?> -->
<in2>jjjj</in2>
<!-- and still more
========================================
** regex: s/<!--([^<>]*)-->//s
----------------------------------------
id: 22
and still more
========================================
** regex: s/<!--([\w\s]*)(?!<!--)-->//s
----------------------------------------
id: 23
and still more
========================================
** regex: s/<!--(.*)(?!<!--)-->//s