I am working on a SQL parser. I have a routine that recursively removes
enclosing parentheses and it works fine. Below is the regex that I use.
However, I want to use the same routine, but instead of looking for
enclosing parens, I want to look for a string enclosed by CASE and END. Can
someone help me translate the regex below so that it will match a CASE/END
construct?
Thanks very much.
Parens
----------
(?:\s+)?\([^\(\)]*\)
This is what I've managed so far with the CASE/END
(?:\s+)?case(?!case|end)\s+end
I've revisited this, became intrigued with zero-assertion width
extented regexp constructs. These constructs don't get enough air-time
here. Since you appear to be leaning in that direction, I thought I would
flesh out a look ahead regexp for your example, perhaps to try to glean insight on
the regexp engine, not really sure. Its very facinating for me. I'm not a big book
reader since I am dislexic, so I try to discover things on my own.
The below would seem to tackle your problem from the perspective of a file slurped
into a variable which is processed. All relavent delimeters are taken into acccount,
my other penchant is for parsing. It is possible to buffer line by line file info
until we just have enough to parse. I didn't do it of course but it is fairly easy.
This would aviod sucking up huge amounts of memory, and is fairly trivial once the
master regexp is known.
I've learned some stuff about the regexp engine's extended operations. I won't go into it.
I decided to include the progression of guesses that went into settling on its final form.
Obviously this form does take into account several delimiting factors as well as look-ahead.
Its not fully tested of course, but it passes my initial alpha form that could be
presented to testers.
As it is now, CASE/END are the targets, however, any can be substituted.
Should you like to employ me for extended projects, set up a contact arangement.
Note the code is at the bottom, the output is at the top, in true dyslexic fashion.
Particularly note in the output, how inner to outter matching goes. This is key.
sln
__OUTPUT__
c:\temp>perl misc9.pl
<<<<<<<<<<< Phase1 >>>>>>>>>>>
$1= --------
' case'
$txt= --------
'
case
1 case end
2 case case end end
fricases can erupt even among friends
end'
<<<<<<<<<<< Phase2 >>>>>>>>>>>
$1= --------
''
$txt= --------
'
case
1
2 case case end end
fricases can erupt even among friends
end'
<<<<<<<<<<< Phase3 >>>>>>>>>>>
$1= --------
''
$txt= --------
'
case
1
2 case end
fricases can erupt even among friends
end'
<<<<<<<<<<< Phase4 >>>>>>>>>>>
$1= --------
''
$txt= --------
'
case
1
2
fricases can erupt even among friends
end'
<<<<<<<<<<< Phase5 >>>>>>>>>>>
$1= --------
'
1
2
fricases can erupt even among friends'
$txt= --------
'
1
2
fricases can erupt even among friends'
************************
FINAL:
'
1
2
fricases can erupt even among friends'
c:\temp>
__CODE__
use strict;
use warnings;
my $txt = join '', <DATA>;
{
# while ($txt =~ s/(?:\s+|^)case(?=\s)(.*)(?!case)(?<=\s)end(?:\s+|$)/$1/is) {} <- sick
# while ($txt =~ s/(?:\s+)case(?=\s)(.*)(?!case)(?<=\s)end(?:\s+)/$1/is) { print "--------\n'$1'\n"} <- disgusting
# while ($txt =~ s/(?:\s+)case(?=\s)(.(?!case)*?)(?<=\s)end(?:\s+)/$1/is) { print "--------\n'$1'\n"} <- putrid
# while ($txt =~ s/(?:\s+)case(?=\s)((?<!case).*?)(?<=\s)end(?:\s+)/$1/is) { print "--------\n'$1'\n"} <- DOA
# while ($txt =~ s/\s+case\s+(.*(?!case))\s+end\s+/ $1 /is) <- what's this?
# while ($txt =~ s/\s+case\s+((.(?!case))*?)end\s+/ $1 /is) <- almost
# while ($txt =~ s/\s+case\s+((.(?!\scase\s))*?)\s+end\s+/ $1 /is) <- better
# while ($txt =~ s/\s+case((.(?!\scase\s))*?)\s+end\s+/ $1 /is) <- more better
# while ($txt =~ s/\s+case((.(?!\scase\s))*?)\s+end\s+/ $1/is) <- hmmm
# while ($txt =~ s/\s+case((.(?!\scase\s))*?)\s+end(\s+)/ $1 /is) <- confused
# while ($txt =~ s/\s+case((?:.(?!\scase\s))*?)\s+end(\s+)/$1$2/is) <- approaching excellence
# while ($txt =~ s/\s+case((?:.(?!\scase\s))*?)\s+end(\s+)/$1$2/is) <- excellence
# while ($txt =~ s/(?:\s+|^)case((?:.(?!\scase\s))*?)\s+end(\s+|$)/$1$2/is) <- PRIMO !!!!
my $cntr = 1;
while ($txt =~ s/(?:\s+|^)case((?:.(?!\scase\s))*?)\s+end(\s+|$)/$1$2/is) # <- Production Regex, Ship to QA
{
print "\n<<<<<<<<<<< Phase".$cntr++." >>>>>>>>>>>\n";
print "\$1= --------\n'$1'\n";
print "\$txt= --------\n'$txt'\n";
}
print "\n\n************************\n FINAL:\n'$txt'\n";
}
__DATA__
case
1 case case end end
2 case case end end
fricases can erupt even among friends
end