W
weston
Are there any known issues with Perl under Cygwin? I've just
encountered a circumstance where a script (in particular, some regular
expressions within it) is behaving oddly -- but only under Cygwin.
The script essentially reads an entire HTML file into a string, and
then attempts to bring any <span> tags that are broken across multiple
lines onto a same line, using this regular expression:
s/(<span.*)\n+([^>]*>)/$1\ $2/mig;
This appears to work under Active State Perl 5.8.4 and Perl 5.8.6 under
OpenBSD. When I operate on this text:
<span lang=JA style='font-family:
"MS Mincho"'>(</span>
I get this result:
<span lang=JA style='font-family: "MS
Mincho"'>(</span>
Under Cygwin (Perl 5.8.5), however, I get this result:
"MS Mincho"'>(</span>
ie, it appears to be throwing away the first backreference.
So, I decided to see what the backreferences looked like, with the
following code:
print "\n 0: [$0]";
print "\n 1: [$1]";
print "\n 2: [$2]";
print "\n 3: [$3]";
print "\n ";
print "\n ".$1.$2;
print "\n";
print "\n".$_;
print "\nEnd";
Under Cygwin, this yields some interesting output:
0: [./stripWord.pl]
]1: [<span lang=JA style='font-family:
2: ["MS Mincho"'>]
3: []
"MS Mincho"'>ont-family:
"MS Mincho"'>(</span>
End
Note that I have not made a mistake in placing the ] for backreference
1 at the beginning of the line -- that's how the output was given.
Note also that the concatenation of $1 and $2 is not correct.
Under the other platforms, the output is as expected:
0: [D:\HTMLCleaners\stripWord.pl]
1: [<span lang=JA style='font-family:]
2: ["MS Mincho"'>]
3: []
<span lang=JA style='font-family:"MS Mincho"'>
<span lang=JA style='font-family: "MS
Mincho"'>(</span>
End
Has anyone ever seen anything like this?
I'm not sure how to get cygwin version information, but from the
prompt, 'uname -a' yields:
CYGWIN_NT-5.1 Hermes 1.5.12(0.116/4/2) 2004-11-10 08:34 i686 unknown
unknown Cygwin
And the entire script I've got is available at:
http://weston.canncentral.org/misc/procword/regexProblem.txt
And the sample html is also available at:
http://weston.canncentral.org/misc/procword/example.txt
encountered a circumstance where a script (in particular, some regular
expressions within it) is behaving oddly -- but only under Cygwin.
The script essentially reads an entire HTML file into a string, and
then attempts to bring any <span> tags that are broken across multiple
lines onto a same line, using this regular expression:
s/(<span.*)\n+([^>]*>)/$1\ $2/mig;
This appears to work under Active State Perl 5.8.4 and Perl 5.8.6 under
OpenBSD. When I operate on this text:
<span lang=JA style='font-family:
"MS Mincho"'>(</span>
I get this result:
<span lang=JA style='font-family: "MS
Mincho"'>(</span>
Under Cygwin (Perl 5.8.5), however, I get this result:
"MS Mincho"'>(</span>
ie, it appears to be throwing away the first backreference.
So, I decided to see what the backreferences looked like, with the
following code:
print "\n 0: [$0]";
print "\n 1: [$1]";
print "\n 2: [$2]";
print "\n 3: [$3]";
print "\n ";
print "\n ".$1.$2;
print "\n";
print "\n".$_;
print "\nEnd";
Under Cygwin, this yields some interesting output:
0: [./stripWord.pl]
]1: [<span lang=JA style='font-family:
2: ["MS Mincho"'>]
3: []
"MS Mincho"'>ont-family:
"MS Mincho"'>(</span>
End
Note that I have not made a mistake in placing the ] for backreference
1 at the beginning of the line -- that's how the output was given.
Note also that the concatenation of $1 and $2 is not correct.
Under the other platforms, the output is as expected:
0: [D:\HTMLCleaners\stripWord.pl]
1: [<span lang=JA style='font-family:]
2: ["MS Mincho"'>]
3: []
<span lang=JA style='font-family:"MS Mincho"'>
<span lang=JA style='font-family: "MS
Mincho"'>(</span>
End
Has anyone ever seen anything like this?
I'm not sure how to get cygwin version information, but from the
prompt, 'uname -a' yields:
CYGWIN_NT-5.1 Hermes 1.5.12(0.116/4/2) 2004-11-10 08:34 i686 unknown
unknown Cygwin
And the entire script I've got is available at:
http://weston.canncentral.org/misc/procword/regexProblem.txt
And the sample html is also available at:
http://weston.canncentral.org/misc/procword/example.txt