S
Sergei Olonichev
Hello,
I have found a problem with character classes definition in unicoded
regular expressions. It seems \s isn't defined properly.
See the following simple program which ought to change space symbols
into "line feed":
cat test.utf8 | ruby-1.8.0 -Ku -ne '$_.gsub(/[\s]+/u,"\n"); puts $_;'
test.utf8 contains the following in hex:
C2 A0 32 33 20 31 0A C2 A0 32 34 20 31 0A
which is UTF8 code for:
00A0 NS no-break space
0032 2 digit two
0033 3 digit three
0020 SP space
0031 1 digit one
000A LF line feed (lf)
00A0 NS no-break space
0032 2 digit two
0034 4 digit four
0020 SP space
0031 1 digit one
000A LF line feed (lf)
But Ruby does not make any changes (does not change "no-break space"
into "line feed")!
Is that a bug?
Best wishes,
Sergei
I have found a problem with character classes definition in unicoded
regular expressions. It seems \s isn't defined properly.
See the following simple program which ought to change space symbols
into "line feed":
cat test.utf8 | ruby-1.8.0 -Ku -ne '$_.gsub(/[\s]+/u,"\n"); puts $_;'
test.utf8 contains the following in hex:
C2 A0 32 33 20 31 0A C2 A0 32 34 20 31 0A
which is UTF8 code for:
00A0 NS no-break space
0032 2 digit two
0033 3 digit three
0020 SP space
0031 1 digit one
000A LF line feed (lf)
00A0 NS no-break space
0032 2 digit two
0034 4 digit four
0020 SP space
0031 1 digit one
000A LF line feed (lf)
But Ruby does not make any changes (does not change "no-break space"
into "line feed")!
Is that a bug?
Best wishes,
Sergei