problem with \s in unicoded regular expressions

Sergei Olonichev · Oct 27, 2003

Hello,

I have found a problem with character classes definition in unicoded
regular expressions. It seems \s isn't defined properly.

See the following simple program which ought to change space symbols
into "line feed":
cat test.utf8 | ruby-1.8.0 -Ku -ne '$_.gsub(/[\s]+/u,"\n"); puts $_;'

test.utf8 contains the following in hex:
C2 A0 32 33 20 31 0A C2 A0 32 34 20 31 0A

which is UTF8 code for:
00A0 NS no-break space
0032 2 digit two
0033 3 digit three
0020 SP space
0031 1 digit one
000A LF line feed (lf)
00A0 NS no-break space
0032 2 digit two
0034 4 digit four
0020 SP space
0031 1 digit one
000A LF line feed (lf)

But Ruby does not make any changes (does not change "no-break space"
into "line feed")!
Is that a bug?

Best wishes,
Sergei

Simon Strandgaard · Oct 27, 2003

Hello,

I have found a problem with character classes definition in unicoded
regular expressions. It seems \s isn't defined properly.

See the following simple program which ought to change space symbols
into "line feed":
cat test.utf8 | ruby-1.8.0 -Ku -ne '$_.gsub(/[\s]+/u,"\n"); puts $_;'

test.utf8 contains the following in hex:
C2 A0 32 33 20 31 0A C2 A0 32 34 20 31 0A

which is UTF8 code for:
00A0 NS no-break space
0032 2 digit two
0033 3 digit three
0020 SP space
0031 1 digit one
000A LF line feed (lf)
00A0 NS no-break space
0032 2 digit two
0034 4 digit four
0020 SP space
0031 1 digit one
000A LF line feed (lf)

But Ruby does not make any changes (does not change "no-break space"
into "line feed")!
Is that a bug?

No..

server> ruby u.rb
"+)!! !\036+)!! !\036"
"+)!!\n!\036+)!!\n!\036"
server> cat u.rb
input = %w(C2 A0 32 33 20 31 0A C2 A0 32 34 20 31 0A)
str = input.map{|i| i.unpack('H2')[0].to_i.chr}.join
p str
p str.gsub(/[\s]+/u,"\n")
server>

I see no problems with regexp \s..

Simon Strandgaard · Oct 27, 2003

Hello,

I have found a problem with character classes definition in unicoded
regular expressions. It seems \s isn't defined properly.

See the following simple program which ought to change space symbols
into "line feed":
cat test.utf8 | ruby-1.8.0 -Ku -ne '$_.gsub(/[\s]+/u,"\n"); puts $_;'

test.utf8 contains the following in hex:
C2 A0 32 33 20 31 0A C2 A0 32 34 20 31 0A

which is UTF8 code for:
00A0 NS no-break space
0032 2 digit two
0033 3 digit three
0020 SP space
0031 1 digit one
000A LF line feed (lf)
00A0 NS no-break space
0032 2 digit two
0034 4 digit four
0020 SP space
0031 1 digit one
000A LF line feed (lf)

But Ruby does not make any changes (does not change "no-break space"
into "line feed")!
Is that a bug?

Click to expand...

No..

server> ruby u.rb
"+)!! !\036+)!! !\036"
"+)!!\n!\036+)!!\n!\036"
server> cat u.rb
input = %w(C2 A0 32 33 20 31 0A C2 A0 32 34 20 31 0A)
str = input.map{|i| i.unpack('H2')[0].to_i.chr}.join
p str
p str.gsub(/[\s]+/u,"\n")
server>

I see no problems with regexp \s..

Hmmm.. there is something wrong with my code .. me sorry, too quick.
My hex2utf8 conversion is buggy.. anyone who knows a smarter way to do
this ?

Sergei Olonichev · Oct 28, 2003

Sergei said:
Hello,

I have found a problem with character classes definition in unicoded
regular expressions. It seems \s isn't defined properly.

See the following simple program which ought to change space symbols
into "line feed":
cat test.utf8 | ruby-1.8.0 -Ku -ne '$_.gsub(/[\s]+/u,"\n"); puts $_;'

I have forgotten "!" after "gsub" here in this letter but not when I
tested this problem, so "!" does not help.

Special chars in regular expressions - Problems	1	Oct 6, 2007
Problems sending mail with gmail and SMTP	1	Feb 20, 2011
Regular Expressions Challenge	6	Feb 7, 2005
ANN: 'rex', a module for easy creation and use of regular expressions	0	Jun 10, 2004
extension_pack	0	Jan 6, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.vhdl FAQ part 4 of 4: glossary	0	Jul 8, 2003

problem with \s in unicoded regular expressions

Sergei Olonichev

Simon Strandgaard

Simon Strandgaard

Sergei Olonichev

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads