Regexp Unicode property names strange behavior?

A

Ammar Ali

[Note: parts of this message were removed to make it a legal post.]

On 1.9.2, I'm seeing an "invalid character property name" error from Regexp
for the named properties Any, Ascii, and Xdigit, but none of the others. If
I add the u option to the expression, it works.


# All good, with or without u option
[ 'Alnum', 'Alpha', 'Blank', 'Cntrl', 'Digit', 'Graph', 'Lower',
'Print', 'Punct', 'Space', 'Upper', 'Word'
].each {|name| puts /\p{#{name}}/ }


# Errors raised without u option
['Any', 'Ascii', 'Xdigit'].each {|name| puts /\p{#{name}}/ }


# Now it's good
['Any', 'Ascii', 'Xdigit'].each {|name| puts /\p{#{name}}/u }


I expected that all the names would either require the u option, or they
wouldn't. If it was just Any and Ascii, I would accept it and move on, but
Xdigit doesn't seem to belong with the other two.

Trying to understand why Any, Ascii, and Xdigit are "special". Any clues
greatly appreciated.


Thanks,
Ammar
 
A

Ammar Ali

[Note: parts of this message were removed to make it a legal post.]

On 1.9.2, I'm seeing an "invalid character property name" error from Regexp
for the named properties Any, Ascii, and Xdigit, but none of the others. If
I add the u option to the expression, it works.


# All good, with or without u option
[ 'Alnum', 'Alpha', 'Blank', 'Cntrl', 'Digit', 'Graph', 'Lower',
'Print', 'Punct', 'Space', 'Upper', 'Word'
].each {|name| puts /\p{#{name}}/ }


# Errors raised without u option
['Any', 'Ascii', 'Xdigit'].each {|name| puts /\p{#{name}}/ }


# Now it's good
['Any', 'Ascii', 'Xdigit'].each {|name| puts /\p{#{name}}/u }


I expected that all the names would either require the u option, or they
wouldn't. If it was just Any and Ascii, I would accept it and move on, but
Xdigit doesn't seem to belong with the other two.

Trying to understand why Any, Ascii, and Xdigit are "special". Any clues
greatly appreciated.


Thanks,
Ammar


It was bad documentation! Xdigit should be XDigit, and Ascii should be
ASCII. Any requires encoding to be specified, which makes sense.

Sorry about the noise.
Ammar
 
A

Ammar Ali

[Note: parts of this message were removed to make it a legal post.]

dude. help out. where?


If I understood your question correctly, then the erroneous docs are at:
http://ruby.runpaint.org/regexps#properties

I submitted an issue against it, with reference to source code, at:
http://github.com/runpaint/read-ruby/issues/issue/68

It has been very difficult finding detailed information about many of the
1.9 regular expression features. Read Ruby has the most coverage I have
found so far.

Regards,
Ammar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top