ruby-dev summary 20489 - 20519

  • Thread starter TAKAHASHI Masayoshi
  • Start date
T

TAKAHASHI Masayoshi

Hello all,

I'm sorry to post so late. This is a summary of ruby-dev ML
last week.


[ruby-dev:20491] [Oniguruma] explicit capture
[ruby-dev:20514] [Oniguruma] Version 1.9.1

Recently, the translation of ''Mastering Regular Expressions''
2nd ed. was published in Japan. Kosako, the author of Oniguruma,
read it and found the ExplicitCapture option in .NET, which will
canceled groups except named groups. So Kosako added an option
REG_OPTION_CAPTURE_ONLY_NAMED_GROUP and a notation (?n:....)
in Oniguruma 1.9.1.

But Tanaka Akira pointed out that Ruby already used /n option,
and proposed using /c option instead of /n. Kosako agreed
Tanaka's idea.


[ruby-dev:20495] matching with invalid byte sequence

Kazuhiro NISHIYAMA pointed out that /./ matched with an invalid
byte sequence in UTF-8.

require 'uconv'
if /./u =~ "\xa3"
Uconv.u8toeuc($&) #=> illegal UTF-8 sequence (a3) (Uconv::Error)
end

But '/./s =~ "\xF1"' and '/./e =~ "\xF6"' don't match.
So he suggested that /./ should match one *character*, even if
$KCODE is UTF-8.

Nobu answered that Ruby's regexp doesn't check whether multi-byte
character sequence is valid or not, at least in current Ruby.
And the reason why /./s and /./e don't match "\xF1" and "\xF6"
each other is that each string should be considered first byte
of multi-byte character, but followed by no trailing bytes.


Regards,

TAKAHASHI 'Maki' Masayoshi E-mail: (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,038
Latest member
OrderProperKetocapsules

Latest Threads

Top