ruby-dev summary 20489 - 20519

TAKAHASHI Masayoshi · Jul 9, 2003

Hello all,

I'm sorry to post so late. This is a summary of ruby-dev ML
last week.

[ruby-dev:20491] [Oniguruma] explicit capture
[ruby-dev:20514] [Oniguruma] Version 1.9.1

Recently, the translation of ''Mastering Regular Expressions''
2nd ed. was published in Japan. Kosako, the author of Oniguruma,
read it and found the ExplicitCapture option in .NET, which will
canceled groups except named groups. So Kosako added an option
REG_OPTION_CAPTURE_ONLY_NAMED_GROUP and a notation (?n:....)
in Oniguruma 1.9.1.

But Tanaka Akira pointed out that Ruby already used /n option,
and proposed using /c option instead of /n. Kosako agreed
Tanaka's idea.

[ruby-dev:20495] matching with invalid byte sequence

Kazuhiro NISHIYAMA pointed out that /./ matched with an invalid
byte sequence in UTF-8.

require 'uconv'
if /./u =~ "\xa3"
Uconv.u8toeuc($&) #=> illegal UTF-8 sequence (a3) (Uconv::Error)
end

But '/./s =~ "\xF1"' and '/./e =~ "\xF6"' don't match.
So he suggested that /./ should match one *character*, even if
$KCODE is UTF-8.

Nobu answered that Ruby's regexp doesn't check whether multi-byte
character sequence is valid or not, at least in current Ruby.
And the reason why /./s and /./e don't match "\xF1" and "\xF6"
each other is that each string should be considered first byte
of multi-byte character, but followed by no trailing bytes.

Regards,

TAKAHASHI 'Maki' Masayoshi E-mail: (e-mail address removed)

ruby-dev summary 26325-26385	0	Jul 4, 2005
ruby-dev summary 21381-21402	3	Sep 23, 2003
ruby-dev summary: 22688-22826	15	Feb 11, 2004
ruby-dev summary 25373-25479	6	Jan 20, 2005
ruby-dev summary 21608-21636	0	Oct 23, 2003
ruby-dev summary 24628-24740	1	Nov 13, 2004
ruby-dev summary 20941-21133	0	Aug 7, 2003
ruby-dev summary 25962-26010	0	Apr 18, 2005

ruby-dev summary 20489 - 20519

TAKAHASHI Masayoshi

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads