A bug in Ruby regexp lib?

ArtÅ«ras Å lajus · Jan 27, 2009

ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]

x11@www:~$ irb
irb(main):001:0> s = "www.myspace.com/djmamania
www.myspace.com/djmantini"
=> "www.myspace.com/djmamania www.myspace.com/djmantini"
irb(main):002:0> s1 = s.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
href="http://\2">\2</a>\3')
=> "<a
href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
www.myspace.com/djmantini"
irb(main):003:0> s1.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
href="http://\2">\2</a>\3')
=> "<a
href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
<a
href=\"http://www.myspace.com/djmantini\">www.myspace.com/djmantini</a>"

Why I have to call gsub two times for this to work? Same regexp works
fine in Firefox JS

Tim Greer · Jan 27, 2009

Artc5abras said:
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]

x11@www:~$ irb
irb(main):001:0> s = "www.myspace.com/djmamania
www.myspace.com/djmantini"
=> "www.myspace.com/djmamania www.myspace.com/djmantini"
irb(main):002:0> s1 = s.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
href="http://\2">\2</a>\3')
=> "<a

href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania said:
www.myspace.com/djmantini"
irb(main):003:0> s1.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
href="http://\2">\2</a>\3')
=> "<a

href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania said:
<a

href=\"http://www.myspace.com/djmantini\">www.myspace.com/djmantini said:
Why I have to call gsub two times for this to work? Same regexp works
fine in Firefox JS

Did you mean:

s1 = s.gsub(%r{(^|\s)?(www\..*?)(\s|$)}m, '\1<a
href="http://\2">\2</a>\3')

irb(main):035:0> s1 = s.gsub(%r{(^|\s)?(www\..*?)(\s|$)}m, '\1<a
href="http://\2">\2</a>\3')
=> "<a
href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
<a
href=\"http://www.myspace.com/djmantini\">www.myspace.com/djmantini</a>"

Note the \1 is using (^|\s), as it's either the start of the string (^)
or a white space between the two URLs (\s), but you also have \3, which
is either the end of the string ($) or white space between the URLs (or
following) (\s), and since there's only one white space between the two
URLs, it throws is off.

To account for both \1 and \3, above I've set it to be optional (^|\s)?
because this will allow you to use \3 without is breaking it. There
are other ways to do this, but just working with what you were using,
that's a change you could make to get the desired results on the first
one... unless I misunderstood what you were trying to do?

Tim Greer · Jan 27, 2009

Tim said:
Artc5abras said:

ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]

x11@www:~$ irb
irb(main):001:0> s = "www.myspace.com/djmamania
www.myspace.com/djmantini"
=> "www.myspace.com/djmamania www.myspace.com/djmantini"
irb(main):002:0> s1 = s.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
href="http://\2">\2</a>\3')
=> "<a

Click to expand...

href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania said:

www.myspace.com/djmantini"
irb(main):003:0> s1.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
href="http://\2">\2</a>\3')
=> "<a

Click to expand...

href=\"http://www.myspace.com/djmantini\">www.myspace.com/djmantini said:

Why I have to call gsub two times for this to work? Same regexp works
fine in Firefox JS

Click to expand...

Did you mean:

s1 = s.gsub(%r{(^|\s)?(www\..*?)(\s|$)}m, '\1<a
href="http://\2">\2</a>\3')

irb(main):035:0> s1 = s.gsub(%r{(^|\s)?(www\..*?)(\s|$)}m, '\1<a
href="http://\2">\2</a>\3')
=> "<a

href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania said:
<a

href=\"http://www.myspace.com/djmantini\">www.myspace.com/djmantini said:
Note the \1 is using (^|\s), as it's either the start of the string
(^) or a white space between the two URLs (\s), but you also have \3,
which is either the end of the string ($) or white space between the
URLs (or following) (\s), and since there's only one white space
between the two URLs, it throws is off.

To account for both \1 and \3, above I've set it to be optional
(^|\s)?
because this will allow you to use \3 without is breaking it. There
are other ways to do this, but just working with what you were using,
that's a change you could make to get the desired results on the first
one... unless I misunderstood what you were trying to do?

Geez, pardon the typos I've made above. Apparently I'm having trouble
working my keyboard (some of those "is" should be "it")

ArtÅ«ras Å lajus · Jan 27, 2009

Tim said:
Note the \1 is using (^|\s), as it's either the start of the string (^)
or a white space between the two URLs (\s), but you also have \3, which
is either the end of the string ($) or white space between the URLs (or
following) (\s), and since there's only one white space between the two
URLs, it throws is off.

To account for both \1 and \3, above I've set it to be optional (^|\s)?
because this will allow you to use \3 without is breaking it. There
are other ways to do this, but just working with what you were using,
that's a change you could make to get the desired results on the first
one... unless I misunderstood what you were trying to do?

Ah, thank you. It seems that Ruby is parsing that string after getting
last \s down there. But shouldn't \3 insert it right back?

Anyways, I have another problem then ;]
it "should link http links" do
"http://www.myspace.com/djmamania".htmlize.should == \
'<a
href="http://www.myspace.com/djmamania">www.myspace.com/djmamania</a>'
end

2)
'String#htmlize should link http links' FAILED
expected: "<a
href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmam
ania</a>",
got: "http://<a
href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>"
(using ==)

What do you suggest?

bug is ruby regexp	3	Feb 2, 2007
Why does Regexp::escape backslash spaces?!?	3	Jan 26, 2010
Regexp bug in 1.9.0?	3	Nov 18, 2004
Strange bug in irb1.9	7	Mar 24, 2009
[array & regexp] Development - works, production not - why?	10	Aug 23, 2009
is it bug?	3	Jul 18, 2007
IO.pos bug?	5	Jul 28, 2008
Is this a Ruby bug in Dir on Windows?	1	Oct 25, 2007

A bug in Ruby regexp lib?

ArtÅ«ras Å lajus

Tim Greer

Tim Greer

ArtÅ«ras Å lajus

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads