A bug in Ruby regexp lib?

Discussion in 'Ruby' started by Artūras Šlajus, Jan 27, 2009.

  1. ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]

    x11@www:~$ irb
    irb(main):001:0> s = "www.myspace.com/djmamania
    www.myspace.com/djmantini"
    => "www.myspace.com/djmamania www.myspace.com/djmantini"
    irb(main):002:0> s1 = s.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
    href="http://\2">\2</a>\3')
    => "<a
    href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
    www.myspace.com/djmantini"
    irb(main):003:0> s1.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
    href="http://\2">\2</a>\3')
    => "<a
    href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
    <a
    href=\"http://www.myspace.com/djmantini\">www.myspace.com/djmantini</a>"

    Why I have to call gsub two times for this to work? Same regexp works
    fine in Firefox JS :)
    --
    Posted via http://www.ruby-forum.com/.
     
    Artūras Šlajus, Jan 27, 2009
    #1
    1. Advertising

  2. Artūras Šlajus

    Tim Greer Guest

    Artc5abras c5a0lajus wrote:

    > ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
    >
    > x11@www:~$ irb
    > irb(main):001:0> s = "www.myspace.com/djmamania
    > www.myspace.com/djmantini"
    > => "www.myspace.com/djmamania www.myspace.com/djmantini"
    > irb(main):002:0> s1 = s.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
    > href="http://\2">\2</a>\3')
    > => "<a
    >

    href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
    > www.myspace.com/djmantini"
    > irb(main):003:0> s1.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
    > href="http://\2">\2</a>\3')
    > => "<a
    >

    href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
    > <a
    >

    href=\"http://www.myspace.com/djmantini\">www.myspace.com/djmantini</a>"
    >
    > Why I have to call gsub two times for this to work? Same regexp works
    > fine in Firefox JS :)


    Did you mean:

    s1 = s.gsub(%r{(^|\s)?(www\..*?)(\s|$)}m, '\1<a
    href="http://\2">\2</a>\3')

    irb(main):035:0> s1 = s.gsub(%r{(^|\s)?(www\..*?)(\s|$)}m, '\1<a
    href="http://\2">\2</a>\3')
    => "<a
    href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
    <a
    href=\"http://www.myspace.com/djmantini\">www.myspace.com/djmantini</a>"

    Note the \1 is using (^|\s), as it's either the start of the string (^)
    or a white space between the two URLs (\s), but you also have \3, which
    is either the end of the string ($) or white space between the URLs (or
    following) (\s), and since there's only one white space between the two
    URLs, it throws is off.

    To account for both \1 and \3, above I've set it to be optional (^|\s)?
    because this will allow you to use \3 without is breaking it. There
    are other ways to do this, but just working with what you were using,
    that's a change you could make to get the desired results on the first
    one... unless I misunderstood what you were trying to do?
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
     
    Tim Greer, Jan 27, 2009
    #2
    1. Advertising

  3. Artūras Šlajus

    Tim Greer Guest

    Tim Greer wrote:

    > Artc5abras c5a0lajus wrote:
    >
    >> ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
    >>
    >> x11@www:~$ irb
    >> irb(main):001:0> s = "www.myspace.com/djmamania
    >> www.myspace.com/djmantini"
    >> => "www.myspace.com/djmamania www.myspace.com/djmantini"
    >> irb(main):002:0> s1 = s.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
    >> href="http://\2">\2</a>\3')
    >> => "<a
    >>

    >

    href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
    >> www.myspace.com/djmantini"
    >> irb(main):003:0> s1.gsub(%r{(\s|^)(www\..*?)(\s|$)}m, '\1<a
    >> href="http://\2">\2</a>\3')
    >> => "<a
    >>

    >

    href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
    >> <a
    >>

    >

    href=\"http://www.myspace.com/djmantini\">www.myspace.com/djmantini</a>"
    >>
    >> Why I have to call gsub two times for this to work? Same regexp works
    >> fine in Firefox JS :)

    >
    > Did you mean:
    >
    > s1 = s.gsub(%r{(^|\s)?(www\..*?)(\s|$)}m, '\1<a
    > href="http://\2">\2</a>\3')
    >
    > irb(main):035:0> s1 = s.gsub(%r{(^|\s)?(www\..*?)(\s|$)}m, '\1<a
    > href="http://\2">\2</a>\3')
    > => "<a
    >

    href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a>
    > <a
    >

    href=\"http://www.myspace.com/djmantini\">www.myspace.com/djmantini</a>"
    >
    > Note the \1 is using (^|\s), as it's either the start of the string
    > (^) or a white space between the two URLs (\s), but you also have \3,
    > which is either the end of the string ($) or white space between the
    > URLs (or following) (\s), and since there's only one white space
    > between the two URLs, it throws is off.
    >
    > To account for both \1 and \3, above I've set it to be optional
    > (^|\s)?
    > because this will allow you to use \3 without is breaking it. There
    > are other ways to do this, but just working with what you were using,
    > that's a change you could make to get the desired results on the first
    > one... unless I misunderstood what you were trying to do?


    Geez, pardon the typos I've made above. Apparently I'm having trouble
    working my keyboard (some of those "is" should be "it")
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
     
    Tim Greer, Jan 27, 2009
    #3
  4. Tim Greer wrote:
    > Note the \1 is using (^|\s), as it's either the start of the string (^)
    > or a white space between the two URLs (\s), but you also have \3, which
    > is either the end of the string ($) or white space between the URLs (or
    > following) (\s), and since there's only one white space between the two
    > URLs, it throws is off.
    >
    > To account for both \1 and \3, above I've set it to be optional (^|\s)?
    > because this will allow you to use \3 without is breaking it. There
    > are other ways to do this, but just working with what you were using,
    > that's a change you could make to get the desired results on the first
    > one... unless I misunderstood what you were trying to do?


    Ah, thank you. It seems that Ruby is parsing that string after getting
    last \s down there. But shouldn't \3 insert it right back? :)

    Anyways, I have another problem then ;]
    it "should link http links" do
    "http://www.myspace.com/djmamania".htmlize.should == \
    '<p><a
    href="http://www.myspace.com/djmamania">www.myspace.com/djmamania</a></p>'
    end

    2)
    'String#htmlize should link http links' FAILED
    expected: "<p><a
    href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmam
    ania</a></p>",
    got: "<p>http://<a
    href=\"http://www.myspace.com/djmamania\">www.myspace.com/djmamania</a></p>"
    (using ==)

    What do you suggest?
    --
    Posted via http://www.ruby-forum.com/.
     
    Artūras Šlajus, Jan 27, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Zhoran Tvalve
    Replies:
    4
    Views:
    165
    Zhoran Tvalve
    Dec 15, 2008
  2. Joao Silva
    Replies:
    16
    Views:
    402
    7stud --
    Aug 21, 2009
  3. Sniper Abandon
    Replies:
    2
    Views:
    127
    Ammar Ali
    Dec 6, 2010
  4. Iñaki Baz Castillo
    Replies:
    1
    Views:
    175
    Iñaki Baz Castillo
    Feb 28, 2011
  5. Uldis  Bojars
    Replies:
    2
    Views:
    213
    Janwillem Borleffs
    Dec 17, 2006
Loading...

Share This Page