Ruby multiline regex problem

Discussion in 'Ruby' started by Gregg Yows, Apr 8, 2008.

  1. Gregg Yows

    Gregg Yows Guest

    Code:

    "<td align="left" ><div style="width: 165px; height: 175px;"><a
    href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
    something here Best</td>"


    Pattern:

    <td.*?>.*?<\/td\s*>


    I'm trying to match this whole block and use it for further parsing.
    This started from an example in Brian Merick's book "Everyday
    Scripting..." that had to be modified because amazon has changed their
    presentation to tables instead of lists.

    Anyway, the regex works fine as a single-line. as soon as I introduce
    this:

    "<td align="left" ><div style="width: 165px; height: 175px;"><a
    href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
    something here

    Best</td>"

    it fails.

    When I try this same expression with perl using the //s mode, it works.
    I understand Ruby uses //m (multi-line mode in nearly the same fashion
    causing newlines to be considered any character, so it should work,
    right? Can anyone tell me what I am doing wrong here? Why isn't
    "multiline" mode working?

    Thanks!
    --
    Posted via http://www.ruby-forum.com/.
    Gregg Yows, Apr 8, 2008
    #1
    1. Advertising

  2. Gregg Yows

    Todd Benson Guest

    On Tue, Apr 8, 2008 at 11:21 AM, Gregg Yows <> wrote:
    > Code:
    >
    > "<td align="left" ><div style="width: 165px; height: 175px;"><a
    > href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
    > something here Best</td>"
    >
    >
    > Pattern:
    >
    > <td.*?>.*?<\/td\s*>
    >
    >
    > I'm trying to match this whole block and use it for further parsing.
    > This started from an example in Brian Merick's book "Everyday
    > Scripting..." that had to be modified because amazon has changed their
    > presentation to tables instead of lists.
    >
    > Anyway, the regex works fine as a single-line. as soon as I introduce
    > this:
    >
    > "<td align="left" ><div style="width: 165px; height: 175px;"><a
    > href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
    > something here
    >
    > Best</td>"
    >
    > it fails.
    >
    > When I try this same expression with perl using the //s mode, it works.
    > I understand Ruby uses //m (multi-line mode in nearly the same fashion
    > causing newlines to be considered any character, so it should work,
    > right? Can anyone tell me what I am doing wrong here? Why isn't
    > "multiline" mode working?
    >
    > Thanks!


    <CODE>

    s = '<td align="left" ><div style="width: 165px; height: 175px;"><a
    href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
    something here

    Best</td>'

    puts "######\ns:"
    puts s

    r1 = /<td.*?>.*?<\/td.*?>/m
    r2 = /<td.*?>(.*?)<\/td.*?>/m

    puts "######\nscan with r1:"
    puts s.scan(r1)
    puts
    puts "######\nmatch with r1:"
    puts (s.match r1)[0]
    puts

    s =~ r1
    puts "######\n=~ and $1 with r1:"
    puts $1

    puts
    puts
    puts

    puts "######\nscan with r2:"
    puts s.scan(r2)
    puts
    puts "######\nmatch with r2:"
    puts (s.match r2)[0]
    puts

    s =~ r2
    puts "######\n=~ and $1 with r2:"
    puts $1

    </CODE>

    Hmm, I'm not sure if the regexp /<td[^>]*>.*?<\/td[^>]*>/m would be
    more appropriate or not.

    Todd
    Todd Benson, Apr 8, 2008
    #2
    1. Advertising

  3. 2008/4/8, Gregg Yows <>:
    > Code:
    >
    > "<td align="left" ><div style="width: 165px; height: 175px;"><a
    > href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
    > something here Best</td>"
    >
    >
    > Pattern:
    >
    > <td.*?>.*?<\/td\s*>
    >
    >
    > I'm trying to match this whole block and use it for further parsing.
    > This started from an example in Brian Merick's book "Everyday
    > Scripting..." that had to be modified because amazon has changed their
    > presentation to tables instead of lists.
    >
    > Anyway, the regex works fine as a single-line. as soon as I introduce
    > this:
    >
    > "<td align="left" ><div style="width: 165px; height: 175px;"><a
    > href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
    > something here
    >
    > Best</td>"
    >
    > it fails.
    >
    > When I try this same expression with perl using the //s mode, it works.
    > I understand Ruby uses //m (multi-line mode in nearly the same fashion
    > causing newlines to be considered any character, so it should work,
    > right? Can anyone tell me what I am doing wrong here? Why isn't
    > "multiline" mode working?


    Works for me: no match without /m, match with /m:

    irb(main):004:0> s=%q{<td align="left" ><div style="width: 165px;
    height: 175px;"><a
    irb(main):005:0'
    href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
    irb(main):006:0' something here Best</td>}
    => "<td align=\"left\" ><div style=\"width: 165px; height:
    175px;\"><a\nhref=\"http://www.amazon.com/Rails-Recipes/dp/09
    77616606/ref=pd_sim_b_njs_img_1\">testPit\nsomething here Best</td>"
    irb(main):007:0> s[%r{<td.*?</td\s*>}]
    => nil
    irb(main):008:0> s[%r{<td.*?</td\s*>}m]
    => "<td align=\"left\" ><div style=\"width: 165px; height:
    175px;\"><a\nhref=\"http://www.amazon.com/Rails-Recipes/dp/09
    77616606/ref=pd_sim_b_njs_img_1\">testPit\nsomething here Best</td>"
    irb(main):009:0>

    Cheers

    robert

    --
    use.inject do |as, often| as.you_can - without end
    Robert Klemme, Apr 9, 2008
    #3
  4. Thanks folks for all your help...turns out that I was using the regex
    test view in Eclipse (RDT) which was obviously not behaving properly in
    multi-line mode. I guess I need to go out and get the Aptana/Radrails
    plugin that has the latest RDT and ruby-debug built in. I identified the
    issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
    restarting that server!

    http://www.rubular.com/





    --
    Posted via http://www.ruby-forum.com/.
    Ransom Tullis, Apr 10, 2008
    #4
  5. 2008/4/10, Ransom Tullis <>:
    > Thanks folks for all your help...turns out that I was using the regex
    > test view in Eclipse (RDT) which was obviously not behaving properly in
    > multi-line mode. I guess I need to go out and get the Aptana/Radrails
    > plugin that has the latest RDT and ruby-debug built in. I identified the
    > issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
    > restarting that server!


    Why look so far? IRB serves the same purpose.

    Cheers

    robert

    --
    use.inject do |as, often| as.you_can - without end
    Robert Klemme, Apr 10, 2008
    #5
  6. Robert Klemme wrote:
    > 2008/4/10, Ransom Tullis <>:
    >> Thanks folks for all your help...turns out that I was using the regex
    >> test view in Eclipse (RDT) which was obviously not behaving properly in
    >> multi-line mode. I guess I need to go out and get the Aptana/Radrails
    >> plugin that has the latest RDT and ruby-debug built in. I identified the
    >> issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
    >> restarting that server!

    >
    > Why look so far? IRB serves the same purpose.
    >
    > Cheers
    >
    > robert


    I'm a newb with Ruby and IRB. I did test the regex in IRB, but did not
    know that I could set a literal string up with \n characters like you
    did above through the interface. So, of course, it was passing
    everytime. That is very cool! I am growing fonder of IRB every day...
    --
    Posted via http://www.ruby-forum.com/.
    Ransom Tullis, Apr 10, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Yatima

    Multiline regex help

    Yatima, Mar 3, 2005, in forum: Python
    Replies:
    13
    Views:
    548
    Kent Johnson
    Mar 4, 2005
  2. eggie5

    multiline regex expression

    eggie5, Jul 21, 2007, in forum: Java
    Replies:
    4
    Views:
    503
    Roedy Green
    Jul 22, 2007
  3. Gilles Ganault

    [2.5] Regex doesn't support MULTILINE?

    Gilles Ganault, Jul 22, 2007, in forum: Python
    Replies:
    9
    Views:
    301
    Gilles Ganault
    Jul 24, 2007
  4. Replies:
    3
    Views:
    725
    Reedick, Andrew
    Jul 1, 2008
  5. dale zhang
    Replies:
    8
    Views:
    403
    Tintin
    Nov 30, 2004
Loading...

Share This Page