Parsing HTML using regexes and arrays.

Discussion in 'Ruby' started by soldier.coder, Nov 7, 2008.

  1. I have a nice little regex to pull the information rich guts from a
    table....

    %r{</thead.*?>(.*?)</table>}m =~html
    # $1 now contains all the rows of the table as one long string.

    I'd like to turn that into an array of rows, but I am not exactly sure
    how.

    Additionally, I'd like to process the rows so that i can get data from
    between the nth <td></td> pair.

    Any help?
    soldier.coder, Nov 7, 2008
    #1
    1. Advertising

  2. On Fri, Nov 7, 2008 at 3:08 PM, soldier.coder
    <> wrote:
    > I have a nice little regex to pull the information rich guts from a
    > table....
    >
    > %r{</thead.*?>(.*?)</table>}m =~html
    > # $1 now contains all the rows of the table as one long string.
    >
    > I'd like to turn that into an array of rows, but I am not exactly sure
    > how.
    >
    > Additionally, I'd like to process the rows so that i can get data from
    > between the nth <td></td> pair.
    >
    > Any help?


    If you have a string with a repeating pattern that you want an array
    of, String#scan is your man.

    irb(main):001:0> html = "<td>foo</td><td>bar</td>"
    => "<td>foo</td><td>bar</td>"
    irb(main):002:0> a = html.scan(/<td>(.+?)<\/td>/)
    => [["foo"], ["bar"]]

    Hmmm, that's sort of ugly.

    irb(main):003:0> a = html.scan(/<td>(.+?)<\/td>/).flatten
    => ["foo", "bar"]

    Much better.

    Ad hoc regexes are fine for quick-n-dirty scripting. But if you're
    serious about parsing HTML you might want to look into Hpricot or
    Nokogiri.

    -Michael Libby
    Michael Libby, Nov 7, 2008
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roedy Green

    File.separatorChar and regexes.

    Roedy Green, Aug 22, 2003, in forum: Java
    Replies:
    0
    Views:
    1,793
    Roedy Green
    Aug 22, 2003
  2. Replies:
    0
    Views:
    515
  3. Philipp
    Replies:
    21
    Views:
    1,123
    Philipp
    Jan 20, 2009
  4. JMI
    Replies:
    2
    Views:
    123
  5. Victor Hooi
    Replies:
    6
    Views:
    168
    Chris Angelico
    Dec 13, 2012
Loading...

Share This Page