Parsing HTML using regexes and arrays.

soldier.coder · Nov 7, 2008

I have a nice little regex to pull the information rich guts from a
table....

%r{</thead.*?>(.*?)</table>}m =~html
# $1 now contains all the rows of the table as one long string.

I'd like to turn that into an array of rows, but I am not exactly sure
how.

Additionally, I'd like to process the rows so that i can get data from
between the nth <td></td> pair.

Any help?

Michael Libby · Nov 7, 2008

I have a nice little regex to pull the information rich guts from a
table....

%r{</thead.*?>(.*?)</table>}m =~html
# $1 now contains all the rows of the table as one long string.

I'd like to turn that into an array of rows, but I am not exactly sure
how.

Additionally, I'd like to process the rows so that i can get data from
between the nth <td></td> pair.

Any help?

If you have a string with a repeating pattern that you want an array
of, String#scan is your man.

irb(main):001:0> html = "<td>foo</td><td>bar</td>"
=> "<td>foo</td><td>bar</td>"
irb(main):002:0> a = html.scan(/<td>(.+?)<\/td>/)
=> [["foo"], ["bar"]]

Hmmm, that's sort of ugly.

irb(main):003:0> a = html.scan(/<td>(.+?)<\/td>/).flatten
=> ["foo", "bar"]

Much better.

Ad hoc regexes are fine for quick-n-dirty scripting. But if you're
serious about parsing HTML you might want to look into Hpricot or
Nokogiri.

-Michael Libby

Getting extra blank rows from appending HTML..?	2	Oct 24, 2023
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
Sort by number of characters	1	Nov 2, 2023
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
Using regexes versus "in" membership test?	6	Dec 12, 2012
Timeout and Exponetial Regexes	4	Sep 28, 2006
Struggling with html table and verification	0	Nov 10, 2018
Only one table shows up with the information	2	Mar 29, 2023

Parsing HTML using regexes and arrays.

soldier.coder

Michael Libby

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads