Regexp question

Lyndon Samson · May 9, 2005

I have a HTML document containing many table cells of which I wish to
extract the contents.

r = Regexp.new("\<TD.*?\>(.*?)\<\TD\>", Regexp::MULTILINE)
m = r.match(table)

The above only matches the first cell, I'd like to continue the match
finding each subsequent cell. The not-very-nice way to do this is to
take the char offset of the match, create a new string from that point
and feed it back into match.

Whats the better way?

Robert Klemme · May 9, 2005

Lyndon said:
I have a HTML document containing many table cells of which I wish to
extract the contents.

r = Regexp.new("\<TD.*?\>(.*?)\<\TD\>", Regexp::MULTILINE)
m = r.match(table)

The above only matches the first cell, I'd like to continue the match
finding each subsequent cell. The not-very-nice way to do this is to
take the char offset of the match, create a new string from that point
and feed it back into match.

Whats the better way?

Use String#scan.

robert

Shajith · May 9, 2005

------=_Part_8740_26768011.1115639827967
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

=20
The above only matches the first cell, I'd like to continue the match
finding each subsequent cell.

Have you tried scan? Try the same regexp with String#scan, to get an array=
=20
of matched groups

- Shajith.

------=_Part_8740_26768011.1115639827967--

Nikolai Weibull · May 9, 2005

Lyndon Samson, May 9:

r = Regexp.new("\<TD.*?\>(.*?)\<\TD\>", Regexp::MULTILINE)
m = r.match(table)

The above only matches the first cell, I'd like to continue the match
finding each subsequent cell. The not-very-nice way to do this is to
take the char offset of the match, create a new string from that point
and feed it back into match.

Whats the better way?

Using String#scan, as previously suggested, is the easiest method.
Another way of doing it is to use a loop while m is non-nil and match
against m.post_match on each iteration. See the documentation of the
MatchData class for more information,
nikolai

String extraction using RegExp	2	Jun 9, 2008
Simple regexp question	0	Oct 26, 2005
Regexp simple question	5	May 11, 2009
Newbie question on regexp in a class method	1	Sep 12, 2008
Regexp Ruby selection	5	Jul 25, 2008
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
too greedy of a regexp	3	Nov 9, 2006
More Regexp and file load problems	2	Jun 27, 2007

Regexp question

Lyndon Samson

Robert Klemme

Shajith

Nikolai Weibull

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads