How to grep the shortly matching in a string

A

Arowana Lin

I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!
 
A

Arowana Lin

Robert said:
Arowana said:
I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!
Would be nice to see the regexp. Maybe using non-greedy multipliers will
do.
You could also use character classes like [^<>] to match chars between <
and >.
But I don't know what you want to match, so please show me :>

Thanks Robert!
here is the example.
<tr><td>Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3</td></tr>
I use the code below to fetch "Format" and "Format2" in the table
feature=content.scan(/[<tr><td>([\w\s])*<\/td><\/tr>/)
I want to get each row into array feature,like
feature[0]=Format;feature[1]=Format2...
but it match all the row into
feature[0]=Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3
 
E

Eero Saynatkari

Arowana said:
Robert said:
Arowana said:
I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!
Would be nice to see the regexp. Maybe using non-greedy multipliers will
do.
You could also use character classes like [^<>] to match chars between <
and >.
But I don't know what you want to match, so please show me :>

Thanks Robert!
here is the example.
<tr><td>Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3</td></tr>
I use the code below to fetch "Format" and "Format2" in the table
feature=content.scan(/[<tr><td>([\w\s])*<\/td><\/tr>/)
I want to get each row into array feature,like
feature[0]=Format;feature[1]=Format2...
but it match all the row into
feature[0]=Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3

Yes, you want the non-greedy version .*? instead of .*
there. You can use ? with the *, + and {,} specifiers.


E
 
D

dblack

Hi --

Non greedy quantifiers could probably be used to do this, but given that
your data is quite nicely delimited you may as well just scan ;)

s =
"<tr><td>Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3</td></tr>"
s.scan(/<td>([^<]*)<\/td>/) { |it| puts it }

array.each {|it| puts it } == puts array :)
outputs:

Format
Format2
Format3
=>
"<tr><td>Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3</td></tr>"

Obviously it doesn't do quite what you want (you need an array) but that part
should be easy to add...

scan returns an array, so just grab it:

results = s.scan(/.../).flatten # flatten because of the ()'s

[And yes, everyone who's about to say it, we all know that you cannot
parse arbitrary HTML with a single regular expression.]


David

--
David A. Black
(e-mail address removed)

"Ruby for Rails", from Manning Publications, coming April 2006!
http://www.manning.com/books/black
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top