regular expression for parsing an html element

A

abcd

I have some HTML such as....

<a href="...."><img src="image/blah/a.jpg" id="ddddd">blah blah blah</
a>....

I wnat to pull out the text that lies inside the quotes of the src
attribute. So in this example I would get image/blah/a.jpg

My regex so far is: src=\"(.*)\" ....however the group in this case
would end up being, image/blah/a.jpg" id="ddddd">blah blah blah</
a>.....

how can I tell the regex group (.*) to end when it gets to the first
" ?

thanks
 
?

=?ISO-8859-2?Q?Wojciech_Mu=B3a?=

abcd said:
My regex so far is: src=\"(.*)\" ....however the group in this case
would end up being, image/blah/a.jpg" id="ddddd">blah blah blah</
a>.....

how can I tell the regex group (.*) to end when it gets to the first
" ?

Use non-greedy matching, i.e. src=\"(.*?)\" (question mark after *.)
See: http://docs.python.org/lib/re-syntax.html

w.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top