Newbie question: matching

J

josh R

Hi all,

I am trying to write some python to parse html code. I find it easier
to do in perl, but would like to learn some basic python. My code
looks like this:

line = "<tr>eat at joe's</tr><tr>or else</tr><tr>you'll starve</tr>"
so = re.compile("(\<tr\>.*?\<\\tr\>)")
foo=so.match(line)
print foo.groups()

I'd like to get an array of elements that looks like this:

array(0)= <tr>eat at joe's</tr>
array(1)= <tr>or else</tr>
array(2)= <tr>you'll starve</tr>

Could you please tell me the correct way to do the matching?

also, is there something similiar to perl's s/foo/bar/g?

Thanks!!!
Josh
 
T

Tobiah

This should really be done with the XML parsing
libraries. I don't remember the libs now, but
I watched a co-worker translate HTML into XML,
and then use minidom, or sax or some other lib
to parse the XML. It is very convenient once
you see how to do it. You either trigger an
event for each tag/text, or get handed an entire
object tree representing your HTML, which you can
traverse and examine at a much higher level than
you can trying to match tags with regular expressions.

Toby
 
G

Gabriel Cooper

josh said:
Hi all,

I am trying to write some python to parse html code. I find it easier
to do in perl, but would like to learn some basic python. My code
looks like this:
You will be doing yourself a disservice if you do this by hand. Check
out http://www.diveintopython.org/ .. it's a free online book (also
available in print). Jump to the chapter on XML parsing. You'll save
yourself loads of time and effort, and you'll take advantage of the
python libraries. After all, python *is* supposed to be "batteries
included."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top