Noob, html trees & parsing

M

Michael Lesser

Hi all.

Noob, first project, read the Poignant Guide, et al.

I have a big Perl script that parses badly-formed HTML files with HTML
Element/Tree. I think it's time for an update.

I think the equivalent in Ruby is Hpricot? I haven't found a lot of dox
on this, so I am assuming that this type of problem is something that
becomes 'obvious' once you start working in Ruby. Or should I be
looking at another/better solution (as in, duh, it's got XXX built-in,
noob...)?

TIA
 
S

Sanjay Sharma

Michael said:
Hi all.

Noob, first project, read the Poignant Guide, et al.

I have a big Perl script that parses badly-formed HTML files with HTML
Element/Tree. I think it's time for an update.

I think the equivalent in Ruby is Hpricot? I haven't found a lot of dox
on this, so I am assuming that this type of problem is something that
becomes 'obvious' once you start working in Ruby. Or should I be
looking at another/better solution (as in, duh, it's got XXX built-in,
noob...)?

TIA

You might want to take a look at html5lib <
http://code.google.com/p/html5lib/ > for parsing bad markup.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top