Parsing html by XML::libXML

J

John7481

Hello everybody,

A database project is targeted to use a perl script to parse the html
file and picking few items from html file, it will insert those items
into database.

Could somebody explain their ideas or real experiences to do such
parsing job using libXML?

Thanks in advance
AR
 
A

Abhinav

John7481 said:
Hello everybody,

A database project is targeted to use a perl script to parse the html
file and picking few items from html file, it will insert those items
into database.

Assuming that your HTML is XML compliant (ei.e. XHTML), you could try using
XPath. It does a great job of finding specific information, and /should/
be installed with your Perl 5.8 system.

There is an introductory tutorial on http://w3schools.org

[SNIP]

HTH
 
K

ko

John7481 said:
Hello everybody,

A database project is targeted to use a perl script to parse the html
file and picking few items from html file, it will insert those items
into database.

Could somebody explain their ideas or real experiences to do such
parsing job using libXML?

Thanks in advance
AR

These articles should help get you started:

http://www.stonehenge.com/merlyn/PerlJournal/col02.html
http://www.stonehenge.com/merlyn/PerlJournal/col03.html

The articles are titled 'Cleaning up your HTML', but its the same
concept, identifying tags/attributes.

After you initialize the parser object, call its recover() method if
you're not sure whether you're dealing with well-formed HTML.

HTH - keith
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top