Parsing html by XML::libXML

Discussion in 'Perl Misc' started by John7481, Aug 12, 2004.

  1. John7481

    John7481 Guest

    Hello everybody,

    A database project is targeted to use a perl script to parse the html
    file and picking few items from html file, it will insert those items
    into database.

    Could somebody explain their ideas or real experiences to do such
    parsing job using libXML?

    Thanks in advance
    AR
     
    John7481, Aug 12, 2004
    #1
    1. Advertising

  2. John7481

    Abhinav Guest

    John7481 wrote:
    > Hello everybody,
    >
    > A database project is targeted to use a perl script to parse the html
    > file and picking few items from html file, it will insert those items
    > into database.


    Assuming that your HTML is XML compliant (ei.e. XHTML), you could try using
    XPath. It does a great job of finding specific information, and /should/
    be installed with your Perl 5.8 system.

    There is an introductory tutorial on http://w3schools.org

    [SNIP]

    HTH

    --

    Abhinav
     
    Abhinav, Aug 12, 2004
    #2
    1. Advertising

  3. John7481

    ko Guest

    John7481 wrote:
    > Hello everybody,
    >
    > A database project is targeted to use a perl script to parse the html
    > file and picking few items from html file, it will insert those items
    > into database.
    >
    > Could somebody explain their ideas or real experiences to do such
    > parsing job using libXML?
    >
    > Thanks in advance
    > AR


    These articles should help get you started:

    http://www.stonehenge.com/merlyn/PerlJournal/col02.html
    http://www.stonehenge.com/merlyn/PerlJournal/col03.html

    The articles are titled 'Cleaning up your HTML', but its the same
    concept, identifying tags/attributes.

    After you initialize the parser object, call its recover() method if
    you're not sure whether you're dealing with well-formed HTML.

    HTH - keith
     
    ko, Aug 13, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ian Gregory
    Replies:
    1
    Views:
    507
  2. Iain
    Replies:
    2
    Views:
    678
  3. Olav
    Replies:
    3
    Views:
    4,251
  4. jwang

    libxml: Parsing XML Question?

    jwang, Jul 6, 2004, in forum: C Programming
    Replies:
    5
    Views:
    407
    TLOlczyk
    Jul 7, 2004
  5. subimage
    Replies:
    11
    Views:
    342
    Mathieu Blondel
    Jun 8, 2006
Loading...

Share This Page