Convert HTML to XML

Discussion in 'Perl Misc' started by Ninja Li, Nov 16, 2009.

  1. Ninja Li

    Ninja Li Guest

    Hi,

    I tried to parse a HTML page using HTML::TreeBuilder but it is a
    little cumbersome. Is there an easier way to parse HTML, say from HTML
    to XML? Which perl package and methods should I use?

    Thanks in advance.

    Nick
    Ninja Li, Nov 16, 2009
    #1
    1. Advertising

  2. Ninja Li

    Ninja Li Guest

    On Nov 16, 11:43 am, Ben Morrow <> wrote:
    >
    > It's not clear what you're trying to do once you've parsed it, but if
    > you want an XML DOMish interface then XML::LibXML will quite happily
    > parse HTML.
    >
    > Ben


    I tried to filter HTML to get the the earnings data, e.g. symbol,
    company, event, time data (link: http://www.earnings.com/conferencecall.asp?client=cb
    ) and put them in a text file.
    Ninja Li, Nov 16, 2009
    #2
    1. Advertising

  3. Ninja Li

    Ninja Li Guest

    On Nov 16, 2:04 pm, Lawrence Statton <> wrote:
    > Ninja Li <> writes:
    >
    > HTML::TreeBuilder really is the "right" tool for parsing HTML you get
    > from the web. One of it's major strengths is it can generate reasonable
    > parse-trees from even unreasonable HTML.
    >
    > Keep in mind that scraping earnings.com's website may be in violation of
    > their terms of use, and you should make sure you have appropriate
    > permission before doing that in an automated way.
    >
    > --L


    Thanks for your help and concern. We are a client of the website and
    are trying to move for Excel-based program to perl.
    Ninja Li, Nov 16, 2009
    #3
  4. Ninja Li

    Guest

    On Mon, 16 Nov 2009 13:40:49 -0800 (PST), Ninja Li <> wrote:

    >On Nov 16, 2:04 pm, Lawrence Statton <> wrote:
    >> Ninja Li <> writes:
    >>
    >> HTML::TreeBuilder really is the "right" tool for parsing HTML you get
    >> from the web. One of it's major strengths is it can generate reasonable
    >> parse-trees from even unreasonable HTML.
    >>
    >> Keep in mind that scraping earnings.com's website may be in violation of
    >> their terms of use, and you should make sure you have appropriate
    >> permission before doing that in an automated way.
    >>
    >> --L

    >
    >Thanks for your help and concern. We are a client of the website and
    >are trying to move for Excel-based program to perl.


    I looked at the source to the page link you provided.
    I hope thats not in violation and the Feds are gonna come get me.

    I wouldn't call it scraping would you? I'd guess Yaaahooei/Googleballs
    own the web cause they do it all the time.

    I've heard there is some kind of Perl module that will turn table data
    into some kind of hash for you. I have personal software (written by me)
    that sucks table data out of html/xml like buttaa. Unfortunately you can't
    get it.

    Look for that module on cpan or somewhere.

    -sln
    , Nov 18, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven Cheng[MSFT]

    RE: Convert HTML to XML or Paser HTML

    Steven Cheng[MSFT], Jan 9, 2004, in forum: ASP .Net
    Replies:
    3
    Views:
    3,463
    George Ter-Saakov
    Feb 12, 2004
  2. Joerg Jooss

    Re: Convert HTML to XML or Paser HTML

    Joerg Jooss, Jan 11, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    556
    Joerg Jooss
    Jan 11, 2004
  3. Q.Z
    Replies:
    0
    Views:
    576
  4. csgraham74
    Replies:
    2
    Views:
    1,231
    csgraham74
    Sep 19, 2006
  5. Erik Wasser
    Replies:
    5
    Views:
    449
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page