[TABLE NOT SHOWN] problem with HTML::Parse

Discussion in 'Perl' started by Mitchua, Jul 10, 2003.

  1. Mitchua

    Mitchua Guest

    When I run the well quoted line:
    my $ascii =
    HTML::FormatText->new->format(HTML::parse::parse_html($html));
    to remove HTML tags from an html document, it replaces all tables with
    "[TABLE NOT SHOWN]". Is there a quick and easy way to get the table content
    parsed too?

    Thanks a lot,
    Mitchua
    Mitchua, Jul 10, 2003
    #1
    1. Advertising

  2. "Mitchua" <> wrote in message
    news:EJiPa.115702$...
    > When I run the well quoted line:
    > my $ascii =
    > HTML::FormatText->new->format(HTML::parse::parse_html($html));
    > to remove HTML tags from an html document, it replaces all tables with
    > "[TABLE NOT SHOWN]". Is there a quick and easy way to get the table

    content
    > parsed too?
    >

    The documentation for HTML::FormatText states: "Formatting of HTML tables
    and forms is not implemented." So not with that module. The documentation
    makes a reference to HTML::Formatter
    (http://search.cpan.org/author/SBURKE/HTML-Format-2.03/lib/HTML/Formatter.pm
    ), which in turn contains references to other modules that may be of some
    help.
    James E Keenan, Jul 11, 2003
    #2
    1. Advertising

  3. "Mitchua" <> wrote in message
    news:YRHPa.6477$...
    >
    > Are there any other (easy) ways to remove all html tags (including tricky
    > tags like comments, etc.) from a web page without using those modules?

    I'm
    > looking for a solution beyond a regular expression.
    >

    "Easy": no. That's why we have all those modules in the HTML section of
    CPAN -- the solution is always difficult, messy and "beyond a regular
    expression."

    I note that in your OP you used HTML::parse. The 1-line description of this
    indicates that it is deprecated. Have you looked into HTML::parser? People
    speak highly of that module.
    James E Keenan, Jul 12, 2003
    #3
  4. Mitchua

    Mitchua Guest

    "James E Keenan" <> wrote in message
    news:beovoq$...
    >
    > "Mitchua" <> wrote in message
    > news:YRHPa.6477$...
    > >
    > > Are there any other (easy) ways to remove all html tags (including

    tricky
    > > tags like comments, etc.) from a web page without using those modules?

    > I'm
    > > looking for a solution beyond a regular expression.
    > >

    > "Easy": no. That's why we have all those modules in the HTML section of
    > CPAN -- the solution is always difficult, messy and "beyond a regular
    > expression."
    >
    > I note that in your OP you used HTML::parse. The 1-line description of

    this
    > indicates that it is deprecated. Have you looked into HTML::parser?

    People
    > speak highly of that module.
    >


    I found this code on the web that uses it:

    use HTML::parser;
    $p = HTML::parser->new;
    $p->parse($notes); # parse the HTML in notes
    $p->eof; # signal end of parse file
    print $p->as_string; # print out the parsed text

    but i get the error "Can't locate ../HTML/Parser/as_string.al". I'm looking
    for that file now.

    Jonathan
    Mitchua, Jul 14, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Johann Blake
    Replies:
    0
    Views:
    1,298
    Johann Blake
    Jun 26, 2003
  2. Johann Blake
    Replies:
    4
    Views:
    1,468
    earbullet
    Dec 9, 2004
  3. Norman Yuan
    Replies:
    0
    Views:
    4,797
    Norman Yuan
    Jan 26, 2006
  4. Ramon F Herrera
    Replies:
    24
    Views:
    936
    Daniel Pitts
    Nov 12, 2007
  5. Mete Akalýn
    Replies:
    1
    Views:
    389
    Mete Akalýn
    Jul 25, 2003
Loading...

Share This Page