Return HTML between tags with HTML::TokeParser ?

Discussion in 'Perl Misc' started by Maqo, Feb 23, 2005.

  1. Maqo

    Maqo Guest

    Is it possible to use HTML::TokeParser to return the raw HTML between
    two <A> tags, as opposed to just the text? My source file contains
    several blocks of code--containing anchor links for each--that I'm
    trying to extract by section while maintaining formatting.

    My code:

    my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");
    while (my $t = $p->get_tag("a")) {
    my $name = $t->[1]{name};
    next unless $name && ($name eq "anchor");
    print "$name : " . $p->get_text("a");

    Example HTML source:

    <A NAME='anchor1'></a><p>Some text and HTML formatting</p><BR>
    <A NAME='anchor2'></a><p>Some text and HTML formatting</p><BR>
    ....
    <A NAME='anchor10'></a><p>Some text and HTML formatting</p><BR>

    The above code returns the "text and formatting" portions nicely,
    albeit only as text. Is there an easy way to do this using
    HTML::parser to return the desired portion, with HTML markup included?
    Many thanks.
     
    Maqo, Feb 23, 2005
    #1
    1. Advertising

  2. "Maqo" <> wrote in news:1109119459.537290.141800
    @c13g2000cwb.googlegroups.com:

    > Is it possible to use HTML::TokeParser to return the raw HTML between
    > two <A> tags, as opposed to just the text? My source file contains
    > several blocks of code--containing anchor links for each--that I'm
    > trying to extract by section while maintaining formatting.
    >
    > My code:
    >
    > my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");


    Cute but counter-productive. Please post real code.

    > while (my $t = $p->get_tag("a")) {
    > my $name = $t->[1]{name};
    > next unless $name && ($name eq "anchor");
    > print "$name : " . $p->get_text("a");
    >
    > Example HTML source:
    >
    > <A NAME='anchor1'></a><p>Some text and HTML formatting</p><BR>


    Am I missing something here? There is no text between <a> and </a>
    above.

    > The above code returns the "text and formatting" portions nicely,
    > albeit only as text.


    Once the bugs are fixed, the code above runs successfully and produces
    no output at all. That is exactly what I expected to see based on the
    sample data you provided. Problem solved.

    Hvae you read the posting guidelines?

    Sinan
     
    A. Sinan Unur, Feb 23, 2005
    #2
    1. Advertising

  3. Maqo

    Michael Wagg Guest

    A. Sinan Unur wrote:

    >>my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");

    >
    > Cute but counter-productive. Please post real code.


    With the exception of the input filename (which was changed from
    "digest.html"), this is the exact code being used.

    >>while (my $t = $p->get_tag("a")) {
    >>my $name = $t->[1]{name};
    >>next unless $name && ($name eq "anchor");
    >>print "$name : " . $p->get_text("a");
    >>
    >>Example HTML source:
    >>
    >><A NAME='anchor1'></a><p>Some text and HTML formatting</p><BR>

    >
    >
    > Am I missing something here? There is no text between <a> and </a>
    > above.


    The above code returns the text between one open tag and the next open
    tag (<A> -> <A>), not between one open tag and the subsequent closing
    tag (<A> -> </A>).
     
    Michael Wagg, Feb 23, 2005
    #3
  4. Maqo

    Sam Holden Guest

    On Wed, 23 Feb 2005 01:50:02 GMT, Michael Wagg <> wrote:
    > A. Sinan Unur wrote:
    >
    >>>my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");

    >>
    >> Cute but counter-productive. Please post real code.

    >
    > With the exception of the input filename (which was changed from
    > "digest.html"), this is the exact code being used.


    That's a really silly || with a constant true value on the left.

    Why would you bother with code that can not be executed? Especially
    when all it could possibly serve to do is to trick other people,
    and perhaps yourself, into thinking there's error checking when
    there isn't.

    --
    Sam Holden
     
    Sam Holden, Feb 23, 2005
    #4
  5. Michael Wagg <> wrote in news:ejRSd.9825$rB3.2454645
    @twister.nyc.rr.com:

    > A. Sinan Unur wrote:
    >
    >>>my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");

    >>
    >> Cute but counter-productive. Please post real code.

    >
    > With the exception of the input filename (which was changed from
    > "digest.html"), this is the exact code being used.


    my $p = HTML::TokeParser->new("file.txt")
    or "Can't open file.";

    >>>while (my $t = $p->get_tag("a")) {
    >>>my $name = $t->[1]{name};
    >>>next unless $name && ($name eq "anchor");


    Now I realize why it doesn't return anything: There are no anchors named
    'anchor' in the data you provided.

    Sorry, I don't have time to look at the rest of the stuff right now.
     
    A. Sinan Unur, Feb 23, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patrick Joly
    Replies:
    0
    Views:
    93
    Patrick Joly
    Feb 25, 2004
  2. jussi
    Replies:
    3
    Views:
    140
    Sherm Pendley
    Oct 7, 2005
  3. DVH

    HTML::TokeParser

    DVH, Oct 16, 2005, in forum: Perl Misc
    Replies:
    8
    Views:
    117
    A. Sinan Unur
    Oct 19, 2005
  4. Abram

    HTML::TokeParser & TableExtract

    Abram, Apr 25, 2006, in forum: Perl Misc
    Replies:
    16
    Views:
    225
    David Combs
    May 22, 2006
  5. -did-not-set--mail-host-address

    HTML::TokeParser; __DATA__ as a filehandle

    -did-not-set--mail-host-address, Oct 24, 2006, in forum: Perl Misc
    Replies:
    2
    Views:
    142
    Brian Wilkins
    Oct 24, 2006
Loading...

Share This Page