HTML::TokeParser; __DATA__ as a filehandle

Discussion in 'Perl Misc' started by Jonathan@D4HRQQB1.i-did-not-set--mail-host-address, Oct 24, 2006.

  1. This is an embarrassingly simple question, but I'm trying to get
    HTML::TokeParser to execute an example even simpler than the one given as an
    example in the docs.

    I expect to get output of "3" from the program Instead, I get "0."
    What is the trivial reason for this? Is there a problem with my use
    of __DATA__, or my reference to main::DATA? Is the central loop not
    working?

    I hope that I have done enough of my homework on this to warrant a meaningful
    response. If I have to go back to perlopen or perlreftut, please let me know.

    Here is my sample program:

    #!/usr/bin/perl -w

    use warnings;
    use strict;
    use diagnostics;

    use HTML::TokeParser;
    my $fh = \<main::DATA>;
    my $p = HTML::TokeParser->new($fh) || die "Bad open: $! \n";
    my $heading3s = 0;

    while (my $token=$p->get_tag("<h3>")){
    $heading3s++;
    }


    print "Number of Level 3 Headings: $heading3s\.\n";

    __DATA__
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
    <html>
    <head>
    <title>Test Page</title>
    </head>
    <body>
    <h3>Alpha</h3>
    <h3>Beta</h3>
    <h3>Gamma</h3>
    </body>
    </html>
     
    -did-not-set--mail-host-address, Oct 24, 2006
    #1
    1. Advertising

  2. -did-not-set--mail-host-address

    Paul Lalli Guest

    Re: HTML::TokeParser; __DATA__ as a filehandle

    -did-not-set--mail-host-address--so-tickle-me wrote:
    > This is an embarrassingly simple question, but I'm trying to get
    > HTML::TokeParser to execute an example even simpler than the one given as an
    > example in the docs.
    >
    > I expect to get output of "3" from the program Instead, I get "0."
    > What is the trivial reason for this? Is there a problem with my use
    > of __DATA__, or my reference to main::DATA? Is the central loop not
    > working?


    no, yes, and yes

    > I hope that I have done enough of my homework on this to warrant a meaningful
    > response. If I have to go back to perlopen or perlreftut, please let me know.
    >
    > Here is my sample program:
    >
    > #!/usr/bin/perl -w
    >
    > use warnings;
    > use strict;
    > use diagnostics;
    >
    > use HTML::TokeParser;
    > my $fh = \<main::DATA>;


    This is attempting to read a line from the DATA filehandle, and then
    take a reference to it. Change to:
    my $fh = \*main::DATA;

    > my $p = HTML::TokeParser->new($fh) || die "Bad open: $! \n";
    > my $heading3s = 0;
    >
    > while (my $token=$p->get_tag("<h3>")){


    Read the docs for HTML::TokeParser. get_tag() takes the name of the
    tag only - not including the < and > delimiters. Change to:
    while (my $token = $p->get_tag('h3')){

    > $heading3s++;
    > }
    >
    >
    > print "Number of Level 3 Headings: $heading3s\.\n";


    Paul Lalli
     
    Paul Lalli, Oct 24, 2006
    #2
    1. Advertising

  3. Re: HTML::TokeParser; __DATA__ as a filehandle

    Paul Lalli wrote:
    > -did-not-set--mail-host-address--so-tickle-me wrote:
    > > This is an embarrassingly simple question, but I'm trying to get
    > > HTML::TokeParser to execute an example even simpler than the one given as an
    > > example in the docs.
    > >
    > > I expect to get output of "3" from the program Instead, I get "0."
    > > What is the trivial reason for this? Is there a problem with my use
    > > of __DATA__, or my reference to main::DATA? Is the central loop not
    > > working?

    >
    > no, yes, and yes
    >
    > > I hope that I have done enough of my homework on this to warrant a meaningful
    > > response. If I have to go back to perlopen or perlreftut, please let me know.
    > >
    > > Here is my sample program:
    > >
    > > #!/usr/bin/perl -w
    > >
    > > use warnings;
    > > use strict;
    > > use diagnostics;
    > >
    > > use HTML::TokeParser;
    > > my $fh = \<main::DATA>;

    >
    > This is attempting to read a line from the DATA filehandle, and then
    > take a reference to it. Change to:
    > my $fh = \*main::DATA;
    >
    > > my $p = HTML::TokeParser->new($fh) || die "Bad open: $! \n";
    > > my $heading3s = 0;
    > >
    > > while (my $token=$p->get_tag("<h3>")){

    >
    > Read the docs for HTML::TokeParser. get_tag() takes the name of the
    > tag only - not including the < and > delimiters. Change to:
    > while (my $token = $p->get_tag('h3')){
    >
    > > $heading3s++;
    > > }
    > >
    > >
    > > print "Number of Level 3 Headings: $heading3s\.\n";

    >
    > Paul Lalli


    Follow what Paul said. I know this is slightly off what you originally
    asked for, but if you want to get what's in between the tags, use this
    code. Remember to define $result:

    my $fh = \*main::DATA;

    my $tp = HTML::TokeParser->new(\$fh) or die "Can't open $!";

    while (my $tag = $tp->get_tag) {
    if($tag->[0] eq 'h3') {
    $result .= $tp->get_text("/h3")."\n";
    }
    }

    This code will get what's between <h3> and </h3> (for each set) and
    append a newline character to the data. This has worked for me in the
    past if you want what's between the tags also.
     
    Brian Wilkins, Oct 24, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patrick Joly
    Replies:
    0
    Views:
    99
    Patrick Joly
    Feb 25, 2004
  2. Maqo
    Replies:
    4
    Views:
    159
    A. Sinan Unur
    Feb 23, 2005
  3. jussi
    Replies:
    3
    Views:
    147
    Sherm Pendley
    Oct 7, 2005
  4. DVH

    HTML::TokeParser

    DVH, Oct 16, 2005, in forum: Perl Misc
    Replies:
    8
    Views:
    125
    A. Sinan Unur
    Oct 19, 2005
  5. Abram

    HTML::TokeParser & TableExtract

    Abram, Apr 25, 2006, in forum: Perl Misc
    Replies:
    16
    Views:
    230
    David Combs
    May 22, 2006
Loading...

Share This Page