HTTP::TokeParser for a web page?

Discussion in 'Perl Misc' started by P.R.Brady, Jun 28, 2004.

  1. P.R.Brady

    P.R.Brady Guest

    TokeParser looks a really useful tool for parsing HTML but will it only
    take input from a file? Is it possible to get it to munge a web page
    directly or even a scalar holding the page content (eg previously
    grabbed with get)?

    This works:

    use warnings;
    use HTML::TokeParser;
    $file='c:/Perl/html/index.html';
    $p = HTML::TokeParser->new($file) ||
    die "Can't open: $!";
    while (my $token = $p->get_token) {
    print ${$token}[0],"\n";
    # etc
    }

    but not:
    $file='file:///c:/Perl/html/index.html';
    or
    $file='http://www.bangor.ac.uk/';

    I'm running version v5.6.1 under Windoze.

    Regards
    Phil
    P.R.Brady, Jun 28, 2004
    #1
    1. Advertising

  2. P.R.Brady

    Paul Lalli Guest

    On Mon, 28 Jun 2004, P.R.Brady wrote:

    > TokeParser looks a really useful tool for parsing HTML but will it only
    > take input from a file? Is it possible to get it to munge a web page
    > directly or even a scalar holding the page content (eg previously
    > grabbed with get)?


    From the documentation (perldoc HTML::TokeParser):


    $p = HTML::TokeParser->new( \$document );
    If the argument is a reference to a plain scalar, then this scalar is
    taken to be the literal document to parse. The value of this scalar
    should not be changed before all tokens have been extracted.


    So in a word, yes.

    #!/usr/bin/perl
    use strict;
    use warnings;
    use LWP::Simple;
    use HTML::TokeParser;

    my $doc = get("http://www.yahoo.com");
    my $parser = HTML::TokeParser->new(\$doc);

    if ($parser->get_tag("title")) {
    my $title = $parser->get_trimmed_text;
    print "Title: $title\n";
    }
    __END__
    Title: Yahoo!

    Paul Lalli
    Paul Lalli, Jun 28, 2004
    #2
    1. Advertising

  3. P.R.Brady

    Brian Gough Guest

    "P.R.Brady" <> writes:

    > TokeParser looks a really useful tool for parsing HTML but will it only
    > take input from a file? Is it possible to get it to munge a web page
    > directly or even a scalar holding the page content (eg previously
    > grabbed with get)?


    According to the documentation (perldoc HTML::TokeParser.pm) it
    accepts either a filename, file handle, or string containing the
    document (as a reference).

    --
    Brian Gough

    Network Theory Ltd,
    Publishing Free Software Manuals --- http://www.network-theory.co.uk/
    Brian Gough, Jun 28, 2004
    #3
  4. P.R.Brady

    P.R.Brady Guest

    Paul Lalli wrote:
    > On Mon, 28 Jun 2004, P.R.Brady wrote:
    >
    >
    >>TokeParser looks a really useful tool for parsing HTML but will it only
    >>take input from a file? Is it possible to get it to munge a web page
    >>directly or even a scalar holding the page content (eg previously
    >>grabbed with get)?

    >
    >
    > From the documentation (perldoc HTML::TokeParser):
    >
    >
    > $p = HTML::TokeParser->new( \$document );
    > If the argument is a reference to a plain scalar, then this scalar is
    > taken to be the literal document to parse. The value of this scalar
    > should not be changed before all tokens have been extracted.
    >
    >
    > So in a word, yes.
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    > use LWP::Simple;
    > use HTML::TokeParser;
    >
    > my $doc = get("http://www.yahoo.com");
    > my $parser = HTML::TokeParser->new(\$doc);
    >
    > if ($parser->get_tag("title")) {
    > my $title = $parser->get_trimmed_text;
    > print "Title: $title\n";
    > }
    > __END__
    > Title: Yahoo!
    >
    > Paul Lalli


    Great! Thanks Paul.
    Phil
    P.R.Brady, Jun 28, 2004
    #4
  5. On Mon, 28 Jun 2004 13:30:29 +0100, "P.R.Brady" <>
    wrote:

    >TokeParser looks a really useful tool for parsing HTML but will it only
    >take input from a file? Is it possible to get it to munge a web page
    >directly or even a scalar holding the page content (eg previously


    You've already been told that in fact this is possible, so what I'm
    about to say is completely OT and possibly misleading in that you may
    think of using this tecnique where it wouldn't be necessary. So you
    stand warned! Anyway here it comes: if it *were* not possible, then
    you can always open() an in-memory file as in:


    #!/usr/bin/perl

    use strict;
    use warnings;

    open my $fh, '<', \<<"EOT";
    foo
    bar
    baz
    EOT

    print while <$fh>;

    __END__


    Michele
    --
    you'll see that it shouldn't be so. AND, the writting as usuall is
    fantastic incompetent. To illustrate, i quote:
    - Xah Lee trolling on clpmisc,
    "perl bug File::Basename and Perl's nature"
    Michele Dondi, Jun 29, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patrick Joly
    Replies:
    0
    Views:
    88
    Patrick Joly
    Feb 25, 2004
  2. Maqo
    Replies:
    4
    Views:
    139
    A. Sinan Unur
    Feb 23, 2005
  3. jussi
    Replies:
    3
    Views:
    127
    Sherm Pendley
    Oct 7, 2005
  4. DVH

    HTML::TokeParser

    DVH, Oct 16, 2005, in forum: Perl Misc
    Replies:
    8
    Views:
    108
    A. Sinan Unur
    Oct 19, 2005
  5. Abram

    HTML::TokeParser & TableExtract

    Abram, Apr 25, 2006, in forum: Perl Misc
    Replies:
    16
    Views:
    217
    David Combs
    May 22, 2006
Loading...

Share This Page