HTTP::TokeParser for a web page?

P.R.Brady · Jun 28, 2004

TokeParser looks a really useful tool for parsing HTML but will it only
take input from a file? Is it possible to get it to munge a web page
directly or even a scalar holding the page content (eg previously
grabbed with get)?

This works:

use warnings;
use HTML::TokeParser;
$file='c:/Perl/html/index.html';
$p = HTML::TokeParser->new($file) ||
die "Can't open: $!";
while (my $token = $p->get_token) {
print ${$token}[0],"\n";
# etc
}

but not:
$file='file:///c:/Perl/html/index.html';
or
$file='http://www.bangor.ac.uk/';

I'm running version v5.6.1 under Windoze.

Regards
Phil

Paul Lalli · Jun 28, 2004

TokeParser looks a really useful tool for parsing HTML but will it only
take input from a file? Is it possible to get it to munge a web page
directly or even a scalar holding the page content (eg previously
grabbed with get)?

From the documentation (perldoc HTML::TokeParser):

$p = HTML::TokeParser->new( \$document );
If the argument is a reference to a plain scalar, then this scalar is
taken to be the literal document to parse. The value of this scalar
should not be changed before all tokens have been extracted.

So in a word, yes.

#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::TokeParser;

my $doc = get("http://www.yahoo.com");
my $parser = HTML::TokeParser->new(\$doc);

if ($parser->get_tag("title")) {
my $title = $parser->get_trimmed_text;
print "Title: $title\n";
}
__END__
Title: Yahoo!

Paul Lalli

Brian Gough · Jun 28, 2004

P.R.Brady said:
TokeParser looks a really useful tool for parsing HTML but will it only
take input from a file? Is it possible to get it to munge a web page
directly or even a scalar holding the page content (eg previously
grabbed with get)?

According to the documentation (perldoc HTML::TokeParser.pm) it
accepts either a filename, file handle, or string containing the
document (as a reference).

P.R.Brady · Jun 28, 2004

Paul said:
From the documentation (perldoc HTML::TokeParser):

$p = HTML::TokeParser->new( \$document );
If the argument is a reference to a plain scalar, then this scalar is
taken to be the literal document to parse. The value of this scalar
should not be changed before all tokens have been extracted.

So in a word, yes.

#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::TokeParser;

my $doc = get("http://www.yahoo.com");
my $parser = HTML::TokeParser->new(\$doc);

if ($parser->get_tag("title")) {
my $title = $parser->get_trimmed_text;
print "Title: $title\n";
}
__END__
Title: Yahoo!

Paul Lalli

Great! Thanks Paul.
Phil

Michele Dondi · Jun 29, 2004

TokeParser looks a really useful tool for parsing HTML but will it only
take input from a file? Is it possible to get it to munge a web page
directly or even a scalar holding the page content (eg previously

You've already been told that in fact this is possible, so what I'm
about to say is completely OT and possibly misleading in that you may
think of using this tecnique where it wouldn't be necessary. So you
stand warned! Anyway here it comes: if it *were* not possible, then
you can always open() an in-memory file as in:

#!/usr/bin/perl

use strict;
use warnings;

open my $fh, '<', \<<"EOT";
foo
bar
baz
EOT

print while <$fh>;

__END__

Michele

HTML::TokeParser; __DATA__ as a filehandle	2	Oct 24, 2006
Bash scripts for web apps	1	Jan 16, 2023
How do I follow links stored in an array?	3	Apr 29, 2008
Help with my responsive home page	2	Dec 14, 2022
searc and replace	1	Apr 22, 2005
HTML::TokeParser, problems 'getting' 'til end-tag	0	Feb 25, 2004
LWP::UserAgent and 404 page not found	4	Jun 22, 2005
search and replace help	2	Apr 22, 2005

HTTP::TokeParser for a web page?

P.R.Brady

Paul Lalli

Brian Gough

P.R.Brady

Michele Dondi

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads