How do I follow links stored in an array?

B

BirgitteRand

I don't know how to follow links in an array (@links) at the bottom of
this script. Can anyone help me?

/Birgitte



#!/usr/bin/perl

use strict;
use WWW::Mechanize;
use LWP::Simple;
use HTML::TokeParser;
use XML::RSS;


# Create the RSS object.
my $rss = XML::RSS->new( version => '2.0' );

# Prep the RSS.
$rss->channel(

title => "JP",

link => "http://jp.dk/seneste",

description => "JP");


my $starting_url = 'http://jp.dk/seneste/';
my $output_dir = "c:/temp/jp";

# Create a new instance of WWW::Mechanize
my $mechanize = WWW::Mechanize->new();

# Retrieve the page
$mechanize->get($starting_url);


my $html = $mechanize->content;


my $p = HTML::TokeParser->new( \$html );


#jump through tags until you get 'h1'
while( my $title = $p->get_tag( 'h1' )) {
last if $title->[1]->{class} eq 'h1';
}


# look through the tokens until you hit the end of 'h1'
my @links;
while ( my $token = $p->get_token ) {
last if $token->[0] eq 'E' && $token->[1] eq 'h1'; #i.e., a
div end tag
if ( $token->[0] eq 'S' && $token->[1] eq 'a' ) {
push @links, $token->[2]->{href} if $token->[2]->{href} =~ /\/udland
\/.*?article.*/;
}

}


# now follow the links
for my $link ( @links ) {

$mechanize->follow( $link );


my $html = $mechanize->content;
my $p = HTML::TokeParser->new( \$html );


while( my $article = $p->get_token( 'h1' )) {
if ( $article->[0] eq 'S' and $article->[1] eq 'h1' ) {
my $title = $p->get_trimmed_text( '/h1' );
$article = $p->get_tag('p');
$article = $p->get_tag('p');
my $date = $p->get_trimmed_text('/p');

print "$date\n$title\n\n";
}
}



}
 
G

Gunnar Hjalmarsson

I don't know how to follow links in an array (@links) at the bottom of
this script.

First you'd better make sure that there are some links in @links to follow.
#jump through tags until you get 'h1'
while( my $title = $p->get_tag( 'h1' )) {
last if $title->[1]->{class} eq 'h1';
}

Since there are no <h1> elements in the document, that code jumps to the
end of string.

You can simply do:

$p->get_tag('/h2');

to get to the section of the document you are interested in. No loop needed.
# look through the tokens until you hit the end of 'h1'
my @links;
while ( my $token = $p->get_token ) {
last if $token->[0] eq 'E' && $token->[1] eq 'h1';
-----------------------------------------------------^^^^
Suppose you mean 'div' ...
# now follow the links

Yes, but first make sure that @links contains what you expect.

print "$_\n" for @links;

If it does, you can start working with the last section of your script.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top