How do I follow links stored in an array?

Discussion in 'Perl Misc' started by BirgitteRand@gmail.com, Apr 29, 2008.

  1. Guest

    I don't know how to follow links in an array (@links) at the bottom of
    this script. Can anyone help me?

    /Birgitte



    #!/usr/bin/perl

    use strict;
    use WWW::Mechanize;
    use LWP::Simple;
    use HTML::TokeParser;
    use XML::RSS;


    # Create the RSS object.
    my $rss = XML::RSS->new( version => '2.0' );

    # Prep the RSS.
    $rss->channel(

    title => "JP",

    link => "http://jp.dk/seneste",

    description => "JP");


    my $starting_url = 'http://jp.dk/seneste/';
    my $output_dir = "c:/temp/jp";

    # Create a new instance of WWW::Mechanize
    my $mechanize = WWW::Mechanize->new();

    # Retrieve the page
    $mechanize->get($starting_url);


    my $html = $mechanize->content;


    my $p = HTML::TokeParser->new( \$html );


    #jump through tags until you get 'h1'
    while( my $title = $p->get_tag( 'h1' )) {
    last if $title->[1]->{class} eq 'h1';
    }


    # look through the tokens until you hit the end of 'h1'
    my @links;
    while ( my $token = $p->get_token ) {
    last if $token->[0] eq 'E' && $token->[1] eq 'h1'; #i.e., a
    div end tag
    if ( $token->[0] eq 'S' && $token->[1] eq 'a' ) {
    push @links, $token->[2]->{href} if $token->[2]->{href} =~ /\/udland
    \/.*?article.*/;
    }

    }


    # now follow the links
    for my $link ( @links ) {

    $mechanize->follow( $link );


    my $html = $mechanize->content;
    my $p = HTML::TokeParser->new( \$html );


    while( my $article = $p->get_token( 'h1' )) {
    if ( $article->[0] eq 'S' and $article->[1] eq 'h1' ) {
    my $title = $p->get_trimmed_text( '/h1' );
    $article = $p->get_tag('p');
    $article = $p->get_tag('p');
    my $date = $p->get_trimmed_text('/p');

    print "$date\n$title\n\n";
    }
    }



    }
     
    , Apr 29, 2008
    #1
    1. Advertising

  2. wrote:
    > I don't know how to follow links in an array (@links) at the bottom of
    > this script.


    First you'd better make sure that there are some links in @links to follow.

    > #jump through tags until you get 'h1'
    > while( my $title = $p->get_tag( 'h1' )) {
    > last if $title->[1]->{class} eq 'h1';
    > }


    Since there are no <h1> elements in the document, that code jumps to the
    end of string.

    You can simply do:

    $p->get_tag('/h2');

    to get to the section of the document you are interested in. No loop needed.

    > # look through the tokens until you hit the end of 'h1'
    > my @links;
    > while ( my $token = $p->get_token ) {
    > last if $token->[0] eq 'E' && $token->[1] eq 'h1';

    -----------------------------------------------------^^^^
    Suppose you mean 'div' ...

    > # now follow the links


    Yes, but first make sure that @links contains what you expect.

    print "$_\n" for @links;

    If it does, you can start working with the last section of your script.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Apr 29, 2008
    #2
    1. Advertising

  3. John Bokma Guest

    wrote:

    > I don't know how to follow links in an array (@links) at the bottom of
    > this script. Can anyone help me?


    You might want to have a peek at:
    <http://johnbokma.com/perl/rss-web-feed-builder.html>

    --
    John

    http://johnbokma.com/perl/
     
    John Bokma, Apr 29, 2008
    #3
  4. On Apr 29, 2:02 pm, John Bokma <> wrote:
    > wrote:
    > > I don't know how to follow links in an array (@links) at the bottom of
    > > this script. Can anyone help me?

    >
    > You might want to have a peek at:
    > <http://johnbokma.com/perl/rss-web-feed-builder.html>


    that's a nifty little script you got there. i esp. like the coding
    style; very clean and clear :).
     
    nolo contendere, Apr 29, 2008
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?ISO-8859-1?Q?=A0?=
    Replies:
    0
    Views:
    427
    =?ISO-8859-1?Q?=A0?=
    Jan 10, 2004
  2. www.westerncartooncards.ca

    Exchange Links < Western Cartoon Cards > Exchange Links

    www.westerncartooncards.ca, Jul 12, 2004, in forum: HTML
    Replies:
    2
    Views:
    1,286
    Toby Inkster
    Jul 12, 2004
  3. John Joyce
    Replies:
    2
    Views:
    152
    John Joyce
    Apr 27, 2007
  4. Danny
    Replies:
    3
    Views:
    154
    Danny
    Apr 12, 2004
  5. Garrett Smith
    Replies:
    14
    Views:
    338
    David Mark
    May 26, 2009
Loading...

Share This Page