How can I follow links in my website

Discussion in 'Perl Misc' started by Danny, Apr 12, 2004.

  1. Danny

    Danny Guest

    I would like to browse a page in one of my websites and get info to populate
    a database. But each page will have a NEXT and PREVIOUS link that takes you
    to another page.

    I need something to look at one page and save it to a file on the HD, then
    follow the NEXT link and go to the next page, and do the same thing, and so
    on.

    Can this be done?
    Danny, Apr 12, 2004
    #1
    1. Advertising

  2. Danny

    Eric Bohlman Guest

    "Danny" <> wrote in
    news:Dfyec.6859$:

    > I would like to browse a page in one of my websites and get info to
    > populate a database. But each page will have a NEXT and PREVIOUS link
    > that takes you to another page.
    >
    > I need something to look at one page and save it to a file on the HD,
    > then follow the NEXT link and go to the next page, and do the same
    > thing, and so on.
    >
    > Can this be done?


    Yep: LWP::Simple and HTML::LinkExtor together ought to do the trick.
    Eric Bohlman, Apr 12, 2004
    #2
    1. Advertising

  3. Danny

    John Bokma Guest

    Danny wrote:

    > I would like to browse a page in one of my websites and get info to populate
    > a database. But each page will have a NEXT and PREVIOUS link that takes you
    > to another page.
    >
    > I need something to look at one page and save it to a file on the HD, then
    > follow the NEXT link and go to the next page, and do the same thing, and so
    > on.
    >
    > Can this be done?


    Yes.

    check the lwpcookbook, and HTML::parser, for example. It's possible to
    not use the parser, but just a regexp if you know what you are doing :-D.

    --
    John personal page: http://johnbokma.com/

    Experienced Perl / Java developer available - http://castleamber.com/
    John Bokma, Apr 12, 2004
    #3
  4. Danny

    Danny Guest

    "John Bokma" <> wrote in message
    news:407abcfb$0$24349$...
    > Danny wrote:
    >
    > > I would like to browse a page in one of my websites and get info to

    populate
    > > a database. But each page will have a NEXT and PREVIOUS link that takes

    you
    > > to another page.
    > >
    > > I need something to look at one page and save it to a file on the HD,

    then
    > > follow the NEXT link and go to the next page, and do the same thing, and

    so
    > > on.
    > >
    > > Can this be done?

    >
    > Yes.
    >
    > check the lwpcookbook, and HTML::parser, for example. It's possible to
    > not use the parser, but just a regexp if you know what you are doing :-D.
    >
    > --
    > John personal page: http://johnbokma.com/
    >
    > Experienced Perl / Java developer available - http://castleamber.com/



    Thanks for your responses.
    I have a sample that works, in that it gets a webpage, prints the contents
    of the website to a text file and then prints all the links in the website.
    Now I just want to follow the links in that website that have "nextpage" in
    the link and so on (this means it goes to the next category page). and I
    want to save each page to a text file like page1.txt, page2.txt etc etc

    this script works but I am not sure where to put loops. I am still
    learning.

    HOw can I do this?
    I would appreciate your help.
    Thanks again
    Danny

    -------
    use CGI;

    $co = new CGI;
    use LWP::Simple;
    use HTML::LinkExtor;
    print $co->header;
    $html = get("http://www.website.com");
    $link_extor = HTML::LinkExtor->new(\&handle_links);
    $link_extor->parse($html);
    use LWP::UserAgent;
    $user_agent = new LWP::UserAgent;

    $request = new HTTP::Request('GET','http://www.website.com');
    $response = $user_agent->request($request);
    open FILEHANDLE, ">file.txt";
    print FILEHANDLE $response->{_content};
    close FILEHANDLE;

    sub handle_links
    {
    ($tag, %links) = @_;
    if ($tag eq 'a') {
    foreach $key (keys %links) {
    if ($key eq 'href') {
    # I assume I put a test here for the NEXT link and then this gets
    loades as above in REQUEST statement?
    print "This is a link: $links{$key}.\n";
    }
    }
    }
    }
    Danny, Apr 12, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?ISO-8859-1?Q?=A0?=
    Replies:
    0
    Views:
    400
    =?ISO-8859-1?Q?=A0?=
    Jan 10, 2004
  2. Replies:
    2
    Views:
    370
    rowe_newsgroups
    Jul 13, 2007
  3. John Joyce
    Replies:
    2
    Views:
    134
    John Joyce
    Apr 27, 2007
  4. Rob Gordon

    Can Javascript do this (follow up)

    Rob Gordon, Nov 22, 2003, in forum: Javascript
    Replies:
    7
    Views:
    91
    kaeli
    Nov 23, 2003
  5. Replies:
    3
    Views:
    150
    nolo contendere
    Apr 29, 2008
Loading...

Share This Page