Looking for modules to help downlaod web-pages...

Discussion in 'Perl Misc' started by Koppe, Jul 20, 2007.

  1. Koppe

    Koppe Guest

    I'm afraid I'm a bit of a newbee when it comes to Perl,
    though I have some experience with other languages
    (mostly C++).

    I would like to make a script to automate the downloading
    some pages on the Web, and thought Perl should be
    suitable for this. However, I'll undoubtfully need some
    modules, and I have no idea of which ones... So I would
    appriciate suggestions to what modules I may need and
    should take a closer look at.

    I'm planning on making something similar to 'wget', but
    specialized to the type of pages I want; so it will mostly
    be a matter of downloading web-pages, saving them,
    and parsing them for links to other web-pages to download.
    I may also need to save other page contents (e.g. images),
    and maybe event content refered to by CSS (e.g. background
    images). Many of the pages I'm after are PHP-pages (but
    AFAIK that is handled on the server-side, isn't it).

    Some of the pages require log-in, so an ability for the script
    to recognize a password-form, fill-in user-name and
    password and post it -- as well as accepting cookies -- are
    needed too. Pages containing just a confirmation-button
    for proceding, may also need to be "pushed" by the script.
    There may also be need to fill-in and send forms with things
    like date-of-birth -- maybe also in the form of drop-down lists.
    Many of these are redirects; e.g. I want a page with text, but
    unless I've previously logged-in, specified dob or confirmed,
    I'm redirected to forms. After I've filled in the form, I procede
    to the page I wanted. However -- at least in my browser -- these
    pages (the one I want and the one I need to fill stuff in on) seem
    to have the same URL and be "identical" from the browsers pov.

    Some limited emulation of JavaScript would also be great. E.g.
    the ability to "fake" a pop-up dialog-box and "press" "OK" or
    "Yes"; for posting some forms; and for redirecting.

    So any idea for modules I ought to look at for accomplising
    some or all of the above, would be very much appriciated.

    -Koppe
    Koppe, Jul 20, 2007
    #1
    1. Advertising

  2. Koppe

    Peter Wyzl Guest

    "Koppe" <> wrote in message
    news:...
    > I'm afraid I'm a bit of a newbee when it comes to Perl,
    > though I have some experience with other languages
    > (mostly C++).
    >
    > I would like to make a script to automate the downloading
    > some pages on the Web, and thought Perl should be
    > suitable for this. However, I'll undoubtfully need some
    > modules, and I have no idea of which ones... So I would
    > appriciate suggestions to what modules I may need and
    > should take a closer look at.
    >
    > I'm planning on making something similar to 'wget', but
    > specialized to the type of pages I want; so it will mostly
    > be a matter of downloading web-pages, saving them,
    > and parsing them for links to other web-pages to download.
    > I may also need to save other page contents (e.g. images),
    > and maybe event content refered to by CSS (e.g. background
    > images). Many of the pages I'm after are PHP-pages (but
    > AFAIK that is handled on the server-side, isn't it).
    >
    > Some of the pages require log-in, so an ability for the script
    > to recognize a password-form, fill-in user-name and
    > password and post it -- as well as accepting cookies -- are
    > needed too. Pages containing just a confirmation-button
    > for proceding, may also need to be "pushed" by the script.
    > There may also be need to fill-in and send forms with things
    > like date-of-birth -- maybe also in the form of drop-down lists.
    > Many of these are redirects; e.g. I want a page with text, but
    > unless I've previously logged-in, specified dob or confirmed,
    > I'm redirected to forms. After I've filled in the form, I procede
    > to the page I wanted. However -- at least in my browser -- these
    > pages (the one I want and the one I need to fill stuff in on) seem
    > to have the same URL and be "identical" from the browsers pov.
    >
    > Some limited emulation of JavaScript would also be great. E.g.
    > the ability to "fake" a pop-up dialog-box and "press" "OK" or
    > "Yes"; for posting some forms; and for redirecting.
    >
    > So any idea for modules I ought to look at for accomplising
    > some or all of the above, would be very much appriciated.


    Big job... start with LWP modules which are installed as part of Perl. That
    will in turn lead to to many others that will possibly be helpful, cookies
    etc.

    Also search CPAN http://www.cpan.org/ for various other things you need.

    P
    Peter Wyzl, Jul 21, 2007
    #2
    1. Advertising

  3. Koppe

    Sisyphus Guest

    "Koppe" <> wrote in message
    news:...
    ..
    ..
    > I would like to make a script to automate the downloading
    > some pages on the Web

    ..
    ..

    Sounds like you might be interested in WWW::Mechanize.

    Cheers,
    Rob
    Sisyphus, Jul 21, 2007
    #3
  4. Sisyphus coughed up some electrons that declared:

    >
    > "Koppe" <> wrote in message
    > news:...
    > .
    > .
    >> I would like to make a script to automate the downloading
    >> some pages on the Web

    > .
    > .
    >
    > Sounds like you might be interested in WWW::Mechanize.
    >
    > Cheers,
    > Rob


    Also LWP or Net::HTTP for more traditional approaches.

    Don't overlook driving wget as another way.

    Cheers
    Tim
    Tim Southerwood, Jul 21, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sharon
    Replies:
    1
    Views:
    421
  2. Nathan Sokalski
    Replies:
    5
    Views:
    977
    Gaurav Vaish \(www.Edujini-Labs.com\)
    Jan 10, 2007
  3. Nathan Sokalski
    Replies:
    4
    Views:
    260
    Nathan Sokalski
    Dec 21, 2006
  4. Nathan Sokalski
    Replies:
    4
    Views:
    298
    Nathan Sokalski
    Dec 21, 2006
  5. delu
    Replies:
    1
    Views:
    95
    Ray Costanzo [MVP]
    Oct 1, 2004
Loading...

Share This Page