Looking for modules to help downlaod web-pages...

K

Koppe

I'm afraid I'm a bit of a newbee when it comes to Perl,
though I have some experience with other languages
(mostly C++).

I would like to make a script to automate the downloading
some pages on the Web, and thought Perl should be
suitable for this. However, I'll undoubtfully need some
modules, and I have no idea of which ones... So I would
appriciate suggestions to what modules I may need and
should take a closer look at.

I'm planning on making something similar to 'wget', but
specialized to the type of pages I want; so it will mostly
be a matter of downloading web-pages, saving them,
and parsing them for links to other web-pages to download.
I may also need to save other page contents (e.g. images),
and maybe event content refered to by CSS (e.g. background
images). Many of the pages I'm after are PHP-pages (but
AFAIK that is handled on the server-side, isn't it).

Some of the pages require log-in, so an ability for the script
to recognize a password-form, fill-in user-name and
password and post it -- as well as accepting cookies -- are
needed too. Pages containing just a confirmation-button
for proceding, may also need to be "pushed" by the script.
There may also be need to fill-in and send forms with things
like date-of-birth -- maybe also in the form of drop-down lists.
Many of these are redirects; e.g. I want a page with text, but
unless I've previously logged-in, specified dob or confirmed,
I'm redirected to forms. After I've filled in the form, I procede
to the page I wanted. However -- at least in my browser -- these
pages (the one I want and the one I need to fill stuff in on) seem
to have the same URL and be "identical" from the browsers pov.

Some limited emulation of JavaScript would also be great. E.g.
the ability to "fake" a pop-up dialog-box and "press" "OK" or
"Yes"; for posting some forms; and for redirecting.

So any idea for modules I ought to look at for accomplising
some or all of the above, would be very much appriciated.

-Koppe
 
P

Peter Wyzl

Koppe said:
I'm afraid I'm a bit of a newbee when it comes to Perl,
though I have some experience with other languages
(mostly C++).

I would like to make a script to automate the downloading
some pages on the Web, and thought Perl should be
suitable for this. However, I'll undoubtfully need some
modules, and I have no idea of which ones... So I would
appriciate suggestions to what modules I may need and
should take a closer look at.

I'm planning on making something similar to 'wget', but
specialized to the type of pages I want; so it will mostly
be a matter of downloading web-pages, saving them,
and parsing them for links to other web-pages to download.
I may also need to save other page contents (e.g. images),
and maybe event content refered to by CSS (e.g. background
images). Many of the pages I'm after are PHP-pages (but
AFAIK that is handled on the server-side, isn't it).

Some of the pages require log-in, so an ability for the script
to recognize a password-form, fill-in user-name and
password and post it -- as well as accepting cookies -- are
needed too. Pages containing just a confirmation-button
for proceding, may also need to be "pushed" by the script.
There may also be need to fill-in and send forms with things
like date-of-birth -- maybe also in the form of drop-down lists.
Many of these are redirects; e.g. I want a page with text, but
unless I've previously logged-in, specified dob or confirmed,
I'm redirected to forms. After I've filled in the form, I procede
to the page I wanted. However -- at least in my browser -- these
pages (the one I want and the one I need to fill stuff in on) seem
to have the same URL and be "identical" from the browsers pov.

Some limited emulation of JavaScript would also be great. E.g.
the ability to "fake" a pop-up dialog-box and "press" "OK" or
"Yes"; for posting some forms; and for redirecting.

So any idea for modules I ought to look at for accomplising
some or all of the above, would be very much appriciated.

Big job... start with LWP modules which are installed as part of Perl. That
will in turn lead to to many others that will possibly be helpful, cookies
etc.

Also search CPAN http://www.cpan.org/ for various other things you need.

P
 
S

Sisyphus

..
..
I would like to make a script to automate the downloading
some pages on the Web
..
..

Sounds like you might be interested in WWW::Mechanize.

Cheers,
Rob
 
T

Tim Southerwood

Sisyphus coughed up some electrons that declared:
.
.
.
.

Sounds like you might be interested in WWW::Mechanize.

Cheers,
Rob

Also LWP or Net::HTTP for more traditional approaches.

Don't overlook driving wget as another way.

Cheers
Tim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top