Request for a PERL script to download files

K

kentweb

I would like to use a PERL script to logon to a web site, navigate web
pages, select links for files to be downloaded, then receive the
files. The names of the files will be dynamic changing daily. The
reason I want to use PERL to do this is to automate the process so a
person doesn't have to do it. Does anyone have an example script that
does something like this?

Thank you.
 
H

Henry Law

I would like to use a PERL script to logon to a web site, navigate web
pages, select links for files to be downloaded, then receive the
files. The names of the files will be dynamic changing daily. The
reason I want to use PERL to do this is to automate the process so a
person doesn't have to do it. Does anyone have an example script that
does something like this?

First a piece of advice: read the posting guidelines for this group
before you post again. It will help you get better answers (and get a
more pleasant time, for reasons that the guidelines explain).

To do what you want will require use of various Perl (not PERL) modules,
which will probably include WWW::Mechanize; CPAN will tell you what it
can do and how to do it; you can also download it from there if you need to.

Use this URL to find examples of how to use it
http://www.google.com/search?hl=en&q=perl+mechanize+sample

Also possible is LWP::Simple which does more simple things (surprise!)
but is very easy to use and powerful enough for many purposes. Here's a
fragment to give you an idea (untested, not syntax checked)

use strict; use warnings;
use LWP::Simple;
my $url = "http://www.example.com";
my $html_source = get $url or die "Aaaaagh:$!";
# process the HTML text (there are Perl modules to help there too)
my $target_url = "http://www.example.com/some/address/on/target";
my $downloaded_file = get $target_url or die "Aiieee:$!";
open DOWN,'>',"/some/local/file" or die "Couldn't open:$!";
binmode DOWN;
print DOWN $downloaded_file;
close DOWN;

Go try something, write some code; start simple - leave out the logging
in stuff first, just see if you can download a known file. Post here if
you can't make it work. Then add other functions as you get more confident.
 
U

Uri Guttman

f> I've noticed that weekend volume is now down to less than 60 posts per
f> day. I think the Perl Usenet community has made its point by being so
f> inhospitable to newcomers over the years. And if you check out
f> Craigslist job listings for Perl, you'll see those are on the downslide
f> too.

ever look at the volume on the perl jobs list? it has steadily rose over
several years now. anyone who know will post there instead of craig's
list as it is subscribed to by perl hackers from all over the world. and
craig's list charges for job postings in some cities now (boston for
sure). so your source of data is not that useful.

the issue is more like usenet overall is slowing down. or most newbies
find it from google and think it is a google service. also there are
dozens of topic specific perl mailing lists (see lists.perl.org), the
perl beginner's list, perlmonks, local monger lists, other bulletin
boards competing for perl discussions. so next time you feel like
indirectly flaming this group, please have some more accurate facts
behind you. the perl community isn't dying by any measure, just its
usenet slice is smaller by bit.

uri
 
J

John Bokma

Uri Guttman said:
the issue is more like usenet overall is slowing down.

Are there any solid numbers on that? I've been hearing the death of Usenet
for many years. What I have seen in those years was that more groups got
added. And yes, that thins out things.
or most newbies
find it from google and think it is a google service. also there are
dozens of topic specific perl mailing lists (see lists.perl.org), the
perl beginner's list, perlmonks, local monger lists, other bulletin
boards competing for perl discussions. so next time you feel like
indirectly flaming this group, please have some more accurate facts
behind you. the perl community isn't dying by any measure, just its
usenet slice is smaller by bit.

No idea if that's really the case. I mean, even Purl Gurl is back :-D.
 
U

Uri Guttman

JB> Are there any solid numbers on that? I've been hearing the death
JB> of Usenet for many years. What I have seen in those years was that
JB> more groups got added. And yes, that thins out things.

i don't think it is dying but it definitely has a smaller mindshare. in
its heyday it WAS the net (other than email).


JB> No idea if that's really the case. I mean, even Purl Gurl is back :-D.

and just as delusional as before. we can only wait out moronzilla and it
will go away for another visit to the loony farm. we had like over a
year or two of peace this last time.

uri
 
Z

zentara

I would like to use a PERL script to logon to a web site, navigate web
pages, select links for files to be downloaded, then receive the
files. The names of the files will be dynamic changing daily. The
reason I want to use PERL to do this is to automate the process so a
person doesn't have to do it. Does anyone have an example script that
does something like this?

Thank you.

Here I'll give you are break. You have separate opeartions,
the logon, the page retreival, the link extraction, and the individual
retrieval.

Googling and groups.googling will give examples of all those
steps, but here are some basics.

This script will download a single url. But you can get multiple
urls, by getting the page, using something like

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use WWW::Mechanize;

my $a = WWW::Mechanize->new();
$a->get( '
' );

print $_->url,"\n" for @{ $a->links };
__END__

You can also just use LWP::UserAgent instead of mechanize
to get the page, then extract the links manually, with
HTML::LinkExtor

You can groups.google search for examples of LinkExtor.
As a matter of fact, if you google enough, you will find scripts
that already do what you want, probably using curl or wget.


#!/usr/bin/perl -w
use strict;
use LWP::UserAgent;

# don't buffer the prints to make the status update
$| = 1;

my $ua = LWP::UserAgent->new();
my $received_size = 0;
my $url = 'http://zentara.net/zentara1.avi';
print "Fetching $url\n";

my $filename = substr( $url, rindex( $url, "/" ) + 1 );
#print "$filename\n";
open( IN, ">$filename" ) or die $!;


my $request_time = time;
my $last_update = 0;
my $response = $ua->get($url,
':content_cb' => \&callback,
':read_size_hint' => 8192,
);
print "\n";

close IN;


sub callback {
my ($data, $response, $protocol) = @_;

my $total_size = $response->header('Content-Length') || 0;
$received_size += length $data;

print IN $data;
# write the $data to a filehandle or whatever should happen
# with it here.

my $time_now = time;

# this to make the status only update once per second.
return unless $time_now > $last_update or $received_size ==
$total_size;
$last_update = $time_now;

print "\rReceived $received_size bytes";
printf " (%i%%)", (100/$total_size)*$received_size if $total_size;
printf " %6.1f/bps", $received_size/(($time_now-$request_time)||1)
if $received_size;
}
__END__

zentara
 
T

Tad McClellan

fishfry said:
I've noticed that weekend volume is now down to less than 60 posts per
day.


60 questions without readily available answers is better than 60 questions
without readily available answers along with 140 Frequently Asked Questions.

Reduced quantity is a _good_ thing if the quality is increased.

I think the Perl Usenet community has made its point


That point being what? Checking the docs?

Perhaps _that_ is why there are less posts, because more people are
finding better answers in less time by checking the docs, and so
do not need to post here.

by being so
inhospitable to newcomers over the years.


We are not inhospitable to newcomers. We are inhospitable to people
who want us to read the docs to them.

How "there's more than one way to do it" morphed into "don't post here
till you've spent the day reading perldocs," is a sad story, in my
personal opinion.


Yes it is.

How "never locking your front door" morphed into "always locking your
front door" is a sad story IMO, but that's what happens when the
Interstate comes to your small town.

Things change when a tight-knit community becomes open to
the General Public.
 
H

Henry Law

Mike said:
I'd like your opinion on the post below. I posted this on 1/20 and
was surprised when I got no response. Used to be you would
get at least 2 responses to any reasonable question.
What is the best way to do simple link extraction
from an FTP site using native Perl. I have native perl
HTML extraction scripts, and I can find the
packaged Extract Link programs on the web, but
I'd like a simple script that will do the job for me.
I'm surprised I can't find this on my own.

Is Net::FTP the way to go?

I didn't answer it because I'm not good enough at this part of the Perl
universe. I do use Net::FTP and it works very well for what I use it
for (simple shipping around of files) but I think you probably know that
so I'd not be helping any by saying so.
 
P

Peter J. Holzer

I'd like your opinion on the post below. I posted this on 1/20 and
was surprised when I got no response. Used to be you would
get at least 2 responses to any reasonable question. I suspect
that the lack of a response is primarily due to the lower
attendance in this group. If you have another explanation, I'd like
to hear it.
_______________________________________

What is the best way to do simple link extraction
from an FTP site using native Perl. [...]
Is Net::FTP the way to go?

I didn't answer it because I didn't know what you meant with "simple
link extraction from an FTP site", because I'm not really familiar with
Net::FTP and because there were other postings to which I could
contribute more.

"Fishfry" may complain that there are now only 60 postings per day on
weekends but that is still way more than I can read, much less answer.

So I mostly just scan the group, often skipping entire threads and
answer only postings which for some reason catch my eye.

hp
 
P

Peter J. Holzer

Are there any solid numbers on that? I've been hearing the death of Usenet
for many years. What I have seen in those years was that more groups got
added. And yes, that thins out things.

I don't have numbers for the Big 8, but here they are for the de.*
hierarchy:

http://usenet.dex.de/de.ALL.html

As you can see the number of postings reached a maximum in 2001 and has
been declining ever since. The number of newsgroups has also been almost
constant since then.

hp
 
A

anno4000

Uri Guttman said:
JB> No idea if that's really the case. I mean, even Purl Gurl is back :-D.

and just as delusional as before.

Uri, be fair. She's much less disruptive than she used to be. The
little I've read of the advice she's giving wasn't entirely off the
mark either. I've seen worse.

Anno
 
U

Uri Guttman

a> Uri, be fair. She's much less disruptive than she used to be. The
a> little I've read of the advice she's giving wasn't entirely off the
a> mark either. I've seen worse.

but it isn't coherent nor contributing well yet. still the same isolated
view of how things should be. and it infected the beginner's list
too. sure it may be less offensive but there is still plenty of just
wrong answers and lack of listening to others who know more. or worse,
not acknowledging that anyone else knows more. i am just making sure it
doesn't sucker newbies into its den of bad (perl) thinking.

uri
 
A

anno4000

Mike Flannigan said:
Henry Law wrote:
[...]

I'd like your opinion on the post below. I posted this on 1/20 and
was surprised when I got no response. Used to be you would
get at least 2 responses to any reasonable question. I suspect
that the lack of a response is primarily due to the lower
attendance in this group. If you have another explanation, I'd like
to hear it. Maybe the commercial product belongs to somebody
on this group? :)

I don't remember the post specifically, but I've noted my
reactions now:
_______________________________________


What is the best way to do simple link extraction
from an FTP site using native Perl.

In an HTML context, "link extraction" would mean scanning
the pages for links. What is "link extraction" in an FTP
context? I seriously don't know. That reduces my inclination
to try a reply.
I have native perl
HTML extraction scripts, and I can find the
packaged Extract Link programs on the web, but
I'd like a simple script that will do the job for me.
I'm surprised I can't find this on my own.

That part does nothing to make he question more clear.
Is Net::FTP the way to go?

Probably. Why didn't you look at the documentation and
see if it's promising? I'm not going to do it for you,
especially not knowing what exactly you're looking for.

So, on to the next posting.

Anno
 
T

Tad McClellan

I skipped over the OP for the very same reason, I couldn't understand
the question.

But _now_ I think I've spotted where the disconnect is originating...

For instance, on this site:
ftp://ftp.FreeBSD.org/pub/FreeBSD/
^^^
^^^

That part of the URL is the "protocol" that is being used
(See http://en.wikipedia.org/wiki/Protocol_(computing)).

Note that we are NOT using http: there, so we are not expecting
a response in HTML.

Go to that URL in a browser, and do a "View Source".

See any "href" or "src" thingamabobs that might be interpreted
as a "link"?

Nope.

Mike has fallen prey to the jack-of-all-trades nature of
modern browsers. Hiding the differences between protocols
and "normalizing" them into a consistent user interface
is handy in most situtations, but it can lead to an
incomplete mental model of what is really happening.

I'd like to get the following links off of it:
ftp://ftp.FreeBSD.org/pub/FreeBSD/README.TXT
ftp://ftp.FreeBSD.org/pub/FreeBSD/TIMESTAMP.


There is no such thing as a "link" in the File Transfer Protocol.

So I'm simply trying to get the links to the files,


Since we are talking ftp here, that question makes no sense.

I figure you are asking either:

I want to find out where the files are (their path).

or

I want to _transfer_ the files.

I guess I can save the site as a text file


I guess you didn't actually try that, since it would have
revealed that there is no HTML here...

I did check that out, but didn't see an easy way to get the
job done. I think it's in there, but I couldn't figure it out.


That's probably because you are trying to follow steps from the
HyperText Transfer Protocol when you should be following the
File Transfer Protocol instead.

In ftp-speak, you need to connect to ftp site, login, change
to directory of interest, fetch file. Those are the 4 steps
shown at the top of the Net::FTP docs.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top