UserAgent's Max_Size and HTTP's header field Range

G

Great Deals

Here is my code:
#!/usr/bin/perl
use Net::HTTP;
use LWP::UserAgent;

$ua = LWP::UserAgent->new (agent => 'Mozilla/4.0', );
$ua->max_size(2000);
$url = 'http://news.google.com'; # for instance, no trailing /

$htmlcode = $ua->get($url, Range => 'bytes=1000-')->content;
print $htmlcode;

#################
First of all, if I put 2000 or 1000 or 750 in max_size, the result is
the same, I don't know why, but if I put 400 there, the downloaded is
much smaller.

Secondly, Range => 'bytes=1000-' does not seem to work. I only want to
fetch the middle part of the page, not from the beginning. How could I
do that?

Here is the header I sent via nettransport
2003-10-01 11:35:38.713 Connecting to news.google.com:80
2003-10-01 11:35:38.713 Connecting to 216.239.33.104:80
2003-10-01 11:35:38.963 Connected
2003-10-01 11:35:38.963 GET / HTTP/1.1
2003-10-01 11:35:38.963 Host: news.google.com
2003-10-01 11:35:38.963 Referer: http://news.google.com
2003-10-01 11:35:38.963 Accept: */*
2003-10-01 11:35:38.963 User-Agent: Mozilla/4.0
2003-10-01 11:35:38.963 Range: bytes=12485-
2003-10-01 11:35:38.963 Connection: close

Here is the header file which google gave me when I use net-transport:

2003-10-01 11:35:40.085 HTTP/1.1 200 OK
2003-10-01 11:35:40.085 Date: Wed, 01 Oct 2003 15:35:38 GMT
2003-10-01 11:35:40.085 Server: GWS/2.1
2003-10-01 11:35:40.085 Content-length: 67700
2003-10-01 11:35:40.085 Cache-control: no-cache, must-revalidate
2003-10-01 11:35:40.085 Expires: Fri, 01 Jan 1990 00:00:00 GMT
2003-10-01 11:35:40.085 Pragma: no-cache
2003-10-01 11:35:40.085 Last-Modified: Wed, 01 Oct 2003 15:30:28 GMT
2003-10-01 11:35:40.085 Content-Type: text/html
 
L

laura fairhead

Here is my code:
#!/usr/bin/perl
use Net::HTTP;
use LWP::UserAgent;

$ua = LWP::UserAgent->new (agent => 'Mozilla/4.0', );
$ua->max_size(2000);
$url = 'http://news.google.com'; # for instance, no trailing /

$htmlcode = $ua->get($url, Range => 'bytes=1000-')->content;
print $htmlcode;

#################
First of all, if I put 2000 or 1000 or 750 in max_size, the result is
the same, I don't know why, but if I put 400 there, the downloaded is
much smaller.

Secondly, Range => 'bytes=1000-' does not seem to work. I only want to
fetch the middle part of the page, not from the beginning. How could I
do that?

Your response from google should have a "Accept-Ranges:" header
so it just doesn;'t support ranges.

There is nothing you can do about this; if the server doesn't support
ranges it won't serve parts of files to you, in that case all you can
do is download the entire file each time and then splice out the part
that you actually wanted. That should be easy but there's no way around
having to waster the time downloading the first bit if the server
doesn't support ranges... It would be theoretically possible to cut
the download short if you didn;t want data at the end of a file by
simply aborting the connection but not from the start.

seeyafrom
l
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top