urllib, urlretrieve method, how to get headers?

  • Thread starter Даниил Рыжков
  • Start date
Ð

Даниил Рыжков

Hello, everyone!

How can I get headers with urlretrieve? I want to send request and get
headers with necessary information before I execute urlretrieve(). Or
are there any alternatives for urlretrieve()?
 
P

Peter Otten

Даниил Рыжков said:
How can I get headers with urlretrieve? I want to send request and get
headers with necessary information before I execute urlretrieve(). Or
are there any alternatives for urlretrieve()?

It's easy to do it manually:

Connect to website and inspect headers:
f = urllib2.urlopen("http://www.python.org")
f.headers["Content-Type"]
'text/html'

Write page content to file:
.... dest.writelines(f)
....

Did we get what we expected?
with open("tmp.html") as f: print f.read().split("title")[1]
....
Python Programming Language &ndash; Official Website</
 
Ð

Даниил Рыжков

Hello again!
Another question: urlopen() reads full file's content, but how can I
get page by small parts?

Regards,
Daniil
 
K

Kushal Kumaran

Hello again!
Another question: urlopen() reads full file's content, but how can I
get page by small parts?

Set the Range header for HTTP requests. The format is specified here:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35. Note
that web servers are not *required* to support this header.

In [10]: req = urllib2.Request('http://cdimage.debian.org/debian-cd/6.0.2..1/amd64/iso-cd/debian-6.0.2.1-amd64-CD-1.iso',
headers = { 'Range' : 'bytes=0-499' })

In [11]: f = urllib2.urlopen(req)

In [12]: data = f.read()

In [13]: len(data)
Out[13]: 500

In [14]: print f.headers
Date: Fri, 01 Jul 2011 16:59:39 GMT
Server: Apache/2.2.14 (Unix)
Last-Modified: Sun, 26 Jun 2011 16:54:45 GMT
ETag: "ebff2f-28700000-4a6a04ab27f10"
Accept-Ranges: bytes
Content-Length: 500
Age: 225
Content-Range: bytes 0-499/678428672
Connection: close
Content-Type: application/octet-stream
 
C

Chris Rebert

Hello again!
Another question: urlopen() reads full file's content, but how can I
get page by small parts?

I don't think that's true. Just pass .read() the number of bytes you
want to read, just as you would with an actual file object.

Cheers,
Chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top