urllib (and urllib2) read all data from page on open()?

Alex Stapleton · Mar 14, 2005

The entire page is downloaded immediately whether you want it to or not when
you do an http request using urllib. This seems slightly broken to me.

Is there anyway to turn this behaviour off and have the objects read method
actually read data from the socket when you ask it to?

Fuzzyman · Mar 14, 2005

Certianly under urllib2 - handle.read(100) will read the next 100 bytes
(up to) from the handle. Which is the same beahviour as the read method
for files.....

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml

Fuzzyman · Mar 14, 2005

Alex said:
Except wouldn't it of already read the entire file when it opened, or does
it occour on the first read()?

Don't know, sorry. Try looking at the source code - it should be
reasonably obvious.

Also will the data returned from
handle.read(100) be raw HTTP? In which case what if the encoding is chunked
or gzipped?

No - you get html - with the http stuff already handled (at least to
the best of my knowledge).

Regards,

Fuzzy
http://www.voidspace.org.uk/python/index.shtml

mysteries of urllib/urllib2	6	Jul 3, 2007
More on Urllib, and Urllib2	1	Jun 28, 2008
charset problems with urllib/urllib2	0	Feb 23, 2009
Iterate through a list and try log in to a website with urllib and re	8	Mar 3, 2014
Urllib2 urlopen and read - difference	3	Apr 15, 2010
urllib and urllib2, with proxies	0	Aug 8, 2006
urllib2 opendirector versus request object	0	Jun 9, 2011
HCaptcha - How to stop page from refreshing on submit if captcha is not checked/validated	1	Aug 29, 2023

urllib (and urllib2) read all data from page on open()?

Alex Stapleton

Fuzzyman

Fuzzyman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads