urllib2, https and gzipped files

Barry · Sep 20, 2009

I'm trying to use urllib2 to download some gzipped files from an https
server, but I cannot correctly open the file. It happens to be an mbox
file -- a mailing list archive to be exact.

Upon calling open, the file starts to be unzipped. Content-Length is
read as the length of the first post in the archive and exactly that
amount of text is downloaded and that's it.

I can do this manually in a browser, but cannot do it any other way. I
couldn't find a solution searching on the web, but tested wget and
curl -- and both of them mess up in a similar way as my python code.
curl is exactly the same. It gets the first few thousand bytes as text
and stops. wget, tries a second time and downloads the remaining
number of bytes to match the actual compressed file size, but the
second part just looks like random bytes.

The same code works on other sites with the same archive; but the
difference is that they are http connections, not https.

Any ideas?

Barry

urllib2.urlopen+BadStatusLine+https	0	May 12, 2011
debugging https connections with urllib2?	2	Jun 18, 2011
urllib2 : https and proxy	3	Jun 28, 2007
urllib2, proxies and https	1	Aug 18, 2006
urllib2 opendirector versus request object	0	Jun 9, 2011
urllib2 and threading	6	May 1, 2009
Sending Error when attaching files	1	Aug 7, 2023
urllib2 - 403 that _should_ not occur.	11	Jan 12, 2009

urllib2, https and gzipped files

Barry

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads