urllib2, https and gzipped files



I'm trying to use urllib2 to download some gzipped files from an https
server, but I cannot correctly open the file. It happens to be an mbox
file -- a mailing list archive to be exact.

Upon calling open, the file starts to be unzipped. Content-Length is
read as the length of the first post in the archive and exactly that
amount of text is downloaded and that's it.

I can do this manually in a browser, but cannot do it any other way. I
couldn't find a solution searching on the web, but tested wget and
curl -- and both of them mess up in a similar way as my python code.
curl is exactly the same. It gets the first few thousand bytes as text
and stops. wget, tries a second time and downloads the remaining
number of bytes to match the actual compressed file size, but the
second part just looks like random bytes.

The same code works on other sites with the same archive; but the
difference is that they are http connections, not https.

Any ideas?



Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question