socket timeout / m2crypto.urllib problems

J

John Hunter

I have a test script below which I use to fetch urls into strings,
either over https or http. When over https, I use m2crypto.urllib and
when over http I use the standard urllib. Whenever, I import sockets
and setdefaulttimeout, however, using m2crypto.urllib tends to cause a
http.BadStatusLine to be raised, even if the timeout is set to be very
large. All of the documents in the test script can be accessed
publicly.

Any ideas? Is there a better/easier way to get https docs in python?

Thanks,
JDH

import urllib, socket
from cStringIO import StringIO
from M2Crypto import Rand, SSL, m2urllib

#comment out this line and the script generally works, but without it
#my zope process, which is using this code, hangs.
socket.setdefaulttimeout(200)


def url_to_string(source):
"""
get url as string, for https and http
"""
if source.startswith('https:'):
sh = StringIO()
url = m2urllib.FancyURLopener()
url.addheader('Connection', 'close')
u = url.open(source)

while 1:
data = u.read()
if not data: break
sh.write(data)
return sh.getvalue()
else:
return urllib.urlopen(source).read()

if __name__=='__main__':


s1 = url_to_string('https://crcdocs.bsd.uchicago.edu/crcdocs/Files/informatics.doc')

s2 = url_to_string('http://yahoo.com')

s3 = url_to_string('https://crcdocs.bsd.uchicago.edu/crcdocs/Files/facepage.doc')
print len(s1), len(s2), len(s3)
 
J

John J. Lee

John Hunter said:
Any ideas? Is there a better/easier way to get https docs in python?
[...]

Python 2.3 has https support built-in even on Windows.


John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,266
Latest member
DavidaAlla

Latest Threads

Top