Deficiency in urllib/socket for https?

G

Gary Feldman

I think I've found a deficiency in the design of urllib related to https.

In order to complete an https connection, it appears that URLOpener and
hence FancyURLOpener require the key and cert files. Or at least, it's not
clear from the description of socket.ssl what it does if they're omitted.

However, urlopen has no way to specify such things. Nor should it - for
typical uses, a person simply trying to retrieve data from an ssl site
really doesn't want to know or care about keys and certificate directories.
One just wants to provide an https url and have it work. Ideally, there
should be defaults for the certificate files.

This implies that somewhere in the function hierarchy, I suspect in
socket.ssl, there needs to be some clever defaults. I don't know if they
folks maintaining the Python distribution really want to be in the business
of maintaining key and certificate directories (probably not), but there at
least ought to be a way to specify default directories (oh, no, another
environment variable?). Thinking idealistically, it would be great if it
could share the default certs on the system (i.e. on UNIX, find a Netscape
or Mozilla install directory and use those, and on MS Windows, do whatever
it takes to use the Windows mechanism).

It's possible my analysis is flawed. I haven't taken the time to download
and read the _ssl code, just the socket.py code (and urllib and httplib) .
So corrections as appreciated as much as comments.

Gary
 
J

John J. Lee

Gary Feldman said:
I think I've found a deficiency in the design of urllib related to https.

In order to complete an https connection, it appears that URLOpener and
hence FancyURLOpener require the key and cert files. Or at least, it's not
clear from the description of socket.ssl what it does if they're omitted.

Nor from urllib -- see below. In fact, it seems that verification is
just skipped if they're not there.

However, urlopen has no way to specify such things. Nor should it - for
typical uses, a person simply trying to retrieve data from an ssl site
really doesn't want to know or care about keys and certificate directories.
One just wants to provide an https url and have it work. Ideally, there
should be defaults for the certificate files.

Hmm, looking at both urllib and urllib2, I see urllib2 doesn't use any
key or certificate files at all. So, two points: this is a deficiency
in urllib2 that should be fixed, and, if you're not bothered about key
verification, I'd guess just not providing key / cert files will work.

Hmm, urllib documentation seems wrong here:

Additional keyword parameters, collected in x509, are used for
authentication with the https: scheme. The keywords key_file and
cert_file are supported; both are needed to actually retrieve a
resource at an https: URL.

The fact that https works in urllib2 (which does not provide key /
cert files) seems to demonstrate that they're *not* required, and that
verification is skipped if they're not supplied.

If you *are* bothered about verification, use the x509 arg to
FancyURLOpener (which is documented, see above). The urlopen function
is just a convenience -- just cut-n-paste the trivial code from
urllib.py and adapt it to your needs if you need something more
complicated.

This implies that somewhere in the function hierarchy, I suspect in
socket.ssl, there needs to be some clever defaults. I don't know if they
folks maintaining the Python distribution really want to be in the business
of maintaining key and certificate directories (probably not), but there at
least ought to be a way to specify default directories (oh, no, another
environment variable?). Thinking idealistically, it would be great if it
could share the default certs on the system (i.e. on UNIX, find a Netscape
or Mozilla install directory and use those, and on MS Windows, do whatever
it takes to use the Windows mechanism).

That sounds great if you have the time to write the code. Nobody else
is likely to.


John
 
J

John J. Lee

Would you mind submitting a doc patch (both urllib and urllib2 docs
appear to need fixing -- urllib2 to say that it never verifies, urllib
to say that it skips verification if an appropriate x509 mapping isn't
supplied)?

Hmm, maybe I've got this wrong: the fact that key/cert args are passed
to httplib.HTTPS by urllib doesn't mean authentication happens, and
the fact that they're not passed by urllib2 doesn't mean
authentication doesn't happen. I'll investigate.


John
 
J

John J. Lee

You're right -- with the caveat that it is useful to have https even
without authentication (essentially all https traffic on the internet
proves that ;-).
[...]

I should have said "...it is useful to have *support* for https...".

The utility of https itself is another matter...


John
 
J

John J. Lee

Hmm, maybe I've got this wrong: the fact that key/cert args are passed
to httplib.HTTPS by urllib doesn't mean authentication happens, and
the fact that they're not passed by urllib2 doesn't mean
authentication doesn't happen. I'll investigate.

Bah! *After* reading the source, I found this in the ssl module docs:

| Warning: This does not do any certificate verification!

(which the _ssl.c source confirms: it uses SSL_VERIFY_NONE, but
doesn't call SSL_get_verify_result).

So the urllib docs are wrong:

| Additional keyword parameters, collected in x509, are used for
| authentication with the https: scheme. The keywords key_file and
| cert_file are supported; both are needed to actually retrieve a
| resource at an https: URL.

They're not needed, and they're never used for authentication (if you
don't count just checking the key without verifying it against the
certificate). Given this, the fact that urllib2 doesn't have
arguments for this starts to look like a feature, not a bug! Actually
(dredging up very hazy memories here) aren't you supposed to check a
revocation list, too? Is that given in a URL in the certificate? No
idea how this SSL stuff is supposed to work, really...

I'll upload a doc patch in a minute.

So, in summary, none of httplib, urllib and urllib2 in standard Python
do proper authentication (because the socket module doesn't). There
are third-party SSL libraries for Python: m2crypto is one. If you
need it, and assuming m2crypto has an ssl function with the same
interface that *does* do better auth, I suppose you could probably do

import socket
from m2crypto import ssl # or whatever
socket.ssl = ssl


And have urllib magically start working, with any luck.


John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top