Deficiency in urllib/socket for https?

Discussion in 'Python' started by Gary Feldman, Aug 21, 2003.

  1. Gary Feldman

    Gary Feldman Guest

    I think I've found a deficiency in the design of urllib related to https.

    In order to complete an https connection, it appears that URLOpener and
    hence FancyURLOpener require the key and cert files. Or at least, it's not
    clear from the description of socket.ssl what it does if they're omitted.

    However, urlopen has no way to specify such things. Nor should it - for
    typical uses, a person simply trying to retrieve data from an ssl site
    really doesn't want to know or care about keys and certificate directories.
    One just wants to provide an https url and have it work. Ideally, there
    should be defaults for the certificate files.

    This implies that somewhere in the function hierarchy, I suspect in
    socket.ssl, there needs to be some clever defaults. I don't know if they
    folks maintaining the Python distribution really want to be in the business
    of maintaining key and certificate directories (probably not), but there at
    least ought to be a way to specify default directories (oh, no, another
    environment variable?). Thinking idealistically, it would be great if it
    could share the default certs on the system (i.e. on UNIX, find a Netscape
    or Mozilla install directory and use those, and on MS Windows, do whatever
    it takes to use the Windows mechanism).

    It's possible my analysis is flawed. I haven't taken the time to download
    and read the _ssl code, just the socket.py code (and urllib and httplib) .
    So corrections as appreciated as much as comments.

    Gary
     
    Gary Feldman, Aug 21, 2003
    #1
    1. Advertising

  2. Gary Feldman

    John J. Lee Guest

    Gary Feldman <> writes:

    > I think I've found a deficiency in the design of urllib related to https.
    >
    > In order to complete an https connection, it appears that URLOpener and
    > hence FancyURLOpener require the key and cert files. Or at least, it's not
    > clear from the description of socket.ssl what it does if they're omitted.


    Nor from urllib -- see below. In fact, it seems that verification is
    just skipped if they're not there.


    > However, urlopen has no way to specify such things. Nor should it - for
    > typical uses, a person simply trying to retrieve data from an ssl site
    > really doesn't want to know or care about keys and certificate directories.
    > One just wants to provide an https url and have it work. Ideally, there
    > should be defaults for the certificate files.


    Hmm, looking at both urllib and urllib2, I see urllib2 doesn't use any
    key or certificate files at all. So, two points: this is a deficiency
    in urllib2 that should be fixed, and, if you're not bothered about key
    verification, I'd guess just not providing key / cert files will work.

    Hmm, urllib documentation seems wrong here:

    Additional keyword parameters, collected in x509, are used for
    authentication with the https: scheme. The keywords key_file and
    cert_file are supported; both are needed to actually retrieve a
    resource at an https: URL.

    The fact that https works in urllib2 (which does not provide key /
    cert files) seems to demonstrate that they're *not* required, and that
    verification is skipped if they're not supplied.

    If you *are* bothered about verification, use the x509 arg to
    FancyURLOpener (which is documented, see above). The urlopen function
    is just a convenience -- just cut-n-paste the trivial code from
    urllib.py and adapt it to your needs if you need something more
    complicated.


    > This implies that somewhere in the function hierarchy, I suspect in
    > socket.ssl, there needs to be some clever defaults. I don't know if they
    > folks maintaining the Python distribution really want to be in the business
    > of maintaining key and certificate directories (probably not), but there at
    > least ought to be a way to specify default directories (oh, no, another
    > environment variable?). Thinking idealistically, it would be great if it
    > could share the default certs on the system (i.e. on UNIX, find a Netscape
    > or Mozilla install directory and use those, and on MS Windows, do whatever
    > it takes to use the Windows mechanism).


    That sounds great if you have the time to write the code. Nobody else
    is likely to.


    John
     
    John J. Lee, Aug 22, 2003
    #2
    1. Advertising

  3. Gary Feldman

    John J. Lee Guest

    (John J. Lee) writes:
    [...]
    > Would you mind submitting a doc patch (both urllib and urllib2 docs
    > appear to need fixing -- urllib2 to say that it never verifies, urllib
    > to say that it skips verification if an appropriate x509 mapping isn't
    > supplied)?


    Hmm, maybe I've got this wrong: the fact that key/cert args are passed
    to httplib.HTTPS by urllib doesn't mean authentication happens, and
    the fact that they're not passed by urllib2 doesn't mean
    authentication doesn't happen. I'll investigate.


    John
     
    John J. Lee, Aug 22, 2003
    #3
  4. Gary Feldman

    John J. Lee Guest

    (John J. Lee) writes:
    [...]
    > You're right -- with the caveat that it is useful to have https even
    > without authentication (essentially all https traffic on the internet
    > proves that ;-).

    [...]

    I should have said "...it is useful to have *support* for https...".

    The utility of https itself is another matter...


    John
     
    John J. Lee, Aug 22, 2003
    #4
  5. Gary Feldman

    John J. Lee Guest

    (John J. Lee) writes:

    > (John J. Lee) writes:
    > [...]
    > > Would you mind submitting a doc patch (both urllib and urllib2 docs
    > > appear to need fixing -- urllib2 to say that it never verifies, urllib
    > > to say that it skips verification if an appropriate x509 mapping isn't
    > > supplied)?

    >
    > Hmm, maybe I've got this wrong: the fact that key/cert args are passed
    > to httplib.HTTPS by urllib doesn't mean authentication happens, and
    > the fact that they're not passed by urllib2 doesn't mean
    > authentication doesn't happen. I'll investigate.


    Bah! *After* reading the source, I found this in the ssl module docs:

    | Warning: This does not do any certificate verification!

    (which the _ssl.c source confirms: it uses SSL_VERIFY_NONE, but
    doesn't call SSL_get_verify_result).

    So the urllib docs are wrong:

    | Additional keyword parameters, collected in x509, are used for
    | authentication with the https: scheme. The keywords key_file and
    | cert_file are supported; both are needed to actually retrieve a
    | resource at an https: URL.

    They're not needed, and they're never used for authentication (if you
    don't count just checking the key without verifying it against the
    certificate). Given this, the fact that urllib2 doesn't have
    arguments for this starts to look like a feature, not a bug! Actually
    (dredging up very hazy memories here) aren't you supposed to check a
    revocation list, too? Is that given in a URL in the certificate? No
    idea how this SSL stuff is supposed to work, really...

    I'll upload a doc patch in a minute.

    So, in summary, none of httplib, urllib and urllib2 in standard Python
    do proper authentication (because the socket module doesn't). There
    are third-party SSL libraries for Python: m2crypto is one. If you
    need it, and assuming m2crypto has an ssl function with the same
    interface that *does* do better auth, I suppose you could probably do

    import socket
    from m2crypto import ssl # or whatever
    socket.ssl = ssl


    And have urllib magically start working, with any luck.


    John
     
    John J. Lee, Aug 23, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. PenguinOfDoom

    distutils deficiency

    PenguinOfDoom, Jun 25, 2003, in forum: Python
    Replies:
    0
    Views:
    739
    PenguinOfDoom
    Jun 25, 2003
  2. KMyers1
    Replies:
    2
    Views:
    418
    KMyers1
    Jul 20, 2007
  3. KMyers1
    Replies:
    0
    Views:
    323
    KMyers1
    Jul 20, 2007
  4. CBFalconer

    Re: Deficiency of strtok

    CBFalconer, Aug 8, 2008, in forum: C Programming
    Replies:
    5
    Views:
    297
    Ben Bacarisse
    Aug 17, 2008
  5. robic0
    Replies:
    15
    Views:
    207
    robic0
    Dec 29, 2005
Loading...

Share This Page