Obtaining SSL certificate info from SSL object - BUG?

J

John Nagle

The Python SSL object offers two methods from obtaining
the info from an SSL certificate, "server()" and "issuer()".
The actual values in the certificate are a series of name/value
pairs in ASN.1 binary format. But what "server()" and "issuer()"
return are strings, with the pairs separated by "/". The
documentation at "http://docs.python.org/lib/ssl-objects.html"
says "Returns a string containing the ASN.1 distinguished name identifying the
server's certificate. (See below for an example showing what distinguished names
look like.)" There is, however, no "below".

What you actually get back looks like this, which is Google's certificate:

"/C=US/ST=California/L=Mountain View/O=Google Inc/CN=www.google.com"

So, no problem; just split on "/", right?

Unfortunately, "/" is a legal character in certificate values.

Worse, this isn't just a theoretical problem. Verisign's issuer
information reads:

"/O=VeriSign Trust Network/OU=VeriSign, Inc./OU=VeriSign International
Server CA - Class 3/OU=www.verisign.com/CPS Incorp.by Ref. LIABILITY LTD.(c)97
VeriSign".

Note that

"OU=Terms of use at www.verisign.com/rpa (c)00"

with a "/" in the middle of the value field. So you hit this
problem on every cert issued by Verisign. Oops.

Nor does there seem to be a way to get at the certificate itself
from within Python. There was some discussion of this in 2002 at

http://groups.google.com/group/comp...t&q=socket+ssl+issuer&rnum=4#eec124c606f56c0b

when someone wrote: "Furthermore, while the server and issuer are exposed
through undocumented attributes, the server_cert is not. So there is no way to
validate the cert manually, short of rewriting socketmodule.c. This is one case
where the batteries included have been sitting on the shelf too long."

Clearly, "server()" and "issuer()" should return lists, not strings. That
would resolve the ambiguity. ASN.1 is a representation for lists, and
hammering those lists into strings loses information.

Is there a workaround for this? Without rebuilding Python
and becoming incompatible?

John Nagle
Animats
 
P

Paul Rubin

John Nagle said:
Is there a workaround for this? Without rebuilding Python
and becoming incompatible?

I've parsed certs by calling openssl in a subprocess. Maybe that's
not what you wanted to hear. If you're really industrious you might
be able to extend the tlslite cert parsing code (written in pure
Python) to get those fields out.
 
D

Donn Cave

John Nagle said:
The Python SSL object offers two methods from obtaining
the info from an SSL certificate, "server()" and "issuer()".
The actual values in the certificate are a series of name/value
pairs in ASN.1 binary format. But what "server()" and "issuer()"
return are strings, with the pairs separated by "/". The
documentation at "http://docs.python.org/lib/ssl-objects.html"
says "Returns a string containing the ASN.1 distinguished name identifying
the
server's certificate. ....
"/O=VeriSign Trust Network/OU=VeriSign, Inc./OU=VeriSign International
Server CA - Class 3/OU=www.verisign.com/CPS Incorp.by Ref. LIABILITY
LTD.(c)97
VeriSign".

Note that

"OU=Terms of use at www.verisign.com/rpa (c)00"

with a "/" in the middle of the value field. ....
Is there a workaround for this? Without rebuilding Python
and becoming incompatible?

As a practical matter, I think it's fairly safe to assume
there will be no values that include / in a context like
really looks like that X.500 style distinguished name.

So if you parse out that string in those terms, and require
each of those key = value pairs to have reasonable values -
key has no embedded spaces, value has non-zero length - then
you should be OK. Re-join any invalid component to its
predecessor's value.

Donn Cave, (e-mail address removed)
 
J

John Nagle

Donn said:
As a practical matter, I think it's fairly safe to assume
there will be no values that include / in a context like
really looks like that X.500 style distinguished name.

Actually, we've just discovered an exploit. By
ordering a low-level certificate with a "/" in the right
place, you can create the illusion (at least for flawed
implementations like this one) that the certificate
belongs to someone else. Just order a certificate from
GoDaddy, enter something like this in the "Name" field

"Myphonyname/C=US/ST=California/L=San Jose/O=eBay Inc./OU=Site
Operations/CN=signin.ebay.com"

and Python code will be spoofed into thinking you're eBay.

Fortunately, browsers don't use Python code.

The actual bug is in

python/trunk/Modules/_ssl.c

at

if ((self->server_cert = SSL_get_peer_certificate(self->ssl))) {
X509_NAME_oneline(X509_get_subject_name(self->server_cert),
self->server, X509_NAME_MAXLEN);
X509_NAME_oneline(X509_get_issuer_name(self->server_cert),
self->issuer, X509_NAME_MAXLEN);

The "X509_name_oneline" function takes an X509_NAME structure, which is
the certificate system's representation of a list, and flattens it
into a printable string. This is a debug function, not one for use in
production code. The SSL documentation for "X509_name_oneline" says:

"The functions X509_NAME_oneline() and X509_NAME_print() are legacy
functions which produce a non standard output form, they don't handle
multi character fields and have various quirks and inconsistencies.
Their use is strongly discouraged in new applications."

What OpenSSL callers are supposed to do is call X509_NAME_entry_count()
to get the number of entries in an X509_NAME structure, then get each
entry with X509_NAME_get_entry(). A few more calls will obtain
the name/value pair from the entry, as UTF8 strings, which should
be converted to Python UNICODE strings.

X509_NAME_oneline() doesn't handle Unicode; it converts non-ASCII
values to "\xnn" format. Again, it's for debug output only.

So what's needed are two new functions for Python's SSL sockets to
replace "issuer" and "server". The new functions should return
lists of Unicode strings representing the key/value pairs.
(A list is needed, not a dictionary; two strings with the same key
are both possible and common.)

The reason this now matters is that new "high assurance" certs,
the ones that tell you how much a site can be trusted, are now being
deployed, and to use them effectively, you need that info. Support for
them is in Internet Explorer 7, so they're going to be widespread soon.
Python needs to catch up.

I'll submit a bug report.

John Nagle
Animats
 
P

Paul Rubin

John Nagle said:
The reason this now matters is that new "high assurance" certs,
the ones that tell you how much a site can be trusted, are now being
deployed,

Oh my, I hadn't heard about this. They come up with new scams all the
time. I guess I'll check for info. It sounds sort of like the terror
alert system, which tells us how scared to be on any particular day ;-)
 
J

John Nagle

Paul said:
Oh my, I hadn't heard about this. They come up with new scams all the
time. I guess I'll check for info. It sounds sort of like the terror
alert system, which tells us how scared to be on any particular day ;-)

Anyway, I've submitted it as a Python bug report:

[1583946] SSL "issuer" and "server" functions problems - security

And for the record, here's a workaround: do a split with this
regular expression:

pparsecertstringre = re.compile(
r"""(?:/)(\w(?:\w|))(?:=)""")

You'll get lists of the form

['', key1, value1, key2, value2 ...]

This isn't totally unspoofable, and won't work for Unicode certs,
but it works for the few dozen common certs I've run through it.

John Nagle
Animats
 
H

Heikki Toivonen

John said:
The Python SSL object offers two methods from obtaining
the info from an SSL certificate, "server()" and "issuer()".
The actual values in the certificate are a series of name/value
pairs in ASN.1 binary format. But what "server()" and "issuer()"
return are strings, with the pairs separated by "/". The

Is it an option for you to use 3rd party libraries (please note that the
Python stdlib SSL library does not do certificate validation etc. which
you'd typically want in a production application)?

With M2Crypto you could do something like this:

from M2Crypto import SSL

ctx = SSL.Context()
conn = SSL.Connection(ctx)
conn.connect(('www.verisign.com', 443))
cert = conn.get_peer_cert()
print cert.get_issuer().as_text()
print cert.get_subject().as_text()
try:
print cert.get_ext('subjectAltName').get_value()
except LookupError:
print 'no subjectAltName'
try:
print cert.get_subject().CN
except AttributeError:
print 'no commonName'

Please note, however, that if you need the server name because you want
to validate that you connected to the server you intended to, it would
be better to let M2Crypto do it for you or use the M2Crypto SSL.Checker
class explicitly yourself.

Other Python crypto libraries probably have equivalent APIs.
 
?

=?ISO-8859-1?Q?Michael_Str=F6der?=

John said:
The Python SSL object offers two methods from obtaining
the info from an SSL certificate, "server()" and "issuer()".
The actual values in the certificate are a series of name/value
pairs in ASN.1 binary format. But what "server()" and "issuer()"
return are strings, with the pairs separated by "/". The
documentation at "http://docs.python.org/lib/ssl-objects.html"
says "Returns a string containing the ASN.1 distinguished name
identifying the server's certificate. (See below for an example showing
what distinguished names look like.)" There is, however, no "below".

What you actually get back looks like this, which is Google's certificate:

"/C=US/ST=California/L=Mountain View/O=Google Inc/CN=www.google.com"

So, no problem; just split on "/", right?

Unfortunately, "/" is a legal character in certificate values.

You hit a really serious problem: There's no completely well-defined
string representation format for distinguished names used in X.509
certificates. The format above is what OpenSSL used in the beginning.
Yuck! IMO this is also a security problem in some cases.

The best thing would be to stick to RFC 4514 (formerly RFC 2253:
Lightweight Directory Access Protocol (LDAP): String Representation of
Distinguished Names). It defines a UTF-8-based string representation.

Play around with OpenSSL's command-line option 'nameopt':
openssl x509 -inform der -in VSIGN1.CER -subject -issuer -noout
subject= /C=US/O=VeriSign, Inc./OU=Class 1 Public Primary Certification
Authority
issuer= /C=US/O=VeriSign, Inc./OU=Class 1 Public Primary Certification
Authority
openssl x509 -inform der -in VSIGN1.CER -subject -issuer -noout
-nameopt rfc2253
subject= OU=Class 1 Public Primary Certification Authority,O=VeriSign\,
Inc.,C=US
issuer= OU=Class 1 Public Primary Certification Authority,O=VeriSign\,
Inc.,C=US

Guess the second is what Python SSL object also should return. No idea
whether this is available at OpenSSL's API level.

Ciao, Michael.
 
?

=?ISO-8859-1?Q?Michael_Str=F6der?=

Donn said:
As a practical matter, I think it's fairly safe to assume
there will be no values that include / in a context like
really looks like that X.500 style distinguished name.

So if you parse out that string in those terms, and require
each of those key = value pairs to have reasonable values -
key has no embedded spaces, value has non-zero length - then
you should be OK. Re-join any invalid component to its
predecessor's value.

Don't make such assumptions when parsing DNs!
It's a major PITA in the long run.

Ciao, Michael.
 
J

John Nagle

Michael said:
You hit a really serious problem: There's no completely well-defined
string representation format for distinguished names used in X.509
certificates. The format above is what OpenSSL used in the beginning.
Yuck! IMO this is also a security problem in some cases.

The best thing would be to stick to RFC 4514 (formerly RFC 2253:
Lightweight Directory Access Protocol (LDAP): String Representation of
Distinguished Names). It defines a UTF-8-based string representation. ....
Guess the second is what Python SSL object also should return. No idea
whether this is available at OpenSSL's API level.
That's exactly what I suggested in my Python bug report update.

OpenSSL has all the right functions. Almost.

OpenSSL has "X509_NAME_oneline()" which is deprecated, which Python
is using, and which uses "/" as a delimiter without escaping "/" in
content.

OpenSSL also has "X509_NAME_print_ex", which does the right
thing - outputs a UTF8 string in RFC 2253 format, with all the
right escapes and Unicode compatibility if you ask for Unicode
output.

Unfortunately, "X509_NAME_print_ex" is set up to output to
an I/O port, not a string. There's no comparable function in
OpenSSL to edit that info to a string.

All the right machinery to do the job is in

openssl/crypto/asn1/a_strex.c

but they ran into a classic C problem. They have code designed
to output to a stream of infinite length, and don't have a way
to get the target length down to the copy function. Take look at
"send_mem_chars" in that file, which is turned off. If it were
used, it would have buffer overflow potential. This could be
fixed, but it's a pain. It's local to that file, though;
someone who owns that code could fix it in an hour.

X509_NAME_oneline(), the deprecated function, is in a
completely separate file and doesn't handle the hard cases at all.

The same problem was reported in Apache mod_ssl back in 2004. See

http://mail-archives.apache.org/mod...x/<[email protected]>

And it had to be fixed in OpenCA. See

http://www.mail-archive.com/[email protected]/msg02672.html

Also, there may be an exploitable bug in MySQL that depends on this. See

http://bugs.mysql.com/bug.php?id=17208

Get the OpenSSL people to fix their API, and the Python fix will
be a one-line change.


John Nagle
 
J

John Nagle

Since I really need this, I'm looking at modifying the Python SSL
interface to SSL objects by adding a function "certificate()" which
returns an X.509 certificate in the following format:

SSL certificates are trees, represented in a format, "ASN.1", which
allows storing numbers, strings, and flags.
Fields are identified by names or by assigned "OID numbers"
(see RFC 2459).

The tree is returned as tuples. The first element of the tuple
is always a string giving the name of the field, and the second
element is a string, Boolean, or number giving the value, or
a list of more tuples. The result is a tree, which will
resemble the tree typically displayed by browsers displaying
SSL certificates.

The top tuple's field name is the domain for which the certificate
applies.

Note that it is straightforward to implement "issuer" and "subject"
using "certificate", which provides a way out of the current problems
with those fields.

Example:

( 'www.google.com',
( 'Certificate',
[ ('Version', 3),
( 'Serial Number',
'4B:A5:AE:59:DE:DD:1C:C7:80:7C:89:22:91:F0:E2:43'),
( 'Certificate Signature Algorithm',
'PKCS #1 MD5 With RSA Encryption'),
( 'Issuer',
[ ('CN', 'Thawte SGC CA'),
('O', 'Thawte Consulting (Pty) Ltd.'),
('C', 'ZA')]),
( 'Validity',
[ ('Not Before', '5/15/2006 23:18:11 PM GMT'),
('Not After', '5/15/2007 23:18:11 PM GMT')]),
( 'Subject',
[ ('CN', 'www.google.com'),
('O', 'Google Inc'),
('L', 'Mountain View'),
('ST', 'California'),
('C', 'US')]),
( 'Subject Public Key Info',
[ ( 'Subjects Public Key Algorithm',
'PKCS #1 RSA Encryption'),
( 'Subjects Public Key',
'30 81 89 02 81 81 00 e6 c5 c6 8d cd 0b a3 03
04dc ae cc c9 46 be bd cc 9d bc 73 34 48 fe d3 7564 d0 c9 c9 7
6 27 72 0f a9 96 1a 3b 81 f3 14 f6ae 90 56 e7 19 d2 73
68 a7 85 a4 ae ca 24 14 3000 ba e8 36 5d 81 73 3a 71 05 8f b1 af 11 87 da5c f
1 3e bf 53 51 84 6f 44 0e b7 e8 26 d7 2f b26f f2 f2 5d df a7 cf 8c a5 e9 1e 6f
30 48 94 210b 01 ad ba 0e 71 01 0d 10 ef bf ee 2c d3
8d fe54 a8 fe d3 97 8f cb 02 03 01 00 01')]),
( 'Certificate Signature Algorithm',
'PKCS #1 MD5 With RSA Encryption'),
( 'Certificate Signature Value',
'57 4b bc a4 43 e7 e0 01 92 a0 96 35 f9 18 08 881d 7b 70 19 8f
f9 36 b2 05 3a 05 ca 14 59 4d 240e e5 8a af 4e 87 5a
f7 1c 2a 96 8f cb 61 40 9ed2 b4 38 40 21 24 c1 4f 1f cb 13 4a 8f 95 02 df91 3d
d6 40 eb 11 6f 9b 10 a1 6f ce 91 5e 30 f66d 13 5e 15
a4 2e c2 18 9e 00 c3 d8 32 67 47 fcb8 1e 9a d9 9a 8e cc ff 7c 12 b7 03 bf 52 20
cf21 f4 f3 77 dd 12 15 f0 94 fa 90 d5 e3 59 68 81')]
))

Comments?

John Nagle
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

John said:
SSL certificates are trees, represented in a format, "ASN.1", which
allows storing numbers, strings, and flags.
Fields are identified by names or by assigned "OID numbers"
(see RFC 2459).

The tree is returned as tuples. The first element of the tuple
is always a string giving the name of the field, and the second
element is a string, Boolean, or number giving the value, or
a list of more tuples. The result is a tree, which will
resemble the tree typically displayed by browsers displaying
SSL certificates.

That looks like a bad choice of interface to me. If you want to expose
the entire certificate, you should do that using as a single byte
string, encoded in DER. The way you are representing it, you are losing
information (e.g. whether the string type was IA5String,
PrintableString, UTF8String), and I thought your complaint was that
the current interfaces lose information, so you should not add an
interface that makes the same mistake it tries to overcome.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top