Is this secure?

M

mk

Hello,

I need to generate passwords and I think that pseudo-random generator is
not good enough, frankly. So I wrote this function:

import struct

def gen_rand_string():
fileobj = open('/dev/urandom','rb')
rstr = fileobj.read(4)
rnum = struct.unpack('L',rstr)[0]
rstr = '%i' % rnum
rnuml = []
while len(rstr) >= 2:
c = rstr[:2]
try:
num = int(c)
rnuml.append(num)
except ValueError:
pass
rstr = rstr[2:]
rnuml = map(lambda x: 97+x/4, rnuml)
rnumc = map(chr, rnuml)
return ''.join(rnumc)

if __name__ == "__main__":
print gen_rand_string()

(yes I know that this way generated string will not contain 'z' because
99/4 + 97 = 121 which is 'y')

The question is: is this secure? That is, can the string generated this
way be considered truly random? (I abstract from not-quite-perfect
nature of /dev/urandom at the moment; I can always switch to /dev/random
which is better)


Regards,
mk
 
P

Paul Rubin

mk said:
I need to generate passwords and I think that pseudo-random generator
is not good enough, frankly. So I wrote this function:...
The question is: is this secure? That is, can the string generated
this way be considered truly random? (I abstract from
not-quite-perfect nature of /dev/urandom at the moment; I can always
switch to /dev/random which is better)

urandom is fine and the entropy loss from the numeric conversions and
eliminating 'z' in that code before you get letters out is not too bad.
The code is pretty ugly. The main problem is you end up with a password
that's usually 5 letters but sometimes just 4 or fewer. Passwords that
short are vulnerable to dictionary attacks. Longer passwords made from
random letters are difficult to remember.

I find it's most practical to use a few random words (chosen from a word
list like /usr/dict/words) rather than random letters. Words are easier
to remember and type.

You might look at the site www.diceware.com for an approach to this,
which you can implement with a program. The docs there are pretty
thoughtful and may help you understand the relevant issues.
 
M

mk

The code is pretty ugly.  The main problem is you end up with a password
that's usually 5 letters but sometimes just 4 or fewer.  

Well I didn't write the whole thing here, in actual use I'd write a
loop repeating the function until I have enough characters and then
I'd select a substring of specified length.

Anything else in the code that is ugly and I should correct?
You might look at the sitewww.diceware.comfor an approach to this,
which you can implement with a program.  The docs there are pretty
thoughtful and may help you understand the relevant issues.

Thanks. But I would also be grateful for indicating what is wrong/ugly
in my code.

Regards,
mk
 
R

Robert Kern

I find it's most practical to use a few random words (chosen from a word
list like /usr/dict/words) rather than random letters. Words are easier
to remember and type.

You might look at the site www.diceware.com for an approach to this,
which you can implement with a program. The docs there are pretty
thoughtful and may help you understand the relevant issues.

I like RFC 1751 for this:

http://gitweb.pycrypto.org/?p=crypt...8a212c22066adabfee521b495eeb4f9d7232b;hb=HEAD

Shortened URL:

http://tr.im/Pv9B

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Robert Kern

Well I didn't write the whole thing here, in actual use I'd write a
loop repeating the function until I have enough characters and then
I'd select a substring of specified length.

Anything else in the code that is ugly and I should correct?

I would recommend using random.SystemRandom.choice() on a sequence of acceptable
characters. E.g. (untested)

import random
import string


characters = string.letters + string.digits + '~!@#$%^&*()-+=,;./\?><|'
# ... or whatever.

def gen_rand_string(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice(characters))
return ''.join(chars)

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
L

Lawrence D'Oliveiro

I need to generate passwords and I think that pseudo-random generator is
not good enough, frankly. So I wrote this function:

Much simpler:

import subprocess

data, _ = subprocess.Popen \
(
args = ("pwgen", "-nc"),
stdout = subprocess.PIPE
).communicate()
print data
 
S

Steven D'Aprano

Hello,

I need to generate passwords and I think that pseudo-random generator is
not good enough, frankly. So I wrote this function: [snip]
(yes I know that this way generated string will not contain 'z' because
99/4 + 97 = 121 which is 'y')

You're worried about the security of the PRNG but then generate a TWO to
FIVE character lowercase password with no digits, punctuation or the
letter 'z'? That's priceless!

Python's PRNG is not suitable for producing cryptographically strong
streams of random bytes, but it is perfectly strong enough for generating
good passwords.


The question is: is this secure?

No.

You are wasting your time trying to fix something which isn't a problem,
and introducing a much bigger problem instead. You are MUCH MUCH MUCH
better off with a six or ten character password taken from upper and
lowercase letters, plus digits, plus punctuation, than a four digit
password taken from lowercase letters only. Even if the first case has
some subtle statistical deviation from uniformity, and the second is
"truly random" (whatever that means), it doesn't matter.

Nobody is going to crack your password because the password generator is
0.01% more likely to generate a "G" than a "q". But they *will* brute-
force your password if you have a four digit password taken from a-y only.


That is, can the string generated this
way be considered truly random?

Define truly random.
 
S

Steven D'Aprano

urandom is fine and the entropy loss from the numeric conversions and
eliminating 'z' in that code before you get letters out is not too bad.

What?

You're going from a possible alphabet of 62 (excluding punctuation) or 94
(inc punctuation available on an American keyboard) distinct letters down
to 25, and you say that's "not too bad"?

Paul, if you were anyone else, I'd be sneering uncontrollably about now,
but you're not clueless about cryptography, so what have I missed? Why is
reducing the number of distinct letters by more than 50% anything but a
disaster? This makes the task of brute-forcing the password exponentially
easier.

Add the fact that the passwords are so short (as little as two characters
in my tests) and this is about as far from secure as it is possible to be.
 
P

Paul Rubin

Steven D'Aprano said:
Paul, if you were anyone else, I'd be sneering uncontrollably about now,
but you're not clueless about cryptography, so what have I missed? Why is
reducing the number of distinct letters by more than 50% anything but a
disaster? This makes the task of brute-forcing the password exponentially
easier.

Reducing the number of distinct letters by 50% decreases the entropy per
character by 1 bit. That stuff about mixing letters and digits and
funny symbols just makes the password a worse nuisance to remember and
type, for a small gain in entropy that you can compute and make up for.
The main thing you have to make sure is that the min-entropy is
sufficient for your purposes, and it's generally more convenient to do
that by making the password a little bit longer than by imposing
contortions on the person typing it. Ross Anderson's "Security
Engineering" chapter about passwords is worth reading too:

http://www.cl.cam.ac.uk/~rja14/Papers/SE-03.pdf

When I mentioned entropy loss to the OP though, I mostly meant loss from
getting rid of the letter z. The (binary) Shannon entropy of the
uniform probability distribution on 26 letters is 4.7004397 bits; on 25
letters, it's 4.6438561 bits. The difference isn't enough to give an
attacker that much advantage.

I like the diceware approach to passphrase generation and I've been
using it for years. www.diceware.com explains it in detail and the docs
there are quite well-thought-out and informative. Keep in mind that the
entropy needed for an online password (attacker has to make a server
query for every guess, and hopefully gets locked out after n wrong
tries) and an offline one (attacker has something like a hash of the
password and can run a completely offline search) are different.

Here is a program that I use sometimes:

from math import log
dictfile = '/usr/share/dict/words'

def genrandom(nbytes):
with open('/dev/urandom') as f:
return int(f.read(nbytes).encode('hex'), 16)

def main():
wordlist = list(x.strip() for x in open(dictfile) if len(x) < 7)
nwords = len(wordlist)
print "%d words, entropy=%.3f bits/word"% (
nwords, log(nwords, 2))
print '-'.join(wordlist[genrandom(10)%nwords] for i in xrange(6))

main()
 
S

Steven D'Aprano

The question is: is this secure? That is, can the string generated this
way be considered truly random?

Putting aside the philosophical question of what "truly random" means, I
presume you mean that the letters are uniformly distributed. The answer
to that is, they don't like uniformly distributed.

This isn't a sophisticated statistical test, it's the equivalent of a
back-of-the-envelope calculation: I generated 100,000 random strings with
your code, and counted how often each letter appears:

If the letters are uniformly distributed, you would expect all the
numbers to be quite close, but instead they range from 15063 to 25679:

{'a': 15063, 'c': 20105, 'b': 15100, 'e': 25465, 'd': 25458, 'g': 25597,
'f': 25589, 'i': 25045, 'h': 25679, 'k': 22945, 'j': 25531, 'm': 16187,
'l': 16252, 'o': 16076, 'n': 16012, 'q': 16069, 'p': 16119, 's': 16088,
'r': 16087, 'u': 15951, 't': 16081, 'w': 16236, 'v': 15893, 'y': 15834,
'x': 15956}

Eye-balling it, it looks vaguely two-humped, one hump around 15-16K, the
second around 22-25K. Sure enough, here's a quick-and-dirty graph:

a | ***********************************
b | ***********************************
c | ***********************************************
d | ***********************************************************
e | ***********************************************************
f | ************************************************************
g | ************************************************************
h | ************************************************************
i | ***********************************************************
j | ************************************************************
k | ******************************************************
l | **************************************
m | **************************************
n | *************************************
o | **************************************
p | **************************************
q | **************************************
r | **************************************
s | **************************************
t | **************************************
u | *************************************
v | *************************************
w | **************************************
x | *************************************
y | *************************************


The mean of the counts is 19056.72, and the mean deviation is 3992.28.
While none of this is statistically sophisticated, it does indicate to me
that your function is nowhere even close to uniform. It has a very strong
bias.
 
P

Paul Rubin

Steven D'Aprano said:
Putting aside the philosophical question of what "truly random" means, I
presume you mean that the letters are uniformly distributed. The answer
to that is, they don't like uniformly distributed.

That is a good point, the way those letters are generated (through the
decimal conversion stuff), they won't be all that uniform.
 
S

Steven D'Aprano

Putting aside the philosophical question of what "truly random" means, I
presume you mean that the letters are uniformly distributed. The answer
to that is, they don't like uniformly distributed.

Er, they don't *look* uniformly distributed.

(Of course, being random, perhaps they are and I just got unlucky.)
 
P

Paul Rubin

mk said:
Thanks. But I would also be grateful for indicating what is wrong/ugly
in my code.

The stuff about converting 4 random bytes to a decimal string and then
peeling off 2 digits at a time is pretty awful, and notice that since
2**32 is 4294967296, in the cases where you get 10 digits, the first
2-digit pair is never higher than 42. There are also some effects on
the lower digits. The total entropy loss probably isn't fatal but as
described, it's ugly.

I'd write your code something like this:

nletters = 5

def randomword(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)])

print randomword(nletters)

I wouldn't rely on a 5 letter combination for a high security
application, but it might be ok for some low security purposes. Two
random 5-letter combinations separated by a hyphen will be much better,
and is probably easier to type than a solid block of 10 letters.
 
R

Robert Kern

Er, they don't *look* uniformly distributed.

(Of course, being random, perhaps they are and I just got unlucky.)

You'd have to be very, *very* unlucky to get a sample of that size so far from
uniformly distributed if the generating process actually were uniform.

Of course, uniformity isn't really necessary. You just need enough entropy in
the distribution (amongst other things like protection of the seed from being
known or guessed). A skewed distribution of characters is perfectly fine
provided that you had enough characters in the password to meet the desired
entropy requirement. A skewed distribution does require more characters to meet
a specified entropy requirement than a uniform distribution, of course.

That said, for a naive strategy like "pick an independent random character,
repeat", you should just use a uniform distribution. It makes the analysis
easier. Worthwhile generators that give skewed distributions usually do so for a
good reason, like generating pronounceable passwords.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
L

Lie Ryan

You'd have to be very, *very* unlucky to get a sample of that size so
far from uniformly distributed if the generating process actually were
uniform.

Of course, uniformity isn't really necessary. You just need enough
entropy in the distribution (amongst other things like protection of the
seed from being known or guessed). A skewed distribution of characters
is perfectly fine provided that you had enough characters in the
password to meet the desired entropy requirement. A skewed distribution
does require more characters to meet a specified entropy requirement
than a uniform distribution, of course.

That said, for a naive strategy like "pick an independent random
character, repeat", you should just use a uniform distribution. It makes
the analysis easier. Worthwhile generators that give skewed
distributions usually do so for a good reason, like generating
pronounceable passwords.

If an attacker knows the that the random number generator have an
extreme skew and he knows the distribution of the letters, how much
advantage would it give the attacker? My initial guess is that the more
skewed the letters are, the better the advantage, since an attacker
using brute-force can write his program to prefer the most likely letters?
 
P

Paul Rubin

Lie Ryan said:
If an attacker knows the that the random number generator have an
extreme skew and he knows the distribution of the letters, how much
advantage would it give the attacker? My initial guess is that the more
skewed the letters are, the better the advantage, since an attacker
using brute-force can write his program to prefer the most likely letters?

A useable (conservative) estimate is that the attacker's workload is 1/p
where p is the probability of the most likely password. That basically
says the password strength can be measured by the min-entropy.
Cryptographers often use that approach. If you want to be more precise,
you can do a conditional probability calculation assuming the attacker
works down the list of possible passwords in order of decreasing
probability, stopping when they hit the right one.

More generally still, passwords regardless of their entropy content are
a sucky way to encapsulate cryptographic secrets. We keep using them
because every alternative has drawbacks of its own.
 
L

Lawrence D'Oliveiro

More generally still, passwords regardless of their entropy content are
a sucky way to encapsulate cryptographic secrets.

They’re a shared secret. How else would you represent a shared secret, if
not with a shared secret?
 
S

Steven D'Aprano

Reducing the number of distinct letters by 50% decreases the entropy per
character by 1 bit.

You say that as if 1 bit of entropy isn't much :)

Given a random six character password taken out of an alphabet of 52
characters, it takes over nine billion attempts to brute force it.
Reducing the alphabet by 50% cuts that down to less than 200 million. To
make up for that loss of 1 bit of entropy, you need two extra characters
in your password.

That stuff about mixing letters and digits and
funny symbols just makes the password a worse nuisance to remember and
type, for a small gain in entropy that you can compute and make up for.

Well, everybody has their own ways of remembering passwords, and I'd much
prefer to remember an eight character password with "funny symbols" that
I chose myself, than a six character password with nothing but letters
that was chosen for me.

Of course, I flatter myself that I know how to choose good passwords, and
I hate remembering long random strings even from a reduced alphabet (e.g.
I hate memorizing eight digit phone numbers, and am completely incapable
of remembering ten digit mobile phone numbers).
 
P

Paul Rubin

Steven D'Aprano said:
Given a random six character password taken out of an alphabet of 52
characters, it takes over nine billion attempts to brute force it.
Reducing the alphabet by 50% cuts that down to less than 200 million. To
make up for that loss of 1 bit of entropy, you need two extra characters
in your password.

One extra character comes pretty close (within 1.3 bits). Even two
extra chars is probably (subjective) easier for a user to deal with than
a completely random mixture of upper/lower case. You don't get the
extra bit per character if that distribution is anything other than
random, of course.

For something like a web password (each guess takes a server hit), where
the resource guarded is not very valuable, 5 chars is probably enough
for most purposes. For something like an encryption key subject to
offline attacks, 6 mixed-case characters will barely slow a real
attacker down.

As before, my suggestion is still diceware. I've used random
alphanumerics in the past but they're too big a hassle, they have to be
written down, etc.

And of course, if you're doing something serious, use a hardware token.
 
M

mk

The stuff about converting 4 random bytes to a decimal string and then
peeling off 2 digits at a time is pretty awful, and notice that since
2**32 is 4294967296, in the cases where you get 10 digits, the first
2-digit pair is never higher than 42.

Yikes! I didn't think about that. This is probably where (some part of)
probability skewing comes from.

Anyway, the passwords for authorized users will be copied and pasted
from email into in the application GUI which will remember it for them,
so they will not have to remember and type them in. So I have little in
the way of limitations of password length - even though in *some* cases
somebody might have to (or be ignorant enough) to retype the password
instead of pasting it in.

In that case the "diceware" approach is not necessary, even though I
will certainly remember this approach for a case when users will have to
remember & type the passwords in.

The main application will access the data using HTTP (probably), so the
main point is that an attacker is not able to guess passwords using
brute force.

Using A-z with 10-char password seems to provide 3 orders of magnitude
more combinations than a-z:
95367431640625L

Even though I'm not sure it is worth it, assuming 1000 brute-force
guesses per second (which over the web would amount pretty much to DOS),
this would take # days:
1103789L

Even then I'm not getting completely uniform distribution for some reason:

d 39411
l 39376
f 39288
a 39275
s 39225
r 39172
p 39159
t 39073
k 39071
u 39064
e 39005
o 39005
n 38995
j 38993
h 38975
q 38958
c 38938
b 38906
g 38894
i 38847
m 38819
v 38712
z 35321
y 35228
w 35189
x 35075

Code:

import operator

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)])

def count_chars(chardict, word):
for c in word:
try:
chardict[c] += 1
except KeyError:
chardict[c] = 0

if __name__ == "__main__":
chardict = {}
for i in range(100000):
w = gen_rand_word(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
for char, count in counts:
print char, count
I'd write your code something like this:

nletters = 5

def randomword(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)])

print randomword(nletters)

Aw shucks when will I learn to do the stuff in 3 lines well instead of
20, poorly. :-/

Regards,
mk
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top