Generating unique row ID ints.

S

Simon Wittber

I'm building a web application using sqlalchemy in my db layer.

Some of the tables require single integer primary keys which might be
exposed in some parts of the web interface. If users can guess the next
key in a sequence, it might be possible for them to 'game' or
manipulate the system in unexpected ways. I want to avoid this by
generating a random key for each row ID, and have decided to use the
same approach for all my single key tables.

Are there any best practices for implementing this?

If the random module is suitable, does anyone have any good ideas on
how this could be implemented?

Some questions which came to mind are:
Would I need to save and restore the random module state when
generating id's for each table?
What would be an appropriate seed?
How many random integers can I generate before a repeat becomes
probable?

I've got my own ideas for implementing this, but am interested to see
how/if anyone else has tackled the same problem.


-Sw.
 
P

Paul Rubin

Simon Wittber said:
Some of the tables require single integer primary keys which might be
exposed in some parts of the web interface. If users can guess the next
key in a sequence, it might be possible for them to 'game' or
manipulate the system in unexpected ways. I want to avoid this by
generating a random key for each row ID, and have decided to use the
same approach for all my single key tables.

Normally primary keys are sequential but only live inside the system.
Users are not supposed to enter them.
If the random module is suitable, does anyone have any good ideas on
how this could be implemented?

The random module does not aim to be secure against knowledgeable
attackers trying to guess the output (i.e. it's not cryptographic
randomness). Use os.urandom instead.
I've got my own ideas for implementing this, but am interested to see
how/if anyone else has tackled the same problem.

The simplest thing to do is generate random strings, e.g.

key = os.urandom(16)

for a 16-byte binary string. You can of course encode it as printing
characters with your favorite binascii function. 16-byte strings like
that should be unguessable and collision-free until you have an
enormous number of them (on the order of 2**64).
 
N

Nick Vatamaniuc

The primary key is usually there to uniquely identify each row. There
are a couple of ways to generate them:

One is to always create a new integer id (sequentially or random)
when you insert a new row. This means that if you insert the same data
again, the row will always be duplicated. Usually you don't want this.


Then the primary key integer must somehow 'represent' the whole data
row. So you would want for the key to be a function of the whole data
row, such that when the data is the same the row ID is the same and
when the data is different the row ID is different. The answer to your
problem is to use a message digest (actually a message authentication
code function).

For example if your data row is in the tuple 'row' then you can do:
hex_digest_key=md5.new("|".join(row)).hexdigest(), you would have to
import the md5 module before doing this.

What that will do is your row will be contcatenated into one string
with each field separated by "|" then the md5 hash of that string will
be taken and the result returned to you in hexadecimal form. You'll get
back something like hex_digest_key='5d41402abc4b2a76b9719d911017c592'.
Then you can turn that into an integer by doing int(hex_digest_key,16)
-- and you can use that integer as your primary key.

But since one of the problems you want to solve is the user's ability
to predict the next key, you cannot just use a simple message digest
function. If the user finally figures out that you are running an MD5
algorithm, the user can also run the same algorithm and generate the
same message digest -- If that is a problem, then use a MAC (Message
Authentication Code) function. It works almost like a message digest
except you concatenate a secret key to the input so MD5 is run on the
row_as_text+secret_key.

Unless the user knows your secret key, they could not generate a
primary key from a given row even if they know you used MD5 and even if
the know the data content of your row.

NOTE: When using a message digest (and friends) it is important to
realize that there will be some collision between the keys if the
number of all possible digests (as limited by the digest algoritm) is
smaller than the number of the possible messages. In practice if you
have large enough integers (64) you shouldn't see any collisions
occur, but it is still good to be aware of them...

Hope this helps,
-Nick Vatamaniuc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top