Generating random strings without duplicates

Discussion in 'C++' started by eric.gagnon@loto-quebec.com, Jul 7, 2005.

  1. Guest

    In a program randomly generating 10 000 000 alphanumeric codes of 16
    characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
    efficient way to ensure that I do not generate duplicates?

    STL set, map?
    Could you give me a little code example?

    Thank you.
    , Jul 7, 2005
    #1
    1. Advertising

  2. On 7 Jul 2005 06:49:29 -0700, wrote:

    >In a program randomly generating 10 000 000 alphanumeric codes of 16
    >characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
    >efficient way to ensure that I do not generate duplicates?
    >
    >STL set, map?
    >Could you give me a little code example?


    160 MB with 1-byte characters, or at least twice that for wchar_t ...
    hmmm?

    Store them in a database, and put a unique index on that column. True
    randomness does not exclude duplicate values.

    --
    Bob Hairgrove
    Bob Hairgrove, Jul 7, 2005
    #2
    1. Advertising

  3. Guest

    Bob Hairgrove wrote:
    > On 7 Jul 2005 06:49:29 -0700, wrote:
    >
    > >In a program randomly generating 10 000 000 alphanumeric codes of 16
    > >characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
    > >efficient way to ensure that I do not generate duplicates?
    > >
    > >STL set, map?
    > >Could you give me a little code example?

    >
    > 160 MB with 1-byte characters, or at least twice that for wchar_t ...
    > hmmm?
    >
    > Store them in a database, and put a unique index on that column. True
    > randomness does not exclude duplicate values.


    Depending on what sort of "randomness" you require, you might be
    able to avoid any kind of storage/lookup at all. Certain pseudo-
    random number generators have the property that the same number
    never occurs more than once for the duration of the randomizer's
    "period" (after which the entire sequence of numbers repeats).
    The linear-congruential algorith used by many implementations of
    rand() has this property -- but unfortunately, from what I've
    read most tend to have periods of only 65,000 or so.

    If you could find a version with a period of 10,000,000 or more
    then your problem is 99% solved. All you then have to do is use
    each random number to generate a 16-character string in such a
    way that no two numbers yield the same string. (This should be
    easy enough to guarantee. If you limit yourselves to characters
    A-Z and 0-9 this gives you up to 36^16 possible strings which
    is more than enough.)

    This will give you an algorithm for generating up to N unique
    strings (N being the period of the randomizer), where uniqueness
    is guaranteed by the algorithm with no lookup required.
    , Jul 8, 2005
    #3
  4. Guest

    wrote:
    > Depending on what sort of "randomness" you require, you might be
    > able to avoid any kind of storage/lookup at all. Certain pseudo-
    > random number generators have the property that the same number
    > never occurs more than once for the duration of the randomizer's
    > "period" (after which the entire sequence of numbers repeats).
    > The linear-congruential algorith used by many implementations of
    > rand() has this property -- but unfortunately, from what I've
    > read most tend to have periods of only 65,000 or so.
    >
    > If you could find a version with a period of 10,000,000 or more
    > then your problem is 99% solved.


    Google: mersenne twister


    Bart.
    , Jul 8, 2005
    #4
  5. rossum Guest

    On 7 Jul 2005 06:49:29 -0700, wrote:

    >In a program randomly generating 10 000 000 alphanumeric codes of 16
    >characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
    >efficient way to ensure that I do not generate duplicates?
    >
    >STL set, map?
    >Could you give me a little code example?
    >
    >Thank you.


    An alternative:

    1 Generate a random string.
    2 Hash the string to a value in a range > 0 - 9 999 999.
    3 If this hash value has been used before then goto 1.
    4 If the hash value is a new one then the string is also new.

    Keep track of which hash values have appeared by using a bit array,
    one bit per value in the hash range.

    If you pick the hash range to be 25% larger than the number of strings
    then on average you will only have to generate 4 strings to find a new
    one at the end of the run.


    Another alternative:

    1 Generate a file of N > 10 000 000 unique strings in advance.
    2 Pick a random number R in the range 1 .. N, both inclusive..
    3 If R is already picked then pick another.
    4 If R is not already picked then output the Rth string.

    Again a bit array will keep track of the numbers that you have already
    picked.

    By generating the strings in advance you can take as long as you want
    to get them right. You can load the file from disk for the actual run.


    Yet another alternative:

    1 Generate a file of N >= 10 000 000 unique strings in advance.
    2 For IX = N downto 1
    3 Pick a random number R in the range 1 .. IX, both inclusive.
    4 Output string R.
    5 Copy string IX over string R
    6 Endfor



    rossum


    The ultimate truth is that there is no ultimate truth
    rossum, Jul 9, 2005
    #5
  6. Greg Guest

    Bob Hairgrove wrote:
    > On 7 Jul 2005 06:49:29 -0700, wrote:
    >
    > >In a program randomly generating 10 000 000 alphanumeric codes of 16
    > >characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
    > >efficient way to ensure that I do not generate duplicates?
    > >
    > >STL set, map?
    > >Could you give me a little code example?

    >
    > 160 MB with 1-byte characters, or at least twice that for wchar_t ...
    > hmmm?
    >
    > Store them in a database, and put a unique index on that column. True
    > randomness does not exclude duplicate values.
    >
    > --
    > Bob Hairgrove
    >


    Try "uuidgen"

    Greg
    Greg, Jul 9, 2005
    #6
  7. E. Mark Ping Guest

    In article <>,
    <> wrote:
    >In a program randomly generating 10 000 000 alphanumeric codes of 16
    >characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
    >efficient way to ensure that I do not generate duplicates?


    More information would make it possible to answer in something other
    than guesses.

    Why are you generating all these strings (how will they be used)? Why
    is duplication an issue? etc.
    --
    Mark Ping
    E. Mark Ping, Jul 9, 2005
    #7
  8. Jim Langston Guest

    "E. Mark Ping" <> wrote in message
    news:dap02i$jta$...
    > In article <>,
    > <> wrote:
    >>In a program randomly generating 10 000 000 alphanumeric codes of 16
    >>characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an
    >>efficient way to ensure that I do not generate duplicates?

    >
    > More information would make it possible to answer in something other
    > than guesses.
    >
    > Why are you generating all these strings (how will they be used)? Why
    > is duplication an issue? etc.
    > --
    > Mark Ping
    >


    And why can't you just use a UUID generator which is guaranteed to be
    unique?
    Jim Langston, Jul 13, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    854
    Ingo R. Homann
    Jun 27, 2005
  2. Mike P

    Generating random strings

    Mike P, May 31, 2007, in forum: ASP .Net
    Replies:
    6
    Views:
    309
    Samuel R. Neff
    May 31, 2007
  3. globalrev
    Replies:
    4
    Views:
    756
    Gabriel Genellina
    Apr 20, 2008
  4. avanti
    Replies:
    14
    Views:
    208
    Chad Burggraf
    Jan 4, 2007
  5. VK
    Replies:
    15
    Views:
    1,160
    Dr J R Stockton
    May 2, 2010
Loading...

Share This Page