Eric said:

I need an effective way (time is my main concern here) to generate 10

000 000 unique alphanumeric strings of 16 characters each.

I used STL set and map but after about 5 000 000 entries, it becomes

very slow, even if I still have enough RAM available on my computer

Do you have any advice?

Here a code sample that I used to ensure uniqueness.

typedef pair<string, bool> StringBool_pair;

map<string, bool> MapValues;

pair<map<string,bool>::iterator,bool> pr;

for (int iii=0; iii<10000000; iii++) {

strCode = random16CharacterCode();

pr = MapValues.insert(StringBool_pair(strCode, true));

if(pr.second == true) {

// Accepted...

} else {

// Rejected...

}

}

If the template library provided with your compiler has a hash_set or a

hash_map you might want to use that.

If not you might want to ditch the map if you are only using it to

ensure uniqueness of the inserted strings and go for a vector.

The risk that a few of the values already exist is small (but is

there).

You want only 1.0E+7 values, the total number of available values is

2.23007E+43 (or 4.52313E+74 if case sensitive). The chance of getting

the same string (if you use a decent pRNG) is only 1 divided by

2.23E+36 (or 1 divided 4.52E+67 if case sensitive).

If this is to risky since you must have a guarantee for uniqueness

another trick is to generate 36 (or 62) different maps and assign 10

million divided by 36 (or 62) values to each map, where each map stores

only string starting with one specific alpha numerical value. Then do a

random access to one of the submaps when you need a value. You'll want

to generate more values for each map to decrease the chance that one of

the submaps will run out of values when you access it.

This trick reduces the number of values you need to insert into one map

thereby replacing the time it takes to go through the tree for several

million values by a constant time during creation of the strings and a

constant time increase when retrieving a value.