Very big array with combinations

B

Bengt Kappner

Hi everyone,

for an art project we also want to print a few books that contain unique combinations of the letters a-z. I am looking for an efficient way to create a text file with those combinations.

My idea was to start with a loop from 1 to 2million, that generates combinations and writes adds them to an array. Before adding them, the array is iterated and checked if the combination already exists.

This process seems to become very slow when the array gets big. Also, what would be the best definition of that array? Is it possible at all for an array to contain 2 million values of 26 charactes each and write that in a text file at the end?

Any help is greatly appreciated!!
Bengt
 
J

Jonathan Lee

My idea was to start with a loop from 1 to 2million, that
generates combinations and writes adds them to an array.
Before adding them, the array is iterated and checked if
the combination already exists.

It's much faster to check a sorted array for a value than
an unsorted one. Luckily, std::set is pretty much perfect
for this kind of job.

Just define a std::set over strings. Generate your random
strings and insert() them until you have enough. The
member function set::insert() will prune the duplicates.
This process seems to become very slow when the array gets
big.
Yup.

Also, what would be the best definition of that array?

A std::set.
Is it possible at all for an array to contain 2 million
values of 26 charactes each and write that in a text
file at the end?

Sure. Just open a file and iterate over the set, writing
out the values.

--Jonathan
 
Ö

Öö Tiib

It's much faster to check a sorted array for a value than
an unsorted one. Luckily, std::set is pretty much perfect
for this kind of job.

Just define a std::set over strings. Generate your random
strings and insert() them until you have enough. The
member function set::insert() will prune the duplicates.


A std::set.


Sure. Just open a file and iterate over the set, writing
out the values.

Unless he wants to write the words in same order in what these were
generated (e.g. not sorted). Set forgets that unsorted order.
 
M

Marcel Müller

Hi,

Bengt said:
for an art project we also want to print a few books that contain unique combinations of the letters a-z. I am looking for an efficient way to create a text file with those combinations.

My idea was to start with a loop from 1 to 2million, that generates combinations and writes adds them to an array. Before adding them, the array is iterated and checked if the combination already exists.

use unordered_set<> and your program will complete in a few seconds. On
older compilers it might be named hash_set<> because it was not
initially part of the STL.


Marcel
 
A

AnonMail2005

Hi everyone,

for an art project we also want to print a few books that contain unique combinations of the letters a-z. I am looking for an efficient way to create a text file with those combinations.

My idea was to start with a loop from 1 to 2million, that generates combinations and writes adds them to an array. Before adding them, the array is iterated and checked if the combination already exists.

This process seems to become very slow when the array gets big. Also, what would be the best definition of that array? Is it possible at all for an array to contain 2 million values of 26 charactes each and write that in a text file at the end?

Any help is greatly appreciated!!
Bengt

Why doesn't your looping logic generate unique combinations so that
you don't have to check if they exist already?

Perhaps I'm missing something?
 
P

Puppet_Sock

for an art project we also want to print a few books that contain unique combinations of the letters a-z. I am looking for an efficient way to create a text file with those combinations.

Art project? That smells like homework to me.
Socks
 
B

Bengt Kappner

Why doesn't your looping logic generate unique combinations so that
you don't have to check if they exist already?

Well, because I can't think of a way to generate unique combinations without having to check. So far I use simple random numbers to generate a combination.

Do you know a way to implement that kind of logic in the loop?
Thanks,
Bengt
 
B

Bengt Kappner

Jonathan said:
A std::set.
That's a very good start, thank you.
So:

#include <iostream>
#include <set>
using namespace std;

int main ()
{
std::set<std::string> myset;
myset.insert('abc');
return 0;
}

Would that do?

Questions:
1. I can't find any 'set.shuffle' method to make it appear random before writing it out. Any idea on this?

2. Just for my interest: Isn't the insert method doing the same as I would do by inserting manually at the right place and comparing to neighbours? Or is this insert() using some mechanism that's faster?

Thank you very much!!
Bengt
 
B

Bengt Kappner

Marcel said:
use unordered_set<> and your program will complete in a few seconds.
Thanks for the hint, but how would that work in a few seconds?
If the set is unordered, doesn't the insert-method also have to iterate all values and compare them?

Thank you,
Bengt
 
B

Bengt Kappner

Hi,

I also ran in a little problem when trying to print the contents of a set, as this is the first time I am using sets.

set<std::string> myset;
set<int>::iterator it;

myset.insert("abc");
myset.insert("abac");
myset.insert("aac");

for (it=myset.begin(); it!=myset.end(); it++)
cout << " " << *it;
return 0;

According to (http://www.cplusplus.com/reference/stl/set/find/) this does work when the set contains integers. But here I am get the "no match for operator"-error.

Any help is greatly appreciated!
Bengt
 
M

Marcel Müller

Bengt said:
Thanks for the hint, but how would that work in a few seconds?
If the set is unordered, doesn't the insert-method also have to iterate all values and compare them?

'Unordered' means that there is no guaranteed order. More exactly that
your data items have no logical ordering. Just a set of unique objects.

Technically it is a hash table. If you construct the hash table with
sufficient size, all of your operations will amortize O(1). So you have
just one or two million operations. No serious problem for modern computers.

Note: std::set<> is not a set in the mathematical point of view. It is
an ordered sequence. unordered_set<> corresponds to a mathematical set.
Nothing else is your requirement.


Marcel
 
J

Jonathan Lee

int main ()
{
  std::set<std::string> myset;
  myset.insert('abc');
  return 0;

}

Would that do?

That's the idea (Except double quotes around "abc").
Questions:
1. I can't find any 'set.shuffle' method to make it appear
random before writing it out. Any idea on this?

In <algorithm> there is a function called random_shuffle().
The only caveat is that it requires a random access iterator,
which std::set doesn't have.

The solution is simple, but kinda inelegant: once you're
done generating the strings, copy them into a std::vector.
Then use random_shuffle() on that. It's a bit of a waste
of memory, but on a one-off project, who cares? And it's
really only a few lines of code.
2. Just for my interest: Isn't the insert method doing
the same as I would do by inserting manually at the
right place and comparing to neighbours? Or is this
insert() using some mechanism that's faster?

set::insert() takes advantage of the fact that std::set
keeps things in sorted order. So it can find a duplicate
with a binary search. Check Wikipedia for a description.
Basically it reduces checking N items to checking log2 N
items. On an array of size N = 2000000 that means it
only checks about 21 items. i.e., it does 1/100000th the
work.
Thank you very much!!

No problem
--Jonathan
 
K

Kai-Uwe Bux

Bengt said:
That's a very good start, thank you.
So:

#include <iostream>
#include <set>
using namespace std;

int main ()
{
std::set<std::string> myset;
myset.insert('abc');
return 0;
}

Would that do?

Questions:
1. I can't find any 'set.shuffle' method to make it appear random before
writing it out. Any idea on this?
[...]

Are you populating the set in random order? if so, you could use the set
just to decide whether an item is new (and then print it right away) or
already dealt with (in which case you would silently ignore it).


Best

Kai-Uwe Bux
 
A

AnonMail2005

Well, because I can't think of a way to generate unique combinations without having to check. So far I use simple random numbers to generate a combination.

Do you know a way to implement that kind of logic in the loop?
Thanks,
Bengt

What is your definition of letter combinations? Here are a few
questions:
1. Are the restricted to a certain length?
2. Do they have to be wordS?

Based on the answers I can come up with suggestions.
 
T

Thomas J. Gritzan

Am 05.05.2010 11:19, schrieb Bengt Kappner:
Hi,

I also ran in a little problem when trying to print the contents of a set, as this is the first time I am using sets.

set<std::string> myset;
set<int>::iterator it;

Iterators only work with containers of the same type. set<int> is not
myset.insert("abc");
myset.insert("abac");
myset.insert("aac");

for (it=myset.begin(); it!=myset.end(); it++)
cout << " " << *it;
return 0;

According to (http://www.cplusplus.com/reference/stl/set/find/) this does work when the set contains integers. But here I am get the "no match for operator"-error.

Your set doesn't contain integers, it contains strings. Replace set<int>
with set<std::string>, and it should work.

It might be a good idea for you to read a good book about C++, Templates
and the STL.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top