Circular lists

C

cartercc

I want to learn an effient way of handle circular lists.

If you want a circular list, you use a counter and modulus it by the
number of items in the list.

For example, if you have a ten element list in an array, and your
counter is modulus ten, 9 gets $array[9], but 10 gets (10 modulus 10 =
0) $array[0].

CC
 
J

Jim Gibson

gamo said:
On Thu, 15 Jan 2009, gamo wrote:

Ok, I rewrote something to assure that there are not duplicates in %clist
and run the test again, with similar results (speed, high use of mem).

The thing could change radically if there is a method to canonicalize all
the rotations of a list in a compact string. Did you say that is possible?

I did so several days ago. You can canonicalize or normalize your lists
by deciding upon a unique element within the list and rotating the list
so that element is first. You can do that by assigning a unique,
sequential key to each element and picking the element with the lowest
key as the first key. Since this element should always appear first,
you can save time in generating all of the lists, given the set of
elements, by just placing the lowest element in the first position and
then generating all of the permutations for the remaining elements.
This reduces your computational time from N! to (N-1)!.

As Xho pointed out, that method will still generate equivalent
"circular lists" if the first element is duplicated elsewhere in the
set. You might be able to reduce or eliminate generating duplicates if
you pick an element as the first element that does not have any
duplicates in the set, if that is possible.
 
J

Jürgen Exner

cartercc said:
I want to learn an effient way of handle circular lists.

If you want a circular list, you use a counter and modulus it by the
number of items in the list.

For example, if you have a ten element list in an array, and your
counter is modulus ten, 9 gets $array[9], but 10 gets (10 modulus 10 =
0) $array[0].

You are falling for the same red herring that has plagued me for the
past days. This is not about circular lists, quite the opposite
actually.

From my understanding now (thanks again to xhoster for the explanation)
the OP is looking for all equivalence classes of all permutations of a
given list, where two (permutated) lists belong to the same equivalence
class if and only if those two lists can be transformed into each other
by rotating/circling their elements.

jue
 
J

Jürgen Exner

gamo said:
The thing could change radically if there is a method to canonicalize all
the rotations of a list in a compact string. Did you say that is possible?

I don't think you have to. What you are looking for are equivalence
classes. For each class you need one representative element or normal
form. Then, if and only if your permute algorithm generates this very
representative, then you found a new equivalence class and you can print
that representative immediately even without storing it.
If a permutation returned by the permute algorithm is not a
representative/not in normal form then you can discard that permutation
right away.

How you define this normal form is up to you. You need to find a
definition, that does not require to generate all the other elements of
the equivalence class, i.e. all the other rotations, because that would
be expensive.

One idea: use alphabetical sorting and define the normal form to be the
smallest alphabetical list.
That means, if the list doesn't start with the smallest element then it
can't be in normal form. This will eliminate the vast majority of
candidates right away.
If a candidate does start with the smallest element and there is no
other element with the same small value, then you got a normal form and
the representative of a new equivalence class.
Only if the same smallest value appears multiple times in the list, then
you need to dig deeper and spend some time evaluating, if your candidate
is the smallest alphanumerical element. But this shouldn't be too
frequent, so the performance hit should be bearable.
I don't think so. If we have numbers instead of letters, I can't think in
a method that could describe the list of circular numbers in less space
that it has.

The trick is not to store all permutations or even all normal forms but
to develop a normal form criteria that allows you to determine if a
permutation is in normal form _without_ looking at previously generated
permutations and for performance reasons without looking at other
rotations, i.e. at other elements of the same equivalence class.

jue
 
S

sln

cartercc said:
I want to learn an effient way of handle circular lists.

If you want a circular list, you use a counter and modulus it by the
number of items in the list.

For example, if you have a ten element list in an array, and your
counter is modulus ten, 9 gets $array[9], but 10 gets (10 modulus 10 =
0) $array[0].

You are falling for the same red herring that has plagued me for the
past days. This is not about circular lists, quite the opposite
actually.

From my understanding now (thanks again to xhoster for the explanation)
the OP is looking for all equivalence classes of all permutations of a
given list, where two (permutated) lists belong to the same equivalence
class if and only if those two lists can be transformed into each other
by rotating/circling their elements.

jue

You guys make me laff. You should watch that new show on the science channel
"Million 2 One". They had a good one there, how many people in a room does it
take to statistically almost guarantee 2 people will have the same birth date
(day-month) out of 365 days/year?

Answer: 26

So funny!

sln
 
X

Xho Jingleheimerschmidt

gamo said:
The thing could change radically if there is a method to canonicalize all
the rotations of a list in a compact string. Did you say that is possible?

Of course. From a previous post:

my $s = join '',@set;
my $two = $s . $s;
next if ($two =~ /gg/);
my ($canon) = sort map {substr $two,$_,length $s} 0..length($s) -1;

That last line canonicalizes by the simple method of computing all
rotations and taking the alphabetically least of them (you could use
List::Util::minstr instead of sort). You could use other
canonicalization rules if they made other optimizations possible. For
example, it might be a good idea to define the canonical representation
as the one the alphabetically first *among those that start with the
rarest element*. In your example, "a" was both the rarest and the
alphabetically first, but that wouldn't hold in general. By doing this,
you get the double advantage of fixing the first position and so not
needing to permute it, plus you don't have to compute all rotations but
only the few ones that would put another instance of the rarest element
first.
I don't think so. If we have numbers instead of letters, I can't think in
a method that could describe the list of circular numbers

If the numbers are single digits, then it works just the same as with
letters. If they can be integers of any number of digits, you just need
a non-digit delimiter to join on to make a canonical representation:

my @two=(@s,@s);
my ($canon) = sort map {join ";", @two[$_..$_+$#s]} 0..$#s;

in less space that it has.

An array packed into a string generally takes less space, in Perl, then
the array itself. Plus you are using one such string to represent not
just one array, but all rotations of it. Plus, you might find you don't
actually need to store the canonical representation in memory long-term,
just use it to test something else on the fly.

Xho
 
X

Xho Jingleheimerschmidt

Jürgen Exner said:
Maybe
http://search.cpan.org/~edpratomo/Algorithm-Permute-0.12/Permute.pm

It mentions some other permute implementations in a benchmark, too.

That (and the others I've seen ) handles sets of course, but not
multi-sets. Well, it handles them, just not well. For example,
the multiset of 10 'a's and 10 'b's has 184756 distinct permutations,
but Algorithm Permute will return each one of them 1.32E+013 times.

So something efficient for multi-sets, like dpermute or
Distinct::permute posted earlier, is essential.

However -as others have discussed already- I doubt that it will be very
useful for larger lists (more than 10 to 15 elements) , because the OP
will run out of space and time if he tries to generate all permutations
first and then filter second.
The filter for rotational equivalence has to be incorporated into the
generating algorithm.

It's far more important to filter for degenerate linear permutations in
the generating algorithm. Once you pin a rare element in the first
position, trying to implement more of the rotational equivalence
short-circuiting during the linear permutation generation doesn't get
you a whole lot of extra benefit for the effort it would take. Probably
better to take the simpler algorithm into C than make the Perl algorithm
insanely complex.

Xho
 
R

Rasmus Villemoes

how many people in a room does it take to statistically almost
guarantee 2 people will have the same birth date (day-month) out of
365 days/year?

Answer: 26

That's the number of people making the probability of at least two
people sharing a birthday > 0.59. That is certainly not the same as
"statistically almost guarantee", whatever that means. 23 people make
the probability > 0.5.
So funny!

No, just math (not that math can't be funny, but this is just too
trivial). And 26 just happens to be the smallest integer making the
quantity (365/365)*(364/365)* ... * ( (365-n+1)/365) smaller than
0.41, which is the probability that no pair out of n people share a
birthday (ignoring subtleties such as leap years and facts about
birthdays not actually being distributed uniformly over all 365 days).

But this is way OT. Since this is a perl newsgroup, here's a bit of
perl code:

my $n;
my $p = 1;
for $n (1..30) {
$p *= (366-$n)/365;
printf "%3s: %.5f , %.5f\n", $n, $p, 1-$p;
}
 
G

gamo

Of course. From a previous post:

my $s = join '',@set;
my $two = $s . $s;
next if ($two =~ /gg/);
my ($canon) = sort map {substr $two,$_,length $s} 0..length($s) -1;

That last line canonicalizes by the simple method of computing all rotations
and taking the alphabetically least of them (you could use List::Util::minstr
instead of sort). You could use other canonicalization rules if they made
other optimizations possible. For example, it might be a good idea to define
the canonical representation as the one the alphabetically first *among those
that start with the rarest element*. In your example, "a" was both the rarest
and the alphabetically first, but that wouldn't hold in general. By doing
this, you get the double advantage of fixing the first position and so not
needing to permute it, plus you don't have to compute all rotations but only
the few ones that would put another instance of the rarest element first.

That's fine, but don't works for handling the circular permutations.
It detects distintinct permutations but not the circular ones, as far as
I tested. Probably the problem comes for taking the first element of the
sorted list (only).

@canon = sort map ...
$canon = join '', @canon;
I don't think so. If we have numbers instead of letters, I can't think in a
method that could describe the list of circular numbers

If the numbers are single digits, then it works just the same as with letters.
If they can be integers of any number of digits, you just need a non-digit
delimiter to join on to make a canonical representation:

my @two=(@s,@s);
my ($canon) = sort map {join ";", @two[$_..$_+$#s]} 0..$#s;

in less space that it has.

An array packed into a string generally takes less space, in Perl, then the
array itself. Plus you are using one such string to represent not just one
array, but all rotations of it. Plus, you might find you don't
actually need to store the canonical representation in memory long-term, just
use it to test something else on the fly.

In the case of 20 letters it has 20*20 rotations per item. That's a number
highly enough to think in using a msg-digest like sha and have a hash of
it.
About testing candidates on the fly I still don't imagine how could be
done without using hashes of the previous ones.

Best regards,
 
X

Xho Jingleheimerschmidt

gamo said:
That's fine, but don't works for handling the circular permutations.

Of course it does. That is the only thing it works for. That is the
whole point of the code. When I originally posted the code, it was
tested and verified to work, converging on the same answer as the other
implementation did.

It detects distintinct permutations

Distinct permutations need no canonicalization. They do not have
multiple different representations of the same underlying conceptual
thing, so canonicalization is neither necessary nor possible. (Unless
you actually did append a digit to each letter to indicate the original
order, as someone proposed as a pedagogical exercise. Then stripping
that digit off would be a form of canonicalization.)

but not the circular ones, as far as
I tested. Probably the problem comes for taking the first element of the
sorted list (only).

Taking just the first element of the sorted list (of strings, which are
themselves implicit lists of letters) is the whole point. If, for some
reason, you later need all the rotations, not just the canonical one,
you can recompute then on the fly given the canonical one. I can't
comment on what your tests were showing you, unless you
post the test code.

....
In the case of 20 letters it has 20*20 rotations per item.

There only 20 rotations, not 20*20. And you only save one of them, if that.
About testing candidates on the fly I still don't imagine how could be
done without using hashes of the previous ones.

Rather than using memory and hashes to explicitly compare to the past,
you use logic and rules to implicitly compare to the past and future.
To pull it off you, need to combine both the distinct permutation
generator and the circular canonicalization together, otherwise it won't
work.

1) Has this linear permutation been generated in past, or will it be
generated again in the future (of this particular execution of the
code)? No!

1a) How do you know that? Because I successfully implemented a distinct
permutation generator designed explicitly to have that behavior.

2) Will the underlying circular concept represented by this linear
permutation be represented by some other linear permutation, past or
future? Yes!

2a) So how can I efficiently communicate with those past and future
representations so that one and only one of us knows whether we are the
"it" one or not? Do it by rule, not by explicit communication. Each of
us tests whether our representation canonicalizes to itself. If it
does, I know I am "it". If it doesn't, I know that one of the other
representations is (or will be) "it", so I bow out gracefully.

Cheers,

Xho
 
G

gamo

Of course it does. That is the only thing it works for. That is the whole
point of the code. When I originally posted the code, it was tested and
verified to work, converging on the same answer as the other implementation
did.



Distinct permutations need no canonicalization. They do not have multiple
different representations of the same underlying conceptual thing, so
canonicalization is neither necessary nor possible. (Unless you actuallydid
append a digit to each letter to indicate the original order, as someone
proposed as a pedagogical exercise. Then stripping that digit off would be a
form of canonicalization.)



Taking just the first element of the sorted list (of strings, which are
themselves implicit lists of letters) is the whole point. If, for some
reason, you later need all the rotations, not just the canonical one, youcan
recompute then on the fly given the canonical one. I can't comment on what
your tests were showing you, unless you
post the test code.

...

You are right, again.
#!/usr/local/bin/perl -w
use List::Util qw(shuffle);
@a = qw(a a a a a r r g g n);
# @a = qw(a a b b b c c c c d d d d d e e e e e e);
$n = @a;
for (1..10_000_000){
@set = shuffle(@a);
$s = join '',@set;
$two = $s . $s;
my ($canon) = sort map {substr $two,$_,$n} 0..$n-1;
if (!defined $clist{$s} && !defined $hash{$canon}){
$clist{$s}++;
# print "$s -> length canon $canon ", length $canon, "\n";
$hash{$canon}++;
$exito++;
print "$0 $exito\n";
$kount=0;
}else{
$kount++;
last if $kount>=100_000;
}
}
print "El número de permutaciones circulares es $exito\n";

__END__
756
There only 20 rotations, not 20*20. And you only save one of them, if that.


Rather than using memory and hashes to explicitly compare to the past, you use
logic and rules to implicitly compare to the past and future.
To pull it off you, need to combine both the distinct permutation generator
and the circular canonicalization together, otherwise it won't work.

1) Has this linear permutation been generated in past, or will it be generated
again in the future (of this particular execution of the code)? No!

1a) How do you know that? Because I successfully implemented a distinct
permutation generator designed explicitly to have that behavior.

2) Will the underlying circular concept represented by this linear permutation
be represented by some other linear permutation, past or future? Yes!

2a) So how can I efficiently communicate with those past and future
representations so that one and only one of us knows whether we are the "it"
one or not? Do it by rule, not by explicit communication. Each of us tests
whether our representation canonicalizes to itself. If it does, I know Iam
"it". If it doesn't, I know that one of the other representations is (or
will be) "it", so I bow out gracefully.
Good idea.
Cheers,

Xho

Best regards,
 
G

gamo

Good idea.

....and it solves definitively the problem, but I wonder if it
could be optimized even more. I.e. given a large list, could be
predicted when the candidates would be of a pattern of success?

Thanks!
 
X

Xho Jingleheimerschmidt

gamo said:
....and it solves definitively the problem, but I wonder if it
could be optimized even more. I.e. given a large list, could be
predicted when the candidates would be of a pattern of success?


Yes, but that is not going to get you orders of magnitude improvement.

The first thing I did was translate the algorithm already described
(including the part about fixing one of the "a"s to be the first slot,
which means less permutations plus less rotations (you only have to find
the other "a" and rotate it, rather than doing all 20 rotations) into
C, because that will get you orders of magnitude improvement and my gut
tells me that nothing left that you can do in Perl will do so (other
than large scale parallelization, assuming you have a 100 CPU cluster
laying around). Also, because at this point it was pretty
straightforward to translate to C, but after more optimization done in
Perl the translation to C would become exponentially harder.

The next optimization is to hack the dpermute code itself (as opposed to
the callback made from it) in the manner you describe, so that as soon
as the second 'a' is placed into the variable 'prefix', I can start
testing to see if the "rotated" string is, to our knowledge so far,
going to be less than, greater than, or undecidable compared to the
original string. If the rotated will be less than the original, then we
will fail the self-canonicalization test and I can abort now before
filling in any more letters. If the rotated will be greater than the
original, then I know that any string I make by filling in more letters
will pass the canonicalization test, so I can set a flag so that from
here (recursively) on, no more lexical order testing is needed, it
passes by induction. If it is undecidable, then the I just have to do
the test-skip-flag thing when the next letter is added. (And of course
I only need to test the added letter to its mate, I don't have to test
all preceding letters as I know that they are equal because otherwise I
wouldn't have reached this point without the flag being set)

After all that, I got C code that could generate all 4,888,643,760
answers for the size 2-3-4-5-6 problem in about one hour, on a 900MHz
processor, using negligible memory.

Of course, if there were not exactly two instance of the rarest letter,
then this optimization would not work. A general-case optimization
should be possible, but much more difficult. (and if the rarest letter
were not also the alphabetically smallest, then rather than changing my
code to work under that scenario, I would just remap the alphabet to
make it be the case that the rarest was the smallest, and then use
either Perl's or linux's "tr" to map the answers back to the original
alphabet.)

But this all because I think it's fun. If I were doing this for real,
I'd probably be asking, "What the heck am I going to do with over 4
billion 20-letter strings, and maybe I should focus my optimization on
the use of them rather than the generation of them"


Xho
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top