Circular lists

xhoster · Jan 11, 2009

Jürgen Exner said:
Are you confirming Red's understanding of the problem or are you
correcting his understanding? To me both versions, his and yours, seem
to be the same.

If they are, then generating all your "circular lists" is trivial, as is
computing their number.

Obviously there are exactly as many circular lists as there are elements
in the original list, because each element can become the first element
in a result list.

ababab only has two linear representations,not 6.

I think your claim is true if and only if the counts of the occurrences of
distinct letters in the input @set have a lcf of 1.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

gamo · Jan 11, 2009

There only that many permutations if your "set" has only unique letters,
which in your example it does not. Anyway, may gut feeling is that to get

And how can I exploit that property reducing iterations?

an accurate count based on stochastic sampling, your will need to be about
as large as the underlying domain, anyway. But I could be wrong.

But I can have a lower bound for the number and the lists I can't have
with exaustive enumeration.

canonicalize! Or solve the problem analytically. If your set is such that
no permutation can have a self-rotation, then every circular list will
have exactly N linear representations.

Only the number of circular permutations could be calculated. I want to
know what are that lists.

Best regards,

PS: If you look at my corrected code, it works. I tought that it was
inefficient but I can't see where. A posible patch could be to use
only $p[] and don't compare with the past, because $p[] contains the
(number of circular perm * lenght of the original list) possibilities.

xhoster · Jan 11, 2009

gamo said:
And how can I exploit that property reducing iterations?

There may already be a module for that on CPAN, but if so finding it
seems as hard as making one. So I whipped up something easy to implement
(but not to use)

To make a good module out of it, recursion should be removed and
reimplemented with explicit state and as an iterator rather than a
callback.

With your example, it returns 5880, as it just generates, filters, and
counts the permutations, and doesn't look at circularity.

#efficiently make all distinct permutations

use strict;
use warnings;

sub dpermute {
my ($set, $prefix, $code)=@_;
my $count;
foreach (keys %$set) {
next unless $set->{$_};
$count++;
my %set = %$set;
$set{$_}--;
my @prefix = (@$prefix, $_);
dpermute(\%set, \@prefix, $code);
};
$code->(@$prefix) unless $count;
};

my %set;
$set{$_}++ foreach qw/a a a a a r r g g n/;
my $x;
dpermute(\%set,[],sub {my $y=join "", @_; $y.=$y; $x++ unless $y=~/gg/});
warn $x;

But I can have a lower bound for the number and the lists I can't have
with exaustive enumeration.

You can use a deterministic enumeration and just not let it run to
completion. I think it might start out less efficient than random, but
at some point will become more efficient as the random starts reproducing
old results. Or do it both and then take the max of the two.

PS: If you look at my corrected code, it works.

Oh, I didn't see the correction (well, I saw it, but thought it was a
botched posting as Ｉdidn't notice the changes just a copy of the
original). I'll take a look later and see if I have any suggestions.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

gamo · Jan 11, 2009

For some input sizes and structures, clearly it is intractable. But the
same is true for nearly all problems.

Certainly. Thanks for remember me that.

Where n is what, the size of @set, or the factorial of that size?

More related with size, but there are counter-examples like
(a a a a a a) or (a a a a a b). I will try it with the case
(a a b b b c c c c d d d d d e e e e e e) for try to learn a
relation.

Best regards,

Jürgen Exner · Jan 12, 2009

ababab only has two linear representations,not 6.

Good catch, you are right.

I think your claim is true if and only if the counts of the occurrences of
distinct letters in the input @set have a lcf of 1.

What's an lcf? http://en.wikipedia.org/wiki/LCF has numerous
suggestions, but none that seems to fit in this context.

jue

Jürgen Exner · Jan 12, 2009

There only that many permutations if your "set" has only unique letters,
which in your example it does not.

Well, I wonder what George is actually talking about. He is saying set,
but as you pointed out he is using a multi-set (or maybe a list?), and
then later he is talking about lists.

Anyway, may gut feeling is that to get
an accurate count based on stochastic sampling, your will need to be about
as large as the underlying domain, anyway. But I could be wrong.

Unless George manages to explain his problem precisely in terms that
everyone can understand and doesn't keep changing his story I have
little hope that we can even agree about the problem to be solved, let
alone agree on a solution.

A formal spec would be nice, but I guess that is too much to ask.

jue

xhoster · Jan 12, 2009

for $j (1..$precounter){
if ($s eq $p[$j]){
$ok=0;
last;
}
};

This would be better written as a hash look up. To do so, the easiest way
is move it above the hash population, and do away with @p altogether,
giving:

for (1..10_000_000){
my @set = shuffle(@a);
my $s = join '',@set;
my $two = $s . $s;
next if ($two =~ /gg/);
unless (exists $hash{$s}){
$exito++;
print "$0 $exito\n";
}
for my $i (0..9){
my $r = substr $two,$i,10;
$hash{$r}++;
}
}

This should be a lot faster. (In addition to other things mentioned
previously, the hard-coding of 9 and 10 in the above is not a good idea)

If RAM becomes a premium, you could put only one canonical sequence
in the hash, rather than every rotation:

for (1..10_000_000){
my @set = shuffle(@a);
my $s = join '',@set;
my $two = $s . $s;
next if ($two =~ /gg/);
my ($canon) = sort map {substr $two,$_,length $s} 0..length($s) -1;
unless (exists $hash{$canon}){
$hash{$canon}++;
$exito++;
print "$0 $exito\n";
}
}
print "El número de permutaciones circulares es $exito\n";

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

xhoster · Jan 12, 2009

Jürgen Exner said:
What's an lcf? http://en.wikipedia.org/wiki/LCF has numerous
suggestions, but none that seems to fit in this context.

Oops. I meant gcf, greatest common factor. (lcf, least common factor, is
pretty trivial, which is probably why there is no entry for it.)

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

xhoster · Jan 12, 2009

Previously, I wrote about getting only distinct permutations:

There may already be a module for that on CPAN, but if so finding it
seems as hard as making one. So I whipped up something easy to implement
(but not to use)

To make a good module out of it, recursion should be removed and
reimplemented with explicit state and as an iterator rather than a
callback.

I've done the above mentioned conversion:

my $iter=Distinct:

ermute->new(qw/a a a a a r r g g n/);
while (my @x=$iter->next) {
print "@x\n";
};

package Distinct:

ermute;
sub new {
"Distinct:

ermute" eq shift @_ or die "Not subclassable";
my %h;
foreach (@_) {
$h{$_}++;
};
my @alpha=keys %h;
my @counts=values %h;
my @first;
push @first, (($_) x $counts[$_]) foreach 0..$#alpha;
return bless {alpha => \@alpha, counts=> \@counts, now => \@first};
};

sub next {
my $self=shift;
return if exists $self->{done};
my @return = map $self->{alpha}[$_], @{$self->{now}};

## climb back off the length until we have backed off far enough that
## the next iteration doesn't need to change anything to the left of us.
my $climb=$#{$self->{now}};
my @left = (0)x @{$self->{alpha}};
my $max_left = -1; # highest digit with any left to choose
while ($climb >=0) {
last if $self->{now}[$climb] < $max_left;
$max_left = $self->{now}[$climb] if $self->{now}[$climb] > $max_left;
$left[ $self->{now}[$climb] ] ++;
$climb--;
};
if ($climb<0) {
$self->{done}=1;
return @return;
};

# roll over the digit that needs it:
$left[ $self->{now}[$climb] ]++;
do {$self->{now}[$climb]++} until $left[ $self->{now}[$climb] ];
$left[ $self->{now}[$climb] ]--;

# fill out rest
foreach my $digit (0..$#left) {
foreach (1..$left[$digit]) {
$self->{now}[++$climb]=$digit;
};
};

return @return

};

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

gamo · Jan 12, 2009

On Fri, 9 Jan 2009, gamo wrote:

Click to expand...

for $j (1..$precounter){
if ($s eq $p[$j]){
$ok=0;
last;
}
};

Click to expand...

This would be better written as a hash look up. To do so, the easiest way
is move it above the hash population, and do away with @p altogether,
giving:

for (1..10_000_000){
my @set = shuffle(@a);
my $s = join '',@set;
my $two = $s . $s;
next if ($two =~ /gg/);
unless (exists $hash{$s}){
$exito++;
print "$0 $exito\n";
}
for my $i (0..9){
my $r = substr $two,$i,10;
$hash{$r}++;
}
}

This should be a lot faster. (In addition to other things mentioned
previously, the hard-coding of 9 and 10 in the above is not a good idea)

If RAM becomes a premium, you could put only one canonical sequence
in the hash, rather than every rotation:

for (1..10_000_000){
my @set = shuffle(@a);
my $s = join '',@set;
my $two = $s . $s;
next if ($two =~ /gg/);
my ($canon) = sort map {substr $two,$_,length $s} 0..length($s) -1;
unless (exists $hash{$canon}){
$hash{$canon}++;
$exito++;
print "$0 $exito\n";
}
}
print "El número de permutaciones circulares es $exito\n";

I don't agree with your simplifications. It needs two hashes:
one for the objective, the circular unique lists, and another
for all the rotations. Each candidate must be checked against
both.

Anyway the problem with lenght 20,
(a a b b b c c c c d d d d d e e e e e e)
is intractable in terms of memory to store the results (using
a 16GB RAM computer). I halt the program with 15,4 GB consumed.
The counter of circular lists marks 5,3 millons and running up
at devils speed.

I run your fast routine dpermute to figure out the dimension
of the problem, but the program consume many time in the calculation.
Still waiting for coment it.

I am hopeless about tryng a real problem of this kind.

Thanks for your efforts, best regards

Jürgen Exner · Jan 12, 2009

gamo said:
Anyway the problem with lenght 20,
(a a b b b c c c c d d d d d e e e e e e)
is intractable in terms of memory to store the results (using
a 16GB RAM computer). I halt the program with 15,4 GB consumed.

Now wait a second. While I agree with xhoster that my count of n rotated
lists for an original list with n elements is not correct in the general
case it is surely an upper bound, because according to xhoster (and I
agree) some generated lists may be identical and thus collapse.

The counter of circular lists marks 5,3 millons and running up
at devils speed.

Therefore there are only a max of 20 rotated lists for the example
above:

aabbbccccdddddeeeeee
abbbccccdddddeeeeeea
bbbccccdddddeeeeeeaa
bbccccdddddeeeeeeaab
bccccdddddeeeeeeaabb
ccccdddddeeeeeeaabbb
cccdddddeeeeeeaabbbc
ccdddddeeeeeeaabbbcc
cdddddeeeeeeaabbbccc
dddddeeeeeeaabbbcccc
ddddeeeeeeaabbbccccd
dddeeeeeeaabbbccccdd
ddeeeeeeaabbbccccddd
deeeeeeaabbbccccdddd
eeeeeeaabbbccccddddd
eeeeeaabbbccccddddde
eeeeaabbbccccdddddee
eeeaabbbccccdddddeee
eeaabbbccccdddddeeee
eaabbbccccdddddeeeee

Some of these may be identical and thus reduce the number of results
(which they are not in this example), but how did you get 5.3 million
candidates?

jue

xhoster · Jan 12, 2009

gamo said:
I don't agree with your simplifications. It needs two hashes:
one for the objective, the circular unique lists,

Why store those in memory? Print them to stdout as you find them (assuming
the first implementation) and read them from disk later. (But in the 2nd
implementation, you naturally save one and only one representation in
memory for each.)

and another
for all the rotations.

If you canonicalize every circular list to have a single canonical
representation, then you only need to test the canonical representation for
one circular-list against all previous canonical representations, not all
the previous rotations. That is what the second one does.

Anyway the problem with lenght 20,
(a a b b b c c c c d d d d d e e e e e e)
is intractable in terms of memory to store the results (using=20
a 16GB RAM computer). I halt the program with 15,4 GB consumed.
The counter of circular lists marks 5,3 millons and running up
at devils speed.

I run your fast routine dpermute to figure out the dimension=20
of the problem, but the program consume many time in the calculation.
=20 Still waiting for coment it.

No need to run it to discover the size. The number of distinct
permutations is 20!/2!/3!/4!/5!/6! which is 97,772,875,200. I don't think
your circular lists can have any internal circular symmetry, so each
underlying circular list should have exactly 20 representatives present,
giving 4,888,643,760 circular lists. (Without any filtering, like your /gg/
example previously.)

If you force one of the 'a's to be in the first slot, then you only need
to generate 9,777,287,520 permutations, getting each circular list twice.

Find the other 'a', and rotate the string to put that 'a' in the first
slot. If the original string is "lt" the new rotated one, print the
original. Otherwise, print nothing as the rotated string either already has
or will later come up in its own right, and get printed at that point.
Nothing needs to be stored in RAM. For speed, I might do it in C, not
Perl.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

gamo · Jan 13, 2009

eaabbbccccdddddeeeee

Some of these may be identical and thus reduce the number of results
(which they are not in this example), but how did you get 5.3 million
candidates?

jue

@a = qw(a a b b b c c c c d d d d d e e e e e e);
$n = @a;
for (1..10_000_000){
@set = shuffle(@a);
$s = join '',@set;
$two = $s . $s;
$clist{$s}++;
if ($clist{$s}==1 && !defined $hash{$s}){
for $i (0..$n-1){
$r = substr $two,$i,$n;
$hash{$r}++;
if ($hash{$r}==1){ $counter++; }
}
$exito++;
print "$0 $exito\n";
}
}
print "El número de permutaciones circulares es $exito\n";
print "Counter of distinct permutations $counter\n";

After a while, taking keys %clist gives you lists that neither
is a rotation of another, nor there are duplicates. But the
experiment fails because the numbers Xho shows. It's exponential,
and intractable.

Best regards,

PS: Xho, I need to have the $clist and $hash in ram to make
comparisons with a new candidate. C is no necesary since
perl fill the memory in a moment.

sln · Jan 13, 2009

@a = qw(a a b b b c c c c d d d d d e e e e e e);
$n = @a;
for (1..10_000_000){
@set = shuffle(@a);
$s = join '',@set;
$two = $s . $s;
$clist{$s}++;
if ($clist{$s}==1 && !defined $hash{$s}){
for $i (0..$n-1){
$r = substr $two,$i,$n;
$hash{$r}++;
if ($hash{$r}==1){ $counter++; }
}
$exito++;
print "$0 $exito\n";
}
}
print "El número de permutaciones circulares es $exito\n";
print "Counter of distinct permutations $counter\n";

After a while, taking keys %clist gives you lists that neither
is a rotation of another, nor there are duplicates. But the
experiment fails because the numbers Xho shows. It's exponential,
and intractable.

Best regards,

PS: Xho, I need to have the $clist and $hash in ram to make
comparisons with a new candidate. C is no necesary since
perl fill the memory in a moment.

jeez, give it a rest........

sln

Xho Jingleheimerschmidt · Jan 15, 2009

He's looking at all permutations of those letters, not just the original
specified order.

@a = qw(a a b b b c c c c d d d d d e e e e e e);
$n = @a;
for (1..10_000_000){
@set = shuffle(@a);
$s = join '',@set;
$two = $s . $s;
$clist{$s}++; ....

After a while, taking keys %clist gives you lists that neither
is a rotation of another, nor there are duplicates.

%clist is populated unconditionally. Being a hash, it of course will
not have duplicates, but there is nothing to prevent it from having
things that are rotations of previous things. So it will end up having
them.

But the
experiment fails because the numbers Xho shows. It's exponential,
and intractable.

Best regards,

PS: Xho, I need to have the $clist and $hash in ram to make
comparisons with a new candidate. C is no necesary since
perl fill the memory in a moment.

It fills memory only because that is they way you insist on implementing
it. If you made use of my distinct permutation generator, you would not
need to use RAM to guard against degenerate linear permutations. If you
made use of canonicalization, you would not need to use RAM to guard
against each different rotational representation. Finally, if you use
my last suggestion to store a linear permutation if and only if it's
rotational canonicalization is equal to the original linear permutation
itself, then you wouldn't even need RAM to guard against duplicate
canonical representations. You could write those representations
directly to disk without storing them in RAM, knowing that each one
would only end up on disk exactly once. Then the only practical issues
of tractability would be CPU time and disk space, not RAM at all.

Xho

Jürgen Exner · Jan 15, 2009

Xho Jingleheimerschmidt said:
He's looking at all permutations of those letters, not just the original
specified order.

Oh, ok, thanks. First time someone mentioned that.

But there are proven and established ways to generate all permutations
of a given (multi)-set of items. Why don't those work?

And second question: why is he talking about circular lists (of the
original order?)?

jue

xhoster · Jan 15, 2009

Jürgen Exner said:
Oh, ok, thanks. First time someone mentioned that.

It's an XYZ thing. He wants all distinct circular lists that could be made
with arrangements of those letters. Distinct linear permutations is a
useful intermediate step in getting there.

But there are proven and established ways to generate all permutations
of a given (multi)-set of items. Why don't those work?

Are any of them on CPAN? I couldn't find any so made a couple of my own.

And second question: why is he talking about circular lists (of the
original order?)?

Circle "lists" (I'd call it circularly distinct strings) are the ultimate
goal. Distinct linear permutations are a useful intermediate in reaching
that goal.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

gamo · Jan 15, 2009

%clist is populated unconditionally. Being a hash, it of course will
not have duplicates, but there is nothing to prevent it from having
things that are rotations of previous things. So it will end up having
them.

The string is tested against %hash, which cointains all the rotations of
previous unique lists. So, I don't expect there will be duplicates in
%clist. Note that every unique list as soon as is discovered populates
%hash with all of it's rotations (from the first).

It fills memory only because that is they way you insist on implementing
it. If you made use of my distinct permutation generator, you would not
need to use RAM to guard against degenerate linear permutations. If you
made use of canonicalization, you would not need to use RAM to guard

I didn't understood the method of "canonicalization"

against each different rotational representation. Finally, if you use
my last suggestion to store a linear permutation if and only if it's
rotational canonicalization is equal to the original linear permutation
itself, then you wouldn't even need RAM to guard against duplicate
canonical representations. You could write those representations
directly to disk without storing them in RAM, knowing that each one
would only end up on disk exactly once. Then the only practical issues
of tractability would be CPU time and disk space, not RAM at all.

Xho

That would be great. Best regards,

gamo · Jan 15, 2009

The string is tested against %hash, which cointains all the rotations of
previous unique lists. So, I don't expect there will be duplicates in
%clist. Note that every unique list as soon as is discovered populates
%hash with all of it's rotations (from the first).

Ok, I rewrote something to assure that there are not duplicates in %clist
and run the test again, with similar results (speed, high use of mem).

The thing could change radically if there is a method to canonicalize all
the rotations of a list in a compact string. Did you say that is possible?

I don't think so. If we have numbers instead of letters, I can't think in
a method that could describe the list of circular numbers in less space
that it has.

Best,

Jürgen Exner · Jan 15, 2009

It's an XYZ thing. He wants all distinct circular lists that could be made
with arrangements of those letters. Distinct linear permutations is a
useful intermediate step in getting there.

Oh, I think I FINALLY understand. The "circular" is not for _generating_
additional results by rotating the list after permutation but it is a
reduction or filter, to _collaps_ multiple lists into one single result,
if those permutated lists are "rotationally equivalent", i.e. if they
can be generated by rotating one of them.

Thank you, thank you, thank you for the explanation.

I'll reread the thread with that clarification in mind.

Are any of them on CPAN? I couldn't find any so made a couple of my own.

Maybe
http://search.cpan.org/~edpratomo/Algorithm-Permute-0.12/Permute.pm

It mentions some other permute implementations in a benchmark, too.

However -as others have discussed already- I doubt that it will be very
useful for larger lists (more than 10 to 15 elements) , because the OP
will run out of space and time if he tries to generate all permutations
first and then filter second.
The filter for rotational equivalence has to be incorporated into the
generating algorithm.

jue

FAQ 4.47 How do I handle circular lists?	0	Apr 6, 2011
Padding strings for a clean visual print out...	5	Dec 23, 2023
Command Line Arguments	0	Mar 7, 2023
circular buffering using substr() ?	9	Feb 3, 2008
Issue with textbox script?	0	Sep 5, 2022
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
calculation on lists	0	Dec 19, 2012
Average of MultiMode of a list of a list	1	Oct 28, 2022

Circular lists

xhoster

gamo

xhoster

gamo

Jürgen Exner

Jürgen Exner

xhoster

xhoster

xhoster

gamo

Jürgen Exner

xhoster

gamo

sln

Xho Jingleheimerschmidt

Jürgen Exner

xhoster

gamo

gamo

Jürgen Exner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads