[perl-python] exercise: partition a list by equivalence

Xah Lee · Feb 17, 2005

here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge($pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

For those of you unfamiliar with math lingoes, partition a list means
to create sublists, based on some criteria. In our case, the input list
comes in the form of pairs, and its members are the union of all
members in the pairs. The criterion for partition in our case is that
of equivalence, specified in the pairs input.

Try to write a Python code for this. In the Python code, the input
should be a list of couples. (for Perlers, sketch out the algorithm on
paper and try to code in Python directly.)

I'll post Perl & Python code tomorrow.

==This is brought to you by the Perl-Python community.==
== http://xahlee.org/perl-python/python.html ==

Xah
(e-mail address removed)
http://xahlee.org/PageTwo_dir/more.html

alex23 · Feb 18, 2005

For those of you unfamiliar with math lingoes, partition a list means

to create sublists, based on some criteria.

Typical moronic mathematicians with their exclusionary lingoes...why
can't they just say "create sublists based on some criteria" like
normal people?

- alex23

anton muhin · Feb 18, 2005

Xah said:
here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge($pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

Almost a joke:

from numarray import *

def merge(*pairs):
flattened = reduce(tuple.__add__, pairs, tuple())
m, M = min(flattened), max(flattened)

d = M - m + 1
matrix = zeros((d, d), type = Bool)

for x, y in pairs:
X, Y = x - m, y - m

matrix[X, X] = 1
matrix[X, Y] = 1
matrix[Y, X] = 1
matrix[Y, Y] = 1

while True:
next = greater(dot(matrix, matrix), 0)
if alltrue(ravel(next == matrix)):
break
matrix = next

results = []
for i in range(d):
eqls, = nonzero(matrix)
if eqls.size():
if i == eqls[0]:
results.append(tuple(x + m for x in eqls))

return results

Disclaimer: I'm not an expert in numarray and suppose the something can
be dramatically imporved.

John Lenton · Feb 18, 2005

here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge($pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

in Python:

def merge(pairings):
rev = {}
res = []
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev
if has_first and has_second:
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy)
rev[first] = rev[second] = copy
return filter(None, res)

and in Perl:

sub merge($)
{
my %rev = ();
my @res = ();
my ($pairing, $first, $second, $has_first, $has_second);
foreach $pairing (@{$_[0]})
{
($first, $second) = @$pairing;
$has_first = exists $rev{$first};
$has_second = exists $rev{$second};
if ($has_first and $has_second)
{
push @{$rev{$first}}, @{$rev{$second}};
@{$rev{$second}} = ();
$rev{$second} = $rev{$first};
}
elsif (exists $rev{$first})
{
push @{$rev{$first}}, $second;
$rev{$second} = $rev{$first};
}
elsif (exists $rev{$second})
{
push @{$rev{$second}}, $first;
$rev{$first} = $rev{$second};
}
else
{
my @copy = ($first, $second);
push @res, \@copy;
$rev{$first} = $rev{$second} = \@copy;
}
}
return [grep @$_, @res];
}

although in Perl you wouldn't define it to take a reference to a list
(as you did), but rather a list, and you wouldn't define it to return
a reference to a list, but rather a list in list context (and possibly
a reference in scalar context).

Both versions are (IMHO) pretty clear (when the docstring/pod is
included), and O(n) because dict/hash lookup and list
appending/pushing is O(1) in both languages. Both versions can
probably be tweaked for speed quite a bit, but I don't *think* there's
a better-than-O(n) algorithm for this.

Note that the Python version assumes that the pairs' elements are
hashable; your example used numbers, so I thought it was pretty safe
assumption. The Perl version has no such restriction.

--
John Lenton ([email protected]) -- Random fortune:
Noble cosa es, aún para un anciano, el aprender.
-- Sófocles. (497-406 a.C.) Filósofo griego.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFhXjgPqu395ykGsRAus5AJ9iCSbGsyP3QHZy/whP195GSVNIwwCcDyqf
hIA+oWaLBHdSGUi7t79Wfx8=
=BlC7
-----END PGP SIGNATURE-----

John Machin · Feb 18, 2005

John said:
here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge($pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

Click to expand...

in Python:

def merge(pairings):
rev = {}
res = []
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev

Not robust in the face of input like:
[[1,1]]
or
[[1,2], [1,2]]
or
[[1,2], [2,1]]
or
[[1,2], [2,3], [3,1]]

needs "if first == second: continue" here

if has_first and has_second:

needs "if rev[first] == rev[second]: continue" here

rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy)

My reaction to the "magic" by which res grows was "omigod that's the
most horrible thing I've seen for quite a while" but there was worse to
come

rev[first] = rev[second] = copy
return filter(None, res)

and in Perl:
[snip]

Both versions are (IMHO) pretty clear (when the docstring/pod is
included), and O(n) because dict/hash lookup and list
appending/pushing is O(1) in both languages. Both versions can
probably be tweaked for speed quite a bit, but I don't *think* there's
a better-than-O(n) algorithm for this.

List appending is O(1) only in the amortised sense. Extending is not
O(1) in any sense. Neither is the list comparison that is necessary for
robustness (using your data structure, that is).

You don't need to think. This problem has been extensively picked over
by folk who are a lot smarter than us, starting from 30 years ago.
Google for "disjoint set" and "union-find". One gets the impression
that the best possible algorithm would be slightly *worse* than O(n).

Note that the Python version assumes that the pairs' elements are
hashable; your example used numbers, so I thought it was pretty safe
assumption. The Perl version has no such restriction.

In the real world, the objects under comparison (images, fingerprints,
customer records, etc) may not themselves be hashable and comparison
for equality may be expensive but a unique identifier would be assigned
to each object and that ID would of course be cheaply hashable and
comparable.

John Lenton · Feb 18, 2005

Not robust in the face of input like:
[[1,1]]
or
[[1,2], [1,2]]
or
[[1,2], [2,1]]
or
[[1,2], [2,3], [3,1]]

oops, my bad.

needs "if first == second: continue" here

if has_first and has_second:

Click to expand...

needs "if rev[first] == rev[second]: continue" here

an 'is' is enough, and better.

rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy)

Click to expand...

My reaction to the "magic" by which res grows was "omigod that's the
most horrible thing I've seen for quite a while" but there was worse to
come

what is magic about it? is it really that horrible?

List appending is O(1) only in the amortised sense. Extending is not
O(1) in any sense. Neither is the list comparison that is necessary for
robustness (using your data structure, that is).

You don't need to think. This problem has been extensively picked over
by folk who are a lot smarter than us, starting from 30 years ago.
Google for "disjoint set" and "union-find". One gets the impression
that the best possible algorithm would be slightly *worse* than O(n).

Of course! I'd forgotten clean about union-find. And yes, it's
O(n*log(n)) union and find, and my implementation is O(n**2) for
union, O(1) for find; I'm pleased that, in spite of having forgotten
about union-find, I "reinvented" (heh) something that is better than
the "naive" implementation we saw in class.

Now I'm even more worried about your dismissal of this as magic and
horrible.

--
John Lenton ([email protected]) -- Random fortune:
Why don't you fix your little problem... and light this candle?
-- Alan Shepherd, the first man into space, Gemini program

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFoBIgPqu395ykGsRApUIAKCnNmctznqq1KTfZhi7Dl8YpjPnbQCguo2E
z/Ss9rWC64h9vuDrJM//Ge8=
=rVAU
-----END PGP SIGNATURE-----

Cyril BAZIN · Feb 19, 2005

Hello John,

Try your python code on this example:
merge([[1,2], [3,4], [1,2], [5,3]])

The result given by your function is:
[[3, 4, 5]]

Sorry...

To Xah: next time you propose an exercise, write some UNIT TESTS!!!
Then people will be able to test if there answers are correct or not.

Cyril

here's another interesting algorithmic exercise, again from part of a
larger program in the previous series.

Here's the original Perl documentation:

=pod

merge($pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not specified.)

=cut

Click to expand...

in Python:

def merge(pairings):
rev = {}
res = []
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev
if has_first and has_second:
rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy)
rev[first] = rev[second] = copy
return filter(None, res)

and in Perl:

sub merge($)
{
my %rev = ();
my @res = ();
my ($pairing, $first, $second, $has_first, $has_second);
foreach $pairing (@{$_[0]})
{
($first, $second) = @$pairing;
$has_first = exists $rev{$first};
$has_second = exists $rev{$second};
if ($has_first and $has_second)
{
push @{$rev{$first}}, @{$rev{$second}};
@{$rev{$second}} = ();
$rev{$second} = $rev{$first};
}
elsif (exists $rev{$first})
{
push @{$rev{$first}}, $second;
$rev{$second} = $rev{$first};
}
elsif (exists $rev{$second})
{
push @{$rev{$second}}, $first;
$rev{$first} = $rev{$second};
}
else
{
my @copy = ($first, $second);
push @res, \@copy;
$rev{$first} = $rev{$second} = \@copy;
}
}
return [grep @$_, @res];
}

although in Perl you wouldn't define it to take a reference to a list
(as you did), but rather a list, and you wouldn't define it to return
a reference to a list, but rather a list in list context (and possibly
a reference in scalar context).

Both versions are (IMHO) pretty clear (when the docstring/pod is
included), and O(n) because dict/hash lookup and list
appending/pushing is O(1) in both languages. Both versions can
probably be tweaked for speed quite a bit, but I don't *think* there's
a better-than-O(n) algorithm for this.

Note that the Python version assumes that the pairs' elements are
hashable; your example used numbers, so I thought it was pretty safe
assumption. The Perl version has no such restriction.

John Machin · Feb 19, 2005

John said:
Not robust in the face of input like:
[[1,1]]
or
[[1,2], [1,2]]
or
[[1,2], [2,1]]
or
[[1,2], [2,3], [3,1]]

Click to expand...

oops, my bad.

needs "if first == second: continue" here

if has_first and has_second:

Click to expand...

needs "if rev[first] == rev[second]: continue" here

Click to expand...

an 'is' is enough, and better.

Good point. You're redeeming yourself

rev[first].extend(rev[second])
rev[second][:] = []
rev[second] = rev[first]
elif has_first:
rev[first].append(second)
rev[second] = rev[first]
elif has_second:
rev[second].append(first)
rev[first] = rev[second]
else:
copy = [first, second]
res.append(copy)

Click to expand...

My reaction to the "magic" by which res grows was "omigod that's the
most horrible thing I've seen for quite a while" but there was worse to
come

Click to expand...

what is magic about it? is it really that horrible?

Try explaining to the newbies over on the tutor list how despite "res"
only ever *explicitly* having little bits like [3, 4] appended to it,
it (or more properly the thing to which it refers) is actually
festering and growing and being mutated under the surface until at the
finale it bursts out dripping slime just like the creature from the
black lagoon ...

Brian Beck · Feb 19, 2005

Xah said:
merge($pairings) takes a list of pairs, each pair indicates the
sameness
of the two indexes. Returns a partitioned list of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the same. Therefore
1==2==4. The result returned is

[[4,2,1],[6,5]];

Not sure how efficient this is, but I decided to take advantage of the
operations provided by sets:

def merge(pairs):
pairs = set(tuple(sorted(p)) for p in pairings)
merged = []
# Each loop will result in a new, complete sublist in the result.
while pairs:
p = set(pairings.pop())
remove = set([])
for pair in pairs:
pairSet = set(pair)
if pairSet & p:
p |= pairSet
remove.add(pair)
pairs -= remove
merged.append(list(p))
return merged

Brian Beck · Feb 19, 2005

Brian said:

>

Code:

[/QUOTE]

Whoops, that should say:

def merge(pairs):
     pairs = set(tuple(sorted(p)) for p in pairs)
     merged = []
     # Each loop will result in a new, complete sublist in the result.
     while pairs:
         p = set(pairs.pop())
         remove = set([])
         for pair in pairs:
             pairSet = set(pair)
             if pairSet & p:
                 p |= pairSet
                 remove.add(pair)
         pairs -= remove
         merged.append(list(p))
     return merged

Brian Beck · Feb 19, 2005

Brian said:
Brian said:

Code:

[/QUOTE][/QUOTE] Ah heck, nevermind... it worked for my small test cases but the algorithm is just wrong.

Click to expand...

David Eppstein · Feb 19, 2005

"John Machin said:
You don't need to think. This problem has been extensively picked over
by folk who are a lot smarter than us, starting from 30 years ago.
Google for "disjoint set" and "union-find". One gets the impression
that the best possible algorithm would be slightly *worse* than O(n).

It can be solved by union-find
(e.g. with UnionFind from <http://www.ics.uci.edu/~eppstein/PADS/>):

import UnionFind
import sets

def merge(pairs):
uf = UnionFind.UnionFind()
items = sets.Set()
for a,b in pairs:
uf.union(a,b)
items.add(a)
items.add(b)
leaders = {}
for a in items:
leaders.setdefault(uf[a],sets.Set()).add(a)
return [list(component) for component in leaders.values()]

If you do all the unions before all the finds, as in this algorithm,
union-find is O(n).

However it might be easier to treat the input pairs as the edges of a
graph and apply a depth-first-search based graph connected components
algorithm.

Brian Beck · Feb 19, 2005

Well, it looks like David posted a good solution, but I haven't tested
it (I'm assuming his works fine). I fixed mine up anyway... it actually
works now. If you're not using 2.4 you'll have to import sets.Set as set.

def merge(pairList):
pairList = set(tuple(sorted(p)) for p in pairList)
# Sort & set to remove duplicates, tuple to make hashable
merged = []
removePairs = set([])

# Each loop will result in a new, complete sublist in the result
while pairList:
if removePairs:
removePairs = set([])
else:
subList = set(pairList.pop()) # Start a new sublist

for pair in pairList:
pairSet = set(pair)
# True when pairSet and subList share at least one element
if pairSet & subList:
subList |= pairSet # Merge pair with subList
removePairs.add(pair) # Mark pair for removal

if removePairs:
pairList -= removePairs
else:
merged.append(list(subList))

return merged

John Lenton · Feb 19, 2005

needs "if rev[first] == rev[second]: continue" here

Click to expand...

an 'is' is enough, and better.

Click to expand...

Good point. You're redeeming yourself

this, together with you saying that it is hard to explain, makes me
think that you aren't comfortable thinking of lists as mutable
objects.

what is magic about it? is it really that horrible?

Click to expand...

Try explaining to the newbies over on the tutor list how despite "res"
only ever *explicitly* having little bits like [3, 4] appended to it,
it (or more properly the thing to which it refers) is actually
festering and growing and being mutated under the surface until at the
finale it bursts out dripping slime just like the creature from the
black lagoon ...

understanding why that works, and why it is 'is' and not '==', are
both part of the same thing. Lists are mutable, and you can mutate
them, and they mutate. Unless you actually write code that uses the
fact you will forget it, and it will bite you. Of course, don't use it
just for the heck of it, but that creature you dismiss as a
slime-dripping mutation is actually quite useful.

While I'm at being unpolite, do you really think this code was harder
to understand than the code posted by anton, using numarray?

And, of course, if this code were for anything non-throw-awayable,
there would've been quite a bit of explaining going on between those
lines of code.

Ok, now back to being polite

--
John Lenton ([email protected]) -- Random fortune:
Hay más días que longanizas.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFq8rgPqu395ykGsRAnTNAJ4rtCpfoFUYJYLpQ6WmvHbmTLSsYgCeJzRE
dXMUU9lYxyECNtld9JjcdeA=
=2pzJ
-----END PGP SIGNATURE-----

David Eppstein · Feb 19, 2005

David Eppstein said:
It can be solved by union-find
(e.g. with UnionFind from <http://www.ics.uci.edu/~eppstein/PADS/>):

Here's a cleaned up version, after I added a proper iter() method to the
UnionFind data structure:

import UnionFind

def merge(pairs):
uf = UnionFind.UnionFind()
for a,b in pairs:
uf.union(a,b)
components = {}
for a in uf:
components.setdefault(uf[a],[]).append(a)
return components.values()

If you do all the unions before all the finds, as in this algorithm,
union-find is O(n).

That was a little too sloppy. It is possible to solve the union find
problem, with all unions before all finds, in time O(n). However the
usual union find data structure takes more time, O(n alpha(n)).

However it might be easier to treat the input pairs as the edges of a
graph and apply a depth-first-search based graph connected components
algorithm.

This would be O(n), though.

John Machin · Feb 19, 2005

John said:
needs "if rev[first] == rev[second]: continue" here

an 'is' is enough, and better.

Click to expand...

Good point. You're redeeming yourself

Click to expand...

this, together with you saying that it is hard to explain, makes me
think that you aren't comfortable thinking of lists as mutable
objects.

How so? There is no connection between is/== and mutability. Let me
amplify: The point about 'is' is a good one, and aids your redemption
after your failure to have adequate guards caused your algorithm not to
work.

what is magic about it? is it really that horrible?

Click to expand...

Try explaining to the newbies over on the tutor list how despite "res"
only ever *explicitly* having little bits like [3, 4] appended to it,
it (or more properly the thing to which it refers) is actually
festering and growing and being mutated under the surface until at the
finale it bursts out dripping slime just like the creature from the
black lagoon ...

Click to expand...

understanding why that works, and why it is 'is' and not '==', are
both part of the same thing.

What same thing is that?

Lists are mutable, and you can mutate
them, and they mutate. Unless you actually write code that uses the
fact you will forget it, and it will bite you. Of course, don't use it
just for the heck of it, but that creature you dismiss as a
slime-dripping mutation is actually quite useful.

You are confusing mutability of lists (without which they would be
called tuples!) with my point that the 'res' list was having its
contents fiddled with implicitly through other "pointers".

While I'm at being unpolite, do you really think this code was harder
to understand than the code posted by anton, using numarray?

I only read it as far as the bit where it creating a matrix of size
O(N**2) -- in my app N can be over a million so I lost interest
rapidly.

And, of course, if this code were for anything non-throw-awayable,
there would've been quite a bit of explaining going on between those
lines of code.

Not of course, but of necessity.

Ok, now back to being polite

Welcome back.

John Lenton · Feb 19, 2005

How so? There is no connection between is/== and mutability. Let me
amplify: The point about 'is' is a good one, and aids your redemption
after your failure to have adequate guards caused your algorithm not to
work.

hey, it worked for all the test cases provided by the customer!

and
then some; I hadn't thought of checking for cycles nor repetetitions.

[snip]

You are confusing mutability of lists (without which they would be
called tuples!) with my point that the 'res' list was having its
contents fiddled with implicitly through other "pointers".

umm... nope, see, well, hair-splitting and all that, there is this
list that holds the partitions; the partitions are lists of
elements. There is a reverse mapping that, for each element, holds the
partition that element is in. So basically you go down the list of
pairings, modifying the partitions as you go. I'm certain if I had
commented the function appropriately, we wouldn't be having this
discussion

Let me see if I can remedy that:

def merge(pairings):
"""
merge(pairings) takes a list of pairs, each pair indicates
the sameness of the two indexes. Returns a partitioned list
of same indexes.

For example, if the input is
merge( [ [1,2], [2,4], [5,6] ] );

that means 1 and 2 are the same. 2 and 4 are the
same. Therefore 1==2==4. The result returned is

[[4,2,1],[6,5]];

(ordering of the returned list and sublists are not
specified.)
"""

# the list of partitions, or partitioned list.
# (each partition is a list of elements)
res = []

# reverse mapping (element -> partition it is in)
rev = {}
for pairing in pairings:
first, second = pairing
has_first = first in rev
has_second = second in rev
if has_first and has_second:
# both elements already in a partition...
if rev[first] is rev[second]:
# both elements in same partition, nothing to do
continue
# joining the two partitions:
# - grow one partition with the other one
rev[first].extend(rev[second])
# - clear the other one
rev[second][:] = []
# update reverse mapping
rev[second] = rev[first]
elif has_first:
# partition already exists, just add an element to it
rev[first].append(second)
# update reverse mapping
rev[second] = rev[first]
elif has_second:
# ditto
rev[second].append(first)
rev[first] = rev[second]
else:
# create partition from scratch
if first == second:
new = [first]
else:
new = [first, second]
# add it to list of partitions
res.append(new)
# update reverse mapping
rev[first] = rev[second] = new
# remove empty partitions
return filter(None, res)

hrmph, I should hit the sack. Sorry if this is still ugly, I'm too
tired to tell.

--
John Lenton ([email protected]) -- Random fortune:
Todo bicho que camina va a parar cuando se canse.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFCFulQgPqu395ykGsRAn9kAJ9bTmJYwo5L90AE3mUiQ30MINy0hwCZARCA
npF5v2EOF8fnMT8vS5AGed4=
=ELjA
-----END PGP SIGNATURE-----

Xah Lee · Feb 19, 2005

here's the answer to the partition by equivalence exercise.

-------------------------------------------

# Perl code

sub merge($) {
my @pairings = @{$_[0]};

my @interm; # array of hashs

# chop the first value of @pairings into @interm
$interm[0]={$pairings[0][0]=>'x'}; ${interm[0]}{$pairings[0][1]}='x';
shift @pairings;

N1: for my $aPair (@pairings) {
for my $aGroup (@interm) {
if (exists ${$aGroup}{$aPair->[0]})
{${$aGroup}{$aPair->[1]}='x'; next N1}
if (exists ${$aGroup}{$aPair->[1]})
{${$aGroup}{$aPair->[0]}='x'; next N1}
}
push @interm, {$aPair->[0]=>'x'}; ${interm[-1]}{$aPair->[1]}='x';
}

my @fin = shift @interm;

N2: for my $group (@interm) {
for my $newcoup (@fin) {
foreach my $k (keys %$group) {
if (exists ${$newcoup}{$k}) {map { ${$newcoup}{$_}='x'}
(keys %$group); next N2;}
}
}
push @fin, $group;
}
return map {[keys (%$_)]} @fin;
}

-----------------------------------------------
# Here's a direct translation of the Perl code above into python:
©
©def merge(pairings): # pairings is a list of couples. e.g.
[(9,2),(7,6),...]
©
© # interm is a list of groups. Each group is a list that hold
© # equivalent numbers. interm stands for interim result. Each
group
© # is a dictionary. Keys are numbers, values are all dummy
© # 'x'. Dictionary is used for ease of dealing with duplicates or
© # checking existence.
© interm=[];
©
© # move first pair of pairings into interm as the first group
© interm.append({pairings[0][0]:'x', pairings[0][1]:'x'}) ; del
pairings[0]
©
© # go thru pairings. For each pair, check if it is in any group in
© # interm. If any part of pair is in a group, then add the other
© # part into that group. If no part of the pair is in any group,
© # then add this pair into interm as a new group.
© for aPair in pairings:
© for aGroup in interm:
© if (aGroup.has_key(aPair[0])): aGroup[aPair[1]]='x';
break
© if (aGroup.has_key(aPair[1])): aGroup[aPair[0]]='x';
break
© else: interm.append( {aPair[0]:'x'} );
interm[-1][aPair[1]]='x'
©
© # now make another pass of the groups in interm, because some
pair
© # that may connect two groups (i.e. with one element in one
group,
© # and second element in another group), yet the pair is simply
© # consumed by a group.
© # This pass will check if there are any element in any other
© # group, if so, such two groups will be unioned. In this pass, we
© # move things from interm into fin. fin==final.
© fin=[]; fin.append(interm.pop(0))
© for group in interm:
© for newcoup in fin:
© for k in group.keys():
© if newcoup.has_key(k):
© for kk in group.keys(): newcoup[kk]='x';
© break
© break
© fin.append(group)
©
© # now turn the dictionaries into lists for return value
© result=[];
© for group in fin: result.append(group.keys())
© return result
©
-------------------
I wrote this (Perl) program in 2003-09, and now basically forgot all
about the internals. The original Perl code does not have inline
comments, nor public consumable documentation as this is my own
project. In the process of translation and the publication and
explanation on this page, i eventually have to re-acquaint the
algorithm i used as i go thru the lines. I was thinking of a quick
brainless translation word-for-word, but that turned out not possible
as i run into problems.

(While i'm learning Python, i run into frustrations with the Python
Documentation. (although it has far more quality than Perl
documentations). The frustrations with documentations will be appended
to this page later: How to write a tutorial )

The translation problem i run into is this. In Perl, there's a GOTO
construct where in a loop one can say "break XYZ" to jump to a
arbitrary outer loop labeled XYZ. Python has "break" but does not
provide a GOTO jump as in Perl. In the process, i have to actually
figure out (for the first time for me) how loops with GOTO jumps can be
translated to alternative structure. This turned out not to be too
hard. For a GOTO jump to a far outer group, one can use multiple breaks
at the end of each loop, possibly in addiction adding a "else"
clause to the different levels of the loops. (Python language can have
a "else" clause for "for" loops. It is executed when the loop
completes. (as opposed to when a break inside jumped out))

Here is a loop with GOTO, translated into Python without:

N1: for my $aPair (@pairings) {
for my $aGroup (@interm) {
if (exists ${$aGroup}{$aPair->[0]})
{${$aGroup}{$aPair->[1]}='x'; next N1}
if (exists ${$aGroup}{$aPair->[1]})
{${$aGroup}{$aPair->[0]}='x'; next N1}
}
push @interm, {$aPair->[0]=>'x'}; ${interm[-1]}{$aPair->[1]}='x';
}
©-----------------------------------
© for aPair in pairings:
© for aGroup in interm:
© if (aGroup.has_key(aPair[0])): aGroup[aPair[1]]='x';
break
© if (aGroup.has_key(aPair[1])): aGroup[aPair[0]]='x';
break
© else: interm.append( {aPair[0]:'x'} );
interm[-1][aPair[1]]='x'
©

Below is another such trans-structure, distinct from the above.

N2: for my $group (@interm) {
for my $newcoup (@fin) {
foreach my $k (keys %$group) {
if (exists ${$newcoup}{$k}) {map { ${$newcoup}{$_}='x'}
(keys %$group); next N2;}
}
}
push @fin, $group;
}
©-----------------------------------
© for group in interm:
© for newcoup in fin:
© for k in group.keys():
© if newcoup.has_key(k):
© for kk in group.keys(): newcoup[kk]='x';
© break
© break
© fin.append(group)
©

The Python version above has not been tested much, but i suppose it is
fine since it is basically a copy of the Perl code. The Perl version is
fairly reliable, as it went thru the gauntlet of special cases testing
and i've used it over the year in the larger program... however no any
formal proof or exhaustive machine testing has been done. Later i might
write some program to auto-test them... but that'd be another day. The
Python version can also use some clean up, or even rewrite as a program
in the Python language proper. Addenda or Errata will be added on this
page.

PS all together there are some 3 or so solutions posted on the
newsgroups. (one by private email) I will have to filter them out and
study them.

Any interesting or important Addenda or Errata will be emailed out
later.
In addition to being archived here:
http://xahlee.org/perl-python/partition_by_equiv.html

Xah
(e-mail address removed)
http://xahlee.org/PageTwo_dir/more.html

Reinhold Birkenfeld · Feb 19, 2005

Xah said:
here's the answer to the partition by equivalence exercise.

Your Python solution is, as expected, wrong (try with
[[10, 8], [7, 3], [1, 7], [5, 4], [2, 2], [3, 8], [7, 10], [2, 3], [6,
10], [3, 2]]
for example).

The solution by John Lenton is wrong, too.

The solution by Brian Beck delivers the correct result for most input,
but for some input lists like
[[3, 3], [8, 7], [3, 2], [8, 5], [5, 6], [6, 3], [10, 8], [8, 10], [4,
10], [10, 2]]
it chokes and returns the empty list.

My solution (which may not be the fastest or most effective, but till
now is the shortest <wink> and it works):

def merge(pairings):
sets = {}
for x1, x2 in pairings:
newset = (sets.get(x1, frozenset([x1]))
| sets.get(x2, frozenset([x2])))
for i in newset:
sets = newset

return [list(aset) for aset in set(sets.itervalues())]

Reinhold

bearophileHUGS · Feb 19, 2005

David Eppstein:

However it might be easier to treat the input pairs as the edges of a
graph and apply a depth-first-search based graph connected components
algorithm. || This would be O(n), though.

Is the DFS the best one here (instead of BFS)?

With the graph implementation that I've just shown here it becomes:

.. import graph
.. def merge(l):
.. g = graph.Graph()
.. g.addArcs(l)
.. return g.connectedComponents()
.. print merge( [ [1,2], [2,4], [5,6] ] )

I presume the complexity is O(n+a); n (the nodes) is proportional to
the number of pairs, and
a (the arcs) depends on the "intricacy" of the input pairs. For a
complete graph, this can
become slow (but there is a high probably that all the pairs are in the
same connected component).
Bye,
Bearophile

[perl-python] generic equivalence partition	10	Feb 24, 2005
[perl-python] 20050118 keyed list	6	Jan 18, 2005
[perl-python] generate all possible pairings	3	Feb 27, 2005
[perl-python] problem: reducing comparison	9	Feb 15, 2005
lang comparison: in-place algorithm for reversing a list in Perl,Python, Lisp	5	Mar 1, 2012
[perl-python] 20050211 generating expression by nested loops	0	Feb 11, 2005
a new Perl/Python a day	14	Jan 10, 2005
[perl-python] combinatorics fun	7	Feb 10, 2005

[perl-python] exercise: partition a list by equivalence

Xah Lee

alex23

anton muhin

John Lenton

John Machin

John Lenton

Cyril BAZIN

John Machin

Brian Beck

Brian Beck

Brian Beck

David Eppstein

Brian Beck

John Lenton

David Eppstein

John Machin

John Lenton

Xah Lee

Reinhold Birkenfeld

bearophileHUGS

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads