How to populate all possible hierarchical clusterings from a set of elements?

justin · Jan 12, 2011

The title sounds too complex, but my question is actually simple.

Suppose I have [1,2,3,4,5], then there are many ways of making
clustering.
Among them, I want to pair up terminals until there is only one left
at the end.
For example, ((((1,2),3),4),5), (1,(2,(3,(4,5)))), or (((1,2),(3,4)),
5) would be legitimate ones.

How do you think can I, using the modules of Python such as itertools
as much as possible, make all possible such clusterings?

Thanks in advance,
Justin.

Peter Otten · Jan 12, 2011

justin said:
The title sounds too complex, but my question is actually simple.

Suppose I have [1,2,3,4,5], then there are many ways of making
clustering.
Among them, I want to pair up terminals until there is only one left
at the end.
For example, ((((1,2),3),4),5), (1,(2,(3,(4,5)))), or (((1,2),(3,4)),
5) would be legitimate ones.

How do you think can I, using the modules of Python such as itertools
as much as possible, make all possible such clusterings?

Here's my first attempt:

def cluster(items):
if len(items) == 2:
yield items
return

for i in range(len(items)-1):
for c in cluster(items[:i] + (items[i:i+2],) + items[i+2:]):
yield c

def unique(items):
seen = set()
for item in items:
if item not in seen:
seen.add(item)
yield item

if __name__ == "__main__":
for item in unique(cluster(tuple("abcd"))):
print item

Unfortunately I get a lot of duplicates

You could define a kind of operator precedence using
itertools.combinations(), but I think it suffers from the same problem as

a 3 b 1 c 2 d

and

a 2 b 1 c 3 d

would both result in ((a, b), (c, d)).

DevPlayer · Jan 13, 2011

lst = [1, 2, 3, 4, 5]

def maketup(lst):
cur_item = lst[-1]
lst = lst[:-1]
if len(lst):
return maketup(lst), cur_item
else:
return cur_item

print maketup(lst)

((((1, 2), 3), 4), 5)

But I'm confused as to what you mean by :

Among them, I want to pair up terminals until there is only one left
at the end.

One what? one pair?, one terminal meaning one number?

DevPlayer · Jan 13, 2011

def maketup(lst):

if len(lst) == 1:
return lst[0]

elif len(lst) == 2:
return (lst[0],lst[1])

elif len(lst) > 2:
return ( (maketup(lst[:-2]), lst[-2]), lst[-1])

maketup(lst)

((((1, 2), 3), 4), 5)

Alain Ketterlin · Jan 13, 2011

justin said:
Suppose I have [1,2,3,4,5], then there are many ways of making
clustering.
Among them, I want to pair up terminals until there is only one left
at the end.

Are you trying "ascending hierarchical clustering" by any chance? In
that case you're supposed to use some kind of distance to select the
(unique) pair of elements to merge at each step.

For example, ((((1,2),3),4),5), (1,(2,(3,(4,5)))), or (((1,2),(3,4)),
5) would be legitimate ones.

How do you think can I, using the modules of Python such as itertools
as much as possible, make all possible such clusterings?

I don't know about itertools, but the basic idea is:

def clusterings(l):
if len(l) == 1:
print repr(l)
else:
n = len(l)
for i in xrange(n):
for j in xrange(i+1,n):
clusterings(l[:i]+l[i+1:j]+l[j+1:]+[[l,l[j]]])

Test this with:

import sys
clusterings([i for i in xrange(int(sys.argv[1]))])

Do you realize there are *many* such clusterings? (the exact number
should be (n!)*((n-1)!)/2^(n-1) given the code above, if I'm not
mistaken.)

-- Alain.

Alain Ketterlin · Jan 13, 2011

DevPlayer said:
def maketup(lst):

if len(lst) == 1:
return lst[0]

elif len(lst) == 2:
return (lst[0],lst[1])

elif len(lst) > 2:
return ( (maketup(lst[:-2]), lst[-2]), lst[-1])

The OP wants all binary trees over the elements, not just one.

-- Alain.

DevPlayer · Jan 13, 2011

tuple([ (tuple(lst[x-1:x+1]) if len(tuple(lst[x-1:x+1]))==2 else
lst[x-1]) for x in lst[::2]])

((1, 2), (3, 4), 5)

# or
x = ((tuple(lst[x-1:x+1]) if len(tuple(lst[x-1:x+1]))==2 else
lst[x-1]) for x in lst[::2])
x.next()
(1, 2)
x.next()
(3, 4)
x.next()
5

DevPlayer · Jan 13, 2011

Ah. out of my depth.

Richard Thomas · Jan 13, 2011

justin said:
justin said:

Suppose I have [1,2,3,4,5], then there are many ways of making
clustering.
Among them, I want to pair up terminals until there is only one left
at the end.

Click to expand...

Are you trying "ascending hierarchical clustering" by any chance? In
that case you're supposed to use some kind of distance to select the
(unique) pair of elements to merge at each step.

For example, ((((1,2),3),4),5), (1,(2,(3,(4,5)))), or (((1,2),(3,4)),
5) would be legitimate ones.

Click to expand...

How do you think can I, using the modules of Python such as itertools
as much as possible, make all possible such clusterings?

Click to expand...

I don't know about itertools, but the basic idea is:

def clusterings(l):
if len(l) == 1:
print repr(l)
else:
n = len(l)
for i in xrange(n):
for j in xrange(i+1,n):
clusterings(l[:i]+l[i+1:j]+l[j+1:]+[[l,l[j]]])

Test this with:

import sys
clusterings([i for i in xrange(int(sys.argv[1]))])

Do you realize there are *many* such clusterings? (the exact number
should be (n!)*((n-1)!)/2^(n-1) given the code above, if I'm not
mistaken.)

-- Alain.

Actually the number of such "clusterings" is the (n-1)th Catalan
number.

http://en.wikipedia.org/wiki/Catalan_numbers

Chard.

Alain Ketterlin · Jan 13, 2011

Richard Thomas said:
On Jan 13, 10:02Â am, Alain Ketterlin <[email protected]>

def clusterings(l):
Â Â if len(l) == 1:
Â Â Â Â print repr(l)
Â Â else:
Â Â Â Â n = len(l)
Â Â Â Â for i in xrange(n):
Â Â Â Â Â Â for j in xrange(i+1,n):
Â Â Â Â Â Â Â Â clusterings(l[:i]+l[i+1:j]+l[j+1:]+[[l,l[j]]])
[...] there are *many* such clusterings? (the exact number should be
(n!)*((n-1)!)/2^(n-1) given the code above, if I'm not mistaken.)

Click to expand...

Actually the number of such "clusterings" is the (n-1)th Catalan
number.

http://en.wikipedia.org/wiki/Catalan_numbers

Click to expand...

I don't think Catalan numbers exactly captures this number. As far as I
remember (and wikipedia seems to confirm this), Cn is the number of ways
you can repeatedly apply a binary operator to a sequence of objects,
where sequence means that the objects are ordered, which is not the case
here. To use wikipedia's example, C3 is 5 because you can do:

((ab)c)d (a(bc))d (ab)(cd) a((bc)d) a(b(cd))

If we list clusterings we can also have:

((ac)b)d ((ac)d)b (ac)(bd) ...

Actually, for each of the 5 "catalan expressions" above, you have 4!
valid permutations of the objects (leading to a complete count of
n!*C(n-1)). But this leads to many "duplicates", because (ab)(cd) and
(cd)(ab) are considered the same.

I just realize that the code I've given above also produces duplicates
(in particular, the example I've just used). At least, my counting was
correct w.r.t. the code The plot thickens...

-- Alain.

Richard Thomas · Jan 13, 2011

Richard Thomas said:
Richard Thomas said:

On Jan 13 said:

def clusterings(l):
if len(l) == 1:
print repr(l)
else:
n = len(l)
for i in xrange(n):
for j in xrange(i+1,n):
clusterings(l[:i]+l[i+1:j]+l[j+1:]+[[l,l[j]]])
[...] there are *many* such clusterings? (the exact number should be
(n!)*((n-1)!)/2^(n-1) given the code above, if I'm not mistaken.)

Click to expand...

Actually the number of such "clusterings" is the (n-1)th Catalan
number.

Click to expand...

http://en.wikipedia.org/wiki/Catalan_numbers

Click to expand...

I don't think Catalan numbers exactly captures this number. As far as I
remember (and wikipedia seems to confirm this), Cn is the number of ways
you can repeatedly apply a binary operator to a sequence of objects,
where sequence means that the objects are ordered, which is not the case
here. To use wikipedia's example, C3 is 5 because you can do:

((ab)c)d (a(bc))d (ab)(cd) a((bc)d) a(b(cd))

If we list clusterings we can also have:

((ac)b)d ((ac)d)b (ac)(bd) ...

Actually, for each of the 5 "catalan expressions" above, you have 4!
valid permutations of the objects (leading to a complete count of
n!*C(n-1)). But this leads to many "duplicates", because (ab)(cd) and
(cd)(ab) are considered the same.

I just realize that the code I've given above also produces duplicates
(in particular, the example I've just used). At least, my counting was
correct w.r.t. the code The plot thickens...

-- Alain.

Okay, I misunderstood the problem, sorry about that. This makes it
rather hard to define a nice recurrence relation for the number of
such clusterings:

C(1) = 1
C(2n+1) = Sigma(1; n; choose(2n+1, r) * C(r) * C(2n+1-r))
C(2n) = Sigma(1; n-1; choose(2n, r) * C(r) * C(2n-r))
+ choose(2n, n) * C(n) * C(n) / 2

See, very ugly. I can't reduce it to anything workable so I just
computed it. Clearly its more than exponential. Some values:

In [1]: [cluster(n) for n in xrange(1, 21)]
Out[1]:
[1,
1,
3,
15,
105,
945,
10395,
135135,
2027025,
34459425,
654729075,
13749310575L,
316234143225L,
7905853580625L,
213458046676875L,
6190283353629375L,
191898783962510625L,
6332659870762850625L,
221643095476699771875L,
8200794532637891559375L]

Anyway, I'm done counting things for now.

Chard.

Arnaud Delobelle · Jan 13, 2011

Peter Otten said:
justin said:

The title sounds too complex, but my question is actually simple.

Suppose I have [1,2,3,4,5], then there are many ways of making
clustering.
Among them, I want to pair up terminals until there is only one left
at the end.
For example, ((((1,2),3),4),5), (1,(2,(3,(4,5)))), or (((1,2),(3,4)),
5) would be legitimate ones.

How do you think can I, using the modules of Python such as itertools
as much as possible, make all possible such clusterings?

Click to expand...

Here's my first attempt:

def cluster(items):
if len(items) == 2:
yield items
return

for i in range(len(items)-1):
for c in cluster(items[:i] + (items[i:i+2],) + items[i+2:]):
yield c

def unique(items):
seen = set()
for item in items:
if item not in seen:
seen.add(item)
yield item

if __name__ == "__main__":
for item in unique(cluster(tuple("abcd"))):
print item

more simply:

def clusters(l):
if len(l) == 1:
yield l[0]
return
for i in range(1, len(l)):
for left in clusters(l[:i]):
for right in clusters(l[i:]):
yield (left, right)

That would give all solutions without duplicates. In fact, this is
simply finding all full binary trees which order l.

However, somewhere else in the thread someone claims that no ordering is
implied on the initial list of items (which is not at all obvious from
the OP).

Peter Otten · Jan 14, 2011

Arnaud said:
more simply:

def clusters(l):
if len(l) == 1:
yield l[0]
return
for i in range(1, len(l)):
for left in clusters(l[:i]):
for right in clusters(l[i:]):
yield (left, right)

That would give all solutions without duplicates. In fact, this is
simply finding all full binary trees which order l.

Easy, now that I see it

The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
How to retrieve a value from a set of radio buttons	2	Aug 26, 2004
FAQ 5.2 How do I change, delete, or insert a line in a file, or append to the beginning of a file?	0	Feb 24, 2011
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 22, 2007
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.c Changes to Answers to Frequently Asked Questions (FAQ)	1	Jul 4, 2004
To experienced OO people: please give some ideas/URLs for a course on object-orientation	1	Jul 31, 2003

How to populate all possible hierarchical clusterings from a set of elements?

justin

Peter Otten

DevPlayer

DevPlayer

Alain Ketterlin

Alain Ketterlin

DevPlayer

DevPlayer

Richard Thomas

Alain Ketterlin

Richard Thomas

Arnaud Delobelle

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads