Enumerating k-segmentations of a sequence

  • Thread starter bullockbefriending bard
  • Start date
B

bullockbefriending bard

I'm not sure if my terminology is precise enough, but what I want to
do is:

Given an ordered sequence of n items, enumerate all its possible k-
segmentations.

This is *not* the same as enumerating the k set partitions of the n
items because I am only interested in those set partitions which
preserve the order of the items. i.e. if the input sequence is (1 2 3
4 5), then ((1 4) (2 3) (5)) is unacceptable, whereas ((1 2) (3) (4
5)) is acceptable. Hence use of term 'segmentations'.

e.g., for input sequence (1 2 3 4 5) and k = 3, I need a function
which enumerates the following 3-segmentations:

((1 2 3) (4) (5))
((1 2) (3 4) (5))
((1 2) (3) (4 5))
((1) (2 3 4) (5))
((1) (2 3) (4 5))
((1) (2) (3 4 5))

The reason I want to do this is to use it in some simple one-
dimensional clustering, i.e. each set item (won't be just integers as
above) will have a value of interest, and i'll select the particular
set partition which minimises Sigma SSD (sum of squared differences
from mean) of these values.

It seems overkill to generate the full list of set partitions
[including e.g. ((1 4) (2) (3 5))] because I intend to begin by
sorting the input sequence such that element 1 < element 2 < ... <
element n.

Structural Recursion not being my strong point, any ideas on how to go
about this would be much appreciated!
 
M

marek.rocki

bullockbefriending bard napisa³(a):
I'm not sure if my terminology is precise enough, but what I want to
do is:

Given an ordered sequence of n items, enumerate all its possible k-
segmentations.

This is *not* the same as enumerating the k set partitions of the n
items because I am only interested in those set partitions which
preserve the order of the items. i.e. if the input sequence is (1 2 3
4 5), then ((1 4) (2 3) (5)) is unacceptable, whereas ((1 2) (3) (4
5)) is acceptable. Hence use of term 'segmentations'.

e.g., for input sequence (1 2 3 4 5) and k = 3, I need a function
which enumerates the following 3-segmentations:

((1 2 3) (4) (5))
((1 2) (3 4) (5))
((1 2) (3) (4 5))
((1) (2 3 4) (5))
((1) (2 3) (4 5))
((1) (2) (3 4 5))

The reason I want to do this is to use it in some simple one-
dimensional clustering, i.e. each set item (won't be just integers as
above) will have a value of interest, and i'll select the particular
set partition which minimises Sigma SSD (sum of squared differences
from mean) of these values.

It seems overkill to generate the full list of set partitions
[including e.g. ((1 4) (2) (3 5))] because I intend to begin by
sorting the input sequence such that element 1 < element 2 < ... <
element n.

Structural Recursion not being my strong point, any ideas on how to go
about this would be much appreciated!

I'm not sure if I correctly understood the semantics of what you need.
Anyway it looks like a nice exercise in nested generators:

def segmentations(sequence, k):
assert 1 <= k <= len(sequence)
if k == 1:
yield [sequence]
else:
for headlen in xrange(1, len(sequence) - k + 2):
head, tail = sequence[:headlen], sequence[headlen:]
for tail_seg in segmentations(tail, k - 1):
yield [head] + tail_seg

for segmentation in segmentations((1, 2, 3, 4, 5), 3):
print segmentation

Regards,
Marek
 
M

Mark Dickinson

I'm not sure if my terminology is precise enough, but what I want to
do is:
[snip problem description]
Structural Recursion not being my strong point, any ideas on how to go
about this would be much appreciated!

If you have Python 2.6 available, itertools.combination might be
useful.
Here's an example:

from itertools import combinations

def partitions(l, k):
"""generate all partitions of a list l into k nonempty pieces"""
n = len(l)
for c in combinations(range(1, n), k-1):
c = [0] + list(c) + [n]
yield [l[c:c[i+1]] for i in xrange(k)]

for p in partitions(range(1, 6), 3):
print p


Mark
 
M

Mark Dickinson

If you have Python 2.6 available, itertools.combination might be

That should be itertools.combinations, of course.

The idea is that to give a partition of e.g., a 5-element list
into 3 nonempty pieces, all you have to do is say where to break
the list. There are 4 possible places to break (between the
1st and 2nd elements, 2nd and 3rd, etc.), and you've got to
make two breaks. Then combinations(range(1, 5), 2) generates
all possible pairs of breaking points.

Mark
 
B

bearophileHUGS

My version:

from itertools import combinations as xcombinations
from itertools import izip, islice

def xpairwise(iterable):
# docs and doctests removed
return izip(iterable, islice(iterable, 1, None))

def segmentations(seq, k):
for comb in xcombinations(range(1, len(seq)), k-1):
yield [seq[start:stop] for start,stop in xpairwise((None,) +
comb + (None,))]

for seg in segmentations(range(1, 6), 3):
print seg
print

for seg in segmentations(range(1, 7), 2):
print seg
print

for seg in segmentations(range(1, 7), 3):
print seg
print

for seg in segmentations(range(1, 7), 4):
print seg
print

Other people have already explained how this works.

Bye,
bearophile
 
A

Arnaud Delobelle

bullockbefriending bard said:
I'm not sure if my terminology is precise enough, but what I want to
do is:

Given an ordered sequence of n items, enumerate all its possible k-
segmentations.

This is *not* the same as enumerating the k set partitions of the n
items because I am only interested in those set partitions which
preserve the order of the items. i.e. if the input sequence is (1 2 3
4 5), then ((1 4) (2 3) (5)) is unacceptable, whereas ((1 2) (3) (4
5)) is acceptable. Hence use of term 'segmentations'.

e.g., for input sequence (1 2 3 4 5) and k = 3, I need a function
which enumerates the following 3-segmentations:

((1 2 3) (4) (5))
((1 2) (3 4) (5))
((1 2) (3) (4 5))
((1) (2 3 4) (5))
((1) (2 3) (4 5))
((1) (2) (3 4 5))

All you need to do is find the 'cutting places'. Here you are cutting
at 2 places, the first cut has to be in position > 0 and the last in
position < 5. That makes 4 potential cutting places: 1, 2, 3, 4.

So you will have 4C2 = 6 "2-segmentations" of {1, 2, 3, 4, 5} which are
given by the subsets of {1, 2, 3, 4} of size 2, i.e the 2-combinations
of {1, 2, 3, 4}.

Generalising this, if you want to find the "k-segmentations" of a
sequence of length n, you will have (n-1)C(k-1) of them and their
cutting places will be given by (k-1)-combinations of {1, 2, ..., n-1}.

Generating the k-combinations is a standard problem so we're done!
 
G

Gerard flanagan

bullockbefriending said:
I'm not sure if my terminology is precise enough, but what I want to
do is:

Given an ordered sequence of n items, enumerate all its possible k-
segmentations.

This is *not* the same as enumerating the k set partitions of the n
items because I am only interested in those set partitions which
preserve the order of the items. i.e. if the input sequence is (1 2 3
4 5), then ((1 4) (2 3) (5)) is unacceptable, whereas ((1 2) (3) (4
5)) is acceptable. Hence use of term 'segmentations'.

e.g., for input sequence (1 2 3 4 5) and k = 3, I need a function
which enumerates the following 3-segmentations:

((1 2 3) (4) (5))
((1 2) (3 4) (5))
((1 2) (3) (4 5))
((1) (2 3 4) (5))
((1) (2 3) (4 5))
((1) (2) (3 4 5))

What Arnaud said.


def nchoosek( n, k ):
'''
Generate all subsets of S(n) that have k items.
S(n) = {1, 2, 3, ..., n}
[[1, 2], [1, 3], [2, 3]]
'''
m = n - k + 1
indexer = range(0, k)
seq = range(1, k+1)
last = range(m, n+1)
yield seq[:]
while seq != last:
high_value = -1
high_index = -1
for i in indexer:
val = seq
if val > high_value and val < m + i:
high_value = val
high_index = i
for j in range(k - high_index):
seq[j+high_index] = high_value + j + 1
yield seq[:]

#(minimally tested)
def part(n,k):
rng = range(1, n+1)
for comb in nchoosek(n-1,k-1):
comb = [0] + comb + [n]
yield [rng[s:t] for (s,t) in (comb[i:i+2] for i in range(k))]


for item in part(5, 2):
print item

for item in part(5, 3):
print item

G.
 
B

bullockbefriending bard

bullockbefriending bard napisa³(a):


I'm not sure if my terminology is precise enough, but what I want to
do is:
Given an ordered sequence of n items, enumerate all its possible k-
segmentations.
This is *not* the same as enumerating the k set partitions of the n
items because I am only interested in those set partitions which
preserve the order of the items. i.e. if the input sequence is (1 2 3
4 5), then ((1 4) (2 3) (5)) is unacceptable, whereas ((1 2) (3) (4
5)) is acceptable. Hence use of term 'segmentations'.
e.g., for input sequence (1 2 3 4 5) and k = 3, I need a function
which enumerates the following 3-segmentations:
((1 2 3) (4) (5))
((1 2) (3 4) (5))
((1 2) (3) (4 5))
((1) (2 3 4) (5))
((1) (2 3) (4 5))
((1) (2) (3 4 5))
The reason I want to do this is to use it in some simple one-
dimensional clustering, i.e. each set item (won't be just integers as
above) will have a value of interest, and i'll select the particular
set partition which minimises Sigma SSD (sum of squared differences
from mean) of these values.
It seems overkill to generate the full list of set partitions
[including e.g. ((1 4) (2) (3 5))] because I intend to begin by
sorting the input sequence such that element 1 < element 2 < ... <
element n.
Structural Recursion not being my strong point, any ideas on how to go
about this would be much appreciated!

I'm not sure if I correctly understood the semantics of what you need.
Anyway it looks like a nice exercise in nested generators:

def segmentations(sequence, k):
        assert 1 <= k <= len(sequence)
        if k == 1:
                yield [sequence]
        else:
                for headlen in xrange(1, len(sequence) - k + 2):
                        head, tail = sequence[:headlen], sequence[headlen:]
                        for tail_seg in segmentations(tail, k - 1):
                                yield [head] + tail_seg

for segmentation in segmentations((1, 2, 3, 4, 5), 3):
        print segmentation

Regards,
Marek

Exactly what I needed. Thank you all for examples and also for
explaining how to think about problems of this nature!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top