Generating all ordered substrings of a string

girish · Jul 11, 2006

Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:
for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]

Any ideas??

TIA,
girish

Bruno Desthuilliers · Jul 12, 2006

(e-mail address removed) a écrit :

Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:

for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]

Any ideas??

Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) > 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]

print listsubstrings("abcd")

girish · Jul 12, 2006

Quoting Bruno Desthuilliers said:
(e-mail address removed) a écrit :

Hi,
I want to generate all non-empty substrings of a string of length
=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:

colocn = 'abcd'
k = 4
for i in range(k-1):

Click to expand...

for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)

rules

Click to expand...

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]

Any ideas??

Click to expand...

Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) > 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]

print listsubstrings("abcd")

Thanks Bruno. I wanted to avoid permutations as it would take more time,
but i guess will have to deal with them now

(
Also this one gives each rule twice...i've to search for some double
counting...

girish · Jul 12, 2006

Quoting Bruno Desthuilliers said:
(e-mail address removed) a écrit :

Hi,
I want to generate all non-empty substrings of a string of length
=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:

colocn = 'abcd'
k = 4
for i in range(k-1):

Click to expand...

for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)

rules

Click to expand...

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]

Any ideas??

Click to expand...

Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) > 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]

print listsubstrings("abcd")

thanks Bruno....wanted to avoid permute function but i guess i've no no
option

(...
also there is some double counting in this one (all rules outputted
twice)...i've to find out where...

girish · Jul 12, 2006

Quoting (e-mail address removed):

Quoting Bruno Desthuilliers said:
Quoting Bruno Desthuilliers said:

(e-mail address removed) a écrit :

Hi,
I want to generate all non-empty substrings of a string of length
=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:

colocn = 'abcd'
k = 4
for i in range(k-1):

for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)

rules

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]

Any ideas??

Click to expand...

Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) > 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]

print listsubstrings("abcd")

Click to expand...

thanks Bruno....wanted to avoid permute function but i guess i've no no
option (...
also there is some double counting in this one (all rules outputted
twice)...i've to find out where...

A Recursive Attempt:
def gen(s):
sList = [s[:i]+s[i+1:] for i in range(len(s))]
substringList = sList
s = sList
while len(s[0]) != 1:
substrings = []
for string in s:
sList = [string[:i]+string[i+1:] for i in range(len(string))]
substrings.extend(sList)
s = set(substrings)
s = list(s)
substringList.extend(s)
print substringList
return substringList

Tal Einat · Jul 12, 2006

Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

In your last example you have ['ac','bd'], but neither 'ac' nor 'bd' is
a _substring_ of 'abcd'.

If you want to compute all possible (non-empty) sub-groups of a group
(a group of characters, in your case), that's really quite a common
algorthmic problem and you should be able to Google for a solution.

Once you have all possible subgroups, just make your (weird) pairs,
remove doubles (either by using a set or by sorting and removing
identical neighboring objects), and you're done.

If you're looking for a more efficient solution, specialized for your
specific problem, you'll have to explain more precisely what you're
trying to do, as well as why existing solutions aren't good enough.

- Tal

Thorsten Kampe · Jul 12, 2006

* (e-mail address removed) (2006-07-11 10:20 +0000)

I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

No, you don't want to generate all substrings, you want to generate
all partions of a given set with length 2:

filter(lambda x: len(x) == 2, part(['abcd']))

I've written an utility that generates all possible partions of a set;
the "pairing" as you call it, is trivial, so you can do it yourself

def part(seq):
import copy
partset = [[]]
for item in seq:
newpartset = []
for partition in partset:
for index in range(len(partition)):
newpartset.append(copy.deepcopy(partition))
newpartset[-1][index].append(item)
partition.append([item])
newpartset.append(partition)
partset = newpartset
return partset

Gerard Flanagan · Jul 12, 2006

Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

from a previous post
(http://groups.google.com/group/comp...78bb00e61b?lnk=st&q=&rnum=23#cef5b578bb00e61b)

def nkRange(n,k):
m = n - k + 1
indexer = range(0, k)
vector = range(1, k+1)
last = range(m, n+1)
yield vector
while vector != last:
high_value = -1
high_index = -1
for i in indexer:
val = vector
if val > high_value and val < m + i:
high_value = val
high_index = i
for j in range(k - high_index):
vector[j+high_index] = high_value + j + 1
yield vector

def kSubsets(s, k):
for vector in nkRange(len(s),k):
yield ''.join( s[i-1] for i in vector )

print list( kSubsets('abcd',2) )

['ab', 'ac', 'ad', 'bc', 'bd', 'cd']

Thorsten Kampe · Jul 13, 2006

* Thorsten Kampe (2006-07-12 19:11 +0000)

filter(lambda x: len(x) == 2, part(['abcd']))

That's "filter(lambda x: len(x) == 2, part('abcd'))" of course...

How to search for substrings of a string in a list?	3	Jun 5, 2006
How find all childrens values of a nested dictionary, fast!	8	Nov 4, 2010
Can't get my mind wrapped around simple permutations	1	May 16, 2008
variable length permutation code	5	Jan 28, 2012
split a string of space separated substrings - elegant solution?	6	Jul 31, 2007
Splitting a string into substrings of equal size	18	Aug 15, 2009
need a variation algorithm for Lists in Dictionaries	2	Jul 5, 2007
combinations of substrings	0	Sep 26, 2007

Generating all ordered substrings of a string

girish

Bruno Desthuilliers

girish

girish

girish

Tal Einat

Thorsten Kampe

Gerard Flanagan

Thorsten Kampe

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads