Generating all ordered substrings of a string

G

girish

Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:
for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]


Any ideas??

TIA,
girish
 
B

Bruno Desthuilliers

(e-mail address removed) a écrit :
Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:


for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]


Any ideas??

Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) > 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]


print listsubstrings("abcd")
 
G

girish

Quoting Bruno Desthuilliers said:
(e-mail address removed) a écrit :
Hi,
I want to generate all non-empty substrings of a string of length
=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:

colocn = 'abcd'
k = 4
for i in range(k-1):

for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]


Any ideas??

Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) > 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]


print listsubstrings("abcd")
Thanks Bruno. I wanted to avoid permutations as it would take more time,
but i guess will have to deal with them now :((
Also this one gives each rule twice...i've to search for some double
counting...
 
G

girish

Quoting Bruno Desthuilliers said:
(e-mail address removed) a écrit :
Hi,
I want to generate all non-empty substrings of a string of length
=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:

colocn = 'abcd'
k = 4
for i in range(k-1):

for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]


Any ideas??

Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) > 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]


print listsubstrings("abcd")

thanks Bruno....wanted to avoid permute function but i guess i've no no
option :((...
also there is some double counting in this one (all rules outputted
twice)...i've to find out where...
 
G

girish

Quoting (e-mail address removed):
Quoting Bruno Desthuilliers said:
(e-mail address removed) a écrit :
Hi,
I want to generate all non-empty substrings of a string of length
=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:


colocn = 'abcd'
k = 4
for i in range(k-1):

for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)

rules

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]


Any ideas??

Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) > 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]


print listsubstrings("abcd")

thanks Bruno....wanted to avoid permute function but i guess i've no no
option :((...
also there is some double counting in this one (all rules outputted
twice)...i've to find out where...
A Recursive Attempt:
def gen(s):
sList = [s[:i]+s[i+1:] for i in range(len(s))]
substringList = sList
s = sList
while len(s[0]) != 1:
substrings = []
for string in s:
sList = [string[:i]+string[i+1:] for i in range(len(string))]
substrings.extend(sList)
s = set(substrings)
s = list(s)
substringList.extend(s)
print substringList
return substringList
 
T

Tal Einat

Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

In your last example you have ['ac','bd'], but neither 'ac' nor 'bd' is
a _substring_ of 'abcd'.

If you want to compute all possible (non-empty) sub-groups of a group
(a group of characters, in your case), that's really quite a common
algorthmic problem and you should be able to Google for a solution.

Once you have all possible subgroups, just make your (weird) pairs,
remove doubles (either by using a set or by sorting and removing
identical neighboring objects), and you're done.

If you're looking for a more efficient solution, specialized for your
specific problem, you'll have to explain more precisely what you're
trying to do, as well as why existing solutions aren't good enough.

- Tal
 
T

Thorsten Kampe

* (e-mail address removed) (2006-07-11 10:20 +0000)
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

No, you don't want to generate all substrings, you want to generate
all partions of a given set with length 2:

filter(lambda x: len(x) == 2, part(['abcd']))

I've written an utility that generates all possible partions of a set;
the "pairing" as you call it, is trivial, so you can do it yourself

def part(seq):
import copy
partset = [[]]
for item in seq:
newpartset = []
for partition in partset:
for index in range(len(partition)):
newpartset.append(copy.deepcopy(partition))
newpartset[-1][index].append(item)
partition.append([item])
newpartset.append(partition)
partset = newpartset
return partset
 
G

Gerard Flanagan

Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

from a previous post
(http://groups.google.com/group/comp...78bb00e61b?lnk=st&q=&rnum=23#cef5b578bb00e61b)

def nkRange(n,k):
m = n - k + 1
indexer = range(0, k)
vector = range(1, k+1)
last = range(m, n+1)
yield vector
while vector != last:
high_value = -1
high_index = -1
for i in indexer:
val = vector
if val > high_value and val < m + i:
high_value = val
high_index = i
for j in range(k - high_index):
vector[j+high_index] = high_value + j + 1
yield vector

def kSubsets(s, k):
for vector in nkRange(len(s),k):
yield ''.join( s[i-1] for i in vector )

print list( kSubsets('abcd',2) )

['ab', 'ac', 'ad', 'bc', 'bd', 'cd']
 
T

Thorsten Kampe

* Thorsten Kampe (2006-07-12 19:11 +0000)
filter(lambda x: len(x) == 2, part(['abcd']))

That's "filter(lambda x: len(x) == 2, part('abcd'))" of course...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top