Generating all ordered substrings of a string

Discussion in 'Python' started by girish@it.usyd.edu.au, Jul 11, 2006.

  1. Guest

    Hi,
    I want to generate all non-empty substrings of a string of length >=2.
    Also,
    each substring is to be paired with 'string - substring' part and vice
    versa.
    Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
    'ab'], ['b', 'ac'], ['ac', 'b']] etc.
    Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
    'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
    ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
    'bd'],['bd','ac']]

    I've tried the following but i cant prevent duplicates and i'm missing
    some substrings:

    >>> colocn = 'abcd'
    >>> k = 4
    >>> for i in range(k-1):

    for j in range(1,k):
    rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
    rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
    rules.append(rule1)
    rules.append(rule2)
    >>> rules

    [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
    ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
    ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
    ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]


    Any ideas??

    TIA,
    girish



    ----------------------------------------------------------------
    This message was sent using IMP, the Internet Messaging Program.
     
    , Jul 11, 2006
    #1
    1. Advertising

  2. a écrit :
    > Hi,
    > I want to generate all non-empty substrings of a string of length >=2.
    > Also,
    > each substring is to be paired with 'string - substring' part and vice
    > versa.
    > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
    > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
    > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
    > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
    > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
    > 'bd'],['bd','ac']]
    >
    > I've tried the following but i cant prevent duplicates and i'm missing
    > some substrings:
    >
    >
    >>>>colocn = 'abcd'
    >>>>k = 4
    >>>>for i in range(k-1):

    >
    > for j in range(1,k):
    > rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
    > rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
    > rules.append(rule1)
    > rules.append(rule2)
    >
    >>>>rules

    >
    > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
    > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
    > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
    > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
    >
    >
    > Any ideas??


    Algorithmic problem.

    First, you need to get all permutations. This is a known algorithm, that
    you'll find examples of on the net. Then for each permutation, list
    possibles 'pair-splits'.

    Here's a (probably sub-optimal, but it's getting late here) possible
    implementation in functional style:

    def rotate(lst):
    yield lst
    max = len(lst)
    for i in range(1, max):
    yield lst[i:] + lst[:-(max-i)]

    def permute(lst):
    if len(lst) > 2:
    for rotated in rotate(lst):
    head, tail = rotated[0], rotated[1:]
    for permuted in permute(tail):
    yield [head] + permuted
    elif len(lst) == 2:
    yield lst
    yield lst[::-1]
    else:
    yield lst

    def splits(lst):
    for i in range(1, len(lst)):
    yield lst[0:i], lst[i:]

    def allsplits(lst):
    for permuted in permute(lst):
    for pair in splits(permuted):
    yield pair

    def listsubstrings(thestr):
    format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
    return [format(list(pair)) for pair in allsplits(list(thestr))]


    print listsubstrings("abcd")
     
    Bruno Desthuilliers, Jul 12, 2006
    #2
    1. Advertising

  3. Guest

    Quoting Bruno Desthuilliers <>:

    > a écrit :
    > > Hi,
    > > I want to generate all non-empty substrings of a string of length
    > >=2.
    > > Also,
    > > each substring is to be paired with 'string - substring' part and vice
    > > versa.
    > > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
    > > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
    > > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
    > > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',

    > 'c'],
    > > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
    > > 'bd'],['bd','ac']]
    > >
    > > I've tried the following but i cant prevent duplicates and i'm

    > missing
    > > some substrings:
    > >
    > >
    > >>>>colocn = 'abcd'
    > >>>>k = 4
    > >>>>for i in range(k-1):

    > >
    > > for j in range(1,k):
    > > rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
    > > rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
    > > rules.append(rule1)
    > > rules.append(rule2)
    > >
    > >>>>rules

    > >
    > > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
    > > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
    > > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
    > > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
    > >
    > >
    > > Any ideas??

    >
    > Algorithmic problem.
    >
    > First, you need to get all permutations. This is a known algorithm, that
    > you'll find examples of on the net. Then for each permutation, list
    > possibles 'pair-splits'.
    >
    > Here's a (probably sub-optimal, but it's getting late here) possible
    > implementation in functional style:
    >
    > def rotate(lst):
    > yield lst
    > max = len(lst)
    > for i in range(1, max):
    > yield lst[i:] + lst[:-(max-i)]
    >
    > def permute(lst):
    > if len(lst) > 2:
    > for rotated in rotate(lst):
    > head, tail = rotated[0], rotated[1:]
    > for permuted in permute(tail):
    > yield [head] + permuted
    > elif len(lst) == 2:
    > yield lst
    > yield lst[::-1]
    > else:
    > yield lst
    >
    > def splits(lst):
    > for i in range(1, len(lst)):
    > yield lst[0:i], lst[i:]
    >
    > def allsplits(lst):
    > for permuted in permute(lst):
    > for pair in splits(permuted):
    > yield pair
    >
    > def listsubstrings(thestr):
    > format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
    > return [format(list(pair)) for pair in allsplits(list(thestr))]
    >
    >
    > print listsubstrings("abcd")

    Thanks Bruno. I wanted to avoid permutations as it would take more time,
    but i guess will have to deal with them now :((
    Also this one gives each rule twice...i've to search for some double
    counting...

    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >





    ----------------------------------------------------------------
    This message was sent using IMP, the Internet Messaging Program.
     
    , Jul 12, 2006
    #3
  4. Guest

    Re: How to generate all substrings of a string

    Quoting Bruno Desthuilliers <>:

    > a écrit :
    > > Hi,
    > > I want to generate all non-empty substrings of a string of length
    > >=2.
    > > Also,
    > > each substring is to be paired with 'string - substring' part and vice
    > > versa.
    > > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
    > > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
    > > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
    > > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',

    > 'c'],
    > > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
    > > 'bd'],['bd','ac']]
    > >
    > > I've tried the following but i cant prevent duplicates and i'm

    > missing
    > > some substrings:
    > >
    > >
    > >>>>colocn = 'abcd'
    > >>>>k = 4
    > >>>>for i in range(k-1):

    > >
    > > for j in range(1,k):
    > > rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
    > > rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
    > > rules.append(rule1)
    > > rules.append(rule2)
    > >
    > >>>>rules

    > >
    > > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
    > > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
    > > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
    > > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
    > >
    > >
    > > Any ideas??

    >
    > Algorithmic problem.
    >
    > First, you need to get all permutations. This is a known algorithm, that
    > you'll find examples of on the net. Then for each permutation, list
    > possibles 'pair-splits'.
    >
    > Here's a (probably sub-optimal, but it's getting late here) possible
    > implementation in functional style:
    >
    > def rotate(lst):
    > yield lst
    > max = len(lst)
    > for i in range(1, max):
    > yield lst[i:] + lst[:-(max-i)]
    >
    > def permute(lst):
    > if len(lst) > 2:
    > for rotated in rotate(lst):
    > head, tail = rotated[0], rotated[1:]
    > for permuted in permute(tail):
    > yield [head] + permuted
    > elif len(lst) == 2:
    > yield lst
    > yield lst[::-1]
    > else:
    > yield lst
    >
    > def splits(lst):
    > for i in range(1, len(lst)):
    > yield lst[0:i], lst[i:]
    >
    > def allsplits(lst):
    > for permuted in permute(lst):
    > for pair in splits(permuted):
    > yield pair
    >
    > def listsubstrings(thestr):
    > format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
    > return [format(list(pair)) for pair in allsplits(list(thestr))]
    >
    >
    > print listsubstrings("abcd")
    >


    thanks Bruno....wanted to avoid permute function but i guess i've no no
    option :((...
    also there is some double counting in this one (all rules outputted
    twice)...i've to find out where...
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >





    ----------------------------------------------------------------
    This message was sent using IMP, the Internet Messaging Program.
     
    , Jul 12, 2006
    #4
  5. Guest

    Re: How to generate all substrings of a string

    Quoting :

    > Quoting Bruno Desthuilliers <>:
    >
    > > a écrit :
    > > > Hi,
    > > > I want to generate all non-empty substrings of a string of length
    > > >=2.
    > > > Also,
    > > > each substring is to be paired with 'string - substring' part and

    > vice
    > > > versa.
    > > > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'],

    > ['c',
    > > > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
    > > > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'],

    > ['abc',
    > > > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',

    > > 'c'],
    > > > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
    > > > 'bd'],['bd','ac']]
    > > >
    > > > I've tried the following but i cant prevent duplicates and i'm

    > > missing
    > > > some substrings:
    > > >
    > > >
    > > >>>>colocn = 'abcd'
    > > >>>>k = 4
    > > >>>>for i in range(k-1):
    > > >
    > > > for j in range(1,k):
    > > > rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
    > > > rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
    > > > rules.append(rule1)
    > > > rules.append(rule2)
    > > >
    > > >>>>rules
    > > >
    > > > [['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc',

    > 'd'],
    > > > ['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad',

    > 'bc'],
    > > > ['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd',

    > 'ab'],
    > > > ['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
    > > >
    > > >
    > > > Any ideas??

    > >
    > > Algorithmic problem.
    > >
    > > First, you need to get all permutations. This is a known algorithm,

    > that
    > > you'll find examples of on the net. Then for each permutation, list
    > > possibles 'pair-splits'.
    > >
    > > Here's a (probably sub-optimal, but it's getting late here) possible
    > > implementation in functional style:
    > >
    > > def rotate(lst):
    > > yield lst
    > > max = len(lst)
    > > for i in range(1, max):
    > > yield lst[i:] + lst[:-(max-i)]
    > >
    > > def permute(lst):
    > > if len(lst) > 2:
    > > for rotated in rotate(lst):
    > > head, tail = rotated[0], rotated[1:]
    > > for permuted in permute(tail):
    > > yield [head] + permuted
    > > elif len(lst) == 2:
    > > yield lst
    > > yield lst[::-1]
    > > else:
    > > yield lst
    > >
    > > def splits(lst):
    > > for i in range(1, len(lst)):
    > > yield lst[0:i], lst[i:]
    > >
    > > def allsplits(lst):
    > > for permuted in permute(lst):
    > > for pair in splits(permuted):
    > > yield pair
    > >
    > > def listsubstrings(thestr):
    > > format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
    > > return [format(list(pair)) for pair in allsplits(list(thestr))]
    > >
    > >
    > > print listsubstrings("abcd")
    > >

    >
    > thanks Bruno....wanted to avoid permute function but i guess i've no no
    > option :((...
    > also there is some double counting in this one (all rules outputted
    > twice)...i've to find out where...

    A Recursive Attempt:
    def gen(s):
    sList = [s[:i]+s[i+1:] for i in range(len(s))]
    substringList = sList
    s = sList
    while len(s[0]) != 1:
    substrings = []
    for string in s:
    sList = [string[:i]+string[i+1:] for i in range(len(string))]
    substrings.extend(sList)
    s = set(substrings)
    s = list(s)
    substringList.extend(s)
    print substringList
    return substringList
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > > --
    > > http://mail.python.org/mailman/listinfo/python-list
    > >

    >
    >
    >
    >
    > ----------------------------------------------------------------
    > This message was sent using IMP, the Internet Messaging Program.
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >





    ----------------------------------------------------------------
    This message was sent using IMP, the Internet Messaging Program.
     
    , Jul 12, 2006
    #5
  6. Tal Einat Guest

    wrote:
    > Hi,
    > I want to generate all non-empty substrings of a string of length >=2.
    > Also,
    > each substring is to be paired with 'string - substring' part and vice
    > versa.
    > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
    > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
    > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
    > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
    > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
    > 'bd'],['bd','ac']]
    >


    In your last example you have ['ac','bd'], but neither 'ac' nor 'bd' is
    a _substring_ of 'abcd'.

    If you want to compute all possible (non-empty) sub-groups of a group
    (a group of characters, in your case), that's really quite a common
    algorthmic problem and you should be able to Google for a solution.

    Once you have all possible subgroups, just make your (weird) pairs,
    remove doubles (either by using a set or by sorting and removing
    identical neighboring objects), and you're done.

    If you're looking for a more efficient solution, specialized for your
    specific problem, you'll have to explain more precisely what you're
    trying to do, as well as why existing solutions aren't good enough.

    - Tal
     
    Tal Einat, Jul 12, 2006
    #6
  7. * (2006-07-11 10:20 +0000)
    > I want to generate all non-empty substrings of a string of length >=2.
    > Also,
    > each substring is to be paired with 'string - substring' part and vice
    > versa.
    > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
    > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
    > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
    > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
    > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
    > 'bd'],['bd','ac']]


    No, you don't want to generate all substrings, you want to generate
    all partions of a given set with length 2:

    filter(lambda x: len(x) == 2, part(['abcd']))

    I've written an utility that generates all possible partions of a set;
    the "pairing" as you call it, is trivial, so you can do it yourself

    def part(seq):
    import copy
    partset = [[]]
    for item in seq:
    newpartset = []
    for partition in partset:
    for index in range(len(partition)):
    newpartset.append(copy.deepcopy(partition))
    newpartset[-1][index].append(item)
    partition.append([item])
    newpartset.append(partition)
    partset = newpartset
    return partset
     
    Thorsten Kampe, Jul 12, 2006
    #7
  8. wrote:
    > Hi,
    > I want to generate all non-empty substrings of a string of length >=2.
    > Also,
    > each substring is to be paired with 'string - substring' part and vice
    > versa.
    > Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
    > 'ab'], ['b', 'ac'], ['ac', 'b']] etc.
    > Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
    > 'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
    > ['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
    > 'bd'],['bd','ac']]
    >


    from a previous post
    (http://groups.google.com/group/comp...78bb00e61b?lnk=st&q=&rnum=23#cef5b578bb00e61b)

    def nkRange(n,k):
    m = n - k + 1
    indexer = range(0, k)
    vector = range(1, k+1)
    last = range(m, n+1)
    yield vector
    while vector != last:
    high_value = -1
    high_index = -1
    for i in indexer:
    val = vector
    if val > high_value and val < m + i:
    high_value = val
    high_index = i
    for j in range(k - high_index):
    vector[j+high_index] = high_value + j + 1
    yield vector

    def kSubsets(s, k):
    for vector in nkRange(len(s),k):
    yield ''.join( s[i-1] for i in vector )

    print list( kSubsets('abcd',2) )

    ['ab', 'ac', 'ad', 'bc', 'bd', 'cd']
     
    Gerard Flanagan, Jul 12, 2006
    #8
  9. * Thorsten Kampe (2006-07-12 19:11 +0000)
    > filter(lambda x: len(x) == 2, part(['abcd']))


    That's "filter(lambda x: len(x) == 2, part('abcd'))" of course...
     
    Thorsten Kampe, Jul 13, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tung Chau
    Replies:
    1
    Views:
    470
    SM Ryan
    Aug 6, 2004
  2. Tung Chau
    Replies:
    0
    Views:
    377
    Tung Chau
    Aug 6, 2004
  3. Robert Reno

    How to get all possible substrings

    Robert Reno, Jan 26, 2010, in forum: C++
    Replies:
    4
    Views:
    568
    P. Lepin
    Jan 28, 2010
  4. kadau
    Replies:
    3
    Views:
    107
    Brian McCauley
    Mar 25, 2005
  5. DL

    Ordered list inside ordered list

    DL, Nov 9, 2009, in forum: Javascript
    Replies:
    6
    Views:
    331
    Dr J R Stockton
    Nov 21, 2009
Loading...

Share This Page