finding sublist

Discussion in 'Python' started by giampiero mu, Aug 2, 2005.

  1. giampiero mu

    giampiero mu Guest

    hi everyone

    my target is implement a function controlla(string - a binary string-)
    that check if there is in a string two equal not overlapping
    subsequences at least of length limitem:

    my code:

    def controlla(test):
    # print test
    limitem=4
    lunghezz=len(test)

    l1=lunghezz-limitem+1
    l2=lunghezz-limitem+1
    f=0

    for ii in range(l1):
    for kk in range(l2):
    t1=test[ii:ii+limitem]
    t2=test[kk:kk+limitem]
    if abs(ii-kk)>=limitem :
    if t1==t2 :
    f=1
    if f==1 :
    break
    break
    break
    if f==0 :
    return test
    else:
    return 'no'


    the question is: Is there a faster way doing that? I don't know,
    changing string into list or array, using map, or module, using
    different loop, regular expression,funcional programming , list
    comprehensions , sets, different looping techniques, i dont
    know.......(!)
     
    giampiero mu, Aug 2, 2005
    #1
    1. Advertising

  2. giampiero mu

    Kent Johnson Guest

    giampiero mu wrote:
    > hi everyone
    >
    > my target is implement a function controlla(string - a binary string-)
    > that check if there is in a string two equal not overlapping
    > subsequences at least of length limitem:


    You can do this with a regular expression:

    >>> import re
    >>> r=re.compile(r'(?P<seq>.{4,}).*(?P=seq)')
    >>> r.match('abcaoeaeoudabc')
    >>> r.match('abcdaeaeuabcd')

    <_sre.SRE_Match object at 0x008D67E0>
    >>> _.group(1)

    'abcd'
    >>> r.match('abcdefgaeaeuabcdefg')

    <_sre.SRE_Match object at 0x008D68E0>
    >>> _.group(1)

    'abcdefg'

    Kent


    >
    > my code:
    >
    > def controlla(test):
    > # print test
    > limitem=4
    > lunghezz=len(test)
    >
    > l1=lunghezz-limitem+1
    > l2=lunghezz-limitem+1
    > f=0
    >
    > for ii in range(l1):
    > for kk in range(l2):
    > t1=test[ii:ii+limitem]
    > t2=test[kk:kk+limitem]
    > if abs(ii-kk)>=limitem :
    > if t1==t2 :
    > f=1
    > if f==1 :
    > break
    > break
    > break
    > if f==0 :
    > return test
    > else:
    > return 'no'
    >
    >
    > the question is: Is there a faster way doing that? I don't know,
    > changing string into list or array, using map, or module, using
    > different loop, regular expression,funcional programming , list
    > comprehensions , sets, different looping techniques, i dont
    > know.......(!)
    >
     
    Kent Johnson, Aug 2, 2005
    #2
    1. Advertising

  3. giampiero mu

    Ron Adam Guest

    giampiero mu wrote:
    > hi everyone


    Hi, you appear to be fairly new to Python, so you might want to take a
    look at the tutorial at Python.org

    Kents suggestion to use RE is good.


    This should be faster than your example by quite a bit if you don't want
    to use RE.

    def controlla(test, size=4):
    # Return substring if a second substring is found.
    # Return None if no match found.

    for i in range(len(test)-size):
    match=test[i:i+size]
    left=test[:i]
    right=test[i+size:]
    if match in left or match in right:
    return match

    print controlla('qqqabcdrrabcd')

    --> 'abcd'


    Here's a few notes on your example for future reference:

    Multiple breaks don't work. The first break will jump out of the loop
    before the other breaks are reached.

    Any function that ends without a return will return None. So you don't
    need to return 'no'.

    Cheers,
    Ron
     
    Ron Adam, Aug 2, 2005
    #3
  4. Hello.

    > my target is implement a function controlla(string - a binary string-)
    > that check if there is in a string two equal not overlapping
    > subsequences at least of length limitem:
    >
    > my code:
    > [snip]
    >


    I may have misunderstood it, but does your function work the way you
    want it to?

    >>>controlla("testeststest")

    no

    I can't get it to print anything other than "no". But then again, I'm
    reading and posting via Google and I guess all those break statements
    shouldn't be on the same indent-level.

    > the question is: Is there a faster way doing that? I don't know,
    > changing string into list or array, using map, or module, using
    > different loop, regular expression,funcional programming , list
    > comprehensions , sets, different looping techniques, i dont
    > know.......(!)


    Since you're using nested for loops when searching the string you
    should make sure not to perform more iterations than neccessary. The
    function below returns a list of all, non-overlapping, substrings in
    text where len(substring)>= minLength. The outer loop is limited to
    about half of the length of the text which is where most of the speed
    comes from but I'm sure it can be tweaked for more.

    def foo(text, minLength):
    result= []
    for length in range(minLength, len(text)/ 2+ 1):
    for start in range(len(text)):
    end= start+ length
    if end< len(text):
    part= text[start:end]
    if text.find(part, end)!= -1:
    if part not in result:
    result.append(part)

    return result

    >>>foo("testeststest", 4)

    ['test', 'stes', 'stest']

    >>>foo("testeststest", 3)

    ['tes', 'est', 'ste', 'test', 'stes', 'stest']

    HTH
    Johan Lindberg
     
    Johan Lindberg, Aug 2, 2005
    #4
  5. giampiero mu

    giampiero mu Guest

    controlla("12345678") -> "12345678"

    thanks everyone. only a question. there is a way to advantage of binary
    sequences?
     
    giampiero mu, Aug 2, 2005
    #5
  6. > thanks everyone. only a question. there is a way to advantage of binary
    > sequences?


    I doubt you'll find any way to optimize the code that somehow only
    applies to binary sequences. You still have to find each possible
    subsequence of minimum length within the sequence and compare it to all
    other possible subsequences and that's what's going to take most of the
    time.

    If you haven't already, check out psyco
    (http://psyco.sourceforge.net/). It will most definitely make your code
    run faster.

    BR
    Johan Lindberg
     
    Johan Lindberg, Aug 3, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul Whipp
    Replies:
    1
    Views:
    372
    Alex Martelli
    Nov 7, 2003
  2. Bernard A.

    inner sublist positions ?

    Bernard A., Apr 20, 2005, in forum: Python
    Replies:
    2
    Views:
    347
    tiissa
    Apr 21, 2005
  3. Helmut Jarausch

    append to a sublist - please help

    Helmut Jarausch, Apr 6, 2008, in forum: Python
    Replies:
    2
    Views:
    355
  4. Matthias Gallé

    find sublist inside list

    Matthias Gallé, May 4, 2009, in forum: Python
    Replies:
    13
    Views:
    1,224
    mzdude
    May 5, 2009
  5. Johannes
    Replies:
    25
    Views:
    662
    Steven D'Aprano
    Aug 20, 2011
Loading...

Share This Page