finding sublist

G

giampiero mu

hi everyone

my target is implement a function controlla(string - a binary string-)
that check if there is in a string two equal not overlapping
subsequences at least of length limitem:

my code:

def controlla(test):
# print test
limitem=4
lunghezz=len(test)

l1=lunghezz-limitem+1
l2=lunghezz-limitem+1
f=0

for ii in range(l1):
for kk in range(l2):
t1=test[ii:ii+limitem]
t2=test[kk:kk+limitem]
if abs(ii-kk)>=limitem :
if t1==t2 :
f=1
if f==1 :
break
break
break
if f==0 :
return test
else:
return 'no'


the question is: Is there a faster way doing that? I don't know,
changing string into list or array, using map, or module, using
different loop, regular expression,funcional programming , list
comprehensions , sets, different looping techniques, i dont
know.......(!)
 
K

Kent Johnson

giampiero said:
hi everyone

my target is implement a function controlla(string - a binary string-)
that check if there is in a string two equal not overlapping
subsequences at least of length limitem:

You can do this with a regular expression:
'abcdefg'

Kent



my code:

def controlla(test):
# print test
limitem=4
lunghezz=len(test)

l1=lunghezz-limitem+1
l2=lunghezz-limitem+1
f=0

for ii in range(l1):
for kk in range(l2):
t1=test[ii:ii+limitem]
t2=test[kk:kk+limitem]
if abs(ii-kk)>=limitem :
if t1==t2 :
f=1
if f==1 :
break
break
break
if f==0 :
return test
else:
return 'no'


the question is: Is there a faster way doing that? I don't know,
changing string into list or array, using map, or module, using
different loop, regular expression,funcional programming , list
comprehensions , sets, different looping techniques, i dont
know.......(!)
 
R

Ron Adam

giampiero said:
hi everyone

Hi, you appear to be fairly new to Python, so you might want to take a
look at the tutorial at Python.org

Kents suggestion to use RE is good.


This should be faster than your example by quite a bit if you don't want
to use RE.

def controlla(test, size=4):
# Return substring if a second substring is found.
# Return None if no match found.

for i in range(len(test)-size):
match=test[i:i+size]
left=test[:i]
right=test[i+size:]
if match in left or match in right:
return match

print controlla('qqqabcdrrabcd')

--> 'abcd'


Here's a few notes on your example for future reference:

Multiple breaks don't work. The first break will jump out of the loop
before the other breaks are reached.

Any function that ends without a return will return None. So you don't
need to return 'no'.

Cheers,
Ron
 
J

Johan Lindberg

Hello.
my target is implement a function controlla(string - a binary string-)
that check if there is in a string two equal not overlapping
subsequences at least of length limitem:

my code:
[snip]

I may have misunderstood it, but does your function work the way you
want it to?
no

I can't get it to print anything other than "no". But then again, I'm
reading and posting via Google and I guess all those break statements
shouldn't be on the same indent-level.
the question is: Is there a faster way doing that? I don't know,
changing string into list or array, using map, or module, using
different loop, regular expression,funcional programming , list
comprehensions , sets, different looping techniques, i dont
know.......(!)

Since you're using nested for loops when searching the string you
should make sure not to perform more iterations than neccessary. The
function below returns a list of all, non-overlapping, substrings in
text where len(substring)>= minLength. The outer loop is limited to
about half of the length of the text which is where most of the speed
comes from but I'm sure it can be tweaked for more.

def foo(text, minLength):
result= []
for length in range(minLength, len(text)/ 2+ 1):
for start in range(len(text)):
end= start+ length
if end< len(text):
part= text[start:end]
if text.find(part, end)!= -1:
if part not in result:
result.append(part)

return result
['test', 'stes', 'stest']
['tes', 'est', 'ste', 'test', 'stes', 'stest']

HTH
Johan Lindberg
(e-mail address removed)
 
G

giampiero mu

controlla("12345678") -> "12345678"

thanks everyone. only a question. there is a way to advantage of binary
sequences?
 
J

Johan Lindberg

thanks everyone. only a question. there is a way to advantage of binary
sequences?

I doubt you'll find any way to optimize the code that somehow only
applies to binary sequences. You still have to find each possible
subsequence of minimum length within the sequence and compare it to all
other possible subsequences and that's what's going to take most of the
time.

If you haven't already, check out psyco
(http://psyco.sourceforge.net/). It will most definitely make your code
run faster.

BR
Johan Lindberg
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top