Counting all permutations of a substring

C

Chris Lasher

Hi all,
How can one count all the permutations of a substring in a string? For
a more concrete example, let's say
targetstr = 'AAA'
and
probestr = 'AA'

I want to consider how many times one can count probestr ('AA') in
targetstr ('AAA'). The value in this example is, obviously, 2: you can
match the first and second A's as 'AA' and also the second and third
A's as 'AA'. Compiling a regular expression of the probestr and doing a
findall(targetstr) will not work because the RE stops on the first
combination of 'AA' that it finds (the first and second A's of the
targetstr). The regular expression re.compile(r'A{2,}'} will not work,
either; it simply consumes all three A's as one match. The string
method count('AA') acts similarly to the findall of the 'AA' RE,
returning 1, as it only matches the first and second A's.

Any suggestions on how to get at that 2nd pair of A's and count it?
Humbly, I'm a bit stumped. Any help would be appreciated.

Many thanks in advance,
Chris
 
M

mensanator

Chris said:
Hi all,
How can one count all the permutations of a substring in a string? For
a more concrete example, let's say
targetstr = 'AAA'
and
probestr = 'AA'

I want to consider how many times one can count probestr ('AA') in
targetstr ('AAA'). The value in this example is, obviously, 2: you can
match the first and second A's as 'AA' and also the second and third
A's as 'AA'. Compiling a regular expression of the probestr and doing a
findall(targetstr) will not work because the RE stops on the first
combination of 'AA' that it finds (the first and second A's of the
targetstr). The regular expression re.compile(r'A{2,}'} will not work,
either; it simply consumes all three A's as one match. The string
method count('AA') acts similarly to the findall of the 'AA' RE,
returning 1, as it only matches the first and second A's.

Any suggestions on how to get at that 2nd pair of A's and count it?
Humbly, I'm a bit stumped. Any help would be appreciated.
matches = 0
lc = len(a)-len(b)+1
for i in xrange(lc):
if a[i:i+lb]==b:
print a
print ' '*i+b
print
matches += 1
return matches
AAABCDEAAAAABSAABAWA
AA

AAABCDEAAAAABSAABAWA
AA

AAABCDEAAAAABSAABAWA
AA

AAABCDEAAAAABSAABAWA
AA

AAABCDEAAAAABSAABAWA
AA

AAABCDEAAAAABSAABAWA
AA

AAABCDEAAAAABSAABAWA
AA

7
 
M

mensanator

Chris said:
Hi all,
How can one count all the permutations of a substring in a string? For
a more concrete example, let's say
targetstr = 'AAA'
and
probestr = 'AA'

I want to consider how many times one can count probestr ('AA') in
targetstr ('AAA'). The value in this example is, obviously, 2: you can
match the first and second A's as 'AA' and also the second and third
A's as 'AA'. Compiling a regular expression of the probestr and doing a
findall(targetstr) will not work because the RE stops on the first
combination of 'AA' that it finds (the first and second A's of the
targetstr). The regular expression re.compile(r'A{2,}'} will not work,
either; it simply consumes all three A's as one match. The string
method count('AA') acts similarly to the findall of the 'AA' RE,
returning 1, as it only matches the first and second A's.

Any suggestions on how to get at that 2nd pair of A's and count it?
Humbly, I'm a bit stumped. Any help would be appreciated.
matches = 0
lc = len(a)-len(b)+1
for i in xrange(lc):
if a[i:i+lb]==b:

Oops, should be
if a[i:i+len(b)]==b:
 
A

Andrew Gwozdziewycz

Hi all,
How can one count all the permutations of a substring in a string? For
a more concrete example, let's say
targetstr = 'AAA'
and
probestr = 'AA'

I want to consider how many times one can count probestr ('AA') in
targetstr ('AAA'). The value in this example is, obviously, 2: you can
match the first and second A's as 'AA' and also the second and third
A's as 'AA'. Compiling a regular expression of the probestr and
doing a
findall(targetstr) will not work because the RE stops on the first
combination of 'AA' that it finds (the first and second A's of the
targetstr). The regular expression re.compile(r'A{2,}'} will not work,
either; it simply consumes all three A's as one match. The string
method count('AA') acts similarly to the findall of the 'AA' RE,
returning 1, as it only matches the first and second A's.

Any suggestions on how to get at that 2nd pair of A's and count it?
Humbly, I'm a bit stumped. Any help would be appreciated.


I don't think you're describing a permutation. A permutation of a
string 'ABC' would be 'CBA', or 'BCA'.

You're describing something like the number of proper substrings of a
given value, and the easiest way to do that is to just check to see if
the probestring starts a string at the current index of the string...

def count_proper_substrings_equal_to(target, probe):
i = 0
count = 0
while i < len(target):
if target[i:].startswith(probe):
count = count + 1
i = i + 1
return i

And of course there's a more 'pythonic' way to do it, but this would
be a generalized algorithm suitable for most languages.
 
A

Azolex

[counting all (possibly overlapping) occurences of a substring in a string]

def count_subs(s,subs,pos=0) :
pos = 1+s.find(subs,pos)
return pos and 1+count_subs(s,subs,pos)

or equivalently

def count_subs(s,subs)
pos,cnt = 0,0
while True :
pos = 1+s.find(subs,pos)
if not pos :
return cnt
cnt += 1

or even (using the helper functions of my last post in the "small
challenge" thread)

def count_subs(s,subs) :
cnt = 0
for pos1 in echoback(1+s.find(subs,pos) for pos in itially(0)) :
if not pos1 :
return cnt
cnt += 1


(I've minimally tested only the first version)
 
C

Chris Lasher

Great suggestions, guys! Thanks so much!

And yes, I stand corrected. A better suited subject title would have
been "Counting all overlapping substrings".

Thanks again,
Chris
 
A

Azolex

I said:
[counting all (possibly overlapping) occurences of a substring in a string]

def count_subs(s,subs,pos=0) :
pos = 1+s.find(subs,pos)
return pos and 1+count_subs(s,subs,pos) .

now to push lisp-style to the extreme, a recursive one-liner solution
with presumably better stack behavior (assuming proper tail-recursion
elimination, which I doubt is the case in Python).

Python 2.5a1 (r25a1:43589, Apr 5 2006, 10:36:43) [MSC v.1310 32 bit
(Intel)] on win32
....2

note though that the first solution beats this one on token count: 37 vs 42
 
M

Michele Simionato

Azolex said:
Python 2.5a1 (r25a1:43589, Apr 5 2006, 10:36:43) [MSC v.1310 32 bit
(Intel)] on win32
...

<sarcastic-mode>
Ah, the pleasure of having a ternary operator in the language!
</sarcastic-mode>

Michele Simionato
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top