counting number of (overlapping) occurances

Discussion in 'Python' started by John, Mar 10, 2006.

  1. John

    John Guest

    I have two strings S1 and S2. I want to know how many times
    S2 occurs inside S1.

    For instance

    if S1 = "AAAA"
    and S2 = "AA"

    then the count is 3. Is there an easy way to do this in python?
    I was trying to use the "count" function but it does not do
    overlapping counts it seems.

    Thanks,
    --j
     
    John, Mar 10, 2006
    #1
    1. Advertising

  2. John

    Paul Rubin Guest

    "John" <> writes:
    > if S1 = "AAAA"
    > and S2 = "AA"
    >
    > then the count is 3. Is there an easy way to do this in python?
    > I was trying to use the "count" function but it does not do
    > overlapping counts it seems.


    len([1 for i in xrange(len(s1)) if s1[i:].startswith(s2)])
     
    Paul Rubin, Mar 10, 2006
    #2
    1. Advertising

  3. John

    John Guest

    Thanks a lot,

    This works but is a bit slow, I guess I'll have to live with it.
    Any chance this could be sped up in python?

    Thanks once again,
    --j
     
    John, Mar 10, 2006
    #3
  4. John

    Paul Rubin Guest

    "John" <> writes:
    > This works but is a bit slow, I guess I'll have to live with it.
    > Any chance this could be sped up in python?


    Whoops, I meant to say:

    len([1 for i in xrange(len(s1)) if s1.startswith(s2,i)])

    That avoids creating a lot of small strings.

    If s1 is large you may want to use fancy algorithms like Boyer-Moore
    string search. If s2 is also large, you could have some kind of
    finite state machine scan through s1 so it recognizes overlapping
    occurrences of s2 in parallel. There might be some way to combine
    that with Boyer-Moore.
     
    Paul Rubin, Mar 10, 2006
    #4
  5. John wrote:
    > This works but is a bit slow, I guess I'll have to live with it.
    > Any chance this could be sped up in python?


    Sure, to a point. Instead of:

    def countoverlap(s1, s2):
    return len([1 for i in xrange(len(s1)) if s1[i:].startswith(s2)])

    Try this version, which takes smaller slices (resulting in 2x-5x speed
    increase when dealing with a large s1 and a small s2):

    def countoverlap(s1, s2):
    L = len(s2)
    return len([1 for i in xrange(len(s1)-L+1) if s1[i:i+L] == s2])

    And for a minor extra boost, this version eliminates the list
    comprehension:

    def countoverlap(s1, s2):
    L = len(s2)
    cnt = 0
    for i in xrange(len(s1)-L+1):
    if s1[i:i+L] == s2:
    cnt += 1
    return cnt

    Finally, if the execution speed of this function is vital to your
    application, create a C extension. String functions like this one are
    generally excellent candidates for extensionizing.

    --Ben
     
    Ben Cartwright, Mar 10, 2006
    #5
  6. John <> wrote:

    > Thanks a lot,
    >
    > This works but is a bit slow, I guess I'll have to live with it.
    > Any chance this could be sped up in python?


    Sure (untested code):

    def count_with_overlaps(needle, haystack):
    count = 0
    pos = 0
    while True:
    where = haystack.index(needle, pos)
    if where == -1: return count
    count += 1
    pos = where + 1


    Alex
     
    Alex Martelli, Mar 10, 2006
    #6
  7. Alex Martelli <> wrote:

    > John <> wrote:
    >
    > > Thanks a lot,
    > >
    > > This works but is a bit slow, I guess I'll have to live with it.
    > > Any chance this could be sped up in python?

    >
    > Sure (untested code):
    >
    > def count_with_overlaps(needle, haystack):
    > count = 0
    > pos = 0
    > while True:
    > where = haystack.index(needle, pos)


    Oops -- 'find', not 'index' -- sorry (it WAS untested;-).

    > if where == -1: return count
    > count += 1
    > pos = where + 1



    Alex
     
    Alex Martelli, Mar 10, 2006
    #7
  8. On Thu, 09 Mar 2006 22:01:17 -0800, John wrote:

    > Thanks a lot,
    >
    > This works but is a bit slow, I guess I'll have to live with it.
    > Any chance this could be sped up in python?


    I don't know. What is it?

    You should always quote enough of the previous poster's message to give
    context. Messages sometimes go missing in Usenet, and people won't have
    the foggiest idea what you are talking about.

    --
    Steven.
     
    Steven D'Aprano, Mar 11, 2006
    #8
  9. Steven D'Aprano wrote

    > You should always quote enough of the previous poster's message to give
    > context. Messages sometimes go missing in Usenet, and people won't have
    > the foggiest idea what you are talking about.


    one would think that given how many Pythoneers we now have working
    for google, at least one of them could go fix the horridly broken posting
    interface in googlegroups...

    :::

    for those who use googlegroups and prefer to make a least a little sense
    to people read this newsgroup/mailing list using other tools, please click
    "show options" and use the "reply" link at the top of the message instead
    of the link with the same name at the bottom.

    </F>
     
    Fredrik Lundh, Mar 11, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rodrick Brown

    counting word occurances

    Rodrick Brown, Jun 1, 2005, in forum: Perl
    Replies:
    2
    Views:
    2,825
    J├╝rgen Exner
    Jun 2, 2005
  2. =?Utf-8?B?SmltIEhlYXZleQ==?=

    Multiple Occurances Of Value In String

    =?Utf-8?B?SmltIEhlYXZleQ==?=, Jun 29, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    529
    Martin Marinov
    Jun 29, 2004
  3. Franz Steinhaeusler
    Replies:
    9
    Views:
    505
    Fredrik Lundh
    Dec 15, 2004
  4. Sandman
    Replies:
    7
    Views:
    223
    Anno Siegel
    Aug 3, 2004
  5. Rodrick Brown

    counting word occurances

    Rodrick Brown, Jun 3, 2005, in forum: Perl Misc
    Replies:
    22
    Views:
    299
Loading...

Share This Page