fastest count of instances in string?

Discussion in 'Perl Misc' started by bill, Oct 3, 2003.

  1. bill

    bill Guest

    What's the fastest way to count how many times a specific character
    appears in a string? Anything faster than something like

    for (split //, $string) { ++$count if $_ eq $char }

    ?

    TIA,

    -bill
     
    bill, Oct 3, 2003
    #1
    1. Advertising

  2. In article <blk164$cd5$>, bill wrote:
    >
    >
    > What's the fastest way to count how many times a specific character
    > appears in a string? Anything faster than something like
    >
    > for (split //, $string) { ++$count if $_ eq $char }


    Regexps are generally slow, but I didn't benchmark your solution
    against this:

    my $string = 'gtcgtcgtacgtgtgtagtattatgcgcgcgc';
    my $ch = 'c';

    my $pos = 0;
    my $count = 0;

    while ($pos = 1 + index($string, $ch, $pos)) {
    ++$count;
    }

    print "$count\n";

    --
    Andreas Kähäri
     
    Andreas Kahari, Oct 3, 2003
    #2
    1. Advertising

  3. bill

    Dan Rawson Guest

    bill wrote:
    > What's the fastest way to count how many times a specific character
    > appears in a string? Anything faster than something like
    >
    > for (split //, $string) { ++$count if $_ eq $char }
    >
    > ?
    >
    > TIA,
    >
    > -bill
    >

    From the camel:

    $cnt = tr/?/?/;
     
    Dan Rawson, Oct 3, 2003
    #3
  4. bill

    AlV Guest

    bill wrote:
    > What's the fastest way to count how many times a specific character
    > appears in a string? Anything faster than something like
    >
    > for (split //, $string) { ++$count if $_ eq $char }


    You could try (beware, result is undef when there is no schar in $_):

    # perl -e '$char="a"; $_="cacababaA"; $n=s/$char/$char/g; print "$n\n";'
    4

    Finding if this is faster than your method is left as an exercise ;o)
     
    AlV, Oct 3, 2003
    #4
  5. bill wrote:
    > What's the fastest way to count how many times a specific character
    > appears in a string? Anything faster than something like
    >
    > for (split //, $string) { ++$count if $_ eq $char }


    That code should be quite slow: running the RE engine, implicitely looping
    over the string and creating a temporary array, explicitely looping over
    that array, comparing each element against $char, ...
    Really, no, it should be very slow.

    My guess would be using tr() will be the fastest you can get.

    jue
     
    Jürgen Exner, Oct 3, 2003
    #5
  6. bill

    AlV Guest

    AlV wrote:
    > bill wrote:
    >
    >> What's the fastest way to count how many times a specific character
    >> appears in a string? Anything faster than something like
    >>
    >> for (split //, $string) { ++$count if $_ eq $char }

    >
    >
    > You could try (beware, result is undef when there is no schar in $_):
    >
    > # perl -e '$char="a"; $_="cacababaA"; $n=s/$char/$char/g; print "$n\n";'
    > 4
    >
    > Finding if this is faster than your method is left as an exercise ;o)
    >


    Oops,Dan Rawson is right tr// is, for obvious reasons, faster than s//g.

    Here are proofs:

    # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
    $n=tr/$char/$char/; }'
    perl -e 1.01s user 0.00s system 98% cpu 1.026 total

    # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
    $n=s/$char/$char/g; }'
    perl -e 5.54s user 0.00s system 99% cpu 5.546 total

    # time perl -e 'for my $i (1 .. 1000000) { $char="a";
    $string="cacababaA"; for (split //, $string) { ++$count if $_ eq $char } }'
    perl -e 13.32s user 0.02s system 99% cpu 13.351 total


    Conclusion: always trust the Camel book ;o)
     
    AlV, Oct 3, 2003
    #6
  7. bill <> wrote:
    >
    >
    > What's the fastest way to count how many times a specific character
    > appears in a string?



    perldoc -f tr


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Oct 3, 2003
    #7
  8. bill

    Uri Guttman Guest

    >>>>> "A" == AlV <> writes:

    A> Here are proofs:

    A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
    A> $n=tr/$char/$char/; }'

    that is broken code. try it out and see what happens.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, Oct 3, 2003
    #8
  9. bill

    AlV Guest

    Uri Guttman wrote:
    >>>>>>"A" == AlV <> writes:

    >
    >
    > A> Here are proofs:
    >
    > A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
    > A> $n=tr/$char/$char/; }'
    >
    > that is broken code. try it out and see what happens.


    You're, of course, right (from perldoc -f perlop):

    Because the transliteration table is built at compile time, neither the
    SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
    interpolation. That means that if you want to use variables, you must
    use an eval():

    eval "tr/$oldlist/$newlist/";
    die $@ if $@;

    eval "tr/$oldlist/$newlist/, 1" or die $@;

    But that would give the very slow:

    # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
    $n= eval "tr/$char/$char/"; } print $n, "\n"; '
    perl -e 31.11s user 0.06s system 99% cpu 31.259 total


    So, s//g would be a much faster way to count the occurences of $char...
    or am I still stupid? :eek:}
     
    AlV, Oct 3, 2003
    #9
  10. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    bill <> wrote in news:blk164$cd5$:

    >
    >
    > What's the fastest way to count how many times a specific character
    > appears in a string? Anything faster than something like
    >
    > for (split //, $string) { ++$count if $_ eq $char }


    Probably the fastest way would be to write a C function to do the counting,
    then link to it via XS (or Inline::C).

    - --
    Eric
    $_ = reverse sort $ /. r , qw p ekca lre uJ reh
    ts p , map $ _. $ " , qw e p h tona e and print

    -----BEGIN PGP SIGNATURE-----
    Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

    iQA/AwUBP32h92PeouIeTNHoEQIuJgCg9Tee/jyxGCNWGzpflde0awH68w4AmwQ4
    VuYqJqYZLU45jpFHTUyM3l+0
    =BBO6
    -----END PGP SIGNATURE-----
     
    Eric J. Roode, Oct 3, 2003
    #10
  11. bill

    Uri Guttman Guest

    >>>>> "A" == AlV <> writes:

    A> Uri Guttman wrote:
    >>>>>>> "A" == AlV <> writes:

    A> $n=tr/$char/$char/; }'
    >> that is broken code. try it out and see what happens.


    A> You're, of course, right (from perldoc -f perlop):

    A> But that would give the very slow:

    A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
    A> $n= eval "tr/$char/$char/"; } print $n, "\n"; '
    A> perl -e 31.11s user 0.06s system 99% cpu 31.259 total

    remove the eval from the loop. create a sub with that eval tr// in it
    and call that in the loop. then you should make all the benchmark
    entries call subs to equalize them. tr will be much faster than the
    others.

    A> So, s//g would be a much faster way to count the occurences of
    A> $char... or am I still stupid? :eek:}

    not stupid, just not familiar with how to isolate the thing you really
    want to benchmark. and there are many benchmark techniques and also ways
    to speed up code like tr// with intrpolation (caching code refs
    generated by eval, etc.) so you don't call eval each time. eval is the
    killer in that loop.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, Oct 3, 2003
    #11
  12. bill

    AlV Guest

    Uri Guttman wrote:
    >>>>>>"A" == AlV <> writes:

    >
    > A> But that would give the very slow:
    >
    > A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
    > A> $n= eval "tr/$char/$char/"; } print $n, "\n"; '
    > A> perl -e 31.11s user 0.06s system 99% cpu 31.259 total
    >
    > remove the eval from the loop. create a sub with that eval tr// in it
    > and call that in the loop. then you should make all the benchmark
    > entries call subs to equalize them. tr will be much faster than the
    > others.


    Huh... :eek:O

    I am certainly dense, but I can't figure how putting the tr// in a sub
    and calling that sub would mean anything other than a longer execution
    time...
    (I tried, but certainly not the way you expected ;o)

    In the first question of this thread, it was asked whether a faster
    method existed for counting character occurences in a string (other than
    splitting the string and looking through the array).

    For me, timing the speed of a given method is timing everything that is
    involved, and for the sake of reading a meaningful value, looping around
    the piece of code that is involved. For example, if it takes, 30ms to
    complete and my expected accuracy is about a second, I need to perform
    100 to 1000 loops before I can expect a "reasonably accurate" value for
    the execution time.

    Down to this particular problem, if using tr/// means using an eval
    around it in order to get interpolation, so be it...

    Suppose the content of the string or the char (that we are trying to
    count) come to change, eval should be called each time: IMHO, this
    overhead should be part of the timing (some sort of worst case).

    But feel free to contradict me :eek:)

    > A> So, s//g would be a much faster way to count the occurences of
    > A> $char... or am I still stupid? :eek:}
    >
    > not stupid, just not familiar with how to isolate the thing you really
    > want to benchmark. and there are many benchmark techniques and also ways
    > to speed up code like tr// with intrpolation (caching code refs
    > generated by eval, etc.) so you don't call eval each time.


    I fail to see how we could cache "code refs generated by eval" (whatever
    that could be, I am no Perl Guru ;o) if the string (in which we are
    looking for the character) and the character itself change...

    > eval is the killer in that loop.


    That, I had guessed... :eek:)
     
    AlV, Oct 3, 2003
    #12
  13. bill

    Uri Guttman Guest

    >>>>> "A" == AlV <> writes:

    A> I fail to see how we could cache "code refs generated by eval"
    A> (whatever that could be, I am no Perl Guru ;o) if the string (in which
    A> we are looking for the character) and the character itself change...

    simple:

    <untested>

    my %tr_subs ;

    sub tr_chars {

    my ( $tr_chars ) = @_

    $tr_subs{ $tr_chars } ||= eval "sub { tr/$tr_chars// }" ;

    return $tr_subs{ $tr_chars }->() ;
    }

    this could also be done using the memoize module i would guess but it is
    so short i didn't bother. also it does the tr on $_ and it could easily
    be changed to work with an argument or defaulting to $_.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, Oct 3, 2003
    #13
  14. bill

    Roy Johnson Guest

    AlV <> wrote in message news:<blk2v6$ub8$>...
    > # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
    > $n=tr/$char/$char/; }'


    You'll get a smidge faster if you don't bother to replace anything.
    tr/// doesn't delete unless you specify the /d modifier, so you can
    just do
    $n=tr/$char//;
    and it is non-destructive.
     
    Roy Johnson, Oct 3, 2003
    #14
  15. bill

    Uri Guttman Guest

    >>>>> "RJ" == Roy Johnson <> writes:

    RJ> AlV <> wrote in message news:<blk2v6$ub8$>...
    >> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
    >> $n=tr/$char/$char/; }'


    RJ> You'll get a smidge faster if you don't bother to replace anything.
    RJ> tr/// doesn't delete unless you specify the /d modifier, so you can
    RJ> just do
    RJ> $n=tr/$char//;
    RJ> and it is non-destructive.

    first, you have the same problem that tr/// doesn't intrpolate.

    secondly, replacing each char by itself is never destuctive. and your
    notion that not using the replacement string is different then you need
    to rtfm more carefully:

    If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
    This latter is useful for counting characters in a class or for
    squashing character sequences in a class.

    so not passing in the replacement string is just syntactic sugar and
    nothing special semantically.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
     
    Uri Guttman, Oct 3, 2003
    #15
  16. bill

    Roy Johnson Guest

    > Within this newsgroup, you are expected to provide valid
    > Perl code


    You did not include any code in your post. Or is it not required in
    posts where you chew someone out?

    The code I posted IS valid Perl code.

    > and you are expected to test your code


    It wasn't "my" code, it was code I modified ever so slightly, and I
    did test it, and compared to the other code, it was a smidge faster.

    Some code for you. Please try it out.

    delete $butt->{'stick'};
     
    Roy Johnson, Oct 3, 2003
    #16
  17. bill

    Roy Johnson Guest

    Purl Gurl <> wrote in message news:<>...

    > That is not valid Perl code unless is it your
    > intent to set $n equal to a bare word which
    > returns a value of zero. There exists an
    > additional reason why your code is invalid.


    Validity of code is not about what one intends it to do. It is whether
    it compiles and runs. My code did.

    > You did not test your code. You are lying.


    No, you are stupid. I did test my code. The test was how it did
    against the originally offered code, not what output it generated.

    > Ignorant troll


    You should make that your signature.

    > You certainly will not survive me.


    Yeah, keep stroking yourself.
     
    Roy Johnson, Oct 4, 2003
    #17
  18. bill

    Roy Johnson Guest

    Uri Guttman <> wrote in message news:<>...

    > first, you have the same problem that tr/// doesn't intrpolate.


    That's actually not my problem. Perhaps I wasn't clear enough in
    communicating that I was only interested in the relative merits of
    specifying a replacement list vs. not.

    > secondly, replacing each char by itself is never destuctive.


    I didn't say that it was. Again, I may have been unclear: the reason I
    mentioned destructiveness is that having a blank replacement set may
    *appear* to be destructive. That is the only reason I can think of for
    specifying that a character set be replaced with itself.

    > so not passing in the replacement string is just syntactic sugar and
    > nothing special semantically.


    In much the same way that
    "$stupid"
    is the same as
    $stupid
    , my point is that
    tr/stupid/stupid/
    is the same as
    tr/stupid//

    And for whatever reason, I got the barest improvement in speed when I
    benchmarked. The second version is less prone to error (say,
    transposing characters in one set) and is easier to grok at a glance:
    "oh, nothing is being replaced."
     
    Roy Johnson, Oct 4, 2003
    #18
  19. Roy Johnson wrote:
    >
    > In much the same way that
    > "$stupid"
    > is the same as
    > $stupid


    It is?

    $ perl -le'
    $stupid = \55;
    print "<", $stupid + 0, "> <", "$stupid" + 0, ">";
    $stupid = [ 1,2,3 ];
    print "<", @{$stupid}, "> <", @{"$stupid"}, ">";
    '
    <135267140> <0>
    <123> <>



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Oct 4, 2003
    #19
  20. bill

    AlV Guest

    John W. Krahn wrote:
    > Roy Johnson wrote:
    >
    >>In much the same way that
    >> "$stupid"
    >>is the same as
    >> $stupid

    >
    >
    > It is?
    >
    > $ perl -le'
    > $stupid = \55;
    > print "<", $stupid + 0, "> <", "$stupid" + 0, ">";
    > $stupid = [ 1,2,3 ];
    > print "<", @{$stupid}, "> <", @{"$stupid"}, ">";
    > '
    > <135267140> <0>
    > <123> <>


    Eh, eh, eh, nice shots! :eek:D

    But, I am not sure that bill (the person who posted the first question
    of this thread) is using Perl anymore...

    He most certainly is hard at work learning Java or Python or
    WhateverLanguageThatIsNotPerl while mumbling about "a bunch of irascible
    loonies" ;o)

    This last remark is not for you, as you certainly understood, John :eek:)

    Have a nice day,
     
    AlV, Oct 4, 2003
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Amy G
    Replies:
    9
    Views:
    322
    Peter Otten
    Nov 24, 2003
  2. John Wohlbier
    Replies:
    2
    Views:
    379
    Josiah Carlson
    Feb 22, 2004
  3. Cedric Fontaine
    Replies:
    1
    Views:
    320
    Jim Langston
    Dec 11, 2007
  4. Replies:
    8
    Views:
    477
    James Stroud
    Jan 29, 2009
  5. Julien Thewys
    Replies:
    10
    Views:
    166
    Robert Dober
    Jul 25, 2008
Loading...

Share This Page