fastest count of instances in string?

bill · Oct 3, 2003

What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

?

TIA,

-bill

Andreas Kahari · Oct 3, 2003

What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

Regexps are generally slow, but I didn't benchmark your solution
against this:

my $string = 'gtcgtcgtacgtgtgtagtattatgcgcgcgc';
my $ch = 'c';

my $pos = 0;
my $count = 0;

while ($pos = 1 + index($string, $ch, $pos)) {
++$count;
}

print "$count\n";

Dan Rawson · Oct 3, 2003

bill said:
What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

?

TIA,

-bill

From the camel:

$cnt = tr/?/?/;

AlV · Oct 3, 2003

bill said:
What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

You could try (beware, result is undef when there is no schar in $_):

# perl -e '$char="a"; $_="cacababaA"; $n=s/$char/$char/g; print "$n\n";'
4

Finding if this is faster than your method is left as an exercise ;o)

Jürgen Exner · Oct 3, 2003

bill said:
What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

That code should be quite slow: running the RE engine, implicitely looping
over the string and creating a temporary array, explicitely looping over
that array, comparing each element against $char, ...
Really, no, it should be very slow.

My guess would be using tr() will be the fastest you can get.

jue

AlV · Oct 3, 2003

AlV said:
You could try (beware, result is undef when there is no schar in $_):

# perl -e '$char="a"; $_="cacababaA"; $n=s/$char/$char/g; print "$n\n";'
4

Finding if this is faster than your method is left as an exercise ;o)

Oops,Dan Rawson is right tr// is, for obvious reasons, faster than s//g.

Here are proofs:

# time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
$n=tr/$char/$char/; }'
perl -e 1.01s user 0.00s system 98% cpu 1.026 total

# time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
$n=s/$char/$char/g; }'
perl -e 5.54s user 0.00s system 99% cpu 5.546 total

# time perl -e 'for my $i (1 .. 1000000) { $char="a";
$string="cacababaA"; for (split //, $string) { ++$count if $_ eq $char } }'
perl -e 13.32s user 0.02s system 99% cpu 13.351 total

Conclusion: always trust the Camel book ;o)

Tad McClellan · Oct 3, 2003

bill said:
What's the fastest way to count how many times a specific character
appears in a string?

perldoc -f tr

Uri Guttman · Oct 3, 2003

A> Here are proofs:

A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
A> $n=tr/$char/$char/; }'

that is broken code. try it out and see what happens.

uri

AlV · Oct 3, 2003

Uri said:
A> Here are proofs:

A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
A> $n=tr/$char/$char/; }'

that is broken code. try it out and see what happens.

You're, of course, right (from perldoc -f perlop):

Because the transliteration table is built at compile time, neither the
SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
interpolation. That means that if you want to use variables, you must
use an eval():

eval "tr/$oldlist/$newlist/";
die $@ if $@;

eval "tr/$oldlist/$newlist/, 1" or die $@;

But that would give the very slow:

# time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
$n= eval "tr/$char/$char/"; } print $n, "\n"; '
perl -e 31.11s user 0.06s system 99% cpu 31.259 total

So, s//g would be a much faster way to count the occurences of $char...
or am I still stupid?

}

Eric J. Roode · Oct 3, 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

bill said:
What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

Probably the fastest way would be to write a C function to do the counting,
then link to it via XS (or Inline::C).

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP32h92PeouIeTNHoEQIuJgCg9Tee/jyxGCNWGzpflde0awH68w4AmwQ4
VuYqJqYZLU45jpFHTUyM3l+0
=BBO6
-----END PGP SIGNATURE-----

Uri Guttman · Oct 3, 2003

A> You're, of course, right (from perldoc -f perlop):

A> But that would give the very slow:

A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
A> $n= eval "tr/$char/$char/"; } print $n, "\n"; '
A> perl -e 31.11s user 0.06s system 99% cpu 31.259 total

remove the eval from the loop. create a sub with that eval tr// in it
and call that in the loop. then you should make all the benchmark
entries call subs to equalize them. tr will be much faster than the
others.

A> So, s//g would be a much faster way to count the occurences of
A> $char... or am I still stupid?

}

not stupid, just not familiar with how to isolate the thing you really
want to benchmark. and there are many benchmark techniques and also ways
to speed up code like tr// with intrpolation (caching code refs
generated by eval, etc.) so you don't call eval each time. eval is the
killer in that loop.

uri

AlV · Oct 3, 2003

Uri said:
A> But that would give the very slow:

A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
A> $n= eval "tr/$char/$char/"; } print $n, "\n"; '
A> perl -e 31.11s user 0.06s system 99% cpu 31.259 total

remove the eval from the loop. create a sub with that eval tr// in it
and call that in the loop. then you should make all the benchmark
entries call subs to equalize them. tr will be much faster than the
others.

Huh...

O

I am certainly dense, but I can't figure how putting the tr// in a sub
and calling that sub would mean anything other than a longer execution
time...
(I tried, but certainly not the way you expected ;o)

In the first question of this thread, it was asked whether a faster
method existed for counting character occurences in a string (other than
splitting the string and looking through the array).

For me, timing the speed of a given method is timing everything that is
involved, and for the sake of reading a meaningful value, looping around
the piece of code that is involved. For example, if it takes, 30ms to
complete and my expected accuracy is about a second, I need to perform
100 to 1000 loops before I can expect a "reasonably accurate" value for
the execution time.

Down to this particular problem, if using tr/// means using an eval
around it in order to get interpolation, so be it...

Suppose the content of the string or the char (that we are trying to
count) come to change, eval should be called each time: IMHO, this
overhead should be part of the timing (some sort of worst case).

But feel free to contradict me

)

A> So, s//g would be a much faster way to count the occurences of
A> $char... or am I still stupid? }

not stupid, just not familiar with how to isolate the thing you really
want to benchmark. and there are many benchmark techniques and also ways
to speed up code like tr// with intrpolation (caching code refs
generated by eval, etc.) so you don't call eval each time.

I fail to see how we could cache "code refs generated by eval" (whatever
that could be, I am no Perl Guru ;o) if the string (in which we are
looking for the character) and the character itself change...

eval is the killer in that loop.

That, I had guessed...

)

Uri Guttman · Oct 3, 2003

A> I fail to see how we could cache "code refs generated by eval"
A> (whatever that could be, I am no Perl Guru ;o) if the string (in which
A> we are looking for the character) and the character itself change...

simple:

<untested>

my %tr_subs ;

sub tr_chars {

my ( $tr_chars ) = @_

$tr_subs{ $tr_chars } ||= eval "sub { tr/$tr_chars// }" ;

return $tr_subs{ $tr_chars }->() ;
}

this could also be done using the memoize module i would guess but it is
so short i didn't bother. also it does the tr on $_ and it could easily
be changed to work with an argument or defaulting to $_.

uri

Roy Johnson · Oct 3, 2003

AlV said:
# time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
$n=tr/$char/$char/; }'

You'll get a smidge faster if you don't bother to replace anything.
tr/// doesn't delete unless you specify the /d modifier, so you can
just do
$n=tr/$char//;
and it is non-destructive.

Uri Guttman · Oct 3, 2003

RJ> You'll get a smidge faster if you don't bother to replace anything.
RJ> tr/// doesn't delete unless you specify the /d modifier, so you can
RJ> just do
RJ> $n=tr/$char//;
RJ> and it is non-destructive.

first, you have the same problem that tr/// doesn't intrpolate.

secondly, replacing each char by itself is never destuctive. and your
notion that not using the replacement string is different then you need
to rtfm more carefully:

If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
This latter is useful for counting characters in a class or for
squashing character sequences in a class.

so not passing in the replacement string is just syntactic sugar and
nothing special semantically.

uri

Roy Johnson · Oct 3, 2003

Within this newsgroup, you are expected to provide valid

Perl code

You did not include any code in your post. Or is it not required in
posts where you chew someone out?

The code I posted IS valid Perl code.

and you are expected to test your code

It wasn't "my" code, it was code I modified ever so slightly, and I
did test it, and compared to the other code, it was a smidge faster.

Some code for you. Please try it out.

delete $butt->{'stick'};

Roy Johnson · Oct 4, 2003

Purl Gurl said:
That is not valid Perl code unless is it your
intent to set $n equal to a bare word which
returns a value of zero. There exists an
additional reason why your code is invalid.

Validity of code is not about what one intends it to do. It is whether
it compiles and runs. My code did.

You did not test your code. You are lying.

No, you are stupid. I did test my code. The test was how it did
against the originally offered code, not what output it generated.

Ignorant troll

You should make that your signature.

You certainly will not survive me.

Yeah, keep stroking yourself.

Roy Johnson · Oct 4, 2003

Uri Guttman said:
first, you have the same problem that tr/// doesn't intrpolate.

That's actually not my problem. Perhaps I wasn't clear enough in
communicating that I was only interested in the relative merits of
specifying a replacement list vs. not.

secondly, replacing each char by itself is never destuctive.

I didn't say that it was. Again, I may have been unclear: the reason I
mentioned destructiveness is that having a blank replacement set may
*appear* to be destructive. That is the only reason I can think of for
specifying that a character set be replaced with itself.

so not passing in the replacement string is just syntactic sugar and
nothing special semantically.

In much the same way that
"$stupid"
is the same as
$stupid
, my point is that
tr/stupid/stupid/
is the same as
tr/stupid//

And for whatever reason, I got the barest improvement in speed when I
benchmarked. The second version is less prone to error (say,
transposing characters in one set) and is easier to grok at a glance:
"oh, nothing is being replaced."

John W. Krahn · Oct 4, 2003

Roy said:
In much the same way that
"$stupid"
is the same as
$stupid

It is?

$ perl -le'
$stupid = \55;
print "<", $stupid + 0, "> <", "$stupid" + 0, ">";
$stupid = [ 1,2,3 ];
print "<", @{$stupid}, "> <", @{"$stupid"}, ">";
'
<135267140> <0>
<123> <>

John

AlV · Oct 4, 2003

John said:
Roy said:

In much the same way that
"$stupid"
is the same as
$stupid

Click to expand...

It is?

$ perl -le'
$stupid = \55;
print "<", $stupid + 0, "> <", "$stupid" + 0, ">";
$stupid = [ 1,2,3 ];
print "<", @{$stupid}, "> <", @{"$stupid"}, ">";
'
<135267140> <0>
<123> <>

Eh, eh, eh, nice shots!

D

But, I am not sure that bill (the person who posted the first question
of this thread) is using Perl anymore...

He most certainly is hard at work learning Java or Python or
WhateverLanguageThatIsNotPerl while mumbling about "a bunch of irascible
loonies" ;o)

This last remark is not for you, as you certainly understood, John

)

Have a nice day,

Sort and count word pairs in a string	6	Jan 29, 2023
Find and count strings of text from multiple files	17	Dec 16, 2021
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Converting an Array to a String in JavaScript	7	Sep 22, 2023
How to keep count of right answer and wrong answers in C++?	0	Nov 3, 2021
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Measuring a string of text	1	Sep 15, 2022
FAQ 5.3 How do I count the number of lines in a file?	0	Jan 31, 2011

fastest count of instances in string?

bill

Andreas Kahari

Dan Rawson

AlV

Jürgen Exner

AlV

Tad McClellan

Uri Guttman

AlV

Eric J. Roode

Uri Guttman

AlV

Uri Guttman

Roy Johnson

Uri Guttman

Roy Johnson

Roy Johnson

Roy Johnson

John W. Krahn

AlV

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads