fastest count of instances in string?

B

bill

What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

?

TIA,

-bill
 
A

Andreas Kahari

What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

Regexps are generally slow, but I didn't benchmark your solution
against this:

my $string = 'gtcgtcgtacgtgtgtagtattatgcgcgcgc';
my $ch = 'c';

my $pos = 0;
my $count = 0;

while ($pos = 1 + index($string, $ch, $pos)) {
++$count;
}

print "$count\n";
 
D

Dan Rawson

bill said:
What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

?

TIA,

-bill
From the camel:

$cnt = tr/?/?/;
 
A

AlV

bill said:
What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

You could try (beware, result is undef when there is no schar in $_):

# perl -e '$char="a"; $_="cacababaA"; $n=s/$char/$char/g; print "$n\n";'
4

Finding if this is faster than your method is left as an exercise ;o)
 
J

Jürgen Exner

bill said:
What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

That code should be quite slow: running the RE engine, implicitely looping
over the string and creating a temporary array, explicitely looping over
that array, comparing each element against $char, ...
Really, no, it should be very slow.

My guess would be using tr() will be the fastest you can get.

jue
 
A

AlV

AlV said:
You could try (beware, result is undef when there is no schar in $_):

# perl -e '$char="a"; $_="cacababaA"; $n=s/$char/$char/g; print "$n\n";'
4

Finding if this is faster than your method is left as an exercise ;o)

Oops,Dan Rawson is right tr// is, for obvious reasons, faster than s//g.

Here are proofs:

# time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
$n=tr/$char/$char/; }'
perl -e 1.01s user 0.00s system 98% cpu 1.026 total

# time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
$n=s/$char/$char/g; }'
perl -e 5.54s user 0.00s system 99% cpu 5.546 total

# time perl -e 'for my $i (1 .. 1000000) { $char="a";
$string="cacababaA"; for (split //, $string) { ++$count if $_ eq $char } }'
perl -e 13.32s user 0.02s system 99% cpu 13.351 total


Conclusion: always trust the Camel book ;o)
 
U

Uri Guttman

A> Here are proofs:

A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
A> $n=tr/$char/$char/; }'

that is broken code. try it out and see what happens.

uri
 
A

AlV

Uri said:
A> Here are proofs:

A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
A> $n=tr/$char/$char/; }'

that is broken code. try it out and see what happens.

You're, of course, right (from perldoc -f perlop):

Because the transliteration table is built at compile time, neither the
SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote
interpolation. That means that if you want to use variables, you must
use an eval():

eval "tr/$oldlist/$newlist/";
die $@ if $@;

eval "tr/$oldlist/$newlist/, 1" or die $@;

But that would give the very slow:

# time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
$n= eval "tr/$char/$char/"; } print $n, "\n"; '
perl -e 31.11s user 0.06s system 99% cpu 31.259 total


So, s//g would be a much faster way to count the occurences of $char...
or am I still stupid? :eek:}
 
E

Eric J. Roode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

bill said:
What's the fastest way to count how many times a specific character
appears in a string? Anything faster than something like

for (split //, $string) { ++$count if $_ eq $char }

Probably the fastest way would be to write a C function to do the counting,
then link to it via XS (or Inline::C).

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP32h92PeouIeTNHoEQIuJgCg9Tee/jyxGCNWGzpflde0awH68w4AmwQ4
VuYqJqYZLU45jpFHTUyM3l+0
=BBO6
-----END PGP SIGNATURE-----
 
U

Uri Guttman

A> You're, of course, right (from perldoc -f perlop):

A> But that would give the very slow:

A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
A> $n= eval "tr/$char/$char/"; } print $n, "\n"; '
A> perl -e 31.11s user 0.06s system 99% cpu 31.259 total

remove the eval from the loop. create a sub with that eval tr// in it
and call that in the loop. then you should make all the benchmark
entries call subs to equalize them. tr will be much faster than the
others.

A> So, s//g would be a much faster way to count the occurences of
A> $char... or am I still stupid? :eek:}

not stupid, just not familiar with how to isolate the thing you really
want to benchmark. and there are many benchmark techniques and also ways
to speed up code like tr// with intrpolation (caching code refs
generated by eval, etc.) so you don't call eval each time. eval is the
killer in that loop.

uri
 
A

AlV

Uri said:
A> But that would give the very slow:

A> # time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
A> $n= eval "tr/$char/$char/"; } print $n, "\n"; '
A> perl -e 31.11s user 0.06s system 99% cpu 31.259 total

remove the eval from the loop. create a sub with that eval tr// in it
and call that in the loop. then you should make all the benchmark
entries call subs to equalize them. tr will be much faster than the
others.

Huh... :eek:O

I am certainly dense, but I can't figure how putting the tr// in a sub
and calling that sub would mean anything other than a longer execution
time...
(I tried, but certainly not the way you expected ;o)

In the first question of this thread, it was asked whether a faster
method existed for counting character occurences in a string (other than
splitting the string and looking through the array).

For me, timing the speed of a given method is timing everything that is
involved, and for the sake of reading a meaningful value, looping around
the piece of code that is involved. For example, if it takes, 30ms to
complete and my expected accuracy is about a second, I need to perform
100 to 1000 loops before I can expect a "reasonably accurate" value for
the execution time.

Down to this particular problem, if using tr/// means using an eval
around it in order to get interpolation, so be it...

Suppose the content of the string or the char (that we are trying to
count) come to change, eval should be called each time: IMHO, this
overhead should be part of the timing (some sort of worst case).

But feel free to contradict me :eek:)
A> So, s//g would be a much faster way to count the occurences of
A> $char... or am I still stupid? :eek:}

not stupid, just not familiar with how to isolate the thing you really
want to benchmark. and there are many benchmark techniques and also ways
to speed up code like tr// with intrpolation (caching code refs
generated by eval, etc.) so you don't call eval each time.

I fail to see how we could cache "code refs generated by eval" (whatever
that could be, I am no Perl Guru ;o) if the string (in which we are
looking for the character) and the character itself change...
eval is the killer in that loop.

That, I had guessed... :eek:)
 
U

Uri Guttman

A> I fail to see how we could cache "code refs generated by eval"
A> (whatever that could be, I am no Perl Guru ;o) if the string (in which
A> we are looking for the character) and the character itself change...

simple:

<untested>

my %tr_subs ;

sub tr_chars {

my ( $tr_chars ) = @_

$tr_subs{ $tr_chars } ||= eval "sub { tr/$tr_chars// }" ;

return $tr_subs{ $tr_chars }->() ;
}

this could also be done using the memoize module i would guess but it is
so short i didn't bother. also it does the tr on $_ and it could easily
be changed to work with an argument or defaulting to $_.

uri
 
R

Roy Johnson

AlV said:
# time perl -e 'for my $i (1 .. 1000000) { $char="a"; $_="cacababaA";
$n=tr/$char/$char/; }'

You'll get a smidge faster if you don't bother to replace anything.
tr/// doesn't delete unless you specify the /d modifier, so you can
just do
$n=tr/$char//;
and it is non-destructive.
 
U

Uri Guttman

RJ> You'll get a smidge faster if you don't bother to replace anything.
RJ> tr/// doesn't delete unless you specify the /d modifier, so you can
RJ> just do
RJ> $n=tr/$char//;
RJ> and it is non-destructive.

first, you have the same problem that tr/// doesn't intrpolate.

secondly, replacing each char by itself is never destuctive. and your
notion that not using the replacement string is different then you need
to rtfm more carefully:

If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated.
This latter is useful for counting characters in a class or for
squashing character sequences in a class.

so not passing in the replacement string is just syntactic sugar and
nothing special semantically.

uri
 
R

Roy Johnson

Within this newsgroup, you are expected to provide valid
Perl code

You did not include any code in your post. Or is it not required in
posts where you chew someone out?

The code I posted IS valid Perl code.
and you are expected to test your code

It wasn't "my" code, it was code I modified ever so slightly, and I
did test it, and compared to the other code, it was a smidge faster.

Some code for you. Please try it out.

delete $butt->{'stick'};
 
R

Roy Johnson

Purl Gurl said:
That is not valid Perl code unless is it your
intent to set $n equal to a bare word which
returns a value of zero. There exists an
additional reason why your code is invalid.

Validity of code is not about what one intends it to do. It is whether
it compiles and runs. My code did.
You did not test your code. You are lying.

No, you are stupid. I did test my code. The test was how it did
against the originally offered code, not what output it generated.
Ignorant troll

You should make that your signature.
You certainly will not survive me.

Yeah, keep stroking yourself.
 
R

Roy Johnson

Uri Guttman said:
first, you have the same problem that tr/// doesn't intrpolate.

That's actually not my problem. Perhaps I wasn't clear enough in
communicating that I was only interested in the relative merits of
specifying a replacement list vs. not.
secondly, replacing each char by itself is never destuctive.

I didn't say that it was. Again, I may have been unclear: the reason I
mentioned destructiveness is that having a blank replacement set may
*appear* to be destructive. That is the only reason I can think of for
specifying that a character set be replaced with itself.
so not passing in the replacement string is just syntactic sugar and
nothing special semantically.

In much the same way that
"$stupid"
is the same as
$stupid
, my point is that
tr/stupid/stupid/
is the same as
tr/stupid//

And for whatever reason, I got the barest improvement in speed when I
benchmarked. The second version is less prone to error (say,
transposing characters in one set) and is easier to grok at a glance:
"oh, nothing is being replaced."
 
J

John W. Krahn

Roy said:
In much the same way that
"$stupid"
is the same as
$stupid

It is?

$ perl -le'
$stupid = \55;
print "<", $stupid + 0, "> <", "$stupid" + 0, ">";
$stupid = [ 1,2,3 ];
print "<", @{$stupid}, "> <", @{"$stupid"}, ">";
'
<135267140> <0>
<123> <>



John
 
A

AlV

John said:
Roy said:
In much the same way that
"$stupid"
is the same as
$stupid


It is?

$ perl -le'
$stupid = \55;
print "<", $stupid + 0, "> <", "$stupid" + 0, ">";
$stupid = [ 1,2,3 ];
print "<", @{$stupid}, "> <", @{"$stupid"}, ">";
'
<135267140> <0>
<123> <>

Eh, eh, eh, nice shots! :eek:D

But, I am not sure that bill (the person who posted the first question
of this thread) is using Perl anymore...

He most certainly is hard at work learning Java or Python or
WhateverLanguageThatIsNotPerl while mumbling about "a bunch of irascible
loonies" ;o)

This last remark is not for you, as you certainly understood, John :eek:)

Have a nice day,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top