Can be this be optimized?

gamo · Jun 14, 2014

sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

TIA

gamo · Jun 14, 2014

El 14/06/14 11:09, Wasell escribió:

return scalar( () = $string =~ /$char/g );

Thank you. It works and must be a lot faster.

Peter J. Holzer · Jun 14, 2014

On Sat, 14 Jun 2014 07:40:58 +0200, in article <lngn9a$k48$1

Maybe this:

sub count {
my ($char, $string) = @_;
return scalar( () = $string =~ /$char/g );
}

Surprisingly, gamo's version is a bit faster on my systems.

hp

Peter J. Holzer · Jun 14, 2014

The usual way is to use the tr/// operator.

This works only for constant characters. tr/// doesn't do double quote
interpolation, so you would have to use string eval to implement a count
subroutine with tr (which I did and which is has (unsurprisingly) quite
a high overhead. It's faster for long strings (over ~ 2000 characters),
though).

In all cases you have to be careful with the $char argument, because it
may have a special meaning. In the regexp variants using /\Q$char/
instead of /$char/ should work, though.

hp

George Mpouras · Jun 14, 2014

#!/usr/bin/perl
# This is about 10 times faster

use strict;
use warnings;
use Benchmark;

my $char = 'l';
my $string = 'Hello Worldll';
my $veryfast = fast_iterator($char, $string);

Benchmark::cmpthese(100_000, {
orig => sub {count($char, $string)},
fast => $veryfast
});

sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

sub fast_iterator {
my ($char, $string, $c) = (@_,0);
eval "sub { '$string'=~tr/$char/$char/ }"
}

gamo · Jun 14, 2014

El 14/06/14 15:06, George Mpouras escribió:

#!/usr/bin/perl
# This is about 10 times faster

use strict;
use warnings;
use Benchmark;

my $char = 'l';
my $string = 'Hello Worldll';
my $veryfast = fast_iterator($char, $string);

Benchmark::cmpthese(100_000, {
orig => sub {count($char, $string)},
fast => $veryfast
});

sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

sub fast_iterator {
my ($char, $string, $c) = (@_,0);
eval "sub { '$string'=~tr/$char/$char/ }"
}

Well, thanks. At first sight, doesn't look good.
At a second sight, it seems to run fast_iterator()
only once and store the result in a scalar $veryfast.
I don't check the results: sure they are impressive.
It happened to me too, a great result is caused
for not doing rather than doing.

Rainer Weikusat · Jun 15, 2014

gamo said:
sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

Depending on how long your strings are and how often you'll be calling
this function vs how often the program is going to be compiled, using a
set of "pre-compiled character-counting routines" might make sense:

----------------------
use Benchmark;

use constant LEN => 50;

my $string = join('', map { chr(rand(26) + 65) } 1 .. LEN);
my $char = chr(rand(26) + 65);

print STDERR ("$string, $char\n");

sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

sub count2 {
my ($char, $string) = @_;
return scalar( () = $string =~ /$char/g );
}

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
sub count3 {
&{$counters[ord(shift) - 65]};
}

print STDERR (count3($char, $string), "\n");

timethese(-4,
{
count => sub { count($char, $string) },
count2 => sub { count2($char, $string)},
count3 => sub { count3($char, $string)}
});

gamo · Jun 15, 2014

El 15/06/14 15:47, Rainer Weikusat escribió:

gamo said:
gamo said:

sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

Click to expand...

Depending on how long your strings are and how often you'll be calling
this function vs how often the program is going to be compiled, using a
set of "pre-compiled character-counting routines" might make sense:

----------------------
use Benchmark;

use constant LEN => 50;

my $string = join('', map { chr(rand(26) + 65) } 1 .. LEN);
my $char = chr(rand(26) + 65);

print STDERR ("$string, $char\n");

sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

sub count2 {
my ($char, $string) = @_;
return scalar( () = $string =~ /$char/g );
}

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
sub count3 {
&{$counters[ord(shift) - 65]};
}

print STDERR (count3($char, $string), "\n");

timethese(-4,
{
count => sub { count($char, $string) },
count2 => sub { count2($char, $string)},
count3 => sub { count3($char, $string)}
});

Thank you. 'count3' is faster but do the counting before,
I think. It's confuse for me.

Is there any chance that a solution with substr could be
efficient?

Peter J. Holzer · Jun 15, 2014

gamo said:
El 15/06/14 15:47, Rainer Weikusat escribió:

gamo said:

sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

Click to expand...

Depending on how long your strings are and how often you'll be calling
this function vs how often the program is going to be compiled, using a
set of "pre-compiled character-counting routines" might make sense:

---------------------- [...]

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
sub count3 {
&{$counters[ord(shift) - 65]};
}

Click to expand...

Thank you. 'count3' is faster but do the counting before,
I think. It's confuse for me.

No, it uses the same trick as Mpouras demonstrated before: It constructs
and compiles a custom subroutine counting the occurances of a specific
character using tr and then runs that the custom sub. The difference is
that Rainer's version creates subs for the characters 'A' to 'Z'
beforehand and then just calls the appropriate behind the scenes, while
with Mpouras' version you had to call the generator function
explicitely.

I think I would just create each custom functions on first call and use
a hash to cache them instead or precreateing them for a set of
characters, though. Just in case somebody wants to search for
"\x{1F4A9}" ...

Is there any chance that a solution with substr could be
efficient?

Why don't you try it? I doubt it, but then I was surprised that your
explicit loop was faster than using scalar to count the matches.

Using index() is about as fast as your original version.

hp

gamo · Jun 15, 2014

El 15/06/14 17:13, Peter J. Holzer escribió:

Why don't you try it? I doubt it, but then I was surprised that your
explicit loop was faster than using scalar to count the matches.

Using index() is about as fast as your original version.

Tried with substr, it's a lot slower.

Thanks

Rainer Weikusat · Jun 15, 2014

gamo said:
El 15/06/14 15:47, Rainer Weikusat escribió:[...]

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
sub count3 {
&{$counters[ord(shift) - 65]};
}

print STDERR (count3($char, $string), "\n");

timethese(-4,
{
count => sub { count($char, $string) },
count2 => sub { count2($char, $string)},
count3 => sub { count3($char, $string)}
});

Click to expand...

Thank you. 'count3' is faster but do the counting before,
I think. It's confuse for me.

Some explanations: For the purpose of this example, 'characters' are
restricted to the uppercase letters A - Z. Further, ASCII enconding is
assumed.

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';

This builds an array of 26 subroutines each counting occurrence of one of
the possible input characters in its first argument. In order to avoid
copying this argument (and because the subroutine code is really
simple), it accesses that via @_ as $_[0].

sub count3 {
&{$counters[ord(shift) - 65]};
}

This is a subroutine which expects the character supposed to be counted
as first argument and the string as second. It shifts the first argument
of its @_ and converts that into an ASCII codepoint via ord. Subtracting
65 (the ASCII code of A) from this numbers results in the array index
for the counting subroutine counting the correct character. The result
of the expression

$counters[ord(shift) - 65]

is a reference to this subroutine. It is then invoked via & without
arguments which means the invoked subroutine uses the @_ of the invoking
subroutine. Since the original first argument was shifted away, the
string to be searched is now the new $_[0] which was what the called
subroutine expects.

This is also a nice example where the argument passing mechanism used by
Perl really shines: All the outer subroutine has to know about the
passed @_ is 'the first argument is mine'. Whatever remains is simply
passed on to the next subroutine and considering that Perl is strictly
'the caller decides whatever it likes to pass' (when prototypes are not
being used), calling a subroutine with more than one argument indirectly
in this way can be accomplished by passing more arguments to the
directly invoked subroutine which will pass them on.

gamo · Jun 15, 2014

El 15/06/14 20:41, Rainer Weikusat escribió:

Thank you. 'count3' is faster but do the counting before,
I think. It's confuse for me.

Click to expand...

Some explanations: For the purpose of this example, 'characters' are
restricted to the uppercase letters A - Z. Further, ASCII enconding is
assumed.

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';

This builds an array of 26 subroutines each counting occurrence of one of
the possible input characters in its first argument. In order to avoid
copying this argument (and because the subroutine code is really
simple), it accesses that via @_ as $_[0].

sub count3 {
&{$counters[ord(shift) - 65]};
}

This is a subroutine which expects the character supposed to be counted
as first argument and the string as second. It shifts the first argument
of its @_ and converts that into an ASCII codepoint via ord. Subtracting
65 (the ASCII code of A) from this numbers results in the array index
for the counting subroutine counting the correct character. The result
of the expression

$counters[ord(shift) - 65]

is a reference to this subroutine. It is then invoked via & without
arguments which means the invoked subroutine uses the @_ of the invoking
subroutine. Since the original first argument was shifted away, the
string to be searched is now the new $_[0] which was what the called
subroutine expects.

This is also a nice example where the argument passing mechanism used by
Perl really shines: All the outer subroutine has to know about the
passed @_ is 'the first argument is mine'. Whatever remains is simply
passed on to the next subroutine and considering that Perl is strictly
'the caller decides whatever it likes to pass' (when prototypes are not
being used), calling a subroutine with more than one argument indirectly
in this way can be accomplished by passing more arguments to the
directly invoked subroutine which will pass them on.

Genial, thanks for the explanation.

G.B. · Jun 16, 2014

El 15/06/14 17:13, Peter J. Holzer escribió:

Tried with substr, it's a lot slower.

index(); maybe a little boring, but flexible, and a little
faster than your original,

sub count4 { # string, string -> int
my ($c, $p) = (0, 0);
while (($p = index($_[1], $_[0], $p)) > -1) {
++$c, ++$p;
}
return $c;
}

Rainer Weikusat · Jun 16, 2014

G.B. said:
El 15/06/14 17:13, Peter J. Holzer escribió:

Tried with substr, it's a lot slower.

Click to expand...

index(); maybe a little boring, but flexible, and a little
faster than your original,

sub count4 { # string, string -> int
my ($c, $p) = (0, 0);
while (($p = index($_[1], $_[0], $p)) > -1) {
++$c, ++$p;
}
return $c;
}

Here's another mid-field method:

sub count5
{
my $s = $_[1];
return $s =~ s/$_[0]/x/g;
}

Rainer Weikusat · Jun 16, 2014

Rainer Weikusat said:
"G.B." <[email protected]> writes:

[count characters in a string]

sub count4 { # string, string -> int
my ($c, $p) = (0, 0);
while (($p = index($_[1], $_[0], $p)) > -1) {
++$c, ++$p;
}
return $c;
}

Click to expand...

Here's another mid-field method:

sub count5
{
my $s = $_[1];
return $s =~ s/$_[0]/x/g;
}

Here's another which is bad for short strings but does (for me) amazingly
good for long ones:

sub count6
{
my $c = 0;
my $first;

for ($_[1]) {
/\G[^$_[0]]+/gc and $first = pos(), redo;
/\G$_[0]+/gc and $c += pos() - $first, redo;
}

return $c;
}

About writing optimized code.	4	May 2, 2023
What "might" I be doing wrong with this snippet?	6	Sep 25, 2023
Can this be optimized?	9	Jun 21, 2011
Can someone explain why i have to drag my mouse on one window and the shape to be printed on another	1	Feb 9, 2022
Can someone tell me what's wrong with this question on StackOverflow?	0	Aug 19, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Can this code be used? Github repository question	0	Sep 28, 2023
Prompt was to have user enter sentence and output should be the number of words, this code's correct but idk How "prevChar = currentChar"work?	2	Mar 8, 2024

Can be this be optimized?

gamo

gamo

Peter J. Holzer

Peter J. Holzer

George Mpouras

gamo

Rainer Weikusat

gamo

Peter J. Holzer

gamo

Rainer Weikusat

gamo

G.B.

Rainer Weikusat

Rainer Weikusat

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads