Can be this be optimized?

G

gamo

sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

TIA
 
P

Peter J. Holzer

On Sat, 14 Jun 2014 07:40:58 +0200, in article <lngn9a$k48$1


Maybe this:

sub count {
my ($char, $string) = @_;
return scalar( () = $string =~ /$char/g );
}

Surprisingly, gamo's version is a bit faster on my systems.

hp
 
P

Peter J. Holzer

The usual way is to use the tr/// operator.

This works only for constant characters. tr/// doesn't do double quote
interpolation, so you would have to use string eval to implement a count
subroutine with tr (which I did and which is has (unsurprisingly) quite
a high overhead. It's faster for long strings (over ~ 2000 characters),
though).

In all cases you have to be careful with the $char argument, because it
may have a special meaning. In the regexp variants using /\Q$char/
instead of /$char/ should work, though.

hp
 
G

George Mpouras

#!/usr/bin/perl
# This is about 10 times faster

use strict;
use warnings;
use Benchmark;

my $char = 'l';
my $string = 'Hello Worldll';
my $veryfast = fast_iterator($char, $string);


Benchmark::cmpthese(100_000, {
orig => sub {count($char, $string)},
fast => $veryfast
});


sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

sub fast_iterator {
my ($char, $string, $c) = (@_,0);
eval "sub { '$string'=~tr/$char/$char/ }"
}
 
G

gamo

El 14/06/14 15:06, George Mpouras escribió:
#!/usr/bin/perl
# This is about 10 times faster

use strict;
use warnings;
use Benchmark;

my $char = 'l';
my $string = 'Hello Worldll';
my $veryfast = fast_iterator($char, $string);


Benchmark::cmpthese(100_000, {
orig => sub {count($char, $string)},
fast => $veryfast
});


sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

sub fast_iterator {
my ($char, $string, $c) = (@_,0);
eval "sub { '$string'=~tr/$char/$char/ }"
}

Well, thanks. At first sight, doesn't look good.
At a second sight, it seems to run fast_iterator()
only once and store the result in a scalar $veryfast.
I don't check the results: sure they are impressive.
It happened to me too, a great result is caused
for not doing rather than doing.
 
R

Rainer Weikusat

gamo said:
sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

Depending on how long your strings are and how often you'll be calling
this function vs how often the program is going to be compiled, using a
set of "pre-compiled character-counting routines" might make sense:

----------------------
use Benchmark;

use constant LEN => 50;

my $string = join('', map { chr(rand(26) + 65) } 1 .. LEN);
my $char = chr(rand(26) + 65);


print STDERR ("$string, $char\n");

sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

sub count2 {
my ($char, $string) = @_;
return scalar( () = $string =~ /$char/g );
}

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
sub count3 {
&{$counters[ord(shift) - 65]};
}


print STDERR (count3($char, $string), "\n");


timethese(-4,
{
count => sub { count($char, $string) },
count2 => sub { count2($char, $string)},
count3 => sub { count3($char, $string)}
});
 
G

gamo

El 15/06/14 15:47, Rainer Weikusat escribió:
gamo said:
sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

Depending on how long your strings are and how often you'll be calling
this function vs how often the program is going to be compiled, using a
set of "pre-compiled character-counting routines" might make sense:

----------------------
use Benchmark;

use constant LEN => 50;

my $string = join('', map { chr(rand(26) + 65) } 1 .. LEN);
my $char = chr(rand(26) + 65);


print STDERR ("$string, $char\n");

sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

sub count2 {
my ($char, $string) = @_;
return scalar( () = $string =~ /$char/g );
}

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
sub count3 {
&{$counters[ord(shift) - 65]};
}


print STDERR (count3($char, $string), "\n");


timethese(-4,
{
count => sub { count($char, $string) },
count2 => sub { count2($char, $string)},
count3 => sub { count3($char, $string)}
});

Thank you. 'count3' is faster but do the counting before,
I think. It's confuse for me.

Is there any chance that a solution with substr could be
efficient?
 
P

Peter J. Holzer

El 15/06/14 15:47, Rainer Weikusat escribió:
gamo said:
sub count{
my ($char, $string) = @_;
my $c = 0;
++$c while ($string =~ /$char/g);
return $c;
}

Depending on how long your strings are and how often you'll be calling
this function vs how often the program is going to be compiled, using a
set of "pre-compiled character-counting routines" might make sense:

---------------------- [...]

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
sub count3 {
&{$counters[ord(shift) - 65]};
}

Thank you. 'count3' is faster but do the counting before,
I think. It's confuse for me.

No, it uses the same trick as Mpouras demonstrated before: It constructs
and compiles a custom subroutine counting the occurances of a specific
character using tr and then runs that the custom sub. The difference is
that Rainer's version creates subs for the characters 'A' to 'Z'
beforehand and then just calls the appropriate behind the scenes, while
with Mpouras' version you had to call the generator function
explicitely.

I think I would just create each custom functions on first call and use
a hash to cache them instead or precreateing them for a set of
characters, though. Just in case somebody wants to search for
"\x{1F4A9}" ...

Is there any chance that a solution with substr could be
efficient?

Why don't you try it? I doubt it, but then I was surprised that your
explicit loop was faster than using scalar to count the matches.

Using index() is about as fast as your original version.

hp
 
G

gamo

El 15/06/14 17:13, Peter J. Holzer escribió:
Why don't you try it? I doubt it, but then I was surprised that your
explicit loop was faster than using scalar to count the matches.

Using index() is about as fast as your original version.

Tried with substr, it's a lot slower.

Thanks
 
R

Rainer Weikusat

gamo said:
El 15/06/14 15:47, Rainer Weikusat escribió:[...]
my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
sub count3 {
&{$counters[ord(shift) - 65]};
}


print STDERR (count3($char, $string), "\n");


timethese(-4,
{
count => sub { count($char, $string) },
count2 => sub { count2($char, $string)},
count3 => sub { count3($char, $string)}
});

Thank you. 'count3' is faster but do the counting before,
I think. It's confuse for me.

Some explanations: For the purpose of this example, 'characters' are
restricted to the uppercase letters A - Z. Further, ASCII enconding is
assumed.

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';

This builds an array of 26 subroutines each counting occurrence of one of
the possible input characters in its first argument. In order to avoid
copying this argument (and because the subroutine code is really
simple), it accesses that via @_ as $_[0].

sub count3 {
&{$counters[ord(shift) - 65]};
}

This is a subroutine which expects the character supposed to be counted
as first argument and the string as second. It shifts the first argument
of its @_ and converts that into an ASCII codepoint via ord. Subtracting
65 (the ASCII code of A) from this numbers results in the array index
for the counting subroutine counting the correct character. The result
of the expression

$counters[ord(shift) - 65]

is a reference to this subroutine. It is then invoked via & without
arguments which means the invoked subroutine uses the @_ of the invoking
subroutine. Since the original first argument was shifted away, the
string to be searched is now the new $_[0] which was what the called
subroutine expects.

This is also a nice example where the argument passing mechanism used by
Perl really shines: All the outer subroutine has to know about the
passed @_ is 'the first argument is mine'. Whatever remains is simply
passed on to the next subroutine and considering that Perl is strictly
'the caller decides whatever it likes to pass' (when prototypes are not
being used), calling a subroutine with more than one argument indirectly
in this way can be accomplished by passing more arguments to the
directly invoked subroutine which will pass them on.
 
G

gamo

El 15/06/14 20:41, Rainer Weikusat escribió:
Thank you. 'count3' is faster but do the counting before,
I think. It's confuse for me.

Some explanations: For the purpose of this example, 'characters' are
restricted to the uppercase letters A - Z. Further, ASCII enconding is
assumed.

my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';

This builds an array of 26 subroutines each counting occurrence of one of
the possible input characters in its first argument. In order to avoid
copying this argument (and because the subroutine code is really
simple), it accesses that via @_ as $_[0].

sub count3 {
&{$counters[ord(shift) - 65]};
}

This is a subroutine which expects the character supposed to be counted
as first argument and the string as second. It shifts the first argument
of its @_ and converts that into an ASCII codepoint via ord. Subtracting
65 (the ASCII code of A) from this numbers results in the array index
for the counting subroutine counting the correct character. The result
of the expression

$counters[ord(shift) - 65]

is a reference to this subroutine. It is then invoked via & without
arguments which means the invoked subroutine uses the @_ of the invoking
subroutine. Since the original first argument was shifted away, the
string to be searched is now the new $_[0] which was what the called
subroutine expects.

This is also a nice example where the argument passing mechanism used by
Perl really shines: All the outer subroutine has to know about the
passed @_ is 'the first argument is mine'. Whatever remains is simply
passed on to the next subroutine and considering that Perl is strictly
'the caller decides whatever it likes to pass' (when prototypes are not
being used), calling a subroutine with more than one argument indirectly
in this way can be accomplished by passing more arguments to the
directly invoked subroutine which will pass them on.

Genial, thanks for the explanation.
 
G

G.B.

El 15/06/14 17:13, Peter J. Holzer escribió:

Tried with substr, it's a lot slower.

index(); maybe a little boring, but flexible, and a little
faster than your original,

sub count4 { # string, string -> int
my ($c, $p) = (0, 0);
while (($p = index($_[1], $_[0], $p)) > -1) {
++$c, ++$p;
}
return $c;
}
 
R

Rainer Weikusat

G.B. said:
El 15/06/14 17:13, Peter J. Holzer escribió:

Tried with substr, it's a lot slower.

index(); maybe a little boring, but flexible, and a little
faster than your original,

sub count4 { # string, string -> int
my ($c, $p) = (0, 0);
while (($p = index($_[1], $_[0], $p)) > -1) {
++$c, ++$p;
}
return $c;
}

Here's another mid-field method:

sub count5
{
my $s = $_[1];
return $s =~ s/$_[0]/x/g;
}
 
R

Rainer Weikusat

Rainer Weikusat said:

[count characters in a string]

sub count4 { # string, string -> int
my ($c, $p) = (0, 0);
while (($p = index($_[1], $_[0], $p)) > -1) {
++$c, ++$p;
}
return $c;
}

Here's another mid-field method:

sub count5
{
my $s = $_[1];
return $s =~ s/$_[0]/x/g;
}

Here's another which is bad for short strings but does (for me) amazingly
good for long ones:

sub count6
{
my $c = 0;
my $first;

for ($_[1]) {
/\G[^$_[0]]+/gc and $first = pos(), redo;
/\G$_[0]+/gc and $c += pos() - $first, redo;
}

return $c;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top