Can be this be optimized?

Discussion in 'Perl Misc' started by gamo, Jun 14, 2014.

  1. gamo

    gamo Guest

    sub count{
    my ($char, $string) = @_;
    my $c = 0;
    ++$c while ($string =~ /$char/g);
    return $c;
    }

    TIA
     
    gamo, Jun 14, 2014
    #1
    1. Advertisements

  2. gamo

    gamo Guest

    El 14/06/14 11:09, Wasell escribió:
    Thank you. It works and must be a lot faster.
     
    gamo, Jun 14, 2014
    #2
    1. Advertisements

  3. Surprisingly, gamo's version is a bit faster on my systems.

    hp
     
    Peter J. Holzer, Jun 14, 2014
    #3
  4. This works only for constant characters. tr/// doesn't do double quote
    interpolation, so you would have to use string eval to implement a count
    subroutine with tr (which I did and which is has (unsurprisingly) quite
    a high overhead. It's faster for long strings (over ~ 2000 characters),
    though).

    In all cases you have to be careful with the $char argument, because it
    may have a special meaning. In the regexp variants using /\Q$char/
    instead of /$char/ should work, though.

    hp
     
    Peter J. Holzer, Jun 14, 2014
    #4
  5. #!/usr/bin/perl
    # This is about 10 times faster

    use strict;
    use warnings;
    use Benchmark;

    my $char = 'l';
    my $string = 'Hello Worldll';
    my $veryfast = fast_iterator($char, $string);


    Benchmark::cmpthese(100_000, {
    orig => sub {count($char, $string)},
    fast => $veryfast
    });


    sub count{
    my ($char, $string) = @_;
    my $c = 0;
    ++$c while ($string =~ /$char/g);
    return $c;
    }

    sub fast_iterator {
    my ($char, $string, $c) = (@_,0);
    eval "sub { '$string'=~tr/$char/$char/ }"
    }
     
    George Mpouras, Jun 14, 2014
    #5
  6. gamo

    gamo Guest

    El 14/06/14 15:06, George Mpouras escribió:
    Well, thanks. At first sight, doesn't look good.
    At a second sight, it seems to run fast_iterator()
    only once and store the result in a scalar $veryfast.
    I don't check the results: sure they are impressive.
    It happened to me too, a great result is caused
    for not doing rather than doing.
     
    gamo, Jun 14, 2014
    #6
  7. Depending on how long your strings are and how often you'll be calling
    this function vs how often the program is going to be compiled, using a
    set of "pre-compiled character-counting routines" might make sense:

    ----------------------
    use Benchmark;

    use constant LEN => 50;

    my $string = join('', map { chr(rand(26) + 65) } 1 .. LEN);
    my $char = chr(rand(26) + 65);


    print STDERR ("$string, $char\n");

    sub count{
    my ($char, $string) = @_;
    my $c = 0;
    ++$c while ($string =~ /$char/g);
    return $c;
    }

    sub count2 {
    my ($char, $string) = @_;
    return scalar( () = $string =~ /$char/g );
    }

    my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';
    sub count3 {
    &{$counters[ord(shift) - 65]};
    }


    print STDERR (count3($char, $string), "\n");


    timethese(-4,
    {
    count => sub { count($char, $string) },
    count2 => sub { count2($char, $string)},
    count3 => sub { count3($char, $string)}
    });
     
    Rainer Weikusat, Jun 15, 2014
    #7
  8. gamo

    gamo Guest

    El 15/06/14 15:47, Rainer Weikusat escribió:
    Thank you. 'count3' is faster but do the counting before,
    I think. It's confuse for me.

    Is there any chance that a solution with substr could be
    efficient?
     
    gamo, Jun 15, 2014
    #8
  9. No, it uses the same trick as Mpouras demonstrated before: It constructs
    and compiles a custom subroutine counting the occurances of a specific
    character using tr and then runs that the custom sub. The difference is
    that Rainer's version creates subs for the characters 'A' to 'Z'
    beforehand and then just calls the appropriate behind the scenes, while
    with Mpouras' version you had to call the generator function
    explicitely.

    I think I would just create each custom functions on first call and use
    a hash to cache them instead or precreateing them for a set of
    characters, though. Just in case somebody wants to search for
    "\x{1F4A9}" ...

    Why don't you try it? I doubt it, but then I was surprised that your
    explicit loop was faster than using scalar to count the matches.

    Using index() is about as fast as your original version.

    hp
     
    Peter J. Holzer, Jun 15, 2014
    #9
  10. gamo

    gamo Guest

    El 15/06/14 17:13, Peter J. Holzer escribió:
    Tried with substr, it's a lot slower.

    Thanks
     
    gamo, Jun 15, 2014
    #10
  11. Some explanations: For the purpose of this example, 'characters' are
    restricted to the uppercase letters A - Z. Further, ASCII enconding is
    assumed.

    my @counters = map { eval("sub { \$_[0] =~ tr/$_// }") } 'A' .. 'Z';

    This builds an array of 26 subroutines each counting occurrence of one of
    the possible input characters in its first argument. In order to avoid
    copying this argument (and because the subroutine code is really
    simple), it accesses that via @_ as $_[0].

    sub count3 {
    &{$counters[ord(shift) - 65]};
    }

    This is a subroutine which expects the character supposed to be counted
    as first argument and the string as second. It shifts the first argument
    of its @_ and converts that into an ASCII codepoint via ord. Subtracting
    65 (the ASCII code of A) from this numbers results in the array index
    for the counting subroutine counting the correct character. The result
    of the expression

    $counters[ord(shift) - 65]

    is a reference to this subroutine. It is then invoked via & without
    arguments which means the invoked subroutine uses the @_ of the invoking
    subroutine. Since the original first argument was shifted away, the
    string to be searched is now the new $_[0] which was what the called
    subroutine expects.

    This is also a nice example where the argument passing mechanism used by
    Perl really shines: All the outer subroutine has to know about the
    passed @_ is 'the first argument is mine'. Whatever remains is simply
    passed on to the next subroutine and considering that Perl is strictly
    'the caller decides whatever it likes to pass' (when prototypes are not
    being used), calling a subroutine with more than one argument indirectly
    in this way can be accomplished by passing more arguments to the
    directly invoked subroutine which will pass them on.
     
    Rainer Weikusat, Jun 15, 2014
    #11
  12. gamo

    gamo Guest

    El 15/06/14 20:41, Rainer Weikusat escribió:
    Genial, thanks for the explanation.
     
    gamo, Jun 15, 2014
    #12
  13. gamo

    G.B. Guest

    index(); maybe a little boring, but flexible, and a little
    faster than your original,

    sub count4 { # string, string -> int
    my ($c, $p) = (0, 0);
    while (($p = index($_[1], $_[0], $p)) > -1) {
    ++$c, ++$p;
    }
    return $c;
    }
     
    G.B., Jun 16, 2014
    #13
  14. Here's another mid-field method:

    sub count5
    {
    my $s = $_[1];
    return $s =~ s/$_[0]/x/g;
    }
     
    Rainer Weikusat, Jun 16, 2014
    #14
  15. [count characters in a string]

    Here's another which is bad for short strings but does (for me) amazingly
    good for long ones:

    sub count6
    {
    my $c = 0;
    my $first;

    for ($_[1]) {
    /\G[^$_[0]]+/gc and $first = pos(), redo;
    /\G$_[0]+/gc and $c += pos() - $first, redo;
    }

    return $c;
    }
     
    Rainer Weikusat, Jun 16, 2014
    #15
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.