Anything to be done about utf8 regexp performance?

Discussion in 'Perl Misc' started by Jochen Lehmeier, Nov 3, 2009.

  1. Hello,

    > perl -V|head

    Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
    Platform:
    osname=linux, osvers=2.6.22-3-k7, archname=i486-linux-gnu-thread-multi
    uname='linux k 2.6.22-3-k7 #1 smp mon oct 22 22:51:54 utc 2007 i686
    gnulinux

    > cat test.pl

    #!/usr/local/bin/perl

    use strict;
    use warnings;

    my $a = "a".("x" x 1000);
    my $b = "\x{1234}".("x" x 1000);

    for (0..1000)
    {
    $a =~ s/r/xxx/;
    $a =~ s/r/xxx/i;
    $b =~ s/r/xxx/;
    $b =~ s/r/xxx/i;
    }

    > perl -d:SmallProf test.pl



    ^L ================ SmallProf version 2.02 ================
    Profile of test.pl
    Page 94
    =================================================================
    count wall tm cpu time line
    0 0.00000 0.00000 1:#!/usr/local/bin/perl
    0 0.00000 0.00000 2:
    0 0.00000 0.00000 3:use strict;
    0 0.00000 0.00000 4:use warnings;
    0 0.00000 0.00000 5:
    1 0.00005 0.00000 6:my $a = "a".("x" x 1000);
    1 0.00006 0.00000 7:my $b = "\x{1234}".("x" x 1000);
    0 0.00000 0.00000 8:
    1 0.00000 0.00000 9:for (0..1000)
    0 0.00000 0.00000 10:{
    1001 0.00596 0.07000 11: $a =~ s/r/xxx/;
    1001 0.01276 0.03000 12: $a =~ s/r/xxx/i;
    1001 0.04787 0.14000 13: $b =~ s/r/xxx/;
    1004 2.05547 2.10000 14: $b =~ s/r/xxx/i;
    0 0.00000 0.00000 15:}

    I can live with line 13, but line 14 is not funny anymore. 344 times
    slower than a latin1 regexp... or 161 times slower than a
    latin1-case-insentitive one.

    I understand that case calculations are much more complex in utf8 than
    latin1. Is there anything that can be done, anyway?
    Jochen Lehmeier, Nov 3, 2009
    #1
    1. Advertising

  2. On 2009-11-03, Jochen Lehmeier <> wrote:
    *SKIP*
    > #!/usr/local/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > my $a = "a".("x" x 1000);
    > my $b = "\x{1234}".("x" x 1000);
    >
    > for (0..1000)
    > {
    > $a =~ s/r/xxx/;
    > $a =~ s/r/xxx/i;
    > $b =~ s/r/xxx/;
    > $b =~ s/r/xxx/i;
    > }
    >

    *SKIP*
    > I can live with line 13, but line 14 is not funny anymore. 344 times
    > slower than a latin1 regexp... or 161 times slower than a
    > latin1-case-insentitive one.
    >
    > I understand that case calculations are much more complex in utf8 than
    > latin1. Is there anything that can be done, anyway?


    HTH (as you can see, that idea has it's limitations):

    #!/usr/bin/perl

    use strict;
    use warnings;
    use Benchmark qw{ cmpthese timethese };

    my $a = "a" . ("x" x 1000);
    my $b = "\x{1234}" . ("x" x 1000);

    cmpthese timethese -3, {
    code00 => sub { $a =~ s/r/xxx/i; },
    code01 => sub { $b =~ s/r/xxx/i; },
    code02 => sub { $b =~ s/[rR]/xxx/; },
    };

    __END__
    Benchmark: running code00, code01, code02 for at least 3 CPU seconds...
    code00: 2 wallclock secs ( 3.02 usr + 0.00 sys = 3.02 CPU) @ 316342.72/s (n=955355)
    code01: 4 wallclock secs ( 3.20 usr + 0.00 sys = 3.20 CPU) @ 4509.38/s (n=14430)
    code02: 2 wallclock secs ( 3.13 usr + 0.00 sys = 3.13 CPU) @ 57964.86/s (n=181430)
    Rate code01 code02 code00
    code01 4509/s -- -92% -99%
    code02 57965/s 1185% -- -82%
    code00 316343/s 6915% 446% --



    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
    Eric Pozharski, Nov 4, 2009
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Wes Groleau

    utf8 in regexp (perl 5.8.1)

    Wes Groleau, Apr 11, 2005, in forum: Perl
    Replies:
    1
    Views:
    2,956
    Wes Groleau
    Apr 12, 2005
  2. Replies:
    24
    Views:
    628
    shailendra
    Oct 19, 2005
  3. Joao Silva
    Replies:
    16
    Views:
    341
    7stud --
    Aug 21, 2009
  4. gry
    Replies:
    2
    Views:
    705
    Alf P. Steinbach
    Mar 13, 2012
  5. Zhidian Du
    Replies:
    0
    Views:
    140
    Zhidian Du
    Feb 21, 2004
Loading...

Share This Page