Counting the repetition count of repeated substrings

Discussion in 'Perl Misc' started by Michele Dondi, Nov 15, 2007.

  1. This is the subject of the following PM node:

    http://perlmonks.org/?node_id=650761

    I am the author of the node and to briefly summarize, the problem is:
    "to calculate the repetition count of repeated substrings of length 2
    or more of a given string."

    Currently, the most interesting code suggestions are within the
    respective subs of the following benchmark:

    (For definiteness the subs take the given string as an argument and
    return a hashref of the counts.)


    #!/usr/bin/perl

    use strict;
    use warnings;
    use constant MIN => 2;
    use Benchmark qw/:all :hireswallclock/;

    my $str='aabcdabcabcecdecd';

    sub oha {
    my $s=shift;
    my %saw;

    while($s =~ /(..+)(?=.*?\1)/g) {
    for my $x (0..length $1) {
    @saw{ map {substr $1, $x, $_} $x+2..length $1 } = ();
    }
    }
    $saw{$_} =()= $s =~ /\Q$_/g for keys %saw;
    \%saw;
    }

    sub kramba {
    my %count;
    my $count;
    $count = sub {
    my( $string) = @_;
    my $length = length( $string );

    if ($length < MIN) {
    $count{$_} == 1 and delete $count{$_} for keys %count;
    return \%count;
    }

    for my $l (MIN..$length) {
    my $s = substr( $string, 0, $l );
    $count{ $s } += 1;
    }
    $count->( substr( $string, 1 ) );
    };
    $count->(shift);
    for (keys %count) {
    delete $count{$_} unless
    $count{$_} >= 2 and length($_) >= MIN;
    }
    \%count;
    }

    sub ikegami {
    my $str = shift;

    local our %counts;
    $str =~ /
    (.{2,}) # or (.+)
    (?(?{ !$counts{$1} })(?=.*\1))
    (?{ ++$counts{$1} })
    (?!)
    /x;
    \%counts;
    }

    sub lodin {
    my $str = shift;

    local our %count;
    $str =~ /
    (.{2,})
    (?(?{ $count{$1} })
    (?!)
    )
    .*
    \1
    (?{ ($count{$1} ||= 1)++ })
    (?!)
    /x;
    \%count;
    }

    cmpthese 10_000 => {
    oha => sub { oha $str },
    kramba => sub { kramba $str },
    ikegami => sub { ikegami $str },
    lodin => sub { lodin $str },
    };

    __END__


    Note: benchmarking is interesting but I'm also interested in other
    charachteristics, e.g. readability, intuitiveness and so on.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Nov 15, 2007
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Dodier
    Replies:
    4
    Views:
    341
    Raymond Hettinger
    Mar 14, 2006
  2. arnuld
    Replies:
    10
    Views:
    1,893
    =?ISO-8859-1?Q?Erik_Wikstr=F6m?=
    Aug 3, 2007
  3. Andrew Tomazos

    Counting Palindrome Substrings ?

    Andrew Tomazos, Feb 5, 2011, in forum: C++
    Replies:
    1
    Views:
    1,080
    Ben Pfaff
    Feb 5, 2011
  4. Ray
    Replies:
    8
    Views:
    156
    Bart Lateur
    Mar 3, 2004
  5. dmonx
    Replies:
    0
    Views:
    366
    dmonx
    Sep 26, 2012
Loading...

Share This Page