Help on String to array !

Discussion in 'Perl Misc' started by jis, Mar 9, 2010.

  1. jis

    jis Guest

    Guys,

    I have a string $hex which has lets assume "0012345689abcd"

    How can I split them into to an array so that
    arr[0]=00 ,arr[1] =12..etc

    it works with split command like this to some extent
    foreach (split(//, $hex){
    $arr[$i]=$_;
    $i++;
    }

    Unfortunately when i read big files of 4MB size it takes
    like 10mins before it completes execution. No good.
    (i couldnt split it like 00,12 but only like 0,0,1,2)

    Then I thought unpack wud be a better idea.
    @arr = unpack("H2",$data); or
    @arr = unpack("H2*",$data);

    But only first element got transferred. ie 00.
    $arr[0]=00 and arr[1] undefined.

    Any one can help me on this?

    thanks,
    jis
    jis, Mar 9, 2010
    #1
    1. Advertising

  2. Don Piven wrote:
    > jis wrote:
    >> Guys,
    >>
    >> I have a string $hex which has lets assume "0012345689abcd"
    >>
    >> How can I split them into to an array so that
    >> arr[0]=00 ,arr[1] =12..etc

    >
    > while ( $hex =~ /[[:xdigit:]]{2}/g ) { push @arr, $1 }


    No need for a loop:

    my @arr = $hex =~ /[[:xdigit:]]{2}/g;

    Also, you don't use capturing parentheses in your regular expression so
    $1 will always be empty.


    > The "g" flag on the regex tells Perl to do its search from where the
    > previous search left off, so this will just walk through your string two
    > characters at a time and relieve you from having to keep track of where
    > you are in the string and in your array.
    >
    > The "o" flag may also be useful; check "Regexp Quote-Like Operators" in
    > perlop for more info.


    The /o option would not be useful in this case as there are no variables
    in the regular expression to interpolate and in any case modern versions
    of perl would not re-interpolate a variable that doesn't change.

    perldoc -q /o




    John
    --
    The programmer is fighting against the two most
    destructive forces in the universe: entropy and
    human stupidity. -- Damian Conway
    John W. Krahn, Mar 9, 2010
    #2
    1. Advertising

  3. jis wrote:
    > Guys,
    >
    > I have a string $hex which has lets assume "0012345689abcd"
    >
    > How can I split them into to an array so that
    > arr[0]=00 ,arr[1] =12..etc


    my @arr = unpack '(a2)*', $hex;



    John
    --
    The programmer is fighting against the two most
    destructive forces in the universe: entropy and
    human stupidity. -- Damian Conway
    John W. Krahn, Mar 9, 2010
    #3
  4. jis

    Guest

    On Tue, 9 Mar 2010 03:34:48 -0800 (PST), jis <> wrote:

    >Guys,
    >
    >I have a string $hex which has lets assume "0012345689abcd"


    >[snip]


    >Unfortunately when i read big files of 4MB size it takes
    >like 10mins before it completes execution. No good.
    >(i couldnt split it like 00,12 but only like 0,0,1,2)
    >
    >Then I thought unpack wud be a better idea.
    > @arr = unpack("H2",$data); or
    >@arr = unpack("H2*",$data);
    >

    Perl distributions for win32 have a problem with
    native realloc(). On these, the larger the dynamic list
    generated by the function, the longer it takes.
    Linux doesen't have this problem.

    In general, if you expect to be splitting up very
    large data segments, its better to control the list
    external to the function, where push() is better.

    Of the 3 types of basic methods: substr/unpack/regexp,
    the one thats the fastest seems to be substr().
    Additionally, on win32 platforms, any method using a
    push is far better.

    My platform is Windows in generating the below data.
    If you have Linux, your results will be different.
    Post your numbers if you can.

    -sln

    Output:
    --------------------
    Size of bigstring = 560

    Substr/push took: 0.00030303 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
    Unpack/list took: 0.000344038 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
    Unpack/push took: 0.000586033 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
    Regexp/list took: 0.000608206 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
    Regexp/push took: 0.000404835 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)

    --------------------
    Size of bigstring = 5600

    Substr/push took: 0.002841 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
    Unpack/list took: 0.00334311 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
    Unpack/push took: 0.00657105 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
    Regexp/list took: 0.00673795 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
    Regexp/push took: 0.004076 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)

    --------------------
    Size of bigstring = 56000

    Substr/push took: 0.0301139 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)
    Unpack/list took: 0.0458951 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU)
    Unpack/push took: 0.0644789 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
    Regexp/list took: 0.07149 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
    Regexp/push took: 0.03965 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)

    --------------------
    Size of bigstring = 560000

    Substr/push took: 0.309315 wallclock secs ( 0.30 usr + 0.02 sys = 0.31 CPU)
    Unpack/list took: 0.723145 wallclock secs ( 0.61 usr + 0.11 sys = 0.72 CPU)
    Unpack/push took: 0.640141 wallclock secs ( 0.64 usr + 0.00 sys = 0.64 CPU)
    Regexp/list took: 0.927701 wallclock secs ( 0.92 usr + 0.00 sys = 0.92 CPU)
    Regexp/push took: 0.516143 wallclock secs ( 0.52 usr + 0.00 sys = 0.52 CPU)

    --------------------
    Size of bigstring = 5600000

    Substr/push took: 3.79988 wallclock secs ( 3.75 usr + 0.06 sys = 3.81 CPU)
    Unpack/list took: 40.0264 wallclock secs (34.97 usr + 5.06 sys = 40.03 CPU)
    Unpack/push took: 6.71793 wallclock secs ( 6.70 usr + 0.01 sys = 6.72 CPU)
    Regexp/list took: 34.6208 wallclock secs (34.56 usr + 0.06 sys = 34.63 CPU)
    Regexp/push took: 7.93654 wallclock secs ( 7.89 usr + 0.05 sys = 7.94 CPU)

    =======
    for my $multiplier (40, 400, 4_000, 40_000, 400_000)
    {
    my $bigstring = '0012345689abcd' x $multiplier;
    print "\n",'-'x20,"\nSize of bigstring = ",length($bigstring),"\n\n";

    ##
    {
    my ($val, $offs, @pairs) = ('',0);
    my $t0 = new Benchmark;
    while ($val=substr( $bigstring, $offs, 2))
    {
    push @pairs, $val;
    $offs+=2;
    }
    my $t1 = new Benchmark;
    print "Substr/push took: ",timestr(timediff($t1, $t0)),"\n";
    }
    ##
    {
    my $t0 = new Benchmark;
    my @pairs = unpack '(a2)*', $bigstring;
    my $t1 = new Benchmark;
    print "Unpack/list took: ",timestr(timediff($t1, $t0)),"\n";
    }
    ##
    {
    my ($val, $offs, @pairs) = ('',0);
    my $t0 = new Benchmark;
    while ($val=unpack("x$offs a2", $bigstring) )
    {
    push @pairs, $val;
    $offs+=2;
    }
    my $t1 = new Benchmark;
    print "Unpack/push took: ",timestr(timediff($t1, $t0)),"\n";
    }
    ##
    {
    my $t0 = new Benchmark;
    my @pairs = $bigstring =~ /[0-9a-f]{2}/g;
    my $t1 = new Benchmark;
    print "Regexp/list took: ",timestr(timediff($t1, $t0)),"\n";
    }
    ##
    {
    my @pairs;
    my $t0 = new Benchmark;
    while ( $bigstring =~ /([0-9a-f]{2})/g ) {
    push @pairs, $1;
    }
    my $t1 = new Benchmark;
    print "Regexp/push took: ",timestr(timediff($t1, $t0)),"\n";
    }
    }

    __END__
    , Mar 9, 2010
    #4
  5. jis

    Guest

    On Tue, 09 Mar 2010 09:57:23 -0800, wrote:
    >=======

    use strict;
    use warnings;
    use Benchmark ':hireswallclock';

    >for my $multiplier (40, 400, 4_000, 40_000, 400_000)
    , Mar 9, 2010
    #5
  6. jis

    jis Guest

    On Mar 9, 10:59 pm, wrote:
    > On Tue, 09 Mar 2010 09:57:23 -0800, wrote:
    > >=======

    >
    > use strict;
    > use warnings;
    > use Benchmark ':hireswallclock';
    >
    >
    >
    > >for my $multiplier (40, 400, 4_000, 40_000, 400_000)- Hide quoted text -

    >
    > - Show quoted text -


    Thanks for the replies.

    As said regex and unpack took longer time than substr.
    I use Windows. The following are the time taken.

    1. Regex : @arr = $hex =~ /[[:xdigit:]]{2}/g; - To read 4Mb file
    into an array it took 1min 7 seconds.
    2. Unpack : @arr = unpack("(C2)*",$hex); - To read 4Mb file into
    an array it took 3min 26seconds.
    3. Substr: while ($val=substr( $hex, $offs, 2))
    {
    push @arr, $val;
    $offs+=2;
    } - To read 4Mb file into an array it took 11 seconds.


    thanks,
    jis
    jis, Mar 10, 2010
    #6
  7. jis

    Uri Guttman Guest

    >>>>> "j" == jis <> writes:

    j> As said regex and unpack took longer time than substr.
    j> I use Windows. The following are the time taken.

    j> 1. Regex : @arr = $hex =~ /[[:xdigit:]]{2}/g; - To read 4Mb file
    j> into an array it took 1min 7 seconds.
    j> 2. Unpack : @arr = unpack("(C2)*",$hex); - To read 4Mb file into
    j> an array it took 3min 26seconds.
    j> 3. Substr: while ($val=substr( $hex, $offs, 2))
    j> {
    j> push @arr, $val;
    j> $offs+=2;
    j> } - To read 4Mb file into an array it took 11 seconds.


    i am sorry, i can't believe it took on the order of minutes to read in a
    file and convert from hex to binary. this is not possible on anything
    but an abacus. given you haven't shown the complete script for each
    version i have to assume your code is broken in some way. also there is
    no way a substr loop would be faster than unpack or a regex. both of
    those would spend all their time in perl's guts while the substr version
    spends most of its time doing slow perl ops in a loop. i say this from
    plenty of experience benchmarking perl code. you can easily write an
    incorrect test of this so i must ask you to post complete working
    programs that exhibit the slowness you claim. i will wager large amounts
    of quatloos i can fix them so the substr will be outed as the slowest
    one.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Mar 10, 2010
    #7
  8. jis

    jis Guest

    Even I want to beleive it should take very less time.
    I post the scripts I used for testing.

    1. #!/usr/bin/perl
    use strict;
    use warnings;
    my $binary_file="28247101.bin";
    open FILE, $binary_file or die "Can't open $binary_file $!\n";
    # binmode FILE to supress conversion of line endings
    binmode FILE;
    undef $/;
    my $data = <FILE>;
    close FILE;
    # convert data to hex form
    my $hex = unpack 'H*', $data;
    my ($val, $offs, @arr) = ('',0);
    #@arr = $hex =~ /[[:xdigit:]]{2}/g;
    @arr = unpack("(C2)*",$hex);
    print "bye";
    print $arr[2]; ( this took 3minuts 25 sec)

    if i uncommment regex protion and comment unpack it would take
    1minute 25 sec

    #!/usr/bin/perl
    use strict;
    use warnings;
    my $binary_file="28247101.bin";
    open FILE, $binary_file or die "Can't open $binary_file $!\n";
    # binmode FILE to supress conversion of line endings
    binmode FILE;
    undef $/;
    my $data = <FILE>;
    close FILE;
    # convert data to hex form
    my $hex = unpack 'H*', $data;
    my $i=0;

    my ($val, $offs, @arr) = ('',0);
    while ($val=substr( $hex, $offs, 2)){
    push @arr, $val;
    $offs+=2;
    }
    print "bye";
    print $arr[2]; This would take only 9 seconds.

    I have used a stopwatch to calculate time.

    Appreciate your help in finding how it can be improved.

    thanks,
    jis









    On Mar 10, 12:51 pm, "Uri Guttman" <> wrote:
    > >>>>> "j" == jis  <> writes:

    >
    >   j> As said regex and unpack took longer time than substr.
    >   j> I use Windows. The following are the time taken.
    >
    >   j> 1. Regex : @arr = $hex =~ /[[:xdigit:]]{2}/g;  - To read  4Mb file
    >   j> into an array it took  1min 7 seconds.
    >   j> 2. Unpack : @arr = unpack("(C2)*",$hex);    - To read  4Mbfile into
    >   j> an array it took  3min 26seconds.
    >   j> 3. Substr: while ($val=substr( $hex, $offs, 2))
    >   j>     {
    >   j>         push @arr, $val;
    >   j>         $offs+=2;
    >   j>     } -  To read  4Mb file into an array it took  11 seconds.
    >
    > i am sorry, i can't believe it took on the order of minutes to read in a
    > file and convert from hex to binary. this is not possible on anything
    > but an abacus. given you haven't shown the complete script for each
    > version i have to assume your code is broken in some way. also there is
    > no way a substr loop would be faster than unpack or a regex. both of
    > those would spend all their time in perl's guts while the substr version
    > spends most of its time doing slow perl ops in a loop. i say this from
    > plenty of experience benchmarking perl code. you can easily write an
    > incorrect test of this so i must ask you to post complete working
    > programs that exhibit the slowness you claim. i will wager large amounts
    > of quatloos i can fix them so the substr will be outed as the slowest
    > one.
    >
    > uri
    >
    > --
    > Uri Guttman  ------    --------  http://www.sysarch.com--
    > -----  Perl Code Review , Architecture, Development, Training, Support ------
    > ---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com---------
    jis, Mar 11, 2010
    #8
  9. jis

    Uri Guttman Guest

    >>>>> "j" == jis <> writes:

    j> Even I want to beleive it should take very less time.
    j> I post the scripts I used for testing.

    j> 1. #!/usr/bin/perl

    j> # convert data to hex form
    j> my $hex = unpack 'H*', $data;
    j> my ($val, $offs, @arr) = ('',0);
    j> #@arr = $hex =~ /[[:xdigit:]]{2}/g;
    j> @arr = unpack("(C2)*",$hex);

    j> my $data = <FILE>;
    j> close FILE;
    j> # convert data to hex form
    j> my $hex = unpack 'H*', $data;
    j> my $i=0;

    j> my ($val, $offs, @arr) = ('',0);
    j> while ($val=substr( $hex, $offs, 2)){
    j> push @arr, $val;
    j> $offs+=2;
    j> }
    j> print "bye";
    j> print $arr[2]; This would take only 9 seconds.

    j> I have used a stopwatch to calculate time.

    a stopwatch? you need to learn how to use the Benchmark.pm module.

    j> Appreciate your help in finding how it can be improved.

    easy. let me do a proper benchmark.

    and you should learn how to properly bottom post and not leave my entire
    post in the message.

    uri


    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Mar 11, 2010
    #9
  10. jis

    Uri Guttman Guest

    >>>>> "j" == jis <> writes:

    j> if i uncommment regex protion and comment unpack it would take
    j> 1minute 25 sec

    j> print "bye";
    j> print $arr[2]; This would take only 9 seconds.

    j> I have used a stopwatch to calculate time.

    as i said, that is a silly way to time programs. and there is no way it
    would take minutes to do this unless you are on a severely slow cpu or
    you are low on ram and are disk thrashing. here is my benchmarked
    version which shows that unpacking (fixed to use A and not C) is the
    fastest and regex (also fixed to do the simplest but correct thing which
    is grab 2 chars) ties your code.

    uncomment out those commented lines to see that this does the same and
    correct thing in all cases.

    here is the timing result run for 10 seconds each:

    s/iter regex substring unpacking
    regex 2.11 -- -0% -25%
    substring 2.11 0% -- -25%
    unpacking 1.58 33% 33% --

    uri


    use strict;
    use warnings;

    use File::Slurp ;
    use Benchmark qw:)all) ;

    my $duration = shift || -2 ;

    my $file_name = '/boot/vmlinuz-2.6.28-15-generic' ;

    my $data = read_file( $file_name, binary => 1 ) ;

    #$data = "\x00\x10" ;

    my $hex = unpack 'H*', $data;

    # unpacking() ;
    # regex() ;
    # substring() ;
    # exit ;

    cmpthese( $duration, {

    unpacking => \&unpacking,
    regex => \&regex,
    substring => \&substring,
    } ) ;

    sub unpacking {
    my @arr = unpack( '(A2)*' , $hex) ;
    # print "@arr\n"
    }

    sub regex {
    my @arr = $hex =~ /(..{2})/g ;
    # print "@arr\n"
    }

    sub substring {

    my ($val, $offs, @arr) = ('',0);
    while ($val=substr( $hex, $offs, 2)){
    push @arr, $val;
    $offs+=2;
    }

    # print "@arr\n"
    }


    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Mar 11, 2010
    #10
  11. jis

    jis Guest

    On Mar 11, 11:15 am, "Uri Guttman" <> wrote:
    > >>>>> "j" == jis  <> writes:

    >
    >   j> if i uncommment  regex protion and comment unpack it would take
    >   j> 1minute 25 sec
    >
    >   j> print "bye";
    >   j> print $arr[2];    This would take only 9 seconds.
    >
    >   j> I have used a stopwatch to calculate time.
    >
    > as i said, that is a silly way to time programs. and there is no way it
    > would take minutes to do this unless you are on a severely slow cpu or
    > you are low on ram and are disk thrashing. here is my benchmarked
    > version which shows that unpacking (fixed to use A and not C) is the
    > fastest and regex (also fixed to do the simplest but correct thing which
    > is grab 2 chars) ties your code.
    >
    > uncomment out those commented lines to see that this does the same and
    > correct thing in all cases.
    >
    > here is the timing result run for 10 seconds each:
    >
    >           s/iter     regex substring unpacking
    > regex       2.11        --       -0%      -25%
    > substring   2.11        0%        --      -25%
    > unpacking   1.58       33%       33%        --
    >
    > uri
    >
    > use strict;
    > use warnings;
    >
    > use File::Slurp ;
    > use Benchmark qw:)all) ;
    >
    > my $duration = shift || -2 ;
    >
    > my $file_name = '/boot/vmlinuz-2.6.28-15-generic' ;
    >
    > my $data = read_file( $file_name, binary => 1 ) ;
    >
    > #$data = "\x00\x10" ;
    >
    > my $hex = unpack 'H*', $data;
    >
    > # unpacking() ;
    > # regex() ;
    > # substring() ;
    > # exit ;
    >
    > cmpthese( $duration, {
    >
    >         unpacking       => \&unpacking,
    >         regex           => \&regex,
    >         substring       => \&substring,
    >
    > } ) ;
    >
    > sub unpacking {
    >         my @arr = unpack( '(A2)*' , $hex) ;
    > #       print "@arr\n"
    >
    > }
    >
    > sub regex {
    >         my @arr = $hex =~ /(..{2})/g ;
    > #       print "@arr\n"
    >
    > }
    >
    > sub substring {
    >
    >         my ($val, $offs, @arr) = ('',0);
    >         while ($val=substr( $hex, $offs, 2)){
    >                 push @arr, $val;
    >                 $offs+=2;
    >         }
    >
    > #       print "@arr\n"
    >
    > }
    >
    > --
    > Uri Guttman  ------    --------  http://www.sysarch.com--
    > -----  Perl Code Review , Architecture, Development, Training, Support ------
    > ---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com---------


    Uri,

    I have used the script you have posted with only change in input file
    i get the following results.
    (warning: too few iterations for a reliable count)
    (warning: too few iterations for a reliable count)
    (warning: too few iterations for a reliable count)
    s/iter unpacking regex substring
    unpacking 9.06 -- -27% -34%
    regex 6.59 37% -- -9%
    substring 6.01 51% 10% --

    Unpacking still remains the longest to finish.

    I use Windows XP professional with a 2Gb RAM. I also have got a 45GB
    free space in my C drive.

    DO you see something else different?

    thanks,
    jis
    jis, Mar 11, 2010
    #11
  12. Uri Guttman wrote:
    >
    > sub regex {
    > my @arr = $hex =~ /(..{2})/g ;
    > # print "@arr\n"
    > }


    Shouldn't that be:

    my @arr = $hex =~ /../g ;

    Or:

    my @arr = $hex =~ /.{2}/g ;

    You are capturing *three* characters instead of two.



    John
    --
    The programmer is fighting against the two most
    destructive forces in the universe: entropy and
    human stupidity. -- Damian Conway
    John W. Krahn, Mar 11, 2010
    #12
  13. jis

    Guest

    On Thu, 11 Mar 2010 04:43:45 -0800 (PST), jis <> wrote:

    >On Mar 11, 11:15 am, "Uri Guttman" <> wrote:
    >> >>>>> "j" == jis  <> writes:

    >>

    >Uri,
    >
    >I have used the script you have posted with only change in input file
    >i get the following results.
    > (warning: too few iterations for a reliable count)
    > (warning: too few iterations for a reliable count)
    > (warning: too few iterations for a reliable count)
    > s/iter unpacking regex substring
    >unpacking 9.06 -- -27% -34%
    >regex 6.59 37% -- -9%
    >substring 6.01 51% 10% --
    >
    >Unpacking still remains the longest to finish.
    >
    >I use Windows XP professional with a 2Gb RAM. I also have got a 45GB
    >free space in my C drive.
    >
    >DO you see something else different?
    >
    >thanks,
    >jis


    You have Windows!
    Try this test below. It uses timethis() for $count itterations.
    You don't want a partial itteration result given a small time interval.

    After you run the code as written, run it by plugging in your file
    information and change the $count to 3 itterations.
    Go for a cofee break. Post back.

    My results:

    Unpacking: 12.7929 wallclock secs ( 9.94 usr + 2.84 sys = 12.78 CPU) @ 0.08/s (n=1)
    Regex: 29.6103 wallclock secs (29.53 usr + 0.08 sys = 29.61 CPU) @ 0.03/s (n=1)
    Substring: 2.85185 wallclock secs ( 2.81 usr + 0.03 sys = 2.84 CPU) @ 0.35/s (n=1)

    -sln

    -----------------
    use strict;
    use warnings;

    use Benchmark qw:)all :hireswallclock) ;

    #---- Uncomment, plug in filename ---------
    # use File::Slurp ;
    # my $file_name = '/boot/vmlinuz-2.6.28-15-generic' ;
    # my $data = read_file( $file_name, binary => 1 ) ;
    # #$data = "\x00\x10" ;
    # my $hex = unpack 'H*', $data;
    #------------------------------------------

    my $count = 1; # increase count to 3 after first testing 1

    #---- Comment out $hex -------------------
    my $hex = 'a0b0c1d2e3f411aabbcc' x 200_000; # about 4MB's
    #-----------------------------------------

    timethis ($count, \&unpacking, "Unpacking");
    timethis ($count, \&regex, "Regex");
    timethis ($count, \&substring, "Substring");

    sub unpacking {
    my @arr = unpack( '(A2)*' , $hex) ;
    # print "@arr\n"
    }

    sub regex {
    my @arr = $hex =~ /.{2}/g ; # regex modified
    # print "@arr\n"
    }

    sub substring {
    my ($val, $offs, @arr) = ('',0);
    while ($val=substr( $hex, $offs, 2)) {
    push @arr, $val;
    $offs+=2;
    }
    # print "@arr\n"
    }
    __END__
    , Mar 11, 2010
    #13
  14. jis

    Uri Guttman Guest

    >>>>> "j" == jis <> writes:

    j> On Mar 11, 11:15 am, "Uri Guttman" <> wrote:
    >> >>>>> "j" == jis  <> writes:

    >>
    >>   j> if i uncommment  regex protion and comment unpack it would take
    >>   j> 1minute 25 sec
    >>
    >>   j> print "bye";
    >>   j> print $arr[2];    This would take only 9 seconds.
    >>
    >>   j> I have used a stopwatch to calculate time.
    >>
    >> as i said, that is a silly way to time programs. and there is no way it
    >> would take minutes to do this unless you are on a severely slow cpu or
    >> you are low on ram and are disk thrashing. here is my benchmarked
    >> version which shows that unpacking (fixed to use A and not C) is the
    >> fastest and regex (also fixed to do the simplest but correct thing which
    >> is grab 2 chars) ties your code.
    >>
    >> uncomment out those commented lines to see that this does the same and
    >> correct thing in all cases.
    >>
    >> here is the timing result run for 10 seconds each:
    >>
    >>           s/iter     regex substring unpacking
    >> regex       2.11        --       -0%      -25%
    >> substring   2.11        0%        --      -25%
    >> unpacking   1.58       33%       33%        --
    >>
    >> uri
    >>
    >> use strict;
    >> use warnings;
    >>
    >> use File::Slurp ;
    >> use Benchmark qw:)all) ;
    >>
    >> my $duration = shift || -2 ;
    >>
    >> my $file_name = '/boot/vmlinuz-2.6.28-15-generic' ;
    >>
    >> my $data = read_file( $file_name, binary => 1 ) ;
    >>
    >> #$data = "\x00\x10" ;
    >>
    >> my $hex = unpack 'H*', $data;
    >>
    >> # unpacking() ;
    >> # regex() ;
    >> # substring() ;
    >> # exit ;
    >>
    >> cmpthese( $duration, {
    >>
    >>         unpacking       => \&unpacking,
    >>         regex           => \&regex,
    >>         substring       => \&substring,
    >>
    >> } ) ;
    >>
    >> sub unpacking {
    >>         my @arr = unpack( '(A2)*' , $hex) ;
    >> #       print "@arr\n"
    >>
    >> }
    >>
    >> sub regex {
    >>         my @arr = $hex =~ /(..{2})/g ;
    >> #       print "@arr\n"
    >>
    >> }
    >>
    >> sub substring {
    >>
    >>         my ($val, $offs, @arr) = ('',0);
    >>         while ($val=substr( $hex, $offs, 2)){
    >>                 push @arr, $val;
    >>                 $offs+=2;
    >>         }
    >>
    >> #       print "@arr\n"
    >>
    >> }
    >>
    >> --
    >> Uri Guttman  ------    --------  http://www.sysarch.com--
    >> -----  Perl Code Review , Architecture, Development, Training, Support ------
    >> ---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com---------


    j> Uri,

    j> I have used the script you have posted with only change in input file
    j> i get the following results.
    j> (warning: too few iterations for a reliable count)
    j> (warning: too few iterations for a reliable count)
    j> (warning: too few iterations for a reliable count)
    j> s/iter unpacking regex substring
    j> unpacking 9.06 -- -27% -34%
    j> regex 6.59 37% -- -9%
    j> substring 6.01 51% 10% --

    j> Unpacking still remains the longest to finish.

    j> I use Windows XP professional with a 2Gb RAM. I also have got a 45GB
    j> free space in my C drive.

    j> DO you see something else different?

    i don't have 45GB files nor do i intend to do that. you are disk
    thrashing which is the cause of your slowdowns. you are not properly
    testing the perl code as your OS I/O is the limiting factor here. learn
    how to understand benchmarks better. your test is not legitimate in
    comparing the algorithms as the disk I/O dominates.

    try it with smaller files that will fit in your ram. not more than .5 gb
    given your systems. and with files that large, i would do the conversion
    in large chunks in a look to mitigate the i/o and then see which does
    better.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Mar 11, 2010
    #14
  15. jis

    Uri Guttman Guest

    >>>>> "JWK" == John W Krahn <> writes:

    JWK> Uri Guttman wrote:
    >>
    >> sub regex {
    >> my @arr = $hex =~ /(..{2})/g ;
    >> # print "@arr\n"
    >> }


    JWK> Shouldn't that be:

    JWK> my @arr = $hex =~ /../g ;

    JWK> Or:

    JWK> my @arr = $hex =~ /.{2}/g ;

    JWK> You are capturing *three* characters instead of two.

    true. i did my output test and must have optimized this without running
    the tests again. anyhow, this whole thing is moot. the OP never said he
    had a 25GB file on a 2gb system. slurping in the whole file and then
    processing it is disk bound and the 2 char algorithm is irrelevant. i am
    out of this thread. the OP doesn't seem to get the concept of
    benchmarking or optimizing. let him stick to his substr and stopwatch.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Mar 11, 2010
    #15
  16. On 2010-03-11 18:30, Uri Guttman <> wrote:
    >>>>>> "JWK" == John W Krahn <> writes:

    > anyhow, this whole thing is moot. the OP never said he had a 25GB file
    > on a 2gb system.


    Right. He never said that. So where did you get that information?

    He said he had a 4 MB file and 45 GB of free space (the latter is rather
    irrelevant, of course).

    hp
    Peter J. Holzer, Mar 11, 2010
    #16
  17. jis

    Uri Guttman Guest

    >>>>> "PJH" == Peter J Holzer <> writes:

    PJH> On 2010-03-11 18:30, Uri Guttman <> wrote:
    >>>>>>> "JWK" == John W Krahn <> writes:

    >> anyhow, this whole thing is moot. the OP never said he had a 25GB file
    >> on a 2gb system.


    PJH> Right. He never said that. So where did you get that information?

    PJH> He said he had a 4 MB file and 45 GB of free space (the latter is rather
    PJH> irrelevant, of course).

    i misread the 45Gb free disk as the file size. he still never mentioned
    the file size. as i showed, the unpack is fastest with the data in
    ram. i still would want to know his setup (file size included) to see
    why his substr would be fastest. it has to be some very odd thing he is
    doing and not telling us. there is no way a substr loop could be faster
    than a single call to unpack.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Mar 12, 2010
    #17
  18. On 2010-03-12 02:47, Uri Guttman <> wrote:
    >>>>>> "PJH" == Peter J Holzer <> writes:

    > PJH> On 2010-03-11 18:30, Uri Guttman <> wrote:
    > >>>>>>> "JWK" == John W Krahn <> writes:
    > >> anyhow, this whole thing is moot. the OP never said he had a 25GB file
    > >> on a 2gb system.

    >
    > PJH> Right. He never said that. So where did you get that information?
    >
    > PJH> He said he had a 4 MB file and 45 GB of free space (the latter is rather
    > PJH> irrelevant, of course).
    >
    > i misread the 45Gb free disk as the file size. he still never mentioned
    > the file size. as i showed, the unpack is fastest with the data in
    > ram. i still would want to know his setup (file size included) to see
    > why his substr would be fastest. it has to be some very odd thing he is
    > doing and not telling us.


    The odd thing he is doing seems to be "using perl on Windows". Sln has
    repeatedly pointed out that growing strings or arrays on Windows is
    extremely slow (yes, sln sometimes makes strange claims, but be not only
    provided benchmark results but also a link to a perlmonks thread - so he
    isn't the only one who noticed this). I don't have access to a Windows
    machine where I could test this myself, though.

    hp
    Peter J. Holzer, Mar 15, 2010
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?UmFqZXNoIHNvbmk=?=

    'System.String[]' from its string representation 'String[] Array'

    =?Utf-8?B?UmFqZXNoIHNvbmk=?=, May 4, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    1,801
    =?Utf-8?B?UmFqZXNoIHNvbmk=?=
    May 4, 2006
  2. Replies:
    7
    Views:
    642
  3. Mara Guida

    const and array of array (of array ...)

    Mara Guida, Sep 2, 2009, in forum: C Programming
    Replies:
    3
    Views:
    488
    David RF
    Sep 3, 2009
  4. Tom
    Replies:
    3
    Views:
    208
    salsablr
    Dec 20, 2004
  5. Tuan  Bui
    Replies:
    14
    Views:
    473
    it_says_BALLS_on_your forehead
    Jul 29, 2005
Loading...

Share This Page