Guys,
I have a string $hex which has lets assume "0012345689abcd"
Unfortunately when i read big files of 4MB size it takes
like 10mins before it completes execution. No good.
(i couldnt split it like 00,12 but only like 0,0,1,2)
Then I thought unpack wud be a better idea.
@arr = unpack("H2",$data); or
@arr = unpack("H2*",$data);
Perl distributions for win32 have a problem with
native realloc(). On these, the larger the dynamic list
generated by the function, the longer it takes.
Linux doesen't have this problem.
In general, if you expect to be splitting up very
large data segments, its better to control the list
external to the function, where push() is better.
Of the 3 types of basic methods: substr/unpack/regexp,
the one thats the fastest seems to be substr().
Additionally, on win32 platforms, any method using a
push is far better.
My platform is Windows in generating the below data.
If you have Linux, your results will be different.
Post your numbers if you can.
-sln
Output:
--------------------
Size of bigstring = 560
Substr/push took: 0.00030303 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/list took: 0.000344038 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/push took: 0.000586033 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/list took: 0.000608206 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/push took: 0.000404835 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
--------------------
Size of bigstring = 5600
Substr/push took: 0.002841 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/list took: 0.00334311 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Unpack/push took: 0.00657105 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/list took: 0.00673795 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Regexp/push took: 0.004076 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
--------------------
Size of bigstring = 56000
Substr/push took: 0.0301139 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)
Unpack/list took: 0.0458951 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU)
Unpack/push took: 0.0644789 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Regexp/list took: 0.07149 wallclock secs ( 0.06 usr + 0.00 sys = 0.06 CPU)
Regexp/push took: 0.03965 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU)
--------------------
Size of bigstring = 560000
Substr/push took: 0.309315 wallclock secs ( 0.30 usr + 0.02 sys = 0.31 CPU)
Unpack/list took: 0.723145 wallclock secs ( 0.61 usr + 0.11 sys = 0.72 CPU)
Unpack/push took: 0.640141 wallclock secs ( 0.64 usr + 0.00 sys = 0.64 CPU)
Regexp/list took: 0.927701 wallclock secs ( 0.92 usr + 0.00 sys = 0.92 CPU)
Regexp/push took: 0.516143 wallclock secs ( 0.52 usr + 0.00 sys = 0.52 CPU)
--------------------
Size of bigstring = 5600000
Substr/push took: 3.79988 wallclock secs ( 3.75 usr + 0.06 sys = 3.81 CPU)
Unpack/list took: 40.0264 wallclock secs (34.97 usr + 5.06 sys = 40.03 CPU)
Unpack/push took: 6.71793 wallclock secs ( 6.70 usr + 0.01 sys = 6.72 CPU)
Regexp/list took: 34.6208 wallclock secs (34.56 usr + 0.06 sys = 34.63 CPU)
Regexp/push took: 7.93654 wallclock secs ( 7.89 usr + 0.05 sys = 7.94 CPU)
=======
for my $multiplier (40, 400, 4_000, 40_000, 400_000)
{
my $bigstring = '0012345689abcd' x $multiplier;
print "\n",'-'x20,"\nSize of bigstring = ",length($bigstring),"\n\n";
##
{
my ($val, $offs, @pairs) = ('',0);
my $t0 = new Benchmark;
while ($val=substr( $bigstring, $offs, 2))
{
push @pairs, $val;
$offs+=2;
}
my $t1 = new Benchmark;
print "Substr/push took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my $t0 = new Benchmark;
my @pairs = unpack '(a2)*', $bigstring;
my $t1 = new Benchmark;
print "Unpack/list took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my ($val, $offs, @pairs) = ('',0);
my $t0 = new Benchmark;
while ($val=unpack("x$offs a2", $bigstring) )
{
push @pairs, $val;
$offs+=2;
}
my $t1 = new Benchmark;
print "Unpack/push took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my $t0 = new Benchmark;
my @pairs = $bigstring =~ /[0-9a-f]{2}/g;
my $t1 = new Benchmark;
print "Regexp/list took: ",timestr(timediff($t1, $t0)),"\n";
}
##
{
my @pairs;
my $t0 = new Benchmark;
while ( $bigstring =~ /([0-9a-f]{2})/g ) {
push @pairs, $1;
}
my $t1 = new Benchmark;
print "Regexp/push took: ",timestr(timediff($t1, $t0)),"\n";
}
}
__END__