substr taking time

Stu · Aug 23, 2006

I am reading through a file that is 2,432 lines with a record length of
450 bytes. I want to get the first 8 bytes from each line and stick in
into an array. When I excute the following piece of code it takes 7 to
8 seconds.

In comparison, when I issue a cut -c1-8 < data this from MKS this
takes one second on the same data

Does anybody know of a way on how these lines of code can be optimized.

BTW, I am using Active Perl on a Windows 2003 platform, but that should
not make a difference

print scalar ( localtime() ) . "\n";

while ($NextLine = <INPUT>)
{
@docidarray = (@docidarray, substr($NextLine, 0, 8));
}

print scalar ( localtime() ) . "\n";

Paul Lalli · Aug 23, 2006

Stu said:
I am reading through a file that is 2,432 lines with a record length of
450 bytes. I want to get the first 8 bytes from each line and stick in
into an array. When I excute the following piece of code it takes 7 to
8 seconds.

In comparison, when I issue a cut -c1-8 < data this from MKS this
takes one second on the same data

Does anybody know of a way on how these lines of code can be optimized.

BTW, I am using Active Perl on a Windows 2003 platform, but that should
not make a difference

print scalar ( localtime() ) . "\n";

while ($NextLine = <INPUT>)
{
@docidarray = (@docidarray, substr($NextLine, 0, 8));

Why are you recopying the entire array every time through the loop?

perldoc -f push

}

print scalar ( localtime() ) . "\n";

Paul Lalli

Ted Zlatanov · Aug 23, 2006

I am reading through a file that is 2,432 lines with a record length of
450 bytes. I want to get the first 8 bytes from each line and stick in
into an array. When I excute the following piece of code it takes 7 to
8 seconds.

In comparison, when I issue a cut -c1-8 < data this from MKS this
takes one second on the same data

Does anybody know of a way on how these lines of code can be optimized.

BTW, I am using Active Perl on a Windows 2003 platform, but that should
not make a difference

print scalar ( localtime() ) . "\n";

while ($NextLine = <INPUT>)
{
@docidarray = (@docidarray, substr($NextLine, 0, 8));
}

print scalar ( localtime() ) . "\n";

While the advice to use push() is valid, you should realize that a
simple task like this does not require Perl, and you are incurring
quite a bit of overhead when you use Perl. You'll never be as fast as
`cut' with Perl at doing `cut's job (well, some examples could be
contrived).

Starting up Perl, in particular, takes a while (depending on the
machine, of course). The modules you are using may also slow you
down. So the 7-8 seconds time may not be just the processing time.

Then, of course, each statement is run by the Perl interpreter, unlike
a program like `cut' which is very optimized in C to do just one task.

So you have to decide - if the task requires just `cut', use just
that. You can produce the localtime() output with the `date'
command. If, however, this is part of a bigger program, you may have
to live with the slight performance hit.

Ted

alpha_beta_release · Aug 24, 2006

hi,

Perl script takes longer time than compiled program.

About how fast substr() is,recently, i try to test it. What i found
it's OK, and considerably fast.

i've done the following test, to compare substr() and less primitive
technique also in Perl. My objective is to find which technique is
faster. (You can extend the string if you want.)

----------------------------
use Benchmark ':all';

my $str1 = "1234567890123456789012345678901234567890";
my $str2 = "1234567890123456789012345678901234567890";
my @a = split '', $str1;
my @b = split '', $str2;

sub code1
{
my $dumb;
for(0..length $str1) {
$dumb = 1 if(substr($str1, $_, 1) eq substr($str2, $_,
1));
}
}
sub code2
{
my $dumb;
for(0..$#a) {
$dumb = 1 if($a[$_] eq $b[$_]);
}
}

timethese(100, {
style1 => 'code1',
style2 => 'code2'
});

John W. Krahn · Aug 24, 2006

[ Please do not top-post. TIA ]

alpha_beta_release said:
Perl script takes longer time than compiled program.

About how fast substr() is,recently, i try to test it. What i found
it's OK, and considerably fast.

i've done the following test, to compare substr() and less primitive
technique also in Perl. My objective is to find which technique is
faster. (You can extend the string if you want.)

----------------------------
use Benchmark ':all';

my $str1 = "1234567890123456789012345678901234567890";
my $str2 = "1234567890123456789012345678901234567890";
my @a = split '', $str1;
my @b = split '', $str2;

sub code1
{
my $dumb;
for(0..length $str1) {
$dumb = 1 if(substr($str1, $_, 1) eq substr($str2, $_,
1));
}
}
sub code2
{
my $dumb;
for(0..$#a) {
$dumb = 1 if($a[$_] eq $b[$_]);
}
}

timethese(100, {
style1 => 'code1',
style2 => 'code2'
});

Maybe because you are reading past the end of the strings:

for(0..length $str1) {

Should be:

for(0..length($str1) - 1) {

John

past time	2	Mar 27, 2012
Efficient field splitting? unpack or substr	6	Oct 10, 2003
substr() hassle, nx vs. Win32	4	Jun 14, 2006
substr forces scalar context with array argument	15	Nov 29, 2005
Output confusion	2	Mar 9, 2023
What date was so many months and years before	18	Nov 29, 2013
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Very slow	16	Jan 12, 2012

substr taking time

Stu

Paul Lalli

Ted Zlatanov

alpha_beta_release

John W. Krahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads