S
Stefan Fischerländer
I've got the following situation:
My data is stored in a file; 5 bytes form one record. The bytes 0 to 3
are a long integer, which is regarded as the key, wheras byte 4 is the
value. I have to read this data structure into a hash.
What I'm doing at the moment is:
open(IN,"<file");
while(read(IN, $buf, 5))
{
$hash{unpack("L", substr($buf,0,4))} = substr($buf,4,1);
}
close(IN);
This takes about 5 seconds to read a file with about 100.000 records,
which is to long for my needs. (Celeron 800, IDE 7200rpm)
I figured out that the bottleneck isn't the I/O, because this version
takes as long as the script above:
open(IN,"<file")
$bread = read(IN, $buf, 1000000) / 5;
close(IN);
for($i=0;$i<$bread;$i++)
{
$hash{unpack("L", substr($buf,$i*5,4))} = substr($buf,$i*5+4,1);
}
Does anyone have any ideas how to build this hash faster? It would
also be possible to change the format of the data file. I already did
experiments with store and retrieve from the Storable module and with
a permanent hash with DB_File. In both cases the data file grew, while
the script was even slower.
Stefan
My data is stored in a file; 5 bytes form one record. The bytes 0 to 3
are a long integer, which is regarded as the key, wheras byte 4 is the
value. I have to read this data structure into a hash.
What I'm doing at the moment is:
open(IN,"<file");
while(read(IN, $buf, 5))
{
$hash{unpack("L", substr($buf,0,4))} = substr($buf,4,1);
}
close(IN);
This takes about 5 seconds to read a file with about 100.000 records,
which is to long for my needs. (Celeron 800, IDE 7200rpm)
I figured out that the bottleneck isn't the I/O, because this version
takes as long as the script above:
open(IN,"<file")
$bread = read(IN, $buf, 1000000) / 5;
close(IN);
for($i=0;$i<$bread;$i++)
{
$hash{unpack("L", substr($buf,$i*5,4))} = substr($buf,$i*5+4,1);
}
Does anyone have any ideas how to build this hash faster? It would
also be possible to change the format of the data file. I already did
experiments with store and retrieve from the Storable module and with
a permanent hash with DB_File. In both cases the data file grew, while
the script was even slower.
Stefan