C
Chris
I've come across the perl issue of inefficient use of memory when
dealing with large datasets. What are people's opinions on the best way
to work around this problem.
e.g.
My input file has this layout:
# Input 1_8:
0.28496 0.10340 0.33403 0.86176 0.06723 0.15316 0.46009 0.09535 ...
# Output 1_8:
0 0 1
# Input 1_9:
0.38225 0.98944 0.03805 0.04031 0.05417 0.19623 0.07656 0.07944 ...
# Output 1_9:
0 0 1
# Input 1_10:
0.11106 0.02792 0.69635 0.37519 0.01326 0.95435 0.15976 0.01406 ...
# Output 1_10:
0 0 1
With ~73000 pairs of input and outputs. The file is ~260Mb is size.
However when reading the file into an array with the following code
snippet results in 1.2Gb of memory usage:
#!/usr/bin/perl
use strict;
use warnings;
my ($patfile) = @ARGV;
open(my $FH, $patfile) or die;
my @array;
my $flag = 0;
my $i = 0;
while (<$FH>) {
$flag = 0 if (/^\# Output/);
$flag = 1 and next if (/^\# Input/);
if ($flag) {
chomp;
print "$i\n";
$array[$i] = [ split ];
++$i;
}
}
exit;
I've read about the various work-arounds to access the array via a file
on disk, but they don't seem to be very conducive for working with
complex data structures. Can you guys/gals let me know of their
favourite method to work more efficiently as at the moment I'm just
reading/writing the files a bit at a time?
TIA
dealing with large datasets. What are people's opinions on the best way
to work around this problem.
e.g.
My input file has this layout:
# Input 1_8:
0.28496 0.10340 0.33403 0.86176 0.06723 0.15316 0.46009 0.09535 ...
# Output 1_8:
0 0 1
# Input 1_9:
0.38225 0.98944 0.03805 0.04031 0.05417 0.19623 0.07656 0.07944 ...
# Output 1_9:
0 0 1
# Input 1_10:
0.11106 0.02792 0.69635 0.37519 0.01326 0.95435 0.15976 0.01406 ...
# Output 1_10:
0 0 1
With ~73000 pairs of input and outputs. The file is ~260Mb is size.
However when reading the file into an array with the following code
snippet results in 1.2Gb of memory usage:
#!/usr/bin/perl
use strict;
use warnings;
my ($patfile) = @ARGV;
open(my $FH, $patfile) or die;
my @array;
my $flag = 0;
my $i = 0;
while (<$FH>) {
$flag = 0 if (/^\# Output/);
$flag = 1 and next if (/^\# Input/);
if ($flag) {
chomp;
print "$i\n";
$array[$i] = [ split ];
++$i;
}
}
exit;
I've read about the various work-arounds to access the array via a file
on disk, but they don't seem to be very conducive for working with
complex data structures. Can you guys/gals let me know of their
favourite method to work more efficiently as at the moment I'm just
reading/writing the files a bit at a time?
TIA