Huge Memory Load for reading into memory

rahulthathoo · Nov 7, 2006

Hi
I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says: VIRT:
7449m and Res: 7.289m - of course I have 8GB of RAM at my disposal. But
the point is why is this happening for a file which is only 600MB in
size. Here is the code:

open UM, MAP_FILE or die "Can't open Usermap.\n";
my %mapHash;
while(<UM>){
@tokens = split(/:/, $_);
$zHashKey = $tokens[0];
@zMovArr = split(/\s/, $tokens[1]);
$mapHash{$zHashKey} = [@zMovArr];
}
close UM;

Thanks for any help.

Rahul

xhoster · Nov 7, 2006

rahulthathoo said:
Hi
I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says: VIRT:
7449m and Res: 7.289m - of course I have 8GB of RAM at my disposal. But
the point is why is this happening for a file which is only 600MB in
size. Here is the code:

open UM, MAP_FILE or die "Can't open Usermap.\n";
my %mapHash;
while(<UM>){
@tokens = split(/:/, $_);
$zHashKey = $tokens[0];
@zMovArr = split(/\s/, $tokens[1]);
$mapHash{$zHashKey} = [@zMovArr];
}
close UM;

You are storing one array (with an unknown number of scalars) for every
line. Arrays have overhead. So do scalars (but not as much as arrays).
For small arrays or small scalars, the overhead can easily be greater than
the actual user-data.

Xho

Mumia W. (reading news) · Nov 7, 2006

Hi
I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says: VIRT:
7449m and Res: 7.289m - of course I have 8GB of RAM at my disposal. But
the point is why is this happening for a file which is only 600MB in
size. Here is the code:
[...]

On your system, there is a document that discusses keeping memory usage
low in Perl:

Start->Run->"perldoc -q memory"

Ted Zlatanov · Nov 7, 2006

I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says: VIRT:
7449m and Res: 7.289m - of course I have 8GB of RAM at my disposal. But
the point is why is this happening for a file which is only 600MB in
size. Here is the code:

open UM, MAP_FILE or die "Can't open Usermap.\n";
my %mapHash;
while(<UM>){
@tokens = split(/:/, $_);
$zHashKey = $tokens[0];
@zMovArr = split(/\s/, $tokens[1]);
$mapHash{$zHashKey} = [@zMovArr];
}
close UM;

Approach 1:

Make the inner loop

@tokens = split(/:/, $_, 2);
$mapHash{$tokens[0]} = $tokens[1];

and then parse out the tokens later as needed.

Approach 2:

use a database (DBM, SQLite, a RDBMS like MySQL, etc.) to hold your
data

Is what you showed really all the code? I doubt there's a 10x
overhead from 600 MB of text data in any case. Look for copies of the
hash, or other data that's using the memory.

Ted

David Squire · Nov 7, 2006

Ted said:
I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says: VIRT:
7449m and Res: 7.289m - of course I have 8GB of RAM at my disposal. But
the point is why is this happening for a file which is only 600MB in
size. Here is the code:

open UM, MAP_FILE or die "Can't open Usermap.\n";
my %mapHash;
while(<UM>){
@tokens = split(/:/, $_);
$zHashKey = $tokens[0];
@zMovArr = split(/\s/, $tokens[1]);
$mapHash{$zHashKey} = [@zMovArr];
}
close UM;

Click to expand...

[snip]

Is what you showed really all the code? I doubt there's a 10x
overhead from 600 MB of text data in any case. Look for copies of the
hash, or other data that's using the memory.

I seem to recall reading here in the last couple of months that there
*is* indeed an overhead that could do that for hashes (as used above)
.... something of the order of 60 bytes per entry, independent of the
size of the key or value. If the keys and values are only a few bytes
each, then a 10x overhead is not implausible.

Would someone with real knowledge of the relevant internals like to comment?

DS

xhoster · Nov 7, 2006

David Squire said:
Ted said:

I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says:
VIRT: 7449m and Res: 7.289m - of course I have 8GB of RAM at my
disposal. But the point is why is this happening for a file which is
only 600MB in size. Here is the code:

open UM, MAP_FILE or die "Can't open Usermap.\n";
my %mapHash;
while(<UM>){
@tokens = split(/:/, $_);
$zHashKey = $tokens[0];
@zMovArr = split(/\s/, $tokens[1]);
$mapHash{$zHashKey} = [@zMovArr];
}
close UM;

Click to expand...

[snip]

Is what you showed really all the code? I doubt there's a 10x
overhead from 600 MB of text data in any case. Look for copies of the
hash, or other data that's using the memory.

Click to expand...

I seem to recall reading here in the last couple of months that there
*is* indeed an overhead that could do that for hashes (as used above)

There is only one hash used, so it's overhead would be minimal (other than
the key overhead, but that is generally less than the overhead of an
equivalent scalar string). There are a lot of arrays, which also have
overhead. And scalars, which also have a lot of overhead. (A "real"
undefined scalar takes ~20 bytes, one with the empty string takes over 40
bytes, on one of my machines)

... something of the order of 60 bytes per entry, independent of the
size of the key or value. If the keys and values are only a few bytes
each, then a 10x overhead is not implausible.

Would someone with real knowledge of the relevant internals like to
comment?

Modern Perls share their keys among all the hashes, and last time I
researched it the first use of a key in any hash used the length of the
key, plus ~20 bytes. Further use of the same key in other hashes used just
12 bytes per key. The value is just a scalar and has the storage
requirements of a scalar. This was 32 bit, presumably 64 bit machines have
substantially higher overhead.

But storage is highly context dependent, and is hysteretic.

If you are storing numbers, taking care to not store the stringy version
of them can cut storage by more than half.

my @x = split; #stores as strings
my @x = map $_+0, split; #stores as ints/floats

Xho

Ted Zlatanov · Nov 10, 2006

I seem to recall reading here in the last couple of months that there
*is* indeed an overhead that could do that for hashes (as used above)
... something of the order of 60 bytes per entry, independent of the
size of the key or value. If the keys and values are only a few bytes
each, then a 10x overhead is not implausible.

OK, so my tip of keeping the data in a string as long as possible
should help. I don't know the exact data the OP used so it's hard to
tell how that will affect performance.

I don't know the relevant internals, but 10x overhead on any kind of
in-memory storage is clearly excessive; it points to either bad data
structure design (the case here) or bad interpreter/VM design. I know
the Perl interpreter is not all that bad with the memory usage, so I
went with the former assumption. But it's not impossible the OP just
had multiple copies of the data inadvertently.

Ted

Getting huge data into memory in perl	9	Nov 3, 2006
Memory error due to the huge/huge input file size	3	Nov 10, 2008
reading binary file into memory. Converting from char to uint32,float, double, ASCII strings etc (st	37	Oct 15, 2011
Trying to parse a HUGE(1gb) xml file	41	Dec 20, 2010
out of memory	20	Oct 31, 2008
Parsing Huge File	3	Aug 13, 2007
"Out of memory!" error with PerlIO::via	0	Oct 16, 2008
Load file into a hash	9	Oct 6, 2007

Huge Memory Load for reading into memory

rahulthathoo

xhoster

Mumia W. (reading news)

Ted Zlatanov

David Squire

xhoster

Ted Zlatanov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads