Huge Memory Load for reading into memory

R

rahulthathoo

Hi
I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says: VIRT:
7449m and Res: 7.289m - of course I have 8GB of RAM at my disposal. But
the point is why is this happening for a file which is only 600MB in
size. Here is the code:

open UM, MAP_FILE or die "Can't open Usermap.\n";
my %mapHash;
while(<UM>){
@tokens = split(/:/, $_);
$zHashKey = $tokens[0];
@zMovArr = split(/\s/, $tokens[1]);
$mapHash{$zHashKey} = [@zMovArr];
}
close UM;


Thanks for any help.

Rahul
 
X

xhoster

rahulthathoo said:
Hi
I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says: VIRT:
7449m and Res: 7.289m - of course I have 8GB of RAM at my disposal. But
the point is why is this happening for a file which is only 600MB in
size. Here is the code:

open UM, MAP_FILE or die "Can't open Usermap.\n";
my %mapHash;
while(<UM>){
@tokens = split(/:/, $_);
$zHashKey = $tokens[0];
@zMovArr = split(/\s/, $tokens[1]);
$mapHash{$zHashKey} = [@zMovArr];
}
close UM;

You are storing one array (with an unknown number of scalars) for every
line. Arrays have overhead. So do scalars (but not as much as arrays).
For small arrays or small scalars, the overhead can easily be greater than
the actual user-data.

Xho
 
M

Mumia W. (reading news)

Hi
I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says: VIRT:
7449m and Res: 7.289m - of course I have 8GB of RAM at my disposal. But
the point is why is this happening for a file which is only 600MB in
size. Here is the code:
[...]

On your system, there is a document that discusses keeping memory usage
low in Perl:

Start->Run->"perldoc -q memory"
 
T

Ted Zlatanov

I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says: VIRT:
7449m and Res: 7.289m - of course I have 8GB of RAM at my disposal. But
the point is why is this happening for a file which is only 600MB in
size. Here is the code:

open UM, MAP_FILE or die "Can't open Usermap.\n";
my %mapHash;
while(<UM>){
@tokens = split(/:/, $_);
$zHashKey = $tokens[0];
@zMovArr = split(/\s/, $tokens[1]);
$mapHash{$zHashKey} = [@zMovArr];
}
close UM;

Approach 1:

Make the inner loop

@tokens = split(/:/, $_, 2);
$mapHash{$tokens[0]} = $tokens[1];

and then parse out the tokens later as needed.

Approach 2:

use a database (DBM, SQLite, a RDBMS like MySQL, etc.) to hold your
data

Is what you showed really all the code? I doubt there's a 10x
overhead from 600 MB of text data in any case. Look for copies of the
hash, or other data that's using the memory.

Ted
 
D

David Squire

Ted said:
I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says: VIRT:
7449m and Res: 7.289m - of course I have 8GB of RAM at my disposal. But
the point is why is this happening for a file which is only 600MB in
size. Here is the code:

open UM, MAP_FILE or die "Can't open Usermap.\n";
my %mapHash;
while(<UM>){
@tokens = split(/:/, $_);
$zHashKey = $tokens[0];
@zMovArr = split(/\s/, $tokens[1]);
$mapHash{$zHashKey} = [@zMovArr];
}
close UM;
[snip]

Is what you showed really all the code? I doubt there's a 10x
overhead from 600 MB of text data in any case. Look for copies of the
hash, or other data that's using the memory.

I seem to recall reading here in the last couple of months that there
*is* indeed an overhead that could do that for hashes (as used above)
.... something of the order of 60 bytes per entry, independent of the
size of the key or value. If the keys and values are only a few bytes
each, then a 10x overhead is not implausible.

Would someone with real knowledge of the relevant internals like to comment?


DS
 
X

xhoster

David Squire said:
Ted said:
I am trying to load a 600MB file into memory through the below code.
But when I do a top on the system, I see that over the duration of the
program run, the memory usage is 7.5 Gigs!! the top command says:
VIRT: 7449m and Res: 7.289m - of course I have 8GB of RAM at my
disposal. But the point is why is this happening for a file which is
only 600MB in size. Here is the code:

open UM, MAP_FILE or die "Can't open Usermap.\n";
my %mapHash;
while(<UM>){
@tokens = split(/:/, $_);
$zHashKey = $tokens[0];
@zMovArr = split(/\s/, $tokens[1]);
$mapHash{$zHashKey} = [@zMovArr];
}
close UM;
[snip]

Is what you showed really all the code? I doubt there's a 10x
overhead from 600 MB of text data in any case. Look for copies of the
hash, or other data that's using the memory.

I seem to recall reading here in the last couple of months that there
*is* indeed an overhead that could do that for hashes (as used above)

There is only one hash used, so it's overhead would be minimal (other than
the key overhead, but that is generally less than the overhead of an
equivalent scalar string). There are a lot of arrays, which also have
overhead. And scalars, which also have a lot of overhead. (A "real"
undefined scalar takes ~20 bytes, one with the empty string takes over 40
bytes, on one of my machines)
... something of the order of 60 bytes per entry, independent of the
size of the key or value. If the keys and values are only a few bytes
each, then a 10x overhead is not implausible.

Would someone with real knowledge of the relevant internals like to
comment?

Modern Perls share their keys among all the hashes, and last time I
researched it the first use of a key in any hash used the length of the
key, plus ~20 bytes. Further use of the same key in other hashes used just
12 bytes per key. The value is just a scalar and has the storage
requirements of a scalar. This was 32 bit, presumably 64 bit machines have
substantially higher overhead.

But storage is highly context dependent, and is hysteretic.

If you are storing numbers, taking care to not store the stringy version
of them can cut storage by more than half.

my @x = split; #stores as strings
my @x = map $_+0, split; #stores as ints/floats

Xho
 
T

Ted Zlatanov

I seem to recall reading here in the last couple of months that there
*is* indeed an overhead that could do that for hashes (as used above)
... something of the order of 60 bytes per entry, independent of the
size of the key or value. If the keys and values are only a few bytes
each, then a 10x overhead is not implausible.

OK, so my tip of keeping the data in a string as long as possible
should help. I don't know the exact data the OP used so it's hard to
tell how that will affect performance.

I don't know the relevant internals, but 10x overhead on any kind of
in-memory storage is clearly excessive; it points to either bad data
structure design (the case here) or bad interpreter/VM design. I know
the Perl interpreter is not all that bad with the memory usage, so I
went with the former assumption. But it's not impossible the OP just
had multiple copies of the data inadvertently.

Ted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top