Perl hash of hash efficiency.

T

tak

Hello,

Here is the partial code that reads the data from a txt file.

open(mainfile) or die("Could not open master file
'$mainfile'.");
foreach my $line (<mainfile>) {
$i++;
chomp($line);
my @values = split(/\|/, $line);

$Master_Hash{$values[3]} = \@values;

if ($i % 10000 == 0) {
#print ("loaded $i lines in hash so far - last entry was:
$values[3] \n");
my $size = keys(%Master_Hash);
my $scalarSize = scalar %Master_Hash;
print "Loaded $i entries - #ofKeys: $size - ScalarSize:
$scalarSize\n";
}
}

And each line is about 420 characters.

$ wc -l -L 07*file
238348 449 07302006file

After it finishes loading these 240k lines into the hash - the xp task
manager reports 1.91 GB of usage.

-Tak
 
T

tak

Oh, I should add that, each line of the 240k lines are about 420
characters, and it becomes 97 elements in the array (after the split)
that is being pushed into the hash.
 
J

John W. Krahn

tak said:
Here is the partial code that reads the data from a txt file.

open(mainfile) or die("Could not open master file
'$mainfile'.");
foreach my $line (<mainfile>) {
$i++;
chomp($line);
my @values = split(/\|/, $line);

$Master_Hash{$values[3]} = \@values;

if ($i % 10000 == 0) {
#print ("loaded $i lines in hash so far - last entry was:
$values[3] \n");
my $size = keys(%Master_Hash);
my $scalarSize = scalar %Master_Hash;
print "Loaded $i entries - #ofKeys: $size - ScalarSize:
$scalarSize\n";
}
}

And each line is about 420 characters.

$ wc -l -L 07*file
238348 449 07302006file

After it finishes loading these 240k lines into the hash - the xp task
manager reports 1.91 GB of usage.

foreach operates on lists so the contents of $mainfile are first loaded into a
list in memory before the contents are processed. Use a while loop instead so
that only one line is loaded into memory at a time. So change:

foreach my $line (<mainfile>) {

To:

while ( my $line = <mainfile> ) {



John
 
B

Ben Morrow

Quoth "A. Sinan Unur said:

open(mainfile) is valid (from Perl4) code, which opens *mainfile{IO} to
the file named in $mainfile (or rather $::mainfile, since it always uses
the global).
or die("Could not open master file

Haven't you read any of the responses??? I pointed this out as a distinct
possibility. You are slurping the entire file, just to process it line by
line.

Given how common this mistake is (for vs. while for reading a file), and
that for is probably a more obvious idiom for iterating over all the
lines in the file, would there be any point suggesting this to p5p as an
optimization along the lines of for (1..100_000) ? I can't see a good
reason why for (<$FH>) {} actually *has* to build the list of lines
first, or indeed when it would be helpful for it to do. I suppose some
obscure situation involving seeking and overwriting inside the loop?

Ben
 
T

tak

A. Sinan Unur said:
Please read the posting guidelines for this group, especially the sections
on quoting and posting code (along with data).

use strict;
use warnings;

missing.

You are right, i did not read the posting guidelines on posting and
quoting code. Perl is the first time i am posting something on google
groups..



what what? mainfile is the path to the file.... i thought it makes
sense...
or die("Could not open master file

Haven't you read any of the responses??? I pointed this out as a distinct
possibility. You are slurping the entire file, just to process it line by
line.


I did read all the responses. I do not know what you mean by
"Slurping", sorry, i am not a native english speaker.Slurping is not a
technical term that you can learn in a Perl book, nor any of the tech
books that you can find in barnes and noble.

Given this snippent, I presume the rest of your code is just as silly, and
it is no shock that you run out of memory.

Please read the posting guidelines. Especially the section about posting
code.

You have just wasted our time trying to guess what your problem was. I
won't be seeing you.

Sinan

The problem was resolved before you responded. Have you not read any of
the responses that I made to several posted before you? At first i
thought it is a collision issue, then later on, i figure it is a memory
issue.

And if you read my post - i mentioned that I am NOT a perl programmer,
this is the FIRST time i am doing things in perl. In java, there is no
foreach, there is a for, there is a while, and a do loop. But there is
no FOREACH. And by looking at the next useful comments made by the next
poster - he mentioned that i should not be using foreach, use while
instead. You could've say the same thing - but you are not in a
position to say i wasted your time trying to guess what my issue was.
B/c the issue was resolved before you posted, and your distinctive
guess was not in a technical term that i could understand. I posted
the code for you to review, is b/c I see that even the issue was
resolved, there are still posted interested on it, and posting more
replies. And that was the reason why i posted it. And furthermore, that
was the original code that sucks up 1.9gb, I did get more advice on how
to make it smaller

And as for you will not be seeing me - OK. If that makes you feel
better.
 
T

tak

i see. thx


tak said:
Here is the partial code that reads the data from a txt file.

open(mainfile) or die("Could not open master file
'$mainfile'.");
foreach my $line (<mainfile>) {
$i++;
chomp($line);
my @values = split(/\|/, $line);

$Master_Hash{$values[3]} = \@values;

if ($i % 10000 == 0) {
#print ("loaded $i lines in hash so far - last entry was:
$values[3] \n");
my $size = keys(%Master_Hash);
my $scalarSize = scalar %Master_Hash;
print "Loaded $i entries - #ofKeys: $size - ScalarSize:
$scalarSize\n";
}
}

And each line is about 420 characters.

$ wc -l -L 07*file
238348 449 07302006file

After it finishes loading these 240k lines into the hash - the xp task
manager reports 1.91 GB of usage.

foreach operates on lists so the contents of $mainfile are first loaded into a
list in memory before the contents are processed. Use a while loop instead so
that only one line is loaded into memory at a time. So change:

foreach my $line (<mainfile>) {

To:

while ( my $line = <mainfile> ) {



John
 
D

David Squire

tak wrote:

[snip]
You are right, i did not read the posting guidelines on posting and
quoting code. Perl is the first time i am posting something on google
groups..

This is not "Google groups". This is a Usenet group. Usenet has been
around since before HTML and the WWW, and long before Google. All Google
is doing here is providing a gateway (for you, I, as are many others
here, am using a real newsreader).


DS
 
T

tak

How many responses did take the diagnose the issue? How many responses
would it have taken had you posted code in the first place? The difference
corresponds to wasted time because you did not take time to post enough
information.

Mister, look at the replies list. when you replied for the first time -
the issue was already resolved. That was what i was trying to explain
to you. And the code that I posted AFTER you replied - was the original
code that was sucking up memory. I posted it b/c I saw that there are
people who posted questions about the code. And that was already
revised after other people (Ben, Paul, XHOS...) posted replied to my
original question. So, eventho you replied - and diagnosed the issue,
and eventho it is correct... i already have the answer before you
posted... So, may be yes, i did not post code, and wasted Paul, Ben,
Xhos's time, but how did i wasted YOUR time??!?!?!?!?!?!? You posted 1
message - and i posted the code that sucks up memory for you to look -
i thought you might be interested to see what was wrong with it... and
then you said i am wasting you time... I could've just NOT posted any
code...
Well, I had forgotten to add you to my killfile.

Sinan

OK, then please remember to add me to your killfile then. If possible,
please do it right after you view this message. B/c it hurts...
 
I

Ian Wilson

tak said:
Mister, look at the replies list. when you replied for the first time -
the issue was already resolved. That was what i was trying to explain
to you. And the code that I posted AFTER you replied - was the original

<snip>

Tak,

If you are not already familiar with it, I think you will find it
helpful to read this:
http://www.catb.org/~esr/faqs/smart-questions.html
Especially
http://www.catb.org/~esr/faqs/smart-questions.html#keepcool

I think it pays to read the posting guidelines for comp.lang.perl.misc
that others have pointed out earlier in this discussion.
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

When someone posts a clear example and quickly gets a clear solution
then I might also benefit by learning something about the subject. When
the newsgroup is filled with vagueness, misunderstandings and
recriminations, we all lose.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top