Quoth "tak said:
the available physical memory are down to about 10 MB when loading...
But the CPU usage remains about 5% only...
So, you are thrashing. You've run out of memory: I would suggest using
one of the DBM modules, probably DB_File. This stores the contents of
the hash in a (structured, binary, fast-to-index) file on disk, which
will probably make things faster.
I have 1 main_hash, which stores 27 hashes in it. And out of each 27
hashes, it averages about 9k unique strings. print scalar %hash
reports, 23/32. What does this number mean?
From perldoc perldata:
| If you evaluate a hash in scalar context, it returns false if the hash
| is empty. If there are any key/value pairs, it returns true; more
| precisely, the value returned is a string consisting of the number of
| used buckets and the number of allocated buckets, separated by a slash.
| This is pretty much useful only to find out whether Perl's internal
| hashing algorithm is performing poorly on your data set. For example,
| you stick 10,000 things in a hash, but evaluating %HASH in scalar
| context reveals "1/16", which means only one out of sixteen buckets has
| been touched, and presumably contains all 10,000 of your items. This
| isn't supposed to happen.
(Note that this is not meant as a rebuke: noone can be expected to have
all the arcana in Perl's std docs memorized. It is meant so that you may
remember where to find it next time
. )
So, your main hash is using 23 buckets to store your 27 subhashes... not
such a useful thing to know
. The real question is, how many buckets
does your original hash (with all the data in it) use? For instance, on
my perl
my %h;
for (1..240_000) {
$h{$_} = 1;
}
print scalar %h;
prints '157199/262144', so the hash is using 157199 buckets, and each
bucket has on average 240000/157199 ~~ 1.5 entries in it, which should
not be a problem.
Can you elaborate on what you mean by a swapping problem?
Your system has started thrashing: the working set (the pages in current
use) has exceeded the size of physical memory, and the system is
spending all its time swapping things in and out.
And I thought
about assigning higher number of bucket to the hash itself , but i
cannot find the related function to set that... I am a Java programmer,
and this is my first perl script.. I tried looking into the constructor
for the hash itself, but it doesnt seem like it accepts argument...?
The next para after my previous quote:
| You can preallocate space for a hash by assigning to the keys()
| function. This rounds up the allocated buckets to the next power of two:
|
| keys(%users) = 1000; # allocate 1024 buckets
Last question,
How Do you delete an element within a hoh? Say i have a hash of hash,
like the following.
my %hoh();
Did you even try this? Perl Is Not Java: this is a syntax error. You
don't need the parens.
loop() { # say this is the loop of each line of my txtFile
What is this 'loop()'? Have you been reading about Perl6? Or did you
mean
sub loop {
?
my $value = "TheRecordFromMyTxtFile";
(You really want to sort out your indentation. Makes life easier for
both you and us.)
my $letter = substr $value, 0, 1; # say, i am using the first letter
as the key for subhash.
my $myKey = substr $value, 5, 9; # Say position 5 - 9 is the key for
the element.
$hoh{$letter}{$myKey} = $value
}
Now, I want to delete a particular value from one of the subhash...
I tried doing this,
delete $hoh{$letter}{$value};
That's correct (assuming $value corresponds to $myKey in the above, not
to $value there: that is, you delete an element by specifying its key).
But it doesnt seem like it is deleting... B/c if I try to get the
length of the $hoh{$letter}, it still reports the same number...
You really need to learn some basic Perl. I'd recommend a book:
'Learning Perl' published by O'Reilly is universally recommended as a
good place to start. An alternative would be to read through the
perldocs, but that's not an easy way to learn.
length (see perldoc -f length) treats its argument as a string and
returns the length of that string. $hoh{$letter} contains a hash
*reference*: see perldoc perldsc and perldoc perlreftut for how
multi-level data structures are implemented in Perl. Or, again, a decent
book will cover it. Now, when you stringify a hash ref, you get
something that looks like 'HASH(0x80142180)', which is basically
useless, and is always the same length.
To find the number of keys in a hash, you do as it says in perldoc -f
length: 'scalar keys %hash'. This is somewhat complicated by the fact
that what you have is not a hash but a hash ref, so we apply 'Use Rule
1' from perlreftut:
# an ordinary hash
print scalar keys %hash;
# replace the var name with { }
print scalar keys %{ }
# put the hashref inside the braces
print scalar keys %{ $hoh{$letter} };
Yes, I agree this is a little icky, but that's what you get when you
graft complex data structures onto a language (Perl4) that doesn't
really support them
.
A useful tool for examining data structures is the module Data:
umper
(obviously, you want to run a test on a smaller dataset rather than
dumping a hash of 240k entries).
Ben