Getting huge data into memory in perl

R

rahulthathoo

Hi

I have a huge data set with me. Its like over 10000 files each with
average 50KB of data. Assuming I have sufficient RAM, i need to be able
to load all this into memory so that when i have to look for it I dont
have to do an I/O. How do I go about this whole process in perl.

Rahul
 
R

rahulthathoo

when i print the result of the following code - it gives me just the
first
line of the file. What do u think is wrong?

my %table1=();
for($var=1;$var<2; $var++)
{
....
.......
.........
open A, $mov_i || die "Shit there is some prob here $!";
$table1{$var} = <A>;
close A;
}

foreach $row ($table1{1})
{
print "$row\n";
}
 
R

rahulthathoo

That actually works fine. My problem is that I want to store each file
as a part of an associative array. I am not able to do that. the first
part of the associative array should be the name of the file and the
second part should be the entire file itself? Is there a way to do
this?
Rahul
 
A

anno4000

[top posting corrected. please don't do that]
That actually works fine.

No, it doesn't. The code will only ever die when $mov_i contains a
boolean false value. It doesn't catch a failed open() as intended.
Learn to test your code.
My problem is that I want to store each file
as a part of an associative array. I am not able to do that. the first
part of the associative array should be the name of the file and the
second part should be the entire file itself? Is there a way to do
this?

See $/ in perlvar, look for "slurp mode". Also see the CPAN module
File::Slurp.

The code in your OP catches only one line because you're only looping
once:

for($var=1;$var<2; $var++)

Anno

Anno
 
X

xhoster

rahulthathoo said:
Hi

I have a huge data set with me. Its like over 10000 files each with
average 50KB of data. Assuming I have sufficient RAM, i need to be able
to load all this into memory so that when i have to look for it I dont
have to do an I/O. How do I go about this whole process in perl.

Generally, you don't. File system caching is a job for your OS, not for
Perl. If you load everything into (virtual) memory, some of it will
probably get paged out anyway, meaning you still need I/O to get at it.

Xho
 
T

Ted Zlatanov

I have a huge data set with me. Its like over 10000 files each with
average 50KB of data. Assuming I have sufficient RAM, i need to be able
to load all this into memory so that when i have to look for it I dont
have to do an I/O. How do I go about this whole process in perl.

Call with "program.pl FILE1 FILE2 FILE3"

This is hardly a good use of memory. You probably want to optimize
this to index keywords, for instance. Why do you need each file in
memory, are you searching them?

Ted


#!/usr/bin/perl

use warnings;
use strict;
use Data::Dumper;

my %files;
foreach my $file (@ARGV)
{
open F, '<' . $file or warn "Couldn't open file $file: $!";
$files{$file} = [(<F>)];
}

print Dumper \%files;
 
C

Charles DeRykus

rahulthathoo said:
That actually works fine...

Appearances can be deceiving. That appears to work but will hide the
actual open errors that do occur. [It'd only work with an oddball
filename that evaluates to null, eg. "0"]

Try it with a non-existent filename for instance:

open A, "no_such_file" || die $!;

In fact that'll be parsed just as though you wrote:

open A, "no_such_file";

Take a look Dr. Ruud's perlopentut suggestion.
 
C

Charles DeRykus

Charles said:
rahulthathoo said:
That actually works fine...

Appearances can be deceiving. That appears to work but will hide the
actual open errors that do occur. [It'd only work with an oddball
filename that evaluates to null, eg. "0"]

Remember what I said about oddballs working...forget it. It gets
worse :).

It doesn't even work with a filename such as "0" because that'd
always generate a false positive... in other words,

open A, "0" || die $! # dies even if "0" exists and can be opened.

and, if "0" does exist and can't be opened, it'll die but probably won't
report the correct error... except by accident.
 
T

Tim Shoppa

rahulthathoo said:
That actually works fine. My problem is that I want to store each file
as a part of an associative array. I am not able to do that.

Actually, you can. You can either read them all in (slurp mode most
likely). Or if you want to learn about tie-ing, you can tie a hash so
that whenever you access it, it'll cache it into RAM and then give you
the contents of that file.

With the small quantity of data you have (50MB by my count) it should
be possible to read them all in to RAM if that's what you want to do.

But... you probably don't really want to do any of the above. A file
system is already a kind of associative array. Various database systems
(Berkeley DB) have excellent lookup-by-key abilities that can be
Tie::'ed with appropriate Perl modules. Relational databases and
tie-ing to them will give you the same thing (and a lot more if you
choose to use it). And very likely there's a much better abstraction
for your data than lookup-by-filename.

Tim.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top