Parsing Huge File

M

Majnu

Hello,

I have a strange problem. I have a flat file of about 6 million rows,
each row of 600 bytes. I read the file line by line after opening like
this:

open(IN, "cat $InputFile |") or die "Failed to open the File";
##This was changed from open(IN, "< $InputFile") because Perl
outrightly refused to open the file.
while(<IN>) {......

The problem is that, at times, Perl just stops after reading 3,334,601
records. No error message printed. And this is not a problem whih
occurs always. It just happens sporadically and hence difficult to
track because, if I re-process the file, it gets read completely.

Would someone please shed light on how this could be happening? Is
this something related with memory?
 
I

Ian Wilson

Majnu said:
Hello,

I have a strange problem. I have a flat file of about 6 million rows,
each row of 600 bytes. I read the file line by line after opening like
this:

open(IN, "cat $InputFile |") or die "Failed to open the File";
##This was changed from open(IN, "< $InputFile") because Perl
outrightly refused to open the file.

I'd write that as

open my $in, '<', $InputFile
or die "Failed to open '$InputFile' because $!";

Your `cat` doesn't seem to be doing anything useful.
I always print information about the cause of the error ($!).

I'd get Perl to say exactly *why* it "refused to open the file" before
introducing further complications like `cat`.
while(<IN>) {......

while( said:
The problem is that, at times, Perl just stops after reading 3,334,601
records. No error message printed. And this is not a problem whih
occurs always. It just happens sporadically and hence difficult to
track because, if I re-process the file, it gets read completely.

Would someone please shed light on how this could be happening? Is
this something related with memory?

I guess that would depend on what sort of processing you are doing.

Perhaps a side effect of your script changes the run-time environment
for the re-run. e.g. it creates a log-file that didn't exist and that
your script expects to be present. Maybe there's resource conflicts with
other running processes. Still it is unusual to get no messages, have
you been assiduous in checking for errors in all statements that can
potentially report run-time errors?
 
X

xhoster

Majnu said:
Hello,

I have a strange problem. I have a flat file of about 6 million rows,
each row of 600 bytes. I read the file line by line after opening like
this:

open(IN, "cat $InputFile |") or die "Failed to open the File";
##This was changed from open(IN, "< $InputFile") because Perl
outrightly refused to open the file.

Instead of just silly things to circumvent the problem, perhaps you
should figure out why Perl refused to open the file. That is what $! is
for. If you put electrical tape over the "oil pressure" light on your
dashboard, instead of fixing the problem, you shouldn't be surprised when
your car stops working.

Xho
 
M

Majnu

You neglected to mention which version of perl and which OS, but I
can see the cause of both of your problems.

% perl -le 'print 600*3_334_601'
2000760600

% perl -V | grep large
useperlio=define d_sfio=undef uselargefiles=define

You cannot process more than 2 gigabytes (can't even open a file
bigger than 2 GB) when using a version of perl that expects file
sizes to fit into a signed 32-bit int.

Check your 'perl -V'; I'm sure it is _not_ compiled with 'uselargefiles=define'.

You need to upgrade to perl v5.8.x immediately.

-Joe

Thanks for the replies.

Yes. The Perl version seems to be the problem here. Though the perl
available was 5.8, withing the script, 5.2 was used, which,
unfortunately was not copiled with uselargefiles option.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top