Memory usage

W

Witold Rugowski

Hi!
I have following problem. My perl script is parsing large log files (to be exact many medium sized) up to 3 or 4 GB of data. But all what is done, it is extracting some data (IP address and volume of traffic).

Data is read line by line, from file by file, with something like (not real code):

while (@files) {
open FILE, files[0] or die;
shift;
while (<FILE>) { if ( $_ =~ /(MY_PATTERN)/ ) call_sum_function($1); }
}
print_results();


All hashes, storing data are very compact (after processing 90% of 1.6 GB data all my defined hashes have less than 30 entries, every having two integers).

And after those 90% shit happens:

Out of memory during request for 156 bytes, total sbrk() is 536711168 bytes!

And I don't have idea why?? Of course I can read, that perl allocated 512MB and that is enough for him, but I have no idea:

a) what consumed this memory
b) how to avoid this error


Any help?
 
M

Matt Garrish

Witold Rugowski said:
Hi!
I have following problem. My perl script is parsing large log files (to be
exact many medium sized) up to 3 or 4 GB of data. But all what is done, it
is extracting some data (IP address and volume of traffic).

Data is read line by line, from file by file, with something like (not
real code):

Then why ask for help? If you don't post real code how is anyone supposed to
guess what you've done wrong?

Matt
 
J

John Bokma

Witold Rugowski said:
(not real code):

Bring back your problem to the smallest possible program that has this
problem and post the *real* code.

Most of the time people find the problem this way themselves. If not,
people can try to reproduce it.
 
W

Witold Rugowski

Xicheng said:
this is not like *nix shell scripts, this barebone "shift" statement
shifts @ARGV instead of @files. use foreach to iterate your array
@files...

This is not real code from a script. I wrote it to show "in general" how I'm doing it (currently I have no code available here). I'm shifting right variable and for files with data less than 1.5GB it works very well.
 
W

Witold Rugowski

John said:
Bring back your problem to the smallest possible program that has this
problem and post the *real* code.

OK. I post it in a few hours...
 
D

Dr.Ruud

Witold Rugowski schreef:
(not real code):

use strict;
use warnings;
while (@files) {
open FILE, files[0] or die;

See: perldoc -f open
Use the three-arg form.
If your Perl is 5.8+, use a Perl scalar filehandle.
Use $! in the die.

What is "files[0]"?


Why the "shift"?

Did you maybe start with something like the following code,
and changed all underscores to "files", s/_/files/g?

while (@_) {
open FILE, $_[0] or die;
 
D

Dr.Ruud

Paul Lalli schreef:
Dr.Ruud:
Witold Rugowski:
open FILE, files[0] or die;

If your Perl is 5.8+, use a Perl scalar filehandle.

$ perl -v

This is perl, v5.6.1 built for sun4-solaris

$ perl -e'open my $fh, "<", "tmp.txt" or die $1; print while <$fh>'
Line 1
Line 2

Oops. Now I think it's >= 5.6.0. I reread `perldoc -f open` and found
out that yesterday I misread several things on Perl versions there.
 
W

Witold Rugowski

John said:
Care to share?

I've changed regexp which was selecting data from logs (I've added new () group). Later, in code I used wrong variable from selecting ($2 in place of $3), so this was producing huge amount of data... That's all...
 
A

A. Sinan Unur

I've changed regexp which was selecting data from logs (I've added new
() group). Later, in code I used wrong variable from selecting ($2 in
place of $3), so this was producing huge amount of data... That's
all...

By not posting a short but complete script or any real code, you have
shown that you do not appreciate the value of others' time thereby
significantly reducing or eliminating the chance to get responses to any
future posts.

Sinan
 
W

Witold Rugowski

A. Sinan Unur said:
By not posting a short but complete script or any real code, you have
shown that you do not appreciate the value of others' time thereby
significantly reducing or eliminating the chance to get responses to any

That is not true (about not appreciating time). Error was caused by logical merit of variables $1, $2, etc from regexp. So, when I send my first post I was desperate :)), tired, and without access to code. At moment when I got access to code I found a bug and I think that it would be hard to spot it without knowing intentions of what code was should do.

So, buggy was following part:

if ( $_ =~ /([\w\d\-\_\.]*) (\w\w\w \d\d \d\d\d\d \d\d:\d\d:\d\d) [\w\d\-\_\.: ]* %PIX-(\d)-(\d*): (.*)/ )
{
[a lot of good code]

$logID = "PIX-$2-$3"; #reassembling PIX syslog log ID
if (!defined ($MSG{$logID}) ) {
print "New comm - $logID\n" if ($opts{v});
chomp;
$MSG{$logID} = $_;
}

So using wrong $2 and $3 (instead $4 and $5) caused MSG hash to grow...

Checking out previous version and removing changes one by one helped to find real cause...

At the end question is if Dump::Dumper can dump ALL variables in scope without enumerating them as an arguments? Because I used it to dump all variables pointing them directly, I was sure that any of my variables is responsible for memory consumption. I was wrong since I have omitted %MSG.
 
G

Glenn Jackman

At 2006-02-02 02:27PM said:
if ( $_ =~ /([\w\d\-\_\.]*) (\w\w\w \d\d \d\d\d\d \d\d:\d\d:\d\d) [\w\d\-\_\.: ]* %PIX-(\d)-(\d*): (.*)/ )

That can be simplified to:
my $date_re = qr(\w{3} \d{2} \d{4} \d{2}:\d{2}:\d{2});
if (/([\w.-]*) ($date_re) [\w.: -]* %PIX-(\d)-(\d*): (.*)/) {
... use $1, $2, $3, $4, $5 ...
}

- $_ is the default target of the match operator
- the \w class includes underscore and digits.
- dot is not special in a character class.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,048
Members
48,769
Latest member
Clifft

Latest Threads

Top