counting matched lines in extremely large files.

M

mikester

First off I'll say - I am a bad perl programmer.

I want to be better and with your help I'll get there and then be able
to contribute more here.

That being said, I have a simple problem compounded by file size.

I have a PIX that logs to my syslog server for a ton of items - my
logs sizes get extremely large; ~13 GIGABYTEs daily and they are
rotated daily.

I'm trying to set up some intrusion detection but with file sizes that
big just counting incidents to start getting a baseline gets time, cpu
and memory intensive using shell commands like grep. So I wanted to do
something in perl but I don't know if because of the file size and
memory limitations I can do that.

Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?

More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...

Anyway - your help is appreciated.

The Mikester
 
M

mikester

First off I'll say - I am a bad perl programmer.

I want to be better and with your help I'll get there and then be able
to contribute more here.

That being said, I have a simple problem compounded by file size.

I have a PIX that logs to my syslog server for a ton of items - my
logs sizes get extremely large; ~13 GIGABYTEs daily and they are
rotated daily.

I'm trying to set up some intrusion detection but with file sizes that
big just counting incidents to start getting a baseline gets time, cpu
and memory intensive using shell commands like grep. So I wanted to do
something in perl but I don't know if because of the file size and
memory limitations I can do that.

Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?

More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...

Anyway - your help is appreciated.

The Mikester


Sorry, typo it is actually
#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `grep -c $VARIABLE $LOG`; <----
print "$GREP\n";

Thanks
 
J

Jim Gibson

[snip]
Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?

Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";

More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...

Scanning one line at a time is better. You can make the regular
expression (/$ARGV[0]/ above) as complicated as you want it.
 
M

mikester

Jim Gibson said:
[snip]
Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?

Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";

More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...

Scanning one line at a time is better. You can make the regular
expression (/$ARGV[0]/ above) as complicated as you want it.
Anyway - your help is appreciated.

The Mikester


I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.
 
M

mikester

Jim Gibson said:
[snip]
Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?

Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";

More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...

Scanning one line at a time is better. You can make the regular
expression (/$ARGV[0]/ above) as complicated as you want it.
Anyway - your help is appreciated.

The Mikester


I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.
 
M

mikester

Jim Gibson said:
[snip]
Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.

#!/usr/bin/perl
$LOG = "$ARGV[1]";
$VARIABLE = "$ARGV[0]";
$GREP = `zgrep -c $VARIABLE $LOG`;
print "$GREP\n";

I print out the number and another program uses that output to put the
number into a database.

How would I accompilish this simply in perl?

Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";

More complicated would be to match multiple variables against the same
log at one time. I would just pull the log into memory if it were a
manageable size but it is not...

Scanning one line at a time is better. You can make the regular
expression (/$ARGV[0]/ above) as complicated as you want it.
Anyway - your help is appreciated.

The Mikester


I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.


It works great - but not with the large files. The files are in the
13GB files size and I just don't have the memory to load that up.
 
J

Jim Gibson

mikester said:
(e-mail address removed) (mikester) wrote in message
Jim Gibson said:
[snip]


Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.
[snip]
Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";
It works great - but not with the large files. The files are in the
13GB files size and I just don't have the memory to load that up.

It shouldn't take much more memory to run that program on a 13GB file
than it does no a small one. The program only reads in one line at a
time. What doesn't "work great" with the large file? What happens?
 
M

mikester

Jim Gibson said:
mikester said:
(e-mail address removed) (mikester) wrote in message
[snip]


Here's the shell command based perl script I run to get a basic count
on a certain number of incidents.
[snip]
Here is a simple perl program that will do that:

#!/usr/bin/perl

use strict;
use warnings;

my $log = $ARGV[1];
my $count = 0;

open(LOG,$log) or die("Can't open $log: $!");
while(<LOG>) {
$count++ if /$ARGV[0]/;
}
print "count of '$ARGV[0]' in $log is $count\n";
It works great - but not with the large files. The files are in the
13GB files size and I just don't have the memory to load that up.

It shouldn't take much more memory to run that program on a 13GB file
than it does no a small one. The program only reads in one line at a
time. What doesn't "work great" with the large file? What happens?

I'll post the output after the holiday.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top