counting matched lines in extremely large files.

Discussion in 'Perl' started by mikester, Dec 18, 2003.

  1. mikester

    mikester Guest

    First off I'll say - I am a bad perl programmer.

    I want to be better and with your help I'll get there and then be able
    to contribute more here.

    That being said, I have a simple problem compounded by file size.

    I have a PIX that logs to my syslog server for a ton of items - my
    logs sizes get extremely large; ~13 GIGABYTEs daily and they are
    rotated daily.

    I'm trying to set up some intrusion detection but with file sizes that
    big just counting incidents to start getting a baseline gets time, cpu
    and memory intensive using shell commands like grep. So I wanted to do
    something in perl but I don't know if because of the file size and
    memory limitations I can do that.

    Here's the shell command based perl script I run to get a basic count
    on a certain number of incidents.

    #!/usr/bin/perl
    $LOG = "$ARGV[1]";
    $VARIABLE = "$ARGV[0]";
    $GREP = `zgrep -c $VARIABLE $LOG`;
    print "$GREP\n";

    I print out the number and another program uses that output to put the
    number into a database.

    How would I accompilish this simply in perl?

    More complicated would be to match multiple variables against the same
    log at one time. I would just pull the log into memory if it were a
    manageable size but it is not...

    Anyway - your help is appreciated.

    The Mikester
     
    mikester, Dec 18, 2003
    #1
    1. Advertising

  2. mikester

    mikester Guest

    (mikester) wrote in message news:<>...
    > First off I'll say - I am a bad perl programmer.
    >
    > I want to be better and with your help I'll get there and then be able
    > to contribute more here.
    >
    > That being said, I have a simple problem compounded by file size.
    >
    > I have a PIX that logs to my syslog server for a ton of items - my
    > logs sizes get extremely large; ~13 GIGABYTEs daily and they are
    > rotated daily.
    >
    > I'm trying to set up some intrusion detection but with file sizes that
    > big just counting incidents to start getting a baseline gets time, cpu
    > and memory intensive using shell commands like grep. So I wanted to do
    > something in perl but I don't know if because of the file size and
    > memory limitations I can do that.
    >
    > Here's the shell command based perl script I run to get a basic count
    > on a certain number of incidents.
    >
    > #!/usr/bin/perl
    > $LOG = "$ARGV[1]";
    > $VARIABLE = "$ARGV[0]";
    > $GREP = `zgrep -c $VARIABLE $LOG`;
    > print "$GREP\n";
    >
    > I print out the number and another program uses that output to put the
    > number into a database.
    >
    > How would I accompilish this simply in perl?
    >
    > More complicated would be to match multiple variables against the same
    > log at one time. I would just pull the log into memory if it were a
    > manageable size but it is not...
    >
    > Anyway - your help is appreciated.
    >
    > The Mikester



    Sorry, typo it is actually

    > #!/usr/bin/perl
    > $LOG = "$ARGV[1]";
    > $VARIABLE = "$ARGV[0]";
    > $GREP = `grep -c $VARIABLE $LOG`; <----
    > print "$GREP\n";


    Thanks
     
    mikester, Dec 18, 2003
    #2
    1. Advertising

  3. mikester

    Jim Gibson Guest

    In article <>, mikester
    <> wrote:

    [snip]

    >
    > Here's the shell command based perl script I run to get a basic count
    > on a certain number of incidents.
    >
    > #!/usr/bin/perl
    > $LOG = "$ARGV[1]";
    > $VARIABLE = "$ARGV[0]";
    > $GREP = `zgrep -c $VARIABLE $LOG`;
    > print "$GREP\n";
    >
    > I print out the number and another program uses that output to put the
    > number into a database.
    >
    > How would I accompilish this simply in perl?


    Here is a simple perl program that will do that:

    #!/usr/bin/perl

    use strict;
    use warnings;

    my $log = $ARGV[1];
    my $count = 0;

    open(LOG,$log) or die("Can't open $log: $!");
    while(<LOG>) {
    $count++ if /$ARGV[0]/;
    }
    print "count of '$ARGV[0]' in $log is $count\n";


    >
    > More complicated would be to match multiple variables against the same
    > log at one time. I would just pull the log into memory if it were a
    > manageable size but it is not...


    Scanning one line at a time is better. You can make the regular
    expression (/$ARGV[0]/ above) as complicated as you want it.

    >
    > Anyway - your help is appreciated.
    >
    > The Mikester
     
    Jim Gibson, Dec 19, 2003
    #3
  4. mikester

    mikester Guest

    Jim Gibson <> wrote in message news:<191220031038058768%>...
    > In article <>, mikester
    > <> wrote:
    >
    > [snip]
    >
    > >
    > > Here's the shell command based perl script I run to get a basic count
    > > on a certain number of incidents.
    > >
    > > #!/usr/bin/perl
    > > $LOG = "$ARGV[1]";
    > > $VARIABLE = "$ARGV[0]";
    > > $GREP = `zgrep -c $VARIABLE $LOG`;
    > > print "$GREP\n";
    > >
    > > I print out the number and another program uses that output to put the
    > > number into a database.
    > >
    > > How would I accompilish this simply in perl?

    >
    > Here is a simple perl program that will do that:
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > my $log = $ARGV[1];
    > my $count = 0;
    >
    > open(LOG,$log) or die("Can't open $log: $!");
    > while(<LOG>) {
    > $count++ if /$ARGV[0]/;
    > }
    > print "count of '$ARGV[0]' in $log is $count\n";
    >
    >
    > >
    > > More complicated would be to match multiple variables against the same
    > > log at one time. I would just pull the log into memory if it were a
    > > manageable size but it is not...

    >
    > Scanning one line at a time is better. You can make the regular
    > expression (/$ARGV[0]/ above) as complicated as you want it.
    >
    > >
    > > Anyway - your help is appreciated.
    > >
    > > The Mikester



    I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.
     
    mikester, Dec 22, 2003
    #4
  5. mikester

    mikester Guest

    Jim Gibson <> wrote in message news:<191220031038058768%>...
    > In article <>, mikester
    > <> wrote:
    >
    > [snip]
    >
    > >
    > > Here's the shell command based perl script I run to get a basic count
    > > on a certain number of incidents.
    > >
    > > #!/usr/bin/perl
    > > $LOG = "$ARGV[1]";
    > > $VARIABLE = "$ARGV[0]";
    > > $GREP = `zgrep -c $VARIABLE $LOG`;
    > > print "$GREP\n";
    > >
    > > I print out the number and another program uses that output to put the
    > > number into a database.
    > >
    > > How would I accompilish this simply in perl?

    >
    > Here is a simple perl program that will do that:
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > my $log = $ARGV[1];
    > my $count = 0;
    >
    > open(LOG,$log) or die("Can't open $log: $!");
    > while(<LOG>) {
    > $count++ if /$ARGV[0]/;
    > }
    > print "count of '$ARGV[0]' in $log is $count\n";
    >
    >
    > >
    > > More complicated would be to match multiple variables against the same
    > > log at one time. I would just pull the log into memory if it were a
    > > manageable size but it is not...

    >
    > Scanning one line at a time is better. You can make the regular
    > expression (/$ARGV[0]/ above) as complicated as you want it.
    >
    > >
    > > Anyway - your help is appreciated.
    > >
    > > The Mikester



    I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.
     
    mikester, Dec 22, 2003
    #5
  6. mikester

    mikester Guest

    (mikester) wrote in message news:<>...
    > Jim Gibson <> wrote in message news:<191220031038058768%>...
    > > In article <>, mikester
    > > <> wrote:
    > >
    > > [snip]
    > >
    > > >
    > > > Here's the shell command based perl script I run to get a basic count
    > > > on a certain number of incidents.
    > > >
    > > > #!/usr/bin/perl
    > > > $LOG = "$ARGV[1]";
    > > > $VARIABLE = "$ARGV[0]";
    > > > $GREP = `zgrep -c $VARIABLE $LOG`;
    > > > print "$GREP\n";
    > > >
    > > > I print out the number and another program uses that output to put the
    > > > number into a database.
    > > >
    > > > How would I accompilish this simply in perl?

    > >
    > > Here is a simple perl program that will do that:
    > >
    > > #!/usr/bin/perl
    > >
    > > use strict;
    > > use warnings;
    > >
    > > my $log = $ARGV[1];
    > > my $count = 0;
    > >
    > > open(LOG,$log) or die("Can't open $log: $!");
    > > while(<LOG>) {
    > > $count++ if /$ARGV[0]/;
    > > }
    > > print "count of '$ARGV[0]' in $log is $count\n";
    > >
    > >
    > > >
    > > > More complicated would be to match multiple variables against the same
    > > > log at one time. I would just pull the log into memory if it were a
    > > > manageable size but it is not...

    > >
    > > Scanning one line at a time is better. You can make the regular
    > > expression (/$ARGV[0]/ above) as complicated as you want it.
    > >
    > > >
    > > > Anyway - your help is appreciated.
    > > >
    > > > The Mikester

    >
    >
    > I'll give that a shot tomorrow, Thanks - I'll let you know how it goes.



    It works great - but not with the large files. The files are in the
    13GB files size and I just don't have the memory to load that up.
     
    mikester, Dec 23, 2003
    #6
  7. mikester

    Jim Gibson Guest

    In article <>, mikester
    <> wrote:

    > (mikester) wrote in message
    > news:<>...
    > > Jim Gibson <> wrote in message
    > > news:<191220031038058768%>...
    > > > In article <>, mikester
    > > > <> wrote:
    > > >
    > > > [snip]
    > > >
    > > > >
    > > > > Here's the shell command based perl script I run to get a basic count
    > > > > on a certain number of incidents.
    > > > >


    [snip]

    > > > Here is a simple perl program that will do that:
    > > >
    > > > #!/usr/bin/perl
    > > >
    > > > use strict;
    > > > use warnings;
    > > >
    > > > my $log = $ARGV[1];
    > > > my $count = 0;
    > > >
    > > > open(LOG,$log) or die("Can't open $log: $!");
    > > > while(<LOG>) {
    > > > $count++ if /$ARGV[0]/;
    > > > }
    > > > print "count of '$ARGV[0]' in $log is $count\n";
    > > >


    >
    > It works great - but not with the large files. The files are in the
    > 13GB files size and I just don't have the memory to load that up.


    It shouldn't take much more memory to run that program on a 13GB file
    than it does no a small one. The program only reads in one line at a
    time. What doesn't "work great" with the large file? What happens?
     
    Jim Gibson, Dec 23, 2003
    #7
  8. mikester

    mikester Guest

    Jim Gibson <> wrote in message news:<231220031527288072%>...
    > In article <>, mikester
    > <> wrote:
    >
    > > (mikester) wrote in message
    > > news:<>...
    > > > Jim Gibson <> wrote in message
    > > > news:<191220031038058768%>...
    > > > > In article <>, mikester
    > > > > <> wrote:
    > > > >
    > > > > [snip]
    > > > >
    > > > > >
    > > > > > Here's the shell command based perl script I run to get a basic count
    > > > > > on a certain number of incidents.
    > > > > >

    >
    > [snip]
    >
    > > > > Here is a simple perl program that will do that:
    > > > >
    > > > > #!/usr/bin/perl
    > > > >
    > > > > use strict;
    > > > > use warnings;
    > > > >
    > > > > my $log = $ARGV[1];
    > > > > my $count = 0;
    > > > >
    > > > > open(LOG,$log) or die("Can't open $log: $!");
    > > > > while(<LOG>) {
    > > > > $count++ if /$ARGV[0]/;
    > > > > }
    > > > > print "count of '$ARGV[0]' in $log is $count\n";
    > > > >

    >
    > >
    > > It works great - but not with the large files. The files are in the
    > > 13GB files size and I just don't have the memory to load that up.

    >
    > It shouldn't take much more memory to run that program on a 13GB file
    > than it does no a small one. The program only reads in one line at a
    > time. What doesn't "work great" with the large file? What happens?


    I'll post the output after the holiday.
     
    mikester, Dec 25, 2003
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    1,305
    Scott Ellsworth
    Aug 11, 2005
  2. Robert Kochem
    Replies:
    1
    Views:
    374
    Arne Vajhøj
    Apr 29, 2010
  3. Mike
    Replies:
    15
    Views:
    2,011
    Roedy Green
    Feb 9, 2012
  4. shree
    Replies:
    5
    Views:
    150
    shree
    Dec 29, 2007
  5. Replies:
    2
    Views:
    111
    John W. Krahn
    Apr 16, 2008
Loading...

Share This Page