matching a pattern with a space or no space??

Discussion in 'Perl Misc' started by erik, Nov 9, 2005.

  1. erik

    erik Guest

    I am trying to chop up some netscreen firewall logs where I just want
    certain fields. In perl, I am doing a "cut" and picking the fields that
    I want. The problem is, silly netscreens insert spaces in thier service
    name at will. For example it might have:

    start_time="2005-11-08 service=https proto=6 src src_port=3873
    dst_port=443 src-xlated ip=x.x.x.x
    (notice there is no space in the service name, it is just https)

    start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
    src_port=123 dst_port=123 src-xlated
    (notice the space between Network and Time.)

    If my cut is space deliminated, the space in the service name throws me
    off by 1 field of course. How can I regex a data flow that is always
    changing? I am stuck...

    Now I can do a "find and replace" for ALL the possible space
    deliminated service names, but that has a high Level of Effort. Any
    ideas?
    erik, Nov 9, 2005
    #1
    1. Advertising

  2. erik

    Degz Guest

    Does the "Proto" string always come after the service. ?

    If so you can find the start posistion of "service" and the start
    posistion of "proto" then take the substring ?

    ie my $pos1 = index($instring, "service")
    my $pos2 = index($instring, "proto")
    my $servicename = substr($instring, $pos1,$pos2-1)

    Degz
    Degz, Nov 9, 2005
    #2
    1. Advertising

  3. erik

    Degz Guest

    Does the "Proto" string always come after the service. ?

    If so you can find the start posistion of "service" and the start
    posistion of "proto" then take the substring ?

    ie my $pos1 = index($instring, "service")
    my $pos2 = index($instring, "proto")
    my $servicename = substr($instring, $pos1,$pos2-1)

    Degz
    Degz, Nov 9, 2005
    #3
  4. On 9 Nov 2005 07:45:55 -0800, erik <> wrote:
    > I am trying to chop up some netscreen firewall logs where I just want
    > certain fields. In perl, I am doing a "cut" and picking the fields that
    > I want. The problem is, silly netscreens insert spaces in thier service
    > name at will. For example it might have:
    >
    > start_time="2005-11-08 service=https proto=6 src src_port=3873
    > dst_port=443 src-xlated ip=x.x.x.x
    > (notice there is no space in the service name, it is just https)
    >
    > start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
    > src_port=123 dst_port=123 src-xlated
    > (notice the space between Network and Time.)


    As you discovered, simply splitting on whitespace doesn't work. There
    might be ways to make it work, but it's better to match the parts you
    want explicitly and use non-capturing groups for the rest. For an
    example on a cut-down version of your data:

    #!/usr/bin/perl

    use strict;
    use warnings;

    use Regexp::Common;

    while ( <DATA> ) {
    chomp;
    my ($service, $proto) = /service=(.*)\s+proto=(.*)/;
    print "$service: $proto\n";
    }

    __DATA__
    service=https proto=6
    service=Network Time proto=17


    Mike

    --
    Michael Zawrotny
    Institute of Molecular Biophysics
    Florida State University | email:
    Tallahassee, FL 32306-4380 | phone: (850) 644-0069
    Michael Zawrotny, Nov 9, 2005
    #4
  5. erik wrote:
    > I am trying to chop up some netscreen firewall logs where I just want
    > certain fields. In perl, I am doing a "cut" and picking the fields that
    > I want. The problem is, silly netscreens insert spaces in thier service
    > name at will. For example it might have:
    >
    > start_time="2005-11-08 service=https proto=6 src src_port=3873
    > dst_port=443 src-xlated ip=x.x.x.x
    > (notice there is no space in the service name, it is just https)
    >
    > start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
    > src_port=123 dst_port=123 src-xlated
    > (notice the space between Network and Time.)
    >
    > If my cut is space deliminated, the space in the service name throws me
    > off by 1 field of course. How can I regex a data flow that is always
    > changing? I am stuck...
    >
    > Now I can do a "find and replace" for ALL the possible space
    > deliminated service names, but that has a high Level of Effort. Any
    > ideas?


    i'm going to guess that the 'name' in the name/value pair cannot
    contain spaces. i'm also going to guess that all name/value pairs are
    delimited by spaces. if this is true, you can match on this pattern:

    /(\w+)=([\w|\s]+)\s\w+=/

    ....i haven't tested this, just using it to convey the concept. you can
    also probably do positive look-ahead, but i'm not too familiar with
    that.
    it_says_BALLS_on_your forehead, Nov 9, 2005
    #5
  6. it_says_BALLS_on_your forehead wrote:
    > erik wrote:
    > > I am trying to chop up some netscreen firewall logs where I just want
    > > certain fields. In perl, I am doing a "cut" and picking the fields that
    > > I want. The problem is, silly netscreens insert spaces in thier service
    > > name at will. For example it might have:
    > >
    > > start_time="2005-11-08 service=https proto=6 src src_port=3873
    > > dst_port=443 src-xlated ip=x.x.x.x
    > > (notice there is no space in the service name, it is just https)
    > >
    > > start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
    > > src_port=123 dst_port=123 src-xlated
    > > (notice the space between Network and Time.)
    > >
    > > If my cut is space deliminated, the space in the service name throws me
    > > off by 1 field of course. How can I regex a data flow that is always
    > > changing? I am stuck...
    > >
    > > Now I can do a "find and replace" for ALL the possible space
    > > deliminated service names, but that has a high Level of Effort. Any
    > > ideas?

    >
    > i'm going to guess that the 'name' in the name/value pair cannot
    > contain spaces. i'm also going to guess that all name/value pairs are
    > delimited by spaces. if this is true, you can match on this pattern:
    >
    > /(\w+)=([\w|\s]+)\s\w+=/
    >
    > ...i haven't tested this, just using it to convey the concept. you can
    > also probably do positive look-ahead, but i'm not too familiar with
    > that.


    ok, tested my pattern:

    #!/apps/webstats/bin/perl

    use strict;

    my $string1 = "service=https proto=6";
    my $string2 = "service=Network Time proto=17";

    my ($name1, $value1) = $string1 =~ m/(\w+)=([\w|\s]+)\s\w+=/;
    my ($name2, $value2) = $string2 =~ m/(\w+)=([\w|\s]+)\s\w+=/;

    print "1: -$name1- = -$value1-\n";
    print "2: -$name2- = -$value2-\n";

    #---OUTPUT
    1: -service- = -https-
    2: -service- = -Network Time-


    ....again, positive lookahead may be more efficient.
    it_says_BALLS_on_your forehead, Nov 9, 2005
    #6
  7. Purl Gurl wrote:
    > Purl Gurl wrote:
    >
    > (snipped)
    >
    > > > start_time="2005-11-08 service=https proto=6 src src_port=3873
    > > > dst_port=443 src-xlated ip=x.x.x.x

    >
    > > > start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
    > > > src_port=123 dst_port=123 src-xlated

    >
    > > There exists discrepancies between your two log formats suggesting
    > > your examples are fabricated.

    >
    > I forgot to add, there is a glaring error in both of your examples
    > which directly indicates your examples are fabricated.
    >
    > Purl Gurl


    are you referring to the double quotes?

    ....anyway, in the interest of productive conversation, here is the code
    with positive lookaheads:

    #!/apps/webstats/bin/perl

    use strict; use warnings;

    my $string1 = "service=https proto=6";
    my $string2 = "service=Network Time proto=17";

    my ($name1, $value1) = $string1 =~ m/(\w+)=([\w|\s]+)(?=\s\w+=)/;
    my ($name2, $value2) = $string2 =~ m/(\w+)=([\w|\s]+)(?=\s\w+=)/;

    print "1: -$name1- = -$value1-\n";
    print "2: -$name2- = -$value2-\n";

    #--OUTPUT
    1: -service- = -https-
    2: -service- = -Network Time-
    it_says_BALLS_on_your forehead, Nov 9, 2005
    #7
  8. erik

    erik Guest

    Thanks everyone!! I completely forgot about substringing. That'll do it.
    erik, Nov 9, 2005
    #8
  9. erik

    Samwyse Guest

    Purl Gurl wrote:
    > simon.chao wrote:
    >
    >>Purl Gurl wrote:
    >>
    >>>Purl Gurl wrote:

    >
    >
    > (snipped)
    >
    >
    >>are you referring to the double quotes?

    >
    >
    > There is no such critter "double quotes."
    >
    > My presumption is you meant to write,
    >
    > "...the single quote mark?"


    Or perhaps "... the single double quote?"
    Samwyse, Nov 10, 2005
    #9
  10. erik

    Samwyse Guest

    Purl Gurl wrote:
    > I suppose you could write a couple dozen snippets to index, return
    > true or false, then select an appropriate substring function. Strikes
    > me substring would be the most difficult method to use.


    No, 'unpack' would be the most difficult.
    Samwyse, Nov 10, 2005
    #10
  11. erik

    Anno Siegel Guest

    erik <> wrote in comp.lang.perl.misc:
    > I am trying to chop up some netscreen firewall logs where I just want
    > certain fields. In perl, I am doing a "cut" and picking the fields that


    What "cut"? Perl doesn't have that function.

    > I want. The problem is, silly netscreens insert spaces in thier service
    > name at will. For example it might have:


    It's not silly netscreen that inserts the space, the space *is* part
    of the service name. Netscreen would be broken if it didn't show it.

    > start_time="2005-11-08 service=https proto=6 src src_port=3873
    > dst_port=443 src-xlated ip=x.x.x.x
    > (notice there is no space in the service name, it is just https)
    >
    > start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
    > src_port=123 dst_port=123 src-xlated
    > (notice the space between Network and Time.)
    >
    > If my cut is space deliminated, the space in the service name throws me
    > off by 1 field of course. How can I regex a data flow that is always
    > changing? I am stuck...


    To regex a data flow? Oh dear...

    You could parse it as what it is -- a series of assignments of values
    to names. What follows shows a regex that does that. It isn't
    thoroughly tested, but it works with your examples and simple variations
    thereof.

    A name is an identifier like in Perl. Assignment is an equals sign "="
    surrounded by optional white space. A value can be any string, including
    blanks, but leading and trailing blanks are stripped.

    A single regex that does all this would be rather long and hard to read.
    Also, it turns out, we want to do a lookahead after each value for the
    next name and assignment. So it is useful, as with all complicated regexes,
    to build it stepwise.

    # Describe an identifier
    my $name_re = qr/
    [[:alpha:][:digit:]]
    \w*
    /xsm; # ...or was that /xms? :)

    # a "=" surrounded by optional white space
    my $equals_re = qr/\s*=\s*/;

    # the possible values
    my $value_re = qr/
    .*? # the value proper can be anything,
    (?= # up to, and excluding...
    \s* # trailing white space
    (?:$name_re$equals_re) # plus another name followed by "="
    |
    \Z # ...unless we're at the end of the string
    )
    /xsm;

    Note how $name_re and $equals_re are used to delimit what $value_re
    matches. To be used like this:

    my $l2 = 'start_time=2005-11-08 service = Network Time ' .
    'proto=17 dst=x.x.x.x src_port=123 dst_port=123 src-xlated';

    while ( $l2 =~ /($name_re)$equals_re($value_re)/g ) {
    print "$1 => '$2'\n";
    }

    That prints:

    start_time => '2005-11-08'
    service => 'Network Time'
    proto => '17'
    dst => 'x.x.x.x'
    src_port => '123'
    dst_port => '123 src-xlated'


    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
    Anno Siegel, Nov 11, 2005
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shuo Xiang

    Stack space, global space, heap space

    Shuo Xiang, Jul 9, 2003, in forum: C Programming
    Replies:
    10
    Views:
    2,872
    Bryan Bullard
    Jul 11, 2003
  2. Christian Seberino
    Replies:
    21
    Views:
    1,632
    Stephen Horne
    Oct 27, 2003
  3. Ian Bicking
    Replies:
    2
    Views:
    988
    Steve Lamb
    Oct 23, 2003
  4. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    221
    Marc Bissonnette
    Jan 13, 2004
  5. Bobby Chamness
    Replies:
    2
    Views:
    216
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page