matching a pattern with a space or no space??

E

erik

I am trying to chop up some netscreen firewall logs where I just want
certain fields. In perl, I am doing a "cut" and picking the fields that
I want. The problem is, silly netscreens insert spaces in thier service
name at will. For example it might have:

start_time="2005-11-08 service=https proto=6 src src_port=3873
dst_port=443 src-xlated ip=x.x.x.x
(notice there is no space in the service name, it is just https)

start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
src_port=123 dst_port=123 src-xlated
(notice the space between Network and Time.)

If my cut is space deliminated, the space in the service name throws me
off by 1 field of course. How can I regex a data flow that is always
changing? I am stuck...

Now I can do a "find and replace" for ALL the possible space
deliminated service names, but that has a high Level of Effort. Any
ideas?
 
D

Degz

Does the "Proto" string always come after the service. ?

If so you can find the start posistion of "service" and the start
posistion of "proto" then take the substring ?

ie my $pos1 = index($instring, "service")
my $pos2 = index($instring, "proto")
my $servicename = substr($instring, $pos1,$pos2-1)

Degz
 
D

Degz

Does the "Proto" string always come after the service. ?

If so you can find the start posistion of "service" and the start
posistion of "proto" then take the substring ?

ie my $pos1 = index($instring, "service")
my $pos2 = index($instring, "proto")
my $servicename = substr($instring, $pos1,$pos2-1)

Degz
 
M

Michael Zawrotny

I am trying to chop up some netscreen firewall logs where I just want
certain fields. In perl, I am doing a "cut" and picking the fields that
I want. The problem is, silly netscreens insert spaces in thier service
name at will. For example it might have:

start_time="2005-11-08 service=https proto=6 src src_port=3873
dst_port=443 src-xlated ip=x.x.x.x
(notice there is no space in the service name, it is just https)

start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
src_port=123 dst_port=123 src-xlated
(notice the space between Network and Time.)

As you discovered, simply splitting on whitespace doesn't work. There
might be ways to make it work, but it's better to match the parts you
want explicitly and use non-capturing groups for the rest. For an
example on a cut-down version of your data:

#!/usr/bin/perl

use strict;
use warnings;

use Regexp::Common;

while ( <DATA> ) {
chomp;
my ($service, $proto) = /service=(.*)\s+proto=(.*)/;
print "$service: $proto\n";
}

__DATA__
service=https proto=6
service=Network Time proto=17


Mike
 
I

it_says_BALLS_on_your forehead

erik said:
I am trying to chop up some netscreen firewall logs where I just want
certain fields. In perl, I am doing a "cut" and picking the fields that
I want. The problem is, silly netscreens insert spaces in thier service
name at will. For example it might have:

start_time="2005-11-08 service=https proto=6 src src_port=3873
dst_port=443 src-xlated ip=x.x.x.x
(notice there is no space in the service name, it is just https)

start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
src_port=123 dst_port=123 src-xlated
(notice the space between Network and Time.)

If my cut is space deliminated, the space in the service name throws me
off by 1 field of course. How can I regex a data flow that is always
changing? I am stuck...

Now I can do a "find and replace" for ALL the possible space
deliminated service names, but that has a high Level of Effort. Any
ideas?

i'm going to guess that the 'name' in the name/value pair cannot
contain spaces. i'm also going to guess that all name/value pairs are
delimited by spaces. if this is true, you can match on this pattern:

/(\w+)=([\w|\s]+)\s\w+=/

....i haven't tested this, just using it to convey the concept. you can
also probably do positive look-ahead, but i'm not too familiar with
that.
 
I

it_says_BALLS_on_your forehead

it_says_BALLS_on_your forehead said:
erik said:
I am trying to chop up some netscreen firewall logs where I just want
certain fields. In perl, I am doing a "cut" and picking the fields that
I want. The problem is, silly netscreens insert spaces in thier service
name at will. For example it might have:

start_time="2005-11-08 service=https proto=6 src src_port=3873
dst_port=443 src-xlated ip=x.x.x.x
(notice there is no space in the service name, it is just https)

start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
src_port=123 dst_port=123 src-xlated
(notice the space between Network and Time.)

If my cut is space deliminated, the space in the service name throws me
off by 1 field of course. How can I regex a data flow that is always
changing? I am stuck...

Now I can do a "find and replace" for ALL the possible space
deliminated service names, but that has a high Level of Effort. Any
ideas?

i'm going to guess that the 'name' in the name/value pair cannot
contain spaces. i'm also going to guess that all name/value pairs are
delimited by spaces. if this is true, you can match on this pattern:

/(\w+)=([\w|\s]+)\s\w+=/

...i haven't tested this, just using it to convey the concept. you can
also probably do positive look-ahead, but i'm not too familiar with
that.

ok, tested my pattern:

#!/apps/webstats/bin/perl

use strict;

my $string1 = "service=https proto=6";
my $string2 = "service=Network Time proto=17";

my ($name1, $value1) = $string1 =~ m/(\w+)=([\w|\s]+)\s\w+=/;
my ($name2, $value2) = $string2 =~ m/(\w+)=([\w|\s]+)\s\w+=/;

print "1: -$name1- = -$value1-\n";
print "2: -$name2- = -$value2-\n";

#---OUTPUT
1: -service- = -https-
2: -service- = -Network Time-


....again, positive lookahead may be more efficient.
 
I

it_says_BALLS_on_your forehead

Purl said:
Purl Gurl wrote:

(snipped)



I forgot to add, there is a glaring error in both of your examples
which directly indicates your examples are fabricated.

Purl Gurl

are you referring to the double quotes?

....anyway, in the interest of productive conversation, here is the code
with positive lookaheads:

#!/apps/webstats/bin/perl

use strict; use warnings;

my $string1 = "service=https proto=6";
my $string2 = "service=Network Time proto=17";

my ($name1, $value1) = $string1 =~ m/(\w+)=([\w|\s]+)(?=\s\w+=)/;
my ($name2, $value2) = $string2 =~ m/(\w+)=([\w|\s]+)(?=\s\w+=)/;

print "1: -$name1- = -$value1-\n";
print "2: -$name2- = -$value2-\n";

#--OUTPUT
1: -service- = -https-
2: -service- = -Network Time-
 
S

Samwyse

Purl said:
There is no such critter "double quotes."

My presumption is you meant to write,

"...the single quote mark?"

Or perhaps "... the single double quote?"
 
S

Samwyse

Purl said:
I suppose you could write a couple dozen snippets to index, return
true or false, then select an appropriate substring function. Strikes
me substring would be the most difficult method to use.

No, 'unpack' would be the most difficult.
 
A

Anno Siegel

erik said:
I am trying to chop up some netscreen firewall logs where I just want
certain fields. In perl, I am doing a "cut" and picking the fields that

What "cut"? Perl doesn't have that function.
I want. The problem is, silly netscreens insert spaces in thier service
name at will. For example it might have:

It's not silly netscreen that inserts the space, the space *is* part
of the service name. Netscreen would be broken if it didn't show it.
start_time="2005-11-08 service=https proto=6 src src_port=3873
dst_port=443 src-xlated ip=x.x.x.x
(notice there is no space in the service name, it is just https)

start_time="2005-11-08 service=Network Time proto=17 dst=x.x.x.x
src_port=123 dst_port=123 src-xlated
(notice the space between Network and Time.)

If my cut is space deliminated, the space in the service name throws me
off by 1 field of course. How can I regex a data flow that is always
changing? I am stuck...

To regex a data flow? Oh dear...

You could parse it as what it is -- a series of assignments of values
to names. What follows shows a regex that does that. It isn't
thoroughly tested, but it works with your examples and simple variations
thereof.

A name is an identifier like in Perl. Assignment is an equals sign "="
surrounded by optional white space. A value can be any string, including
blanks, but leading and trailing blanks are stripped.

A single regex that does all this would be rather long and hard to read.
Also, it turns out, we want to do a lookahead after each value for the
next name and assignment. So it is useful, as with all complicated regexes,
to build it stepwise.

# Describe an identifier
my $name_re = qr/
[[:alpha:][:digit:]]
\w*
/xsm; # ...or was that /xms? :)

# a "=" surrounded by optional white space
my $equals_re = qr/\s*=\s*/;

# the possible values
my $value_re = qr/
.*? # the value proper can be anything,
(?= # up to, and excluding...
\s* # trailing white space
(?:$name_re$equals_re) # plus another name followed by "="
|
\Z # ...unless we're at the end of the string
)
/xsm;

Note how $name_re and $equals_re are used to delimit what $value_re
matches. To be used like this:

my $l2 = 'start_time=2005-11-08 service = Network Time ' .
'proto=17 dst=x.x.x.x src_port=123 dst_port=123 src-xlated';

while ( $l2 =~ /($name_re)$equals_re($value_re)/g ) {
print "$1 => '$2'\n";
}

That prints:

start_time => '2005-11-08'
service => 'Network Time'
proto => '17'
dst => 'x.x.x.x'
src_port => '123'
dst_port => '123 src-xlated'


Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,534
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top