Rubish Way of extracting elements

D

Daniel Völkerts

I started written a little script to analyse my syslogs. The development
went on very fast, but today I'm searching the rubish way to dissect a
string into some parts. For example in my syslog there is a line (valid
as described in rfc3146)

<165> Aug 16 17:01:35 localhost Just a test

I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

But how do I accomplish this? I read the pickaxe book, but the example I
found was about repeating values e.g. | as seperator. Is a suitable
regexp the way or should use another technique e.g. String#index etc.?


Thanks for your time helping me, I'll pay it back if I become a little
more rubisher ;)
 
D

David A. Black

Hi --

I started written a little script to analyse my syslogs. The development
went on very fast, but today I'm searching the rubish way to dissect a
string into some parts. For example in my syslog there is a line (valid
as described in rfc3146)

<165> Aug 16 17:01:35 localhost Just a test

I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

But how do I accomplish this? I read the pickaxe book, but the example I
found was about repeating values e.g. | as seperator. Is a suitable
regexp the way or should use another technique e.g. String#index etc.?


Thanks for your time helping me, I'll pay it back if I become a little
more rubisher ;)

You could match it to a regular expression, and grab the results in
()-expressions:

str = "<165> Aug 16 17:01:35 localhost Just a test"

pri, timestamp, device, msg =
/<(\d+)>\s+(\w+\s+\d+\s+[\d:]+)\s+(\S+)\s+(.*)/.match(str).captures

Another way would be to use scanf. This has the advantage that you
get your 165 as an integer (if that's important):

require 'scanf'
pri, timestamp, device, msg = str.scanf("<%\d> %15c %s%*c %[\\S\\s]"


(You might have to adjust either the regex or the format string
depending on how consistent and predictable the lines are.)


David
 
C

Charles Mills

I started written a little script to analyse my syslogs. The
development went on very fast, but today I'm searching the rubish way
to dissect a string into some parts. For example in my syslog there is
a line (valid as described in rfc3146)

<165> Aug 16 17:01:35 localhost Just a test

I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

But how do I accomplish this? I read the pickaxe book, but the example
I found was about repeating values e.g. | as seperator. Is a suitable
regexp the way or should use another technique e.g. String#index etc.?
Probably use regular expressions. You could have one big regexp or one
for each field like so:
var =~ /<([0-9]+)>/
pri = $1
$' =~ /some regexp/ # I'm lazy
timestamp = $1
# etc
You could also use \A along with the post match ($') to make sure the
fields come in the order you expect.
-Charlie
 
F

Florian Gross

Daniel said:
<165> Aug 16 17:01:35 localhost Just a test
I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

This ought to work, but there might be other ways to do this:

if md = /^<(\d+)> (\S+ \d+ \d+:\d+:\d+) (\S+) (.*?)$/.match(text)
pri, timestamp, device, msg = *md.captures
# Do something with the captures
end

Regards,
Florian Gross
 
D

Daniel Völkerts

Daniel said:
I feel sorry, 'I started writting..' is the correct way.

What the hell, writting is also wrong, tzzz. Too much caffeine in my head!

After I posted the above thread I have written this line

pri,timestamp,device,msg = aMsg.scan(/<\d{1,5}>|\w{3,} \d\d
\d\d:\d\d:\d\d|\w+/)

Is this the right way? Please feel free to post comments. I'll looking
for it to improve my ruby skills.
 
D

Daniel Völkerts

David said:
You could match it to a regular expression, and grab the results in
()-expressions:

str = "<165> Aug 16 17:01:35 localhost Just a test"

pri, timestamp, device, msg =
/<(\d+)>\s+(\w+\s+\d+\s+[\d:]+)\s+(\S+)\s+(.*)/.match(str).captures

Another way would be to use scanf. This has the advantage that you
get your 165 as an integer (if that's important):

require 'scanf'
pri, timestamp, device, msg = str.scanf("<%\d> %15c %s%*c %[\\S\\s]"


(You might have to adjust either the regex or the format string
depending on how consistent and predictable the lines are.)

Thanks a lot. Thats the way I would expect it. Simple and nice to
understand. I'll try it.

Many regards.
 
R

Robert Klemme

Florian Gross said:
This ought to work, but there might be other ways to do this:

if md = /^<(\d+)> (\S+ \d+ \d+:\d+:\d+) (\S+) (.*?)$/.match(text)
pri, timestamp, device, msg = *md.captures
# Do something with the captures
end

Some more admittedly ugly constructions:

val = "<165> Aug 16 17:01:35 localhost Just a test"
unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
puts "matched"
end

pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+)
\s+ (\S+) \s+ (.*)$/x.match(val).to_a
if pri
puts "matched"
end

LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x

unless ( ( pri, timestamp, device, msg = * LOG_RX.match(val).to_a ).empty? )
puts "matched"
end

if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
puts "matched"
end

if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
puts "matched"
end

:)

robert
 
D

Daniel Völkerts

Robert said:
Some more admittedly ugly constructions:

val = "<165> Aug 16 17:01:35 localhost Just a test"
unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
puts "matched"
end

pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+)
\s+ (\S+) \s+ (.*)$/x.match(val).to_a
if pri
puts "matched"
end

LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x

unless ( ( pri, timestamp, device, msg = * LOG_RX.match(val).to_a ).empty? )
puts "matched"
end

if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
puts "matched"
end

if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
puts "matched"
end

:)

robert

*boom* That blow my mind away! No no, thanks a lot for that piece of code.

But I prefer the scanf and one-line-regexp.

I'll test which kind performs better for my needs. As I said, I'm a ruby
newbie and personal programming rule is: keep it simple! ;) I've to
understand the things I wrote.

If the point is reached where my little script becomes interesting for
others than me, I'll post an [Ann] thread.


Bye,
 
R

Robert Klemme

(1) This one converts the RX MatchData into an array and tests for emptyness
to determine whether it matched. And along the way values are assigned to
local vars.

(2) Similar, but now just one local var is used as match check: if "pri" is
not nil, the RX matched.

(3) Same approach as (1) but the regexp is defined as a constant to make
stuff more readable.

(4) Similar approach to (2) but the test is included ("&& line"). Note that
this time no conversion to array is done here so we need the additional
local "line" to receive the complete capture.

(5) Same as (4) but with regexp in constant as in (3).
*boom* That blow my mind away! No no, thanks a lot for that piece of code.

:) I *should've* put some comments in... Ok, inserting them above now.
But I prefer the scanf and one-line-regexp.

Basically I used extended regular expressions (switched by the "/x" flag).
Whitespace is ignored, that's why you see more "\s+" in there. And that's
why the regexp is longer.
I'll test which kind performs better for my needs. As I said, I'm a ruby
newbie and personal programming rule is: keep it simple! ;) I've to
understand the things I wrote.

That's an excellent road to walk down! Handcrafted, simple code is better
than a mindless copy of something found somewhere.

Kind regards

robert
 
D

Daniel Völkerts

Hi Robert,

thank you very very much for your short lesson. It's very intresting and
I'll see how I can profite from these information.

Ruby becomes more and more usable for me (normally my language of choice
is java but for such little scripts ruby is a great of fun!).

Bye,
 
R

Robert Klemme

Daniel Völkerts said:
Hi Robert,

thank you very very much for your short lesson. It's very intresting and
I'll see how I can profite from these information.

I'm glad I could be of any help.
Ruby becomes more and more usable for me (normally my language of choice
is java but for such little scripts ruby is a great of fun!).

Same here. I even use Ruby sometimes to manipulate Java code or search
through piles of Java code... :)
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Ohooo... *shake in fear*
:)

Kind regards

robert
 
D

Dany Cayouette

Thanks for taking the time to put in explainations. Also a ruby newbie that never wrote anything useful yet, but started to follow this list a bit. Always look forward to your posting since I'm sure you'll put some line of code I won't understand.. ;-) Part of my learning is 'trying' to understand them. Thanks for the extra hand on this one!

Dany
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top