Rubish Way of extracting elements

Discussion in 'Ruby' started by Daniel Völkerts, Aug 16, 2004.

  1. I started written a little script to analyse my syslogs. The development
    went on very fast, but today I'm searching the rubish way to dissect a
    string into some parts. For example in my syslog there is a line (valid
    as described in rfc3146)

    <165> Aug 16 17:01:35 localhost Just a test

    I was trying to reach this form

    var = content

    pri = 165
    timestamp = Aug 16 17:01:35
    device = localhost
    msg = Just a test

    But how do I accomplish this? I read the pickaxe book, but the example I
    found was about repeating values e.g. | as seperator. Is a suitable
    regexp the way or should use another technique e.g. String#index etc.?


    Thanks for your time helping me, I'll pay it back if I become a little
    more rubisher ;)

    --
    Daniel Völkerts ::
    "Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
     
    Daniel Völkerts, Aug 16, 2004
    #1
    1. Advertising

  2. Daniel Völkerts wrote:

    > I started written a little script to analyse my syslogs.


    I feel sorry, 'I started writting..' is the correct way.

    --
    Daniel Völkerts ::
    "Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
     
    Daniel Völkerts, Aug 16, 2004
    #2
    1. Advertising

  3. Hi --

    On Tue, 17 Aug 2004, Daniel Völkerts wrote:

    > I started written a little script to analyse my syslogs. The development
    > went on very fast, but today I'm searching the rubish way to dissect a
    > string into some parts. For example in my syslog there is a line (valid
    > as described in rfc3146)
    >
    > <165> Aug 16 17:01:35 localhost Just a test
    >
    > I was trying to reach this form
    >
    > var = content
    >
    > pri = 165
    > timestamp = Aug 16 17:01:35
    > device = localhost
    > msg = Just a test
    >
    > But how do I accomplish this? I read the pickaxe book, but the example I
    > found was about repeating values e.g. | as seperator. Is a suitable
    > regexp the way or should use another technique e.g. String#index etc.?
    >
    >
    > Thanks for your time helping me, I'll pay it back if I become a little
    > more rubisher ;)


    You could match it to a regular expression, and grab the results in
    ()-expressions:

    str = "<165> Aug 16 17:01:35 localhost Just a test"

    pri, timestamp, device, msg =
    /<(\d+)>\s+(\w+\s+\d+\s+[\d:]+)\s+(\S+)\s+(.*)/.match(str).captures

    Another way would be to use scanf. This has the advantage that you
    get your 165 as an integer (if that's important):

    require 'scanf'
    pri, timestamp, device, msg = str.scanf("<%\d> %15c %s%*c %[\\S\\s]"


    (You might have to adjust either the regex or the format string
    depending on how consistent and predictable the lines are.)


    David

    --
    David A. Black
     
    David A. Black, Aug 16, 2004
    #3
  4. On Aug 16, 2004, at 8:06 AM, Daniel Völkerts wrote:

    > I started written a little script to analyse my syslogs. The
    > development went on very fast, but today I'm searching the rubish way
    > to dissect a string into some parts. For example in my syslog there is
    > a line (valid as described in rfc3146)
    >
    > <165> Aug 16 17:01:35 localhost Just a test
    >
    > I was trying to reach this form
    >
    > var = content
    >
    > pri = 165
    > timestamp = Aug 16 17:01:35
    > device = localhost
    > msg = Just a test
    >
    > But how do I accomplish this? I read the pickaxe book, but the example
    > I found was about repeating values e.g. | as seperator. Is a suitable
    > regexp the way or should use another technique e.g. String#index etc.?
    >

    Probably use regular expressions. You could have one big regexp or one
    for each field like so:
    var =~ /<([0-9]+)>/
    pri = $1
    $' =~ /some regexp/ # I'm lazy
    timestamp = $1
    # etc
    You could also use \A along with the post match ($') to make sure the
    fields come in the order you expect.
    -Charlie

    >
    > Thanks for your time helping me, I'll pay it back if I become a little
    > more rubisher ;)
    >
    > --
    > Daniel Völkerts ::
    > "Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
    >
     
    Charles Mills, Aug 16, 2004
    #4
  5. Daniel Völkerts wrote:

    > <165> Aug 16 17:01:35 localhost Just a test
    > I was trying to reach this form
    >
    > var = content
    >
    > pri = 165
    > timestamp = Aug 16 17:01:35
    > device = localhost
    > msg = Just a test


    This ought to work, but there might be other ways to do this:

    if md = /^<(\d+)> (\S+ \d+ \d+:\d+:\d+) (\S+) (.*?)$/.match(text)
    pri, timestamp, device, msg = *md.captures
    # Do something with the captures
    end

    Regards,
    Florian Gross
     
    Florian Gross, Aug 16, 2004
    #5
  6. Daniel Völkerts wrote:

    > I feel sorry, 'I started writting..' is the correct way.


    What the hell, writting is also wrong, tzzz. Too much caffeine in my head!

    After I posted the above thread I have written this line

    pri,timestamp,device,msg = aMsg.scan(/<\d{1,5}>|\w{3,} \d\d
    \d\d:\d\d:\d\d|\w+/)

    Is this the right way? Please feel free to post comments. I'll looking
    for it to improve my ruby skills.

    --
    Daniel Völkerts ::
    "Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
     
    Daniel Völkerts, Aug 16, 2004
    #6
  7. David A. Black wrote:

    > You could match it to a regular expression, and grab the results in
    > ()-expressions:
    >
    > str = "<165> Aug 16 17:01:35 localhost Just a test"
    >
    > pri, timestamp, device, msg =
    > /<(\d+)>\s+(\w+\s+\d+\s+[\d:]+)\s+(\S+)\s+(.*)/.match(str).captures
    >
    > Another way would be to use scanf. This has the advantage that you
    > get your 165 as an integer (if that's important):
    >
    > require 'scanf'
    > pri, timestamp, device, msg = str.scanf("<%\d> %15c %s%*c %[\\S\\s]"
    >
    >
    > (You might have to adjust either the regex or the format string
    > depending on how consistent and predictable the lines are.)


    Thanks a lot. Thats the way I would expect it. Simple and nice to
    understand. I'll try it.

    Many regards.

    --
    Daniel Völkerts ::
    "Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
     
    Daniel Völkerts, Aug 16, 2004
    #7
  8. "Florian Gross" <> schrieb im Newsbeitrag
    news:...
    > Daniel Völkerts wrote:
    >
    > > <165> Aug 16 17:01:35 localhost Just a test
    > > I was trying to reach this form
    > >
    > > var = content
    > >
    > > pri = 165
    > > timestamp = Aug 16 17:01:35
    > > device = localhost
    > > msg = Just a test

    >
    > This ought to work, but there might be other ways to do this:
    >
    > if md = /^<(\d+)> (\S+ \d+ \d+:\d+:\d+) (\S+) (.*?)$/.match(text)
    > pri, timestamp, device, msg = *md.captures
    > # Do something with the captures
    > end


    Some more admittedly ugly constructions:

    val = "<165> Aug 16 17:01:35 localhost Just a test"
    unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
    \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
    puts "matched"
    end

    pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+)
    \s+ (\S+) \s+ (.*)$/x.match(val).to_a
    if pri
    puts "matched"
    end

    LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x

    unless ( ( pri, timestamp, device, msg = * LOG_RX.match(val).to_a ).empty? )
    puts "matched"
    end

    if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
    \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
    puts "matched"
    end

    if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
    puts "matched"
    end

    :)

    robert
     
    Robert Klemme, Aug 16, 2004
    #8
  9. Robert Klemme wrote:

    > Some more admittedly ugly constructions:
    >
    > val = "<165> Aug 16 17:01:35 localhost Just a test"
    > unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
    > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
    > puts "matched"
    > end
    >
    > pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+)
    > \s+ (\S+) \s+ (.*)$/x.match(val).to_a
    > if pri
    > puts "matched"
    > end
    >
    > LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x
    >
    > unless ( ( pri, timestamp, device, msg = * LOG_RX.match(val).to_a ).empty? )
    > puts "matched"
    > end
    >
    > if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
    > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
    > puts "matched"
    > end
    >
    > if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
    > puts "matched"
    > end
    >
    > :)
    >
    > robert
    >


    *boom* That blow my mind away! No no, thanks a lot for that piece of code.

    But I prefer the scanf and one-line-regexp.

    I'll test which kind performs better for my needs. As I said, I'm a ruby
    newbie and personal programming rule is: keep it simple! ;) I've to
    understand the things I wrote.

    If the point is reached where my little script becomes interesting for
    others than me, I'll post an [Ann] thread.


    Bye,
    --
    Daniel Völkerts ::
    "Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
     
    Daniel Völkerts, Aug 16, 2004
    #9
  10. "Daniel Völkerts" <> schrieb im Newsbeitrag
    news:cfr4ml$d13$00$-online.com...
    > Robert Klemme wrote:
    >
    > > Some more admittedly ugly constructions:
    > >
    > > val = "<165> Aug 16 17:01:35 localhost Just a test"


    (1) This one converts the RX MatchData into an array and tests for emptyness
    to determine whether it matched. And along the way values are assigned to
    local vars.

    > > unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+

    \s+
    > > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
    > > puts "matched"
    > > end


    (2) Similar, but now just one local var is used as match check: if "pri" is
    not nil, the RX matched.

    > > pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+

    \d+:\d+:\d+)
    > > \s+ (\S+) \s+ (.*)$/x.match(val).to_a
    > > if pri
    > > puts "matched"
    > > end


    (3) Same approach as (1) but the regexp is defined as a constant to make
    stuff more readable.

    > > LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+

    (.*)$/x
    > >
    > > unless ( ( pri, timestamp, device, msg = *

    LOG_RX.match(val).to_a ).empty? )
    > > puts "matched"
    > > end


    (4) Similar approach to (2) but the test is included ("&& line"). Note that
    this time no conversion to array is done here so we need the additional
    local "line" to receive the complete capture.

    > > if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+

    \s+
    > > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
    > > puts "matched"
    > > end


    (5) Same as (4) but with regexp in constant as in (3).

    > > if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
    > > puts "matched"
    > > end
    > >
    > > :)
    > >
    > > robert
    > >

    >
    > *boom* That blow my mind away! No no, thanks a lot for that piece of code.


    :) I *should've* put some comments in... Ok, inserting them above now.

    > But I prefer the scanf and one-line-regexp.


    Basically I used extended regular expressions (switched by the "/x" flag).
    Whitespace is ignored, that's why you see more "\s+" in there. And that's
    why the regexp is longer.

    > I'll test which kind performs better for my needs. As I said, I'm a ruby
    > newbie and personal programming rule is: keep it simple! ;) I've to
    > understand the things I wrote.


    That's an excellent road to walk down! Handcrafted, simple code is better
    than a mindless copy of something found somewhere.

    Kind regards

    robert
     
    Robert Klemme, Aug 17, 2004
    #10
  11. Hi Robert,

    thank you very very much for your short lesson. It's very intresting and
    I'll see how I can profite from these information.

    Ruby becomes more and more usable for me (normally my language of choice
    is java but for such little scripts ruby is a great of fun!).

    Bye,
    --
    Daniel Völkerts ::
    "Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek
     
    Daniel Völkerts, Aug 17, 2004
    #11
  12. "Daniel Völkerts" <> schrieb im Newsbeitrag
    news:cft54c$872$02$-online.com...
    > Hi Robert,
    >
    > thank you very very much for your short lesson. It's very intresting and
    > I'll see how I can profite from these information.


    I'm glad I could be of any help.

    > Ruby becomes more and more usable for me (normally my language of choice
    > is java but for such little scripts ruby is a great of fun!).


    Same here. I even use Ruby sometimes to manipulate Java code or search
    through piles of Java code... :)

    > Daniel Völkerts ::
    > "Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek


    Ohooo... *shake in fear*
    :)

    Kind regards

    robert
     
    Robert Klemme, Aug 18, 2004
    #12
  13. Thanks for taking the time to put in explainations. Also a ruby newbie that never wrote anything useful yet, but started to follow this list a bit. Always look forward to your posting since I'm sure you'll put some line of code I won't understand.. ;-) Part of my learning is 'trying' to understand them. Thanks for the extra hand on this one!

    Dany



    On Tue, 17 Aug 2004 09:52:32 +0200
    "Robert Klemme" <> wrote:

    >
    > "Daniel Völkerts" <> schrieb im Newsbeitrag
    > news:cfr4ml$d13$00$-online.com...
    > > Robert Klemme wrote:
    > >
    > > > Some more admittedly ugly constructions:
    > > >
    > > > val = "<165> Aug 16 17:01:35 localhost Just a test"

    >
    > (1) This one converts the RX MatchData into an array and tests for emptyness
    > to determine whether it matched. And along the way values are assigned to
    > local vars.
    >
    > > > unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+

    > \s+
    > > > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
    > > > puts "matched"
    > > > end

    >
    > (2) Similar, but now just one local var is used as match check: if "pri" is
    > not nil, the RX matched.
    >
    > > > pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+

    > \d+:\d+:\d+)
    > > > \s+ (\S+) \s+ (.*)$/x.match(val).to_a
    > > > if pri
    > > > puts "matched"
    > > > end

    >
    > (3) Same approach as (1) but the regexp is defined as a constant to make
    > stuff more readable.
    >
    > > > LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+

    > (.*)$/x
    > > >
    > > > unless ( ( pri, timestamp, device, msg = *

    > LOG_RX.match(val).to_a ).empty? )
    > > > puts "matched"
    > > > end

    >
    > (4) Similar approach to (2) but the test is included ("&& line"). Note that
    > this time no conversion to array is done here so we need the additional
    > local "line" to receive the complete capture.
    >
    > > > if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+

    > \s+
    > > > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
    > > > puts "matched"
    > > > end

    >
    > (5) Same as (4) but with regexp in constant as in (3).
    >
    > > > if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
    > > > puts "matched"
    > > > end
    > > >
    > > > :)
    > > >
    > > > robert
    > > >

    > >
    > > *boom* That blow my mind away! No no, thanks a lot for that piece of code.

    >
    > :) I *should've* put some comments in... Ok, inserting them above now.
    >
    > > But I prefer the scanf and one-line-regexp.

    >
    > Basically I used extended regular expressions (switched by the "/x" flag).
    > Whitespace is ignored, that's why you see more "\s+" in there. And that's
    > why the regexp is longer.
    >
    > > I'll test which kind performs better for my needs. As I said, I'm a ruby
    > > newbie and personal programming rule is: keep it simple! ;) I've to
    > > understand the things I wrote.

    >
    > That's an excellent road to walk down! Handcrafted, simple code is better
    > than a mindless copy of something found somewhere.
    >
    > Kind regards
    >
    > robert
    >
     
    Dany Cayouette, Aug 18, 2004
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Wolfgang Lipp
    Replies:
    1
    Views:
    404
    Patrick TJ McPhee
    Jan 30, 2004
  2. Replies:
    0
    Views:
    372
  3. Kevin
    Replies:
    16
    Views:
    47,308
    Roedy Green
    Jan 30, 2008
  4. JoeM
    Replies:
    15
    Views:
    323
    Chris Angelico
    Nov 15, 2011
  5. WKC CCC
    Replies:
    4
    Views:
    96
    Tim Pease
    Feb 13, 2007
Loading...

Share This Page