Easy Method to 'slurp' a line toeknized by split

Discussion in 'Perl Misc' started by Rhugga, Feb 9, 2005.

  1. Rhugga

    Rhugga Guest

    I am writing a syslog parser that normalizes log entries and the loads
    them into a oracle database, I open the input file and process each
    line and tokenize it for use but I am running into a roadblock that I
    can't seem to find a clean solution for. (my perl skills have grown
    rusty over the years)

    Here is a sample entry from my input file:

    1 Feb 8 00:05:41 back-0202 tldcd[928]: [ID 359804 daemon.notice]
    TLD(0) opening robotic path /dev/sg/c3t0l0

    This is basically 7 fields of info:
    <log count> <timestamp (3 fields)> <hostname> <process info> <log
    content>

    (<log count> is the number of times an identical log entry was detected
    and truncated)

    So using split I break this down into components:

    @ARGS = (split / /, $line);
    $line =~ s/ +/ /g;
    $line =~ s/^ +//g;
    $log_count = $ARGS[0];
    $log_month = $ARGS[1];
    $log_day = $ARGS[2];
    $log_time = $ARGS[3];
    $log_hostname = $ARGS[4];
    $log_proc_info = $ARGS[5];
    $log_message = $ARGS[6];

    My problem is I want $log_message to contain everything after the
    process info field. (in the sample entry above, $log_proc_info will
    contain tldcd[928]). However $log_message will only contain the next
    space delimited field, in this case it will be '[ID'. WHat I want to do
    is after I glean $log_proc_info, I then want to set $log_message to the
    remaining bytes up to but not including EOL. (ie: I want $log_message =
    "[ID 359804 daemon.notice] TLD(0) opening robotic path /dev/sg/c3t0l0"
    )

    I hope this is making sense, I have working on no sleep as usual.

    Thanks for any help,
    CC
    Rhugga, Feb 9, 2005
    #1
    1. Advertising

  2. Rhugga

    Paul Lalli Guest

    "Rhugga" <> wrote in message
    news:...
    >
    > @ARGS = (split / /, $line);
    > $line =~ s/ +/ /g;
    > $line =~ s/^ +//g;
    > $log_count = $ARGS[0];
    > $log_month = $ARGS[1];
    > $log_day = $ARGS[2];
    > $log_time = $ARGS[3];
    > $log_hostname = $ARGS[4];
    > $log_proc_info = $ARGS[5];
    > $log_message = $ARGS[6];
    >
    > My problem is I want $log_message to contain everything after the
    > process info field.


    Have you considered reading the documentation for the function you're
    using? The solution is readily available to anyone who reads
    perldoc -f split
    (fourth paragraph)

    Paul Lalli
    Paul Lalli, Feb 9, 2005
    #2
    1. Advertising

  3. Rhugga

    Guest

    Rhugga wrote:

    > @ARGS = (split / /, $line);


    > $log_count = $ARGS[0];
    > $log_month = $ARGS[1];
    > $log_day = $ARGS[2];
    > $log_time = $ARGS[3];
    > $log_hostname = $ARGS[4];
    > $log_proc_info = $ARGS[5];
    > $log_message = $ARGS[6];


    This is more simply written

    my ($log_count,$log_month,$log_day,$log_time,$log_hostname,
    $log_proc_info,$log_message) = split / /, $line;

    Note: I've guessed that the omission of my() was a mistake since it
    probably was. An assignment without a my() overwrites one or more
    existing variables rather then intruducing new ones. Unless you have a
    possitive reason to modify an existing variable it is generally not a
    good idea to do so.

    Note: It would probably be more natural to represent the parsed record
    as a hash rather than 7 separate scalars. It is generally a good idea
    to us the natural representations of things unless you have a positive
    reason to do otherwise.

    my %log;
    @log{'count','month','day','time','hostname',
    'proc_info','message'} = split / /, $line;

    > My problem is I want $log_message to contain everything after the
    > process info field. (in the sample entry above, $log_proc_info will
    > contain tldcd[928]).


    You appear to have a question about the split() function.

    Have you considered the radical option of looking-up the description of
    the split function in the reference manual?

    Pay particular attension to the semantics of the 4th argument.
    , Feb 9, 2005
    #3
  4. Rhugga

    John Bokma Guest

    John Bokma, Feb 9, 2005
    #4
  5. Rhugga

    Rhugga Guest

    Maybe I am seeing a bug on SLES 9 then. I initially tried an approach
    like this:

    my ($log_count, $log_month, $log_day, $log_time, $log_hostname,
    $log_proc_info, $log_message) = split( / /, $line);

    I was getting errant non-consistent results. (which is why I dummied
    it down to the brute force approach I posted in the orignal post) The
    code I listed above is in a debug type of setup right now. I'll look at
    it again, just working on not much sleep and haven't written much perl
    in the last 5 years so very rusty. Ironcially, I parse the local
    timestamp into vars in the same way as you are suggesting for my data
    and that works fine.

    @ARGS is in all caps because it is my preference to set all caps for
    var names of 'un-refined' data. By un-refined meaning something I
    intend to do with it later on. When I see @ARGS or @MYVAR for example,
    that tells me this is raw data generated from a split() or something
    similiar. Just a personal preference. I'm the only one that sees my
    code so I dont need to adhere to coding standards as one would in a
    team envrionment.

    The reason why my() is missing is because the code you see is inside a
    loop, I define all those vars before the loop using my(). Once one
    iteration of the loop ends I no longer have a need for anything stored
    in those vars. (as this is getting shoved into oracle)

    Thanks for all the suggestions, I defintely have much more to go on
    now.

    Thx
    Rhugga, Feb 9, 2005
    #5
  6. Rhugga

    Rhugga Guest

    Maybe I am seeing a bug on SLES 9 then. I initially tried an approach
    like this:

    my ($log_count, $log_month, $log_day, $log_time, $log_hostname,
    $log_proc_info, $log_message) = split( / /, $line);

    I was getting errant non-consistent results. (which is why I dummied
    it down to the brute force approach I posted in the orignal post) The
    code I listed above is in a debug type of setup right now. I'll look at
    it again, just working on not much sleep and haven't written much perl
    in the last 5 years so very rusty. Ironcially, I parse the local
    timestamp into vars in the same way as you are suggesting for my data
    and that works fine.

    @ARGS is in all caps because it is my preference to set all caps for
    var names of 'un-refined' data. By un-refined meaning something I
    intend to do with it later on. When I see @ARGS or @MYVAR for example,
    that tells me this is raw data generated from a split() or something
    similiar. Just a personal preference. I'm the only one that sees my
    code so I dont need to adhere to coding standards as one would in a
    team envrionment.

    The reason why my() is missing is because the code you see is inside a
    loop, I define all those vars before the loop using my(). Once one
    iteration of the loop ends I no longer have a need for anything stored
    in those vars. (as this is getting shoved into oracle)

    Thanks for all the suggestions, I defintely have much more to go on
    now.

    Thx
    Rhugga, Feb 9, 2005
    #6
  7. Rhugga

    Rhugga Guest

    Just FYI:

    @ARGS = (split / /, $line, 7);

    This works perfectly, so much thanks for that suggestion. I just need
    to revert back to my ($count, $month, ...) = (split / /, $line, 7);

    Thx all.
    Rhugga, Feb 9, 2005
    #7
  8. Rhugga

    Eric Bohlman Guest

    "Rhugga" <> wrote in news:1107977068.579418.259720
    @o13g2000cwo.googlegroups.com:

    > The reason why my() is missing is because the code you see is inside a
    > loop, I define all those vars before the loop using my(). Once one
    > iteration of the loop ends I no longer have a need for anything stored
    > in those vars. (as this is getting shoved into oracle)


    If the variables are used only inside the loop, you should be declaring
    them inside the loop. Variables should have the narrowest possible scope;
    among other things, it will make it easier to understand/modify your code
    when you come back to it six months later.
    Eric Bohlman, Feb 9, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jürgen Exner

    Re: can slurp do this?

    Jürgen Exner, Jul 16, 2008, in forum: Perl
    Replies:
    0
    Views:
    2,590
    Jürgen Exner
    Jul 16, 2008
  2. Stefan Ram
    Replies:
    13
    Views:
    1,385
    Arne Vajhøj
    Jul 27, 2008
  3. Dick Davies
    Replies:
    1
    Views:
    110
    Gavin Sinclair
    Sep 29, 2005
  4. Wes Gamble
    Replies:
    7
    Views:
    130
    Lyle Johnson
    Mar 23, 2006
  5. Helge Elvik
    Replies:
    4
    Views:
    111
Loading...

Share This Page