Parsing Log records with regular expressions

Kris K. · Feb 3, 2011

I have a log file which is text based which has records in two formats
of the following form
`
A|B|C|D\n
A|B|C|D|E\n
\n
Exception\n
\n
\tstack trace line1\n
\tstack trace line2\n
\tstack trace line3\n
\n
A|B|C|D\n`

The first form (A|B|C|D) has statically defined columns delimited by a
pipe symbol. The second form has the last character "E" which implies an
exception record. If it is an exception record the information about the
exception follows. The exception information starts with a line
"Exception", followed by another newline and stacktrace on multiple
lines. Each stacktrace element starts with a tab.

I am parsing this file with ruby. Currently I am reading line by line
and building the log records. This is working fine.

I am wondering if I could rely on regular expressions to do it instead
of reading line by line - I could read a chunk of the file and apply two
regular expressions to see if there is a match and if I find the match
process the record and move to the next record. If there is no match,
then I combine multiple chunks until I find a match. Is this approach a
valid
consideration? Is this doable with Ruby? If there are any open source
projects, that do something like this, can someone point me to it? Also
any thoughts which one is more efficient and why? Appreciate any
feedback.

Robert Klemme · Feb 4, 2011

I have a log file which is text based which has records in two formats
of the following form
`
A|B|C|D\n
A|B|C|D|E\n
\n
Exception\n
\n
\tstack trace line1\n
\tstack trace line2\n
\tstack trace line3\n
\n
A|B|C|D\n`

The first form (A|B|C|D) has statically defined columns delimited by a
pipe symbol. The second form has the last character "E" which implies an
exception record. If it is an exception record the information about the
exception follows. The exception information starts with a line
"Exception", followed by another newline and stacktrace on multiple
lines. Each stacktrace element starts with a tab.

I am parsing this file with ruby. Currently I am reading line by line
and building the log records. This is working fine.

I am wondering if I could rely on regular expressions to do it instead
of reading line by line - I could read a chunk of the file and apply two
regular expressions to see if there is a match and if I find the match
process the record and move to the next record. If there is no match,
then I combine multiple chunks until I find a match. Is this approach a
valid consideration?

Question is: why do you want to do that? Line based parsing is simple
and has the advantage that you always get a complete record. Note
also that underneath Ruby uses buffered reading - just in case you
wonder about IO efficiency.

Is this doable with Ruby?

Yes, certainly.

If there are any open source
projects, that do something like this, can someone point me to it? Also
any thoughts which one is more efficient and why? Appreciate any
feedback.

My implementation of this would use a single regular expression with
an optional part for the "|E". That way you need to match only once
and you can immediately distinguish record types.

# untested
Record = Struct.new :a, :b, :c, :d, :e

last = nil
ex = false

def parse
ARGF.each do |line|
if %r{^([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)(\|E)?} =~ line
ex = $5
r = Record.new $1, $2, $3, $4
r.e = "" if ex

yield last if last

last = r
elsif ex
last.e << line
else
warn "Dunno what to do with line %{line.inspect}"
end
end

yield last if last
end

parse do |rec|
p rec
end

Cheers

robert

Kris K. · Feb 4, 2011

Thanks for the prompt response. Apprecite your taking the time to
respond with sample code. I have just started on this as a pet project
to learn Ruby. The task is to build a log analysis web application. The
log file is not a standard one - in the sense that it is dynamically
constructed where some columns are optional, but all of them are
seperated by '|' character. Initially I am starting with reading a
static file but at some point my plan is to use SSH to read the live
file contents and provide realtime inforation. So I was considering what
other alternatives might work well in the realtime scenario as well.

Robert Klemme wrote in post #979584:

My implementation of this would use a single regular expression with
an optional part for the "|E". That way you need to match only once
and you can immediately distinguish record types.

# untested
Record = Struct.new :a, :b, :c, :d, :e

last = nil
ex = false

def parse
ARGF.each do |line|
if %r{^([^|]*)\|([^|]*)\|([^|]*)\|([^|]*)(\|E)?} =~ line
ex = $5
r = Record.new $1, $2, $3, $4
r.e = "" if ex

yield last if last

last = r
elsif ex
last.e << line
else
warn "Dunno what to do with line %{line.inspect}"
end
end

yield last if last
end

parse do |rec|
p rec
end

Cheers

robert

Dynamic block parsing + scrolling	0	May 30, 2024
Dynamic block parsing + scrolling	0	May 30, 2024
FOSS or Freeware, Prefferably Runs on Linux Mint: Search US Goverment Records, Legally to Find Literarary Work	8	Apr 5, 2023
The power of regular expressions without regular expressions.	0	Jul 17, 2013
regexp(ing) Backus-Naurish expressions ...	7	Mar 13, 2013
help with regular expressions	9	Oct 24, 2008
Regular expressions, capture repeated groups	4	Jul 8, 2010
The definitive statement on parsing HTML with regular expressions	5	Jan 29, 2013

Parsing Log records with regular expressions

Kris K.

Robert Klemme

Kris K.

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads