RE to extract infos from a logfile

G

Gerhard M

hi,

i've to extract some informations from a logfile. The logfile has the
format:
[timestamp].[hostname].[pid]:[type] '[xml]' [some text]

e.g.
131112.dumbo.domain.tld.414:create_customer '<!xml...>..</..>'
additional info

but if the xml is to long the line will be truncated. Now there is:
131112.dumbo.domain.tld.414:create_customer
'<!xml...>.....................
the xml is not enclosured by quotes

what i need are the fields ($time,$process,$info,$xml) where xml is
either the xml enlcosured by the quotes or the line starting at the
first quote if line is truncated.

looking for an RE to match this i've not found the one closing at the
quote or at EOL:


while (<>) {
($time,$process,$info,$xml) = /^(\d+).*\.(\d+):(\w+)\s*'(.+)'/;
# this one will not match to long xml-lines

($time,$process,$info,$xml) = /^(\d+).*\.(\d+):(\w+)\s*'(.+)$/;
# this one will set $xml="[xml]' [text]"

($time,$process,$info,$xml) =
m#^(\d+).*\.(\d+):(\w+)\s*'([\w<>!&;,\.]+)#;
# not realy nice, if to include all possible chars to the pattern

($time,$process,$info,$xml) = m#^(\d+).*\.(\d+):(\w+)\s*'(.*>)#;
# does not work, [text] can also contain <>'s

($time,$process,$info,$xml) = m#^(\d+).*\.(\d+):(\w+)\s*'(.*)'?#;
# will ever match to eol
}


does anyone have an RE which will extract the xml correct?

thx
gerhard
 
A

Anno Siegel

Gerhard M said:
hi,

i've to extract some informations from a logfile. The logfile has the
format:
[timestamp].[hostname].[pid]:[type] '[xml]' [some text]

e.g.
131112.dumbo.domain.tld.414:create_customer '<!xml...>..</..>'
additional info

but if the xml is to long the line will be truncated. Now there is:
131112.dumbo.domain.tld.414:create_customer
'<!xml...>.....................
the xml is not enclosured by quotes

what i need are the fields ($time,$process,$info,$xml) where xml is
either the xml enlcosured by the quotes or the line starting at the
first quote if line is truncated.

looking for an RE to match this i've not found the one closing at the
quote or at EOL:


while (<>) {
($time,$process,$info,$xml) = /^(\d+).*\.(\d+):(\w+)\s*'(.+)'/;
# this one will not match to long xml-lines

"to long" meaning truncated?

Instead of matching everything up to a "'" (which may never come)
match everything that is not a "'". So change the last bit of
the regex from "...(.+)'/" to "([^%']*)/".

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,227
Latest member
Daniella65

Latest Threads

Top