Converting LF/FF delimited logs to XML w/ Python?

K

Kadin2048

This is a very noob-ish question so I apologize in advance, but I'm
hoping to get some input and advice before I get too over my head.

I'm trying to convert some log files from a formfeed- and
linefeed-delimited form into XML. I'd been thinking of using Python to
do this, but I'll be honest and say that I'm very inexperienced with
Python, so before I dive in I wanted to see whether some more
experienced minds thought I was choosing the right tool.

Basically, what I want to do is convert from instant messaging logs
produced by CenterIM, which look like this (Where "^L" represents ASCII
12, the formfeed character):

^L
IN
MSG
1190126325
1190126325
hi
^L
OUT
MSG
1190126383
1190126383
hello

To an XML-based format* like this:

<chat account="joeblow" service="AIM" version="0.4">
<message sender="janedoe" time="1190126325">hi</message>
<message sender="joeblow" time="1190126383">hello</message>
</chat>

Obviously there's information in the bottom example not present in the
top (account names, protocol), but I'll grab those from the file name or
prompt the user.

Given that I'd be learning as I go along, is Python a good tool for
doing this? (Am I totally insane to be trying this as a beginner?) And
if so, where should I start? I'd like to avoid massive
wheel-reinvention if at all possible.

I'm not afraid to RTFM but there's a lot of information around on Python
and I'm not sure what's most relevant. Suggestions on what to read,
books to buy, etc., are all welcomed.

Thanks in advance,
Kadin.

* For the curious, this is sort of poor attempt at the "Universal Log
Format" as used by Adium on OS X.
 
C

Chris Mellon

This is a very noob-ish question so I apologize in advance, but I'm
hoping to get some input and advice before I get too over my head.

I'm trying to convert some log files from a formfeed- and
linefeed-delimited form into XML. I'd been thinking of using Python to
do this, but I'll be honest and say that I'm very inexperienced with
Python, so before I dive in I wanted to see whether some more
experienced minds thought I was choosing the right tool.

Basically, what I want to do is convert from instant messaging logs
produced by CenterIM, which look like this (Where "^L" represents ASCII
12, the formfeed character):

^L
IN
MSG
1190126325
1190126325
hi
^L
OUT
MSG
1190126383
1190126383
hello

To an XML-based format* like this:

<chat account="joeblow" service="AIM" version="0.4">
<message sender="janedoe" time="1190126325">hi</message>
<message sender="joeblow" time="1190126383">hello</message>
</chat>

Obviously there's information in the bottom example not present in the
top (account names, protocol), but I'll grab those from the file name or
prompt the user.

Given that I'd be learning as I go along, is Python a good tool for
doing this? (Am I totally insane to be trying this as a beginner?) And
if so, where should I start? I'd like to avoid massive
wheel-reinvention if at all possible.

I'm not afraid to RTFM but there's a lot of information around on Python
and I'm not sure what's most relevant. Suggestions on what to read,
books to buy, etc., are all welcomed.

This is a pretty simple problem and is well suited for a beginner
project. The file() builtin will get you the data in your log file.
Using the split() method of the string object, you can break your
logfile into chunks.

There are a number of XML libraries in the standard lib, but xml.etree
is my preferred one. It is documented in the stdlib docs, and on the
effbot site.
 
K

kyosohma

This is a very noob-ish question so I apologize in advance, but I'm
hoping to get some input and advice before I get too over my head.

I'm trying to convert some log files from a formfeed- and
linefeed-delimited form into XML. I'd been thinking of using Python to
do this, but I'll be honest and say that I'm very inexperienced with
Python, so before I dive in I wanted to see whether some more
experienced minds thought I was choosing the right tool.

Basically, what I want to do is convert from instant messaging logs
produced by CenterIM, which look like this (Where "^L" represents ASCII
12, the formfeed character):

^L
IN
MSG
1190126325
1190126325
hi
^L
OUT
MSG
1190126383
1190126383
hello

To an XML-based format* like this:

<chat account="joeblow" service="AIM" version="0.4">
<message sender="janedoe" time="1190126325">hi</message>
<message sender="joeblow" time="1190126383">hello</message>
</chat>

Obviously there's information in the bottom example not present in the
top (account names, protocol), but I'll grab those from the file name or
prompt the user.

Given that I'd be learning as I go along, is Python a good tool for
doing this? (Am I totally insane to be trying this as a beginner?) And
if so, where should I start? I'd like to avoid massive
wheel-reinvention if at all possible.

I'm not afraid to RTFM but there's a lot of information around on Python
and I'm not sure what's most relevant. Suggestions on what to read,
books to buy, etc., are all welcomed.

Thanks in advance,
Kadin.

* For the curious, this is sort of poor attempt at the "Universal Log
Format" as used by Adium on OS X.

--http://kadin.sdf-us.org/

I've used lxml and DOM/minidom. Both took my a while to figure out and
I still don't always understand them. Anyway, lxml is similar to the
method Chris mentioned.

http://docs.python.org/lib/module-xml.dom.html
http://www.oreilly.com/catalog/pythonxml/chapter/ch01.html
http://pyxml.sourceforge.net/topics/

Mike
 
K

Kadin2048

"Chris Mellon said:
This is a pretty simple problem and is well suited for a beginner
project. The file() builtin will get you the data in your log file.
Using the split() method of the string object, you can break your
logfile into chunks.

Glad to hear I'm not jumping too far into the deep end, then. I figured
this had to be a fairly common/basic task.
There are a number of XML libraries in the standard lib, but xml.etree
is my preferred one. It is documented in the stdlib docs, and on the
effbot site.

Excellent, I'll check it out further. effbot looks like a good resource.

Thanks,
Kadin.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top