Combining several text files

E

Eric

This is my first post, so please advise if I'm not using proper
etiquette. I've actually searched around a bit and while I think I can
do this, I can't think of a clean elegant way. I'm pretty new to
Python, but from what I've learned so far is that there is almost
always an easier way.

I have to parse several log files. I've already written a working
parser. The log files are simple text files that when they get to a
certain size are renamed to append a number. So, you might end up
with:

filename.log.2
filename.log.1
filename.log

The higher the number, the older the file. I want to search for all
the files in a directory with "filename.log" as part of their name.
Then I can do one of two things. First I could combine them so that
the resulting file ends up with the oldest on top and newest on the
bottom. Otherwise, I could just iterate over the multiple files within
my parser.

I don't need working code (that makes things too easy), just clear
suggestions to a Python newcomer to speed me on my way.

Thanks
 
M

MRAB

Eric said:
> This is my first post, so please advise if I'm not using proper
> etiquette. I've actually searched around a bit and while I think I can
> do this, I can't think of a clean elegant way. I'm pretty new to
> Python, but from what I've learned so far is that there is almost
> always an easier way.
>
> I have to parse several log files. I've already written a working
> parser. The log files are simple text files that when they get to a
> certain size are renamed to append a number. So, you might end up
> with:
>
> filename.log.2
> filename.log.1
> filename.log
>
> The higher the number, the older the file. I want to search for all
> the files in a directory with "filename.log" as part of their name.
> Then I can do one of two things. First I could combine them so that
> the resulting file ends up with the oldest on top and newest on the
> bottom. Otherwise, I could just iterate over the multiple files within
> my parser.
>
> I don't need working code (that makes things too easy), just clear
> suggestions to a Python newcomer to speed me on my way.
>
My suggestion is to list the filenames, sort them into descending order
by the suffix (converted to an int) (treat an unnumbered filename as one
having the suffix ".0"), and then parse them in the resulting order.
 
C

Chris Rebert

This is my first post, so please advise if I'm not using proper
etiquette. I've actually searched around a bit and while I think I can
do this, I can't think of a clean elegant way. I'm pretty new to
Python, but from what I've learned so far is that there is almost
always an easier way.

I have to parse several log files. I've already written a working
parser. The log files are simple text files that when they get to a
certain size are renamed to append a number. So, you might end up
with:

filename.log.2
filename.log.1
filename.log

The higher the number, the older the file. I want to search for all
the files in a directory with "filename.log" as part of their name.
Then I can do one of two things. First I could combine them so that
the resulting file ends up with the oldest on top and newest on the
bottom. Otherwise, I could just iterate over the multiple files within
my parser.

I don't need working code (that makes things too easy), just clear
suggestions to a Python newcomer to speed me on my way.

For listing the filenames, you'll want to use os.listdir:
http://docs.python.org/library/os.html#os.listdir

or possibly the `glob` module depending on your needs:
http://docs.python.org/library/glob.html

Cheers,
Chris
 
E

Eric

R

rdmurray

Quoth Eric said:
This is my first post, so please advise if I'm not using proper
etiquette. I've actually searched around a bit and while I think I can
do this, I can't think of a clean elegant way. I'm pretty new to
Python, but from what I've learned so far is that there is almost
always an easier way.

I have to parse several log files. I've already written a working
parser. The log files are simple text files that when they get to a
certain size are renamed to append a number. So, you might end up
with:

filename.log.2
filename.log.1
filename.log

The higher the number, the older the file. I want to search for all
the files in a directory with "filename.log" as part of their name.
Then I can do one of two things. First I could combine them so that
the resulting file ends up with the oldest on top and newest on the
bottom. Otherwise, I could just iterate over the multiple files within
my parser.

I don't need working code (that makes things too easy), just clear
suggestions to a Python newcomer to speed me on my way.

My first thought would be to do something like this (assuming
you are on a unix variant):

yourscript `ls -tr filename.log*`

and then in 'yourscript' do:

import fileinput
for line in fileinput.input():
#process the lines

This has the nice advantage of giving you access to the source filename
and line number within its source file of each line, in case that
is useful.

This is an example of why Python is referred to as "Batteries Included" :)

--RDM
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top