File processing - is Python suitable?

F

ferrad

I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.

Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?

ferrad
 
H

Helmut Jarausch

ferrad said:
I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.

Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?

I think that's one of the great strength of Python.

Just some pointers

http://gnosis.cx/TPiP/
http://www.egenix.com/products/python/mxBase/mxTextTools/


--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany
 
A

Alan Isaac

ferrad said:
I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically.
Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files?

Someone can. ;-)
However if the file is structured,
awk may be faster, since this sounds
like the kind of report generation it
was designed for.

Alan Isaac
 
P

Peter Otten

ferrad said:
I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.

Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?

Yes, and if you are a non-programmer, the entry barrier for Python is as low
as it can get. However, what a programming language treats as a rule is
much stricter than what a human being might expect. For example, appending
an 's' to the first word in a sentence is "easy" in Python, changing the
subject's numerus to plural is "hard". Both are doable, but the less
technical your rules are the harder they become to translate.

You often have to compromise either by proofreading the results of any
automated processing, or by having your program ask a human operator in the
cases it can't decide upon.

I recommend that you play around a bit in the interactive interpreter to get
a feel for the kind of operations that are easily available on strings.

Then write the processing rules into a script, and always start your
conversion from the original data (of which you you have a backup in some
locker), not some intermediate output. That way you can try processing
without losing information in the data or about the process -- until you
find the results acceptable. Make backups of your script, too, before you
are trying something new.

Peter
 
J

Jorgen Grahn

I have not used Python before, but believe it may be what I need.

I have large text files containing text, numbers, and junk. I want to
delete large chunks process other bits, etc, much like I'd do in an
editor, but want to do it automatically. I have a set of generic
rules that my fingers follow to process these files, which all follow
a similar template.

Doesn't your text editor have recordable macros?
Question: can I translate these types of rules into programmatical
constructs that Python can use to process these files? Can Python do
the trick?

Impossible to tell, since we do not know these rules. If they need
your good judgement, intelligence, knowledge or taste, a good text
editor, with careful application of recorded macros, is the way to go.
Maybe in combination with a few Perl or Python scripts and special
features of the text editor. I often find myself doing that kind of
work, when the text I start with is too irregular to be easily machine
parsable.

On the other hand, if the work is purely mechanical, tedious stuff,
there is a fair chance that it can be completely automated using
Python. (IMHO, Perl is often a better tool for this kind of
work, but few other languages beat Python in this area.)

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top