An Odd Little Script

Greg Lindstrom · Mar 9, 2005

Hello-

I have a task which -- dare I say -- would be easy in <asbestos_undies>
Perl </asbestos_undies> but would rather do in Python (our primary
language at Novasys). I have a file with varying length records. All
but the first record, that is; it's always 107 bytes long. What I would
like to do is strip out all linefeeds from the file, read the character
in position 107 (the end of segment delimiter) and then replace all of
the end of segment characters with linefeeds, making a file where each
segment is on its own line. Currently, some vendors supply files with
linefeeds, others don't, and some split the file every 80 bytes. In
Perl I would operate on the file in place and be on my way. The files
can be quite large, so I'd rather not be making extra copies unless it's
absolutely essential/required.

I turn to the collective wisdom/trickery of the list to point me in the
right direction. How can I perform the above task while keeping my sanity?

Thanks!
--greg
--
Greg Lindstrom 501 975.4859
Computer Programmer (e-mail address removed)
NovaSys Health
Little Rock, Arkansas

"We are the music makers, and we are the dreamers of dreams." W.W.

Michael Hoffman · Mar 9, 2005

Greg said:
I have a file with varying length records. All
but the first record, that is; it's always 107 bytes long. What I would
like to do is strip out all linefeeds from the file, read the character
in position 107 (the end of segment delimiter) and then replace all of
the end of segment characters with linefeeds, making a file where each
segment is on its own line.

Hmmmm... here's one way of doing it:

import mmap
import sys

DELIMITER_OFFSET = 107

data_file = file(sys.argv[1], "r+w")
data_file.seek(0, 2)
data_length = data_file.tell()
data = mmap.mmap(data_file.fileno(), data_length, access=mmap.ACCESS_WRITE)
delimiter = data[DELIMITER_OFFSET]

for index, char in enumerate(data):
if char == delimiter:
data[index] = "\n"

data.flush()

There are doubtless more efficient ways, like using mmap.mmap.find()
instead of iterating over every character but that's an exercise for
the reader. And personally I would make extra copies ANYWAY--not doing
so is asking for trouble.

M.E.Farmer · Mar 9, 2005

Greg said:
Hello-

I have a task which -- dare I say -- would be easy in

Perl </asbestos_undies> but would rather do in Python (our primary
language at Novasys). I have a file with varying length records. All
but the first record, that is; it's always 107 bytes long. What I would
like to do is strip out all linefeeds from the file, read the character
in position 107 (the end of segment delimiter) and then replace all of
the end of segment characters with linefeeds, making a file where each
segment is on its own line. Currently, some vendors supply files with
linefeeds, others don't, and some split the file every 80 bytes. In
Perl I would operate on the file in place and be on my way. The files
can be quite large, so I'd rather not be making extra copies unless it's
absolutely essential/required.

I turn to the collective wisdom/trickery of the list to point me in the
right direction. How can I perform the above task while keeping my sanity?

Thanks!
--greg
--
Greg Lindstrom 501 975.4859
Computer Programmer (e-mail address removed)
NovaSys Health
Little Rock, Arkansas

"We are the music makers, and we are the dreamers of dreams." W.W.

This should be fairly simple, but maybe not

# get the end of segment character
# this is not optimal but should be a start
f = open('yourrecord', 'r')
eos = f.seek(107).read(1)
r = f.read()
f.close()
r = r.replace('\r', '')
r = r.replace('\n', '')
r = r.replace(eos, '\n')
f = open('yourrecord', 'w')
f.write(r)
f.close()

hth,
M.E.Farmer

Michael Hoffman · Mar 9, 2005

Michael said:
Hmmmm... here's one way of doing it:

import mmap
import sys

DELIMITER_OFFSET = 107

N.B. this is a zero-based 107. If you are using one-based coordinates,
then this is actually position 108.

Recognizing the Arrival of a New File	2	Mar 8, 2005
Postgres and SSL	1	Feb 11, 2005
Sharing Base Class members	0	Jul 12, 2004
Accessing Postgress from Windows	0	Jan 28, 2005
EDI x12 --> XML	1	Feb 4, 2005
Using Paramiko	0	Apr 19, 2005
Pattern Matching	0	Jul 19, 2004
ODBC Connection on Windows XP	0	Jan 3, 2005

An Odd Little Script

Greg Lindstrom

Michael Hoffman

M.E.Farmer

Michael Hoffman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads