An Odd Little Script

G

Greg Lindstrom

Hello-

I have a task which -- dare I say -- would be easy in <asbestos_undies>
Perl </asbestos_undies> but would rather do in Python (our primary
language at Novasys). I have a file with varying length records. All
but the first record, that is; it's always 107 bytes long. What I would
like to do is strip out all linefeeds from the file, read the character
in position 107 (the end of segment delimiter) and then replace all of
the end of segment characters with linefeeds, making a file where each
segment is on its own line. Currently, some vendors supply files with
linefeeds, others don't, and some split the file every 80 bytes. In
Perl I would operate on the file in place and be on my way. The files
can be quite large, so I'd rather not be making extra copies unless it's
absolutely essential/required.

I turn to the collective wisdom/trickery of the list to point me in the
right direction. How can I perform the above task while keeping my sanity?

Thanks!
--greg
--
Greg Lindstrom 501 975.4859
Computer Programmer (e-mail address removed)
NovaSys Health
Little Rock, Arkansas

"We are the music makers, and we are the dreamers of dreams." W.W.
 
M

Michael Hoffman

Greg said:
I have a file with varying length records. All
but the first record, that is; it's always 107 bytes long. What I would
like to do is strip out all linefeeds from the file, read the character
in position 107 (the end of segment delimiter) and then replace all of
the end of segment characters with linefeeds, making a file where each
segment is on its own line.

Hmmmm... here's one way of doing it:

import mmap
import sys

DELIMITER_OFFSET = 107

data_file = file(sys.argv[1], "r+w")
data_file.seek(0, 2)
data_length = data_file.tell()
data = mmap.mmap(data_file.fileno(), data_length, access=mmap.ACCESS_WRITE)
delimiter = data[DELIMITER_OFFSET]

for index, char in enumerate(data):
if char == delimiter:
data[index] = "\n"

data.flush()

There are doubtless more efficient ways, like using mmap.mmap.find()
instead of iterating over every character but that's an exercise for
the reader. And personally I would make extra copies ANYWAY--not doing
so is asking for trouble.
 
M

M.E.Farmer

Greg said:
Hello-

I have a task which -- dare I say -- would be easy in
Perl </asbestos_undies> but would rather do in Python (our primary
language at Novasys). I have a file with varying length records. All
but the first record, that is; it's always 107 bytes long. What I would
like to do is strip out all linefeeds from the file, read the character
in position 107 (the end of segment delimiter) and then replace all of
the end of segment characters with linefeeds, making a file where each
segment is on its own line. Currently, some vendors supply files with
linefeeds, others don't, and some split the file every 80 bytes. In
Perl I would operate on the file in place and be on my way. The files
can be quite large, so I'd rather not be making extra copies unless it's
absolutely essential/required.

I turn to the collective wisdom/trickery of the list to point me in the
right direction. How can I perform the above task while keeping my sanity?

Thanks!
--greg
--
Greg Lindstrom 501 975.4859
Computer Programmer (e-mail address removed)
NovaSys Health
Little Rock, Arkansas

"We are the music makers, and we are the dreamers of dreams." W.W.

This should be fairly simple, but maybe not ;)
# get the end of segment character
# this is not optimal but should be a start
f = open('yourrecord', 'r')
eos = f.seek(107).read(1)
r = f.read()
f.close()
r = r.replace('\r', '')
r = r.replace('\n', '')
r = r.replace(eos, '\n')
f = open('yourrecord', 'w')
f.write(r)
f.close()

hth,
M.E.Farmer
 
M

Michael Hoffman

Michael said:
Hmmmm... here's one way of doing it:

import mmap
import sys

DELIMITER_OFFSET = 107

N.B. this is a zero-based 107. If you are using one-based coordinates,
then this is actually position 108.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Recognizing the Arrival of a New File 2
Postgres and SSL 1
Sharing Base Class members 0
Accessing Postgress from Windows 0
EDI x12 --> XML 1
Using Paramiko 0
Pattern Matching 0
ODBC Connection on Windows XP 0

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top