An Odd Little Script

Discussion in 'Python' started by Greg Lindstrom, Mar 9, 2005.

  1. Hello-

    I have a task which -- dare I say -- would be easy in <asbestos_undies>
    Perl </asbestos_undies> but would rather do in Python (our primary
    language at Novasys). I have a file with varying length records. All
    but the first record, that is; it's always 107 bytes long. What I would
    like to do is strip out all linefeeds from the file, read the character
    in position 107 (the end of segment delimiter) and then replace all of
    the end of segment characters with linefeeds, making a file where each
    segment is on its own line. Currently, some vendors supply files with
    linefeeds, others don't, and some split the file every 80 bytes. In
    Perl I would operate on the file in place and be on my way. The files
    can be quite large, so I'd rather not be making extra copies unless it's
    absolutely essential/required.

    I turn to the collective wisdom/trickery of the list to point me in the
    right direction. How can I perform the above task while keeping my sanity?

    Thanks!
    --greg
    --
    Greg Lindstrom 501 975.4859
    Computer Programmer
    NovaSys Health
    Little Rock, Arkansas

    "We are the music makers, and we are the dreamers of dreams." W.W.
     
    Greg Lindstrom, Mar 9, 2005
    #1
    1. Advertising

  2. Greg Lindstrom wrote:

    > I have a file with varying length records. All
    > but the first record, that is; it's always 107 bytes long. What I would
    > like to do is strip out all linefeeds from the file, read the character
    > in position 107 (the end of segment delimiter) and then replace all of
    > the end of segment characters with linefeeds, making a file where each
    > segment is on its own line.


    Hmmmm... here's one way of doing it:

    import mmap
    import sys

    DELIMITER_OFFSET = 107

    data_file = file(sys.argv[1], "r+w")
    data_file.seek(0, 2)
    data_length = data_file.tell()
    data = mmap.mmap(data_file.fileno(), data_length, access=mmap.ACCESS_WRITE)
    delimiter = data[DELIMITER_OFFSET]

    for index, char in enumerate(data):
    if char == delimiter:
    data[index] = "\n"

    data.flush()

    There are doubtless more efficient ways, like using mmap.mmap.find()
    instead of iterating over every character but that's an exercise for
    the reader. And personally I would make extra copies ANYWAY--not doing
    so is asking for trouble.
    --
    Michael Hoffman
     
    Michael Hoffman, Mar 9, 2005
    #2
    1. Advertising

  3. Greg Lindstrom

    M.E.Farmer Guest

    Greg Lindstrom wrote:
    > Hello-
    >
    > I have a task which -- dare I say -- would be easy in

    <asbestos_undies>
    > Perl </asbestos_undies> but would rather do in Python (our primary
    > language at Novasys). I have a file with varying length records.

    All
    > but the first record, that is; it's always 107 bytes long. What I

    would
    > like to do is strip out all linefeeds from the file, read the

    character
    > in position 107 (the end of segment delimiter) and then replace all

    of
    > the end of segment characters with linefeeds, making a file where

    each
    > segment is on its own line. Currently, some vendors supply files

    with
    > linefeeds, others don't, and some split the file every 80 bytes. In
    > Perl I would operate on the file in place and be on my way. The

    files
    > can be quite large, so I'd rather not be making extra copies unless

    it's
    > absolutely essential/required.
    >
    > I turn to the collective wisdom/trickery of the list to point me in

    the
    > right direction. How can I perform the above task while keeping my

    sanity?
    >
    > Thanks!
    > --greg
    > --
    > Greg Lindstrom 501 975.4859
    > Computer Programmer
    > NovaSys Health
    > Little Rock, Arkansas
    >
    > "We are the music makers, and we are the dreamers of dreams." W.W.


    This should be fairly simple, but maybe not ;)
    # get the end of segment character
    # this is not optimal but should be a start
    f = open('yourrecord', 'r')
    eos = f.seek(107).read(1)
    r = f.read()
    f.close()
    r = r.replace('\r', '')
    r = r.replace('\n', '')
    r = r.replace(eos, '\n')
    f = open('yourrecord', 'w')
    f.write(r)
    f.close()

    hth,
    M.E.Farmer
     
    M.E.Farmer, Mar 9, 2005
    #3
  4. Michael Hoffman wrote:
    > Greg Lindstrom wrote:
    >
    >> I have a file with varying length records. All but the first record,
    >> that is; it's always 107 bytes long. What I would like to do is strip
    >> out all linefeeds from the file, read the character in position 107
    >> (the end of segment delimiter) and then replace all of the end of
    >> segment characters with linefeeds, making a file where each segment is
    >> on its own line.

    >
    >
    > Hmmmm... here's one way of doing it:
    >
    > import mmap
    > import sys
    >
    > DELIMITER_OFFSET = 107


    N.B. this is a zero-based 107. If you are using one-based coordinates,
    then this is actually position 108.
    --
    Michael Hoffman
     
    Michael Hoffman, Mar 9, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michael Speer

    Odd behavior with odd code

    Michael Speer, Feb 16, 2007, in forum: C Programming
    Replies:
    33
    Views:
    1,104
    Richard Heathfield
    Feb 18, 2007
  2. =?Utf-8?B?TG93bGFuZGVy?=
    Replies:
    0
    Views:
    378
    =?Utf-8?B?TG93bGFuZGVy?=
    Mar 27, 2007
  3. ThaDoctor
    Replies:
    3
    Views:
    385
    Alan Woodland
    Sep 28, 2007
  4. Martin Mueller
    Replies:
    0
    Views:
    78
    Martin Mueller
    Aug 27, 2008
  5. Daniel
    Replies:
    1
    Views:
    214
    Bart van Ingen Schenau
    Jul 9, 2013
Loading...

Share This Page