convert non-delimited to delimited

Discussion in 'Python' started by RyanL, Aug 27, 2007.

  1. RyanL

    RyanL Guest

    I'm a newbie! I have a non-delimited data file that I'd like to
    convert to delimited.

    Example...
    Line in non-delimited file:
    0139725635999992000010100534+42050-102800FM-15+1198KAIA

    Should be:
    0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

    What is the best way to go about this? I've looked all over for
    examples, help, suggestions, but have not found much. CSV module
    doesn't seem to do exactly what I want. Maybe I'm just missing
    something or not using the correct terminology in my searches. Any
    assistance is greatly appreaciated! Using Python 2.4
     
    RyanL, Aug 27, 2007
    #1
    1. Advertising

  2. RyanL

    Mark Elston Guest

    * RyanL wrote (on 8/27/2007 10:59 AM):
    > I'm a newbie! I have a non-delimited data file that I'd like to
    > convert to delimited.
    >
    > Example...
    > Line in non-delimited file:
    > 0139725635999992000010100534+42050-102800FM-15+1198KAIA
    >
    > Should be:
    > 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
    >
    > What is the best way to go about this? I've looked all over for
    > examples, help, suggestions, but have not found much. CSV module
    > doesn't seem to do exactly what I want. Maybe I'm just missing
    > something or not using the correct terminology in my searches. Any
    > assistance is greatly appreaciated! Using Python 2.4
    >


    Since you have to know, a priori, how to break the input string I
    assume that these fields are of fixed length. You can use the following
    to do what you want:

    >>> a="0139725635999992000010100534+42050-102800FM-15+1198KAIA"
    >>> print "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s" %

    (a[0:4],a[4:10],a[10:15],a[15:19],a[19:21],a[21:23],a[23:25],
    a[25:27],a[27],a[28:34],a[34:41],a[41:46],a[46:51],a[51:])

    which results in the following output:

    0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

    Mark
     
    Mark Elston, Aug 27, 2007
    #2
    1. Advertising

  3. RyanL

    Guest

    On Aug 27, 1:35 pm, Mark Elston <> wrote:
    > * RyanL wrote (on 8/27/2007 10:59 AM):
    >
    >
    >
    >
    >
    > > I'm a newbie! I have a non-delimited data file that I'd like to
    > > convert to delimited.

    >
    > > Example...
    > > Line in non-delimited file:
    > > 0139725635999992000010100534+42050-102800FM-15+1198KAIA

    >
    > > Should be:
    > > 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

    >
    > > What is the best way to go about this? I've looked all over for
    > > examples, help, suggestions, but have not found much. CSV module
    > > doesn't seem to do exactly what I want. Maybe I'm just missing
    > > something or not using the correct terminology in my searches. Any
    > > assistance is greatly appreaciated! Using Python 2.4

    >
    > Since you have to know, a priori, how to break the input string I
    > assume that these fields are of fixed length. You can use the following
    > to do what you want:
    >
    > >>> a="0139725635999992000010100534+42050-102800FM-15+1198KAIA"
    > >>> print "%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s" %

    > (a[0:4],a[4:10],a[10:15],a[15:19],a[19:21],a[21:23],a[23:25],
    > a[25:27],a[27],a[28:34],a[34:41],a[41:46],a[46:51],a[51:])
    >
    > which results in the following output:
    >
    > 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
    >
    > Mark


    Or try this:

    import struct
    test = '0139725635999992000010100534+42050-102800FM-15+1198KAIA'
    template = '4s6s5s4s2s2s2s2s1s6s7s5s5s4s'
    the_line = struct.unpack(template,test)
    print the_line
    print ','.join(the_line)

    ## ('0139', '725635', '99999', '2000', '01', '01', '00', '53', '4',
    '+42050', '-102800', 'FM-15', '+1198', 'KAIA')
    ##
    ##
    0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
     
    , Aug 27, 2007
    #3
  4. RyanL

    Matimus Guest

    On Aug 27, 10:59 am, RyanL <> wrote:
    > I'm a newbie! I have a non-delimited data file that I'd like to
    > convert to delimited.
    >
    > Example...
    > Line in non-delimited file:
    > 0139725635999992000010100534+42050-102800FM-15+1198KAIA
    >
    > Should be:
    > 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
    >
    > What is the best way to go about this? I've looked all over for
    > examples, help, suggestions, but have not found much. CSV module
    > doesn't seem to do exactly what I want. Maybe I'm just missing
    > something or not using the correct terminology in my searches. Any
    > assistance is greatly appreaciated! Using Python 2.4


    I don't think you are going to find anything that will just do this
    for you. You are going to have read the file, figure out where to
    split the string, and reprint it delimited with commas. As for
    suggesting code... I can't tell how you actually want to delimit the
    stuff from the above example? Are the fields always a fixed number of
    characters? If they aren't then is there some other method for
    determining how many characters to group into a field? From the looks
    of it you could split that string any way you want and get something
    that looks right, but isn't.

    Matt
     
    Matimus, Aug 27, 2007
    #4
  5. On Aug 27, 2007, at 10:59 AM, RyanL wrote:

    > I'm a newbie! I have a non-delimited data file that I'd like to
    > convert to delimited.
    >
    > Example...
    > Line in non-delimited file:
    > 0139725635999992000010100534+42050-102800FM-15+1198KAIA
    >
    > Should be:
    > 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
    >
    > What is the best way to go about this? I've looked all over for
    > examples, help, suggestions, but have not found much. CSV module
    > doesn't seem to do exactly what I want. Maybe I'm just missing
    > something or not using the correct terminology in my searches. Any
    > assistance is greatly appreaciated! Using Python 2.4


    Is each data element a fixed size?
     
    Michael Bentley, Aug 27, 2007
    #5
  6. RyanL

    Neil Cerutti Guest

    On 2007-08-27, RyanL <> wrote:
    > I'm a newbie! I have a non-delimited data file that I'd like to
    > convert to delimited.
    >
    > Example...
    > Line in non-delimited file:
    > 0139725635999992000010100534+42050-102800FM-15+1198KAIA
    >
    > Should be:
    > 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA


    It looks like a fixed format data file, also called flat-record
    files, in which fields are of fixed lengths, and records are
    separated by newlines.

    > What is the best way to go about this?


    Check out chapter 2 of _Text Processing in Python_ for one
    solution.

    http://gnosis.cx/TPiP/chap2.txt

    --
    Neil Cerutti
    Weight Watchers will meet at 7 p.m. Please use large double door at the side
    entrance. --Church Bulletin Blooper
     
    Neil Cerutti, Aug 27, 2007
    #6
  7. RyanL

    Paul McGuire Guest

    On Aug 27, 12:59 pm, RyanL <> wrote:
    > I'm a newbie! I have a non-delimited data file that I'd like to
    > convert to delimited.
    >
    > Example...
    > Line in non-delimited file:
    > 0139725635999992000010100534+42050-102800FM-15+1198KAIA
    >
    > Should be:
    > 0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
    >
    > What is the best way to go about this? I've looked all over for
    > examples, help, suggestions, but have not found much. CSV module
    > doesn't seem to do exactly what I want. Maybe I'm just missing
    > something or not using the correct terminology in my searches. Any
    > assistance is greatly appreaciated! Using Python 2.4


    I'm guessing that these lines *aren't* fixed-length, especially those
    signed integer fields. I used the patented Paul McGuire CrystalBall
    module to come up with this pyparsing rendition. (OP may adjust to
    suit.)

    -- Paul

    data = "0139725635999992000010100534+42050-102800FM-15+1198KAIA"
    """to be parsed as:
    0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA"""

    from pyparsing import *
    import time
    def convertTimeStamp(t):
    t["date"] = map(int,t.date)
    t["time"] = map(int,t.time)
    return time.strftime("%Y-%m-%dT%H:%M",
    tuple(t.date)+tuple(t.time)+(0,0,0,0))

    yearMonthDay = Word(nums,exact=4) + Word(nums,exact=2) +
    Word(nums,exact=2)
    hourMinuteSecond = Word(nums,exact=2) + Word(nums,exact=2)
    timestamp = ( yearMonthDay("date") + hourMinuteSecond("time") )
    timestamp.setParseAction(convertTimeStamp)
    signedInteger = Word("+-",nums)

    fieldA = Word(nums,exact=4)("A")
    fieldB = Word(nums,exact=6)("B")
    fieldC = Word(nums,exact=5)("C")
    fieldD = timestamp("timestamp")
    fieldE = Word(nums)("E")
    fieldF = signedInteger("latitude").setParseAction(lambda t : int(t[0])/
    1000.0)
    fieldG = signedInteger("longitude").setParseAction(lambda t :
    int(t[0])/1000.0)
    fieldH = Combine(Word(alphas,exact=2) + "-" + Word(nums,exact=2))("H")
    fieldI = signedInteger("I")
    fieldJ = Word(alphas)("J")
    dataFields = fieldA + fieldB + fieldC + fieldD + fieldE + \
    fieldF + fieldG + fieldH + fieldI + fieldJ

    res = dataFields.parseString(data)
    print res.dump()

    prints:

    ['0139', '725635', '99999', '2000-01-01T00:53', '4',
    42.049999999999997, -102.8, 'FM-15', '+1198', 'KAIA']
    - A: 0139
    - B: 725635
    - C: 99999
    - E: 4
    - H: FM-15
    - I: +1198
    - J: KAIA
    - latitude: 42.05
    - longitude: -102.8
    - timestamp: 2000-01-01T00:53
     
    Paul McGuire, Aug 28, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Panu Kinnari

    Parsing non-delimited text file

    Panu Kinnari, Oct 2, 2008, in forum: Ruby
    Replies:
    6
    Views:
    179
    Panu Kinnari
    Oct 4, 2008
  2. Julian Hsiao
    Replies:
    6
    Views:
    114
    Tassilo v. Parseval
    Jul 29, 2003
  3. Replies:
    7
    Views:
    285
    Guest
    May 27, 2006
  4. bruce
    Replies:
    38
    Views:
    316
    Mark Lawrence
    Nov 1, 2013
  5. MRAB
    Replies:
    0
    Views:
    112
Loading...

Share This Page