Newbie question: replacing nulls in CSV with preceding value

Discussion in 'Python' started by Matt Waite, Feb 1, 2007.

  1. Matt Waite

    Matt Waite Guest

    My first post, my first real python use, please be gentle:

    I have a CSV file, exported from Excel, that has blank records in it,
    and I need to fill them in with the values from the record just above
    it until it hits a non-blank value. Here's an example of the data,
    which is in a file called test2.csv:

    Zone,City,Event
    1,Anytown,Event
    ,,Event1
    ,,Event2
    ,,Event44
    2,Anothertown,Event3
    ,,Event5
    ,,Event7

    What I need it to look like is:

    Zone,City,Event
    1,Anytown,Event1
    1,Anytown,Event2
    1,Anytown,Event44
    2,Anothertown,Event3
    2,Anothertown,Event5
    2,Anothertown,Event7

    Pretty much everything I've tried has failed, and I've been searching
    for hours for something similar and haven't found anything. The best
    I've come up with --which is half baked and I don't even know if it
    works -- is this:

    import csv
    citynew=''
    reader = csv.DictReader(open("/home/mwaite/test/test2.csv", "rb"))
    for row in reader:
    row['CITY'] == citynew
    else:
    citynew=row['CITY']

    The logic here -- in this case trying to fill in the city information
    -- would seem to work, but I'm not sure. And I'm not sure how to write
    the results of this back to the file.

    Any help anyone can offer is greatly appreciated. I'm trying hard to
    learn, but have been frustrated by this problem.

    Matt

    Matthew Waite
    www.mattwaite.com
     
    Matt Waite, Feb 1, 2007
    #1
    1. Advertisements

  2. Matt Waite

    skip Guest

    Matt> I have a CSV file, exported from Excel, that has blank records in
    Matt> it, and I need to fill them in with the values from the record
    Matt> just above it until it hits a non-blank value.

    Try something like:

    #!/usr/bin/env python

    import sys
    import csv

    last = {}
    reader = csv.DictReader(open("test1.csv", "rb"))
    writer = csv.DictWriter(open("test2.csv", "wb"),
    sys.stdout, fieldnames="Zone City Event".split())
    for row in reader:
    for key in row:
    if not row[key]:
    row[key] = last.get(key, "")
    writer.writerow(row)
    last = row

    Skip
     
    skip, Feb 1, 2007
    #2
    1. Advertisements

  3. Matt Waite

    Laszlo Nagy Guest

    I recommend that you create a writer and a new file. After you processed
    the input, you can delete the original file and replace it with the new
    one. My experience is that this is highly efficient and works with very
    large files. I would do something like this (not tested)

    import csv
    import os
    import shutil
    fin = file("/home/mwaite/test/test2.csv", "rb")
    reader = csv.DictReader(fin)
    fout = file(fin.name + '.tmp',"wb+")
    writer = csv.writer(fout)
    prev = {}
    columns = ('CITY','ZIP','NAME')
    for curr in reader:
    foreach col in columns:
    if (not curr.has_key(col)) or (curr[col] == ''):
    if prev.has_key(col):
    curr[col] = prev[col]
    else:
    curr[col] = ''
    row = [curr[col] for col in columns ]
    writer.write(row)
    prev = curr
    os.unlink(fin.name)
    fout.close()
    shutil.rename(fout.name,fin.name)

    I really did not test it, but you can get the idea.

    Best,

    Laszlo
     
    Laszlo Nagy, Feb 1, 2007
    #3
  4. Matt Waite a écrit :
    Did you actually tried it ? And itf yes, do you really think it 'works'?

    Ok, the solution is quite easy - assuming the very first data row has no
    blank values, and that either both Zone and City are blank or both are
    filled:

    prev_zone = prev_city = ''
    for row in reader:
    if row['Zone']:
    prev_zone, prev_city = row['Zone'], row['City']
    else:
    row['Zone'], row['City'] = prev_zone, prev_city
    # now do something with row...
    look for csv.DictWriter.writerow

    NB : note that you cannot "write back to the file" - just write to
    *another* file (then eventually os.rename(olfile, newfile))
     
    Bruno Desthuilliers, Feb 1, 2007
    #4
  5. Matt Waite

    Matt Waite Guest

    Thanks everyone for your help. I got Skip's answer to work (mine is
    pasted below):

    import sys
    import csv

    last = {}
    reader = csv.DictReader(open("/home/mwaite/test/test2.csv", "rb"))
    writer = csv.DictWriter(open("/home/mwaite/test/test3.csv", "wb"),
    ['ZONE','CITY','EVENT'], dialect='excel')
    for row in reader:
    for key in row:
    if not row[key]:
    row[key] = last.get(key, "")
    writer.writerow(row)
    last = row
     
    Matt Waite, Feb 1, 2007
    #5
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.