Grouping on and exporting to csv files


A

accessnewbie

My current Test_Sort.csv looks like this (which constantly changes-althoughnever more than 4 records, thus far anyway):

RecNo,Count,District,Fruit,StoreNo
1,100,0,oranges,1254
2,30,3,apples,654
3,100,0,bananas,21
4,0,4,grapes,478

I wrote the following python 2.7 script (stuck with 2.7)

---------------------
import sys
import csv
import operator


source = "E:\\Data\\Test_Sort.csv"
result = "E:\\Data\\Test_Sort1.csv"

with open(source,"rb") as source:
rdr= csv.reader( source )
with open(result,"wb") as result:
next(rdr, None) # Skip header
wtr= csv.writer( result )
for r in rdr:
wtr.writerow( (r[2], r[3])) # Only write columns 3 and 4 to a csv file
exit
exit

data = csv.reader(open("E:\\Data\\Test_Sort1.csv"),delimiter=',')
sortedlist = sorted(data, key=operator.itemgetter(0,1)) # sort by column 1, the 2
print sortedlist

-----------------------------

My sortedlist results are:

[['0', 'bananas'], ['0', 'oranges'], ['3', 'apples'], ['4', 'grapes']]

I want to output all fruits with the same District number out to a csv filecalled by its District number. (No headers)

Thus in above example I would ultimately have three csv files:

0.csv
bananas
oranges

3.csv
apples

4.csv
grapes

Question is : Any ideas as to how export this sortedlist to the tables described above?

I already have a script that already will look at each file and count the number of lines and insert the correct number of commas and the word "and" where appropriate in a sentence.

My ultimate goal is to read each file and print out a sentence based on whether it has record or not, and if so how many records and then add the word"and" and/or commas between records if greater than 1. I will then concatenate all the sentences together to form a paragraph.

Thus, the above example would say:

The western district sold bananas and oranges. The central district sold apples. The southern district sold grapes.

or if the files were:

0.csv
apples
bananas
oranges

2.csv
grapes

The resulting paragraph would read:

The western district sold apples, bananas, and oranges. The eastern district sold grapes.

It may seem like a crazy way about getting my results but it was the best Icould come up with given my limited knowledge of python.

Thanks for any and all help.
 
Ad

Advertisements

P

Peter Otten

My current Test_Sort.csv looks like this (which constantly
changes-although never more than 4 records, thus far anyway):

RecNo,Count,District,Fruit,StoreNo
1,100,0,oranges,1254
2,30,3,apples,654
3,100,0,bananas,21
4,0,4,grapes,478

I wrote the following python 2.7 script (stuck with 2.7)

---------------------
import sys
import csv
import operator


source = "E:\\Data\\Test_Sort.csv"
result = "E:\\Data\\Test_Sort1.csv"

with open(source,"rb") as source:
rdr= csv.reader( source )
with open(result,"wb") as result:
next(rdr, None) # Skip header
wtr= csv.writer( result )
for r in rdr:
wtr.writerow( (r[2], r[3])) # Only write columns 3 and 4 to a
csv file
exit
exit

data = csv.reader(open("E:\\Data\\Test_Sort1.csv"),delimiter=',')
sortedlist = sorted(data, key=operator.itemgetter(0,1)) # sort by column
1, the 2 print sortedlist

-----------------------------

My sortedlist results are:

[['0', 'bananas'], ['0', 'oranges'], ['3', 'apples'], ['4', 'grapes']]

I want to output all fruits with the same District number out to a csv
file called by its District number. (No headers)

Thus in above example I would ultimately have three csv files:

0.csv
bananas
oranges

3.csv
apples

4.csv
grapes

Question is : Any ideas as to how export this sortedlist to the tables
described above?

The standard library offers itertools.groupby()

for district, group in itertools.groupby(
sortedlist, key=operator.itemgetter(0)):
with open(district + ".csv", "wb") as outfile:
csv.writer(outfile).writerows(group)

If you want try to do it by hand as an exercise -- you need a variable to
keep track of the current district:

last_district = None
for row in sortedlist:
district = row[0]
if last_district != district:
# open new file/writer
last_district = district
# write row
I already have a script that already will look at each file and count the
number of lines and insert the correct number of commas and the word "and"
where appropriate in a sentence.

My ultimate goal is to read each file and print out a sentence based on
whether it has record or not, and if so how many records and then add the
word "and" and/or commas between records if greater than 1. I will then
concatenate all the sentences together to form a paragraph.

Thus, the above example would say:

The western district sold bananas and oranges. The central district sold
apples. The southern district sold grapes.

Have a look at str.join():
colors = ["red", "yellow", "blue"]
print "Who's afraid of {}, and {}?".format(", ".join(colors[:-1]),
colors[-1])
Who's afraid of red, yellow, and blue?
or if the files were:

0.csv
apples
bananas
oranges

2.csv
grapes

The resulting paragraph would read:

The western district sold apples, bananas, and oranges. The eastern
district sold grapes.

It may seem like a crazy way about getting my results but it was the best
I could come up with given my limited knowledge of python.

No, it is what experienced programmers do all the time -- break a complex
task into manageable steps, come up with a simple solution for each step,
and then improve these steps as needed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top