Data aggregation

vedranp · Mar 6, 2008

Hi,

I have a case where I should aggregate data from the CSV file, which
contains data in this way:

DATE TIME COUNTRY ZIP CITY VALUE1 VALUE2 VALUE3
21.2.2008 00:00 A 1000 CITY1 1 2 3
21.2.2008 00:00 A 1000 CITY2 4 5 6
21.2.2008 00:00 A 1000 CITY3 7 8 9
21.2.2008 00:00 A 1000 CITY4 1 2 3
21.2.2008 00:15 A 1000 CITY1 4 5 6
21.2.2008 00:15 A 1000 CITY2 7 8 9
21.2.2008 00:15 A 1000 CITY3 1 2 3
21.2.2008 00:15 A 1000 CITY4 4 5 6
21.2.2008 00:00 A 2000 CITY10 7 8 9
21.2.2008 00:00 A 2000 CITY20 1 2 3
21.2.2008 00:00 A 2000 CITY30 4 5 6
21.2.2008 00:00 A 2000 CITY40 1 2 3
21.2.2008 00:15 A 2000 CITY10 7 8 9
21.2.2008 00:15 A 2000 CITY20 1 2 3
21.2.2008 00:15 A 2000 CITY30 4 5 6
21.2.2008 00:15 A 2000 CITY40 1 2 3

I need to aggregate data from file1, so the result would be a CSV file
(file2) in this format:

DATE COUNTRY ZIP CITY SumOfVALUE1 SumOfVALUE2 SumOfVALUE3 formula1
21.2.2008 A 1000 CITY1 5 7 9 12
21.2.2008 A 1000 CITY2 11 13 15 24
21.2.2008 A 1000 CITY3 8 10 12 18
21.2.2008 A 1000 CITY4 5 7 9 12
21.2.2008 A 2000 CITY10 14 16 18 30
21.2.2008 A 2000 CITY20 2 4 6 6
21.2.2008 A 2000 CITY30 8 10 12 18
21.2.2008 A 2000 CITY40 2 4 6 6

So, group by DATE, COUNTRY, ZIP and CITY and sum (or do some
calculation) the values and do some calculation from summed fields
(e.g.: formula1 = SumOfVALUE1+SumOfVALUE2). I am able to do this by
first loading file1 in SQL, perform a query there, which returns the
file2 results and then load it back in the SQL in the different table.

I would like to avoid the step of taking data out from database in
order to process it. I would like to process the file1 in Python and
load the result (file2) in SQL.

From some little experience with Perl, I think this is managable with
double hash tables (1: basic hash with key/value = CITY/pointer-to-
other-hash, 2: hash table with values for CITY1), so I assume that
there would be also a way in Python, maybe with dictionaries? Any
ideas?

Regards,
Vedran.

jay graves · Mar 6, 2008

So, group by DATE, COUNTRY, ZIP and CITY and sum (or do some

You are soooo close. Look up itertools.groupby
Don't forget to sort your data first.

http://aspn.activestate.com/ASPN/search?query=groupby&x=0&y=0&section=PYTHONCKBK&type=Subsection
http://mail.python.org/pipermail/python-list/2006-June/388004.html

From some little experience with Perl, I think this is managable with
double hash tables (1: basic hash with key/value = CITY/pointer-to-
other-hash, 2: hash table with values for CITY1), so I assume that
there would be also a way in Python, maybe with dictionaries? Any
ideas?

Sometimes it makes sense to do this with dictionaries. For example,
if you need to do counts on various combinations of columns.

count of unique values in column 'A'
count of unique values in column 'C'
count of unique combinations of columns 'A' and 'B'
count of unique combinations of columns 'A' and 'C'
count of unique combinations of columns 'B' and 'C'
in all cases, sum(D) and avg(E)

Since I need 'C' by itself, and 'A' and 'C' together, I can't just
sort and break on 'A','B','C'.

HTH
....
jay graves

John Nagle · Mar 6, 2008

vedranp said:
I would like to avoid the step of taking data out from database in
order to process it.

You can probably do this entirely within SQL. Most SQL databases,
including MySQL, will let you put the result of a SELECT into a new
table.

John Nagle

petr.jakes.tpc · Mar 6, 2008

You can probably do this entirely within SQL. Most SQL databases,
including MySQL, will let you put the result of a SELECT into a new
table.

John Nagle

I agree,

maybe following can help

http://tinyurl.com/32colp

Petr Jakes

On-air script	46	Jul 23, 2023
Machine Learning.. Endless Struggle	3	Feb 16, 2023
How do I make this in C with for loop	3	Jan 16, 2023
Minimum Total Difficulty	0	Nov 15, 2023
PDF extraction of specific data	1	Jun 13, 2021
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
New baby here	2	Jan 3, 2020
C# How to convert date into en-US when thread culture is ar-SA	1	Feb 18, 2021

Data aggregation

vedranp

jay graves

John Nagle

petr.jakes.tpc

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads