Pivot Table/Groupby/Sum question

Discussion in 'Python' started by patrick.waldo@gmail.com, Dec 27, 2007.

  1. Guest

    Hi all,

    I tried reading http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334695
    on the same subject, but it didn't work for me. I'm trying to learn
    how to make pivot tables from some excel sheets and I am trying to
    abstract this into a simple sort of example. Essentially I want to
    take input data like this:

    Name Time of day Amount
    Bob Morn 240
    Bob Aft 300
    Joe Morn 70
    Joe Aft 80
    Jil Morn 100
    Jil Aft 150

    And output it as:

    Name Total Morning Afternoon
    Bob 540 240 300
    Joe 150 70 80
    Jil 250 100 150
    Total 940 410 530

    The writing the output part is the easy part. However, I have a
    couple problems. 1) Grouping by name seems to work perfectly, but
    working by time does not. ie

    I will get:
    Bob
    240
    300
    Joe
    70
    80
    Jil
    100
    150
    which is great but...
    Morn
    240
    Aft
    300
    Morn
    70
    Aft
    80
    Morn
    100
    Aft
    150
    And not
    Morn
    240
    70
    100
    Aft
    300
    80
    150

    2) I can't figure out how to sum these values because of the
    iteration. I always get an error like: TypeError: iteration over non-
    sequence

    Here's the code:

    from itertools import groupby

    data = [['Bob', 'Morn', 240],['Bob', 'Aft', 300],['Joe', 'Morn', 70],
    ['Joe', 'Aft', 80],\
    ['Jil', 'Morn', 100],['Jil', 'Aft', 150]]

    NAME, TIME, AMOUNT=range(3)
    for k, g in groupby(data, key=lambda r: r[NAME]):
    print k
    for record in g:
    print "\t", record[AMOUNT]
    for k, g in groupby(data, key=lambda r: r[TIME]):
    print k
    for record in g:
    print "\t", record[AMOUNT]

    Thanks for any comments
    , Dec 27, 2007
    #1
    1. Advertising

  2. John Machin Guest

    On Dec 28, 4:56 am, wrote:

    > from itertools import groupby


    You seem to have overlooked this important sentence in the
    documentation: "Generally, the iterable needs to already be sorted on
    the same key function"
    John Machin, Dec 27, 2007
    #2
    1. Advertising

  3. Guest

    On Dec 27, 10:59 pm, John Machin <> wrote:
    > On Dec 28, 4:56 am, wrote:
    >
    > > from itertools import groupby

    >
    > You seem to have overlooked this important sentence in the
    > documentation: "Generally, the iterable needs to already be sorted on
    > the same key function"


    Yes, but I imagine this shouldn't prevent me from using and
    manipulating the data. It also doesn't explain why the names get
    sorted correctly and the time does not.

    I was trying to do this:

    count_tot = []
    for k, g in groupby(data, key=lambda r: r[NAME]):
    for record in g:
    count_tot.append((k,record[SALARY]))
    for i in count_tot:
    here I want to say add all the numbers for each person, but I'm
    missing something.

    If you have any ideas about how to solve this pivot table issue, which
    seems to be scant on Google, I'd much appreciate it. I know I can do
    this in Excel easily with the automated wizard, but I want to know how
    to do it myself and format it to my needs.
    , Dec 27, 2007
    #3
  4. John Machin Guest

    On Dec 28, 10:05 am, wrote:
    > On Dec 27, 10:59 pm, John Machin <> wrote:
    >
    > > On Dec 28, 4:56 am, wrote:

    >
    > > > from itertools import groupby

    >
    > > You seem to have overlooked this important sentence in the
    > > documentation: "Generally, the iterable needs to already be sorted on
    > > the same key function"

    >
    > Yes, but I imagine this shouldn't prevent me from using and
    > manipulating the data.


    You imagine correctly (and pointlessly) in general; however in
    particular it prevents you using itertools.groupby simplistically to
    manipulate the data in the way you want to manipulate it.

    > It also doesn't explain why the names get
    > sorted correctly and the time does not.


    The names in your example were NOT sorted, "correctly" or otherwise.
    The output order is the same as the input order: Bob, Joe, Jil.

    >>> seq = ['Bob', 'Joe', 'Jil']
    >>> sorted(seq)

    ['Bob', 'Jil', 'Joe']
    >>> seq == sorted(seq)

    False
    >>>


    >
    > I was trying to do this:
    >
    > count_tot = []
    > for k, g in groupby(data, key=lambda r: r[NAME]):
    > for record in g:
    > count_tot.append((k,record[SALARY]))
    > for i in count_tot:
    > here I want to say add all the numbers for each person, but I'm
    > missing something.
    >
    > If you have any ideas about how to solve this pivot table issue, which
    > seems to be scant on Google, I'd much appreciate it. I know I can do
    > this in Excel easily with the automated wizard, but I want to know how
    > to do it myself and format it to my needs.


    Watch this space.
    John Machin, Dec 28, 2007
    #4
  5. John Machin Guest

    On Dec 28, 11:48 am, John Machin <> wrote:
    > On Dec 28, 10:05 am, wrote:
    >
    >
    > > If you have any ideas about how to solve this pivot table issue, which
    > > seems to be scant on Google, I'd much appreciate it. I know I can do
    > > this in Excel easily with the automated wizard, but I want to know how
    > > to do it myself and format it to my needs.

    >
    > Watch this space.


    Tested as much as you see:

    8<---
    class SimplePivotTable(object):

    def __init__(
    self,
    row_order=None, col_order=None, # see example
    missing=0, # what to return for an empty cell. Alternatives:
    '', 0.0, None, 'NULL'
    ):
    self.row_order = row_order
    self.col_order = col_order
    self.missing = missing
    self.cell_dict = {}
    self.row_total = {}
    self.col_total = {}
    self.grand_total = 0
    self.headings_OK = False

    def add_item(self, row_key, col_key, value):
    self.grand_total += value
    try:
    self.col_total[col_key] += value
    except KeyError:
    self.col_total[col_key] = value
    try:
    self.cell_dict[row_key][col_key] += value
    self.row_total[row_key] += value
    except KeyError:
    try:
    self.cell_dict[row_key][col_key] = value
    self.row_total[row_key] += value
    except KeyError:
    self.cell_dict[row_key] = {col_key: value}
    self.row_total[row_key] = value

    def _process_headings(self):
    if self.headings_OK:
    return
    self.row_headings = self.row_order or
    list(sorted(self.row_total.keys()))
    self.col_headings = self.col_order or
    list(sorted(self.col_total.keys()))
    self.headings_OK = True

    def get_col_headings(self):
    self._process_headings()
    return self.col_headings

    def generate_row_info(self):
    self._process_headings()
    for row_key in self.row_headings:
    row_dict = self.cell_dict[row_key]
    row_vals = [row_dict.get(col_key, self.missing) for
    col_key in self.col_headings]
    yield row_key, self.row_total[row_key], row_vals

    def get_col_totals(self):
    self._process_headings()
    row_dict = self.col_total
    row_vals = [row_dict.get(col_key, self.missing) for col_key in
    self.col_headings]
    return self.grand_total, row_vals

    if __name__ == "__main__":

    data = [
    ['Bob', 'Morn', 240],
    ['Bob', 'Aft', 300],
    ['Joe', 'Morn', 70],
    ['Joe', 'Aft', 80],
    ['Jil', 'Morn', 100],
    ['Jil', 'Aft', 150],
    ['Bob', 'Aft', 40],
    ['Bob', 'Aft', 5],
    ['Dozy', 'Aft', 1], # Dozy doesn't show up till lunch-time
    ]
    NAME, TIME, AMOUNT = range(3)

    print
    ptab = SimplePivotTable(
    col_order=['Morn', 'Aft'],
    missing='uh-oh',
    )
    for s in data:
    ptab.add_item(row_key=s[NAME], col_key=s[TIME],
    value=s[AMOUNT])
    print ptab.get_col_headings()
    for x in ptab.generate_row_info():
    print x
    print 'Tots', ptab.get_col_totals()
    8<---
    John Machin, Dec 28, 2007
    #5
  6. Guest

    Wow, I did not realize it would be this complicated! I'm fairly new
    to Python and somehow I thought I could find a simpler solution. I'll
    have to mull over this to fully understand how it works for a bit.

    Thanks a lot!

    On Dec 28, 4:03 am, John Machin <> wrote:
    > On Dec 28, 11:48 am, John Machin <> wrote:
    >
    > > On Dec 28, 10:05 am, wrote:

    >
    > > > If you have any ideas about how to solve this pivot table issue, which
    > > > seems to be scant on Google, I'd much appreciate it. I know I can do
    > > > this in Excel easily with the automated wizard, but I want to know how
    > > > to do it myself and format it to my needs.

    >
    > > Watch this space.

    >
    > Tested as much as you see:
    >
    > 8<---
    > class SimplePivotTable(object):
    >
    > def __init__(
    > self,
    > row_order=None, col_order=None, # see example
    > missing=0, # what to return for an empty cell. Alternatives:
    > '', 0.0, None, 'NULL'
    > ):
    > self.row_order = row_order
    > self.col_order = col_order
    > self.missing = missing
    > self.cell_dict = {}
    > self.row_total = {}
    > self.col_total = {}
    > self.grand_total = 0
    > self.headings_OK = False
    >
    > def add_item(self, row_key, col_key, value):
    > self.grand_total += value
    > try:
    > self.col_total[col_key] += value
    > except KeyError:
    > self.col_total[col_key] = value
    > try:
    > self.cell_dict[row_key][col_key] += value
    > self.row_total[row_key] += value
    > except KeyError:
    > try:
    > self.cell_dict[row_key][col_key] = value
    > self.row_total[row_key] += value
    > except KeyError:
    > self.cell_dict[row_key] = {col_key: value}
    > self.row_total[row_key] = value
    >
    > def _process_headings(self):
    > if self.headings_OK:
    > return
    > self.row_headings = self.row_order or
    > list(sorted(self.row_total.keys()))
    > self.col_headings = self.col_order or
    > list(sorted(self.col_total.keys()))
    > self.headings_OK = True
    >
    > def get_col_headings(self):
    > self._process_headings()
    > return self.col_headings
    >
    > def generate_row_info(self):
    > self._process_headings()
    > for row_key in self.row_headings:
    > row_dict = self.cell_dict[row_key]
    > row_vals = [row_dict.get(col_key, self.missing) for
    > col_key in self.col_headings]
    > yield row_key, self.row_total[row_key], row_vals
    >
    > def get_col_totals(self):
    > self._process_headings()
    > row_dict = self.col_total
    > row_vals = [row_dict.get(col_key, self.missing) for col_key in
    > self.col_headings]
    > return self.grand_total, row_vals
    >
    > if __name__ == "__main__":
    >
    > data = [
    > ['Bob', 'Morn', 240],
    > ['Bob', 'Aft', 300],
    > ['Joe', 'Morn', 70],
    > ['Joe', 'Aft', 80],
    > ['Jil', 'Morn', 100],
    > ['Jil', 'Aft', 150],
    > ['Bob', 'Aft', 40],
    > ['Bob', 'Aft', 5],
    > ['Dozy', 'Aft', 1], # Dozy doesn't show up till lunch-time
    > ]
    > NAME, TIME, AMOUNT = range(3)
    >
    > print
    > ptab = SimplePivotTable(
    > col_order=['Morn', 'Aft'],
    > missing='uh-oh',
    > )
    > for s in data:
    > ptab.add_item(row_key=s[NAME], col_key=s[TIME],
    > value=s[AMOUNT])
    > print ptab.get_col_headings()
    > for x in ptab.generate_row_info():
    > print x
    > print 'Tots', ptab.get_col_totals()
    > 8<---
    , Dec 28, 2007
    #6
  7. Guest

    What about to let SQL to work for you.
    HTH

    Petr Jakes

    Tested on Python 2.5.1
    8<----------------------

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    import sqlite3
    con = sqlite3.connect(":memory:")
    cur = con.cursor()

    inputData=(
    ('Bob', 'Morn', 240),
    ('Bob', 'Aft', 300),
    ('Joe', 'Morn', 70),
    ('Joe', 'Aft', 80),
    ('Jil', 'Morn', 100),
    ('Jil', 'Aft', 150),
    ('Jil', 'Aft', 150),
    ('Jil', 'Aft', 150))

    def data_generator(dataSet):
    for dataSetRow in dataSet:
    yield dataSetRow

    pivotSelect='''
    SELECT
    NAME,
    sum (AMOUNT) as TOTAL,
    sum (case when (TIME_OF_DAY) = 'Morn' then AMOUNT else 0 END) as
    MORN,
    sum (case when (TIME_OF_DAY) = 'Aft' then AMOUNT else 0 END) as AFT
    FROM MY_NAMES
    GROUP BY 1'''

    cur.execute("create table MY_NAMES(NAME, TIME_OF_DAY, AMOUNT)")
    cur.executemany("""insert into MY_NAMES(NAME, TIME_OF_DAY, AMOUNT)
    values (?,?,?)""", data_generator(inputData))
    cur.execute(pivotSelect)
    for row in cur.fetchall():
    print row
    , Dec 28, 2007
    #7
  8. John Machin Guest

    On Dec 29, 9:58 am, wrote:
    > What about to let SQL to work for you.


    The OP is "trying to learn how to make pivot tables from some excel
    sheets". You had better give him a clue on how to use ODBC on an
    "excel sheet" :)

    [snip]
    > SELECT
    > NAME,
    > sum (AMOUNT) as TOTAL,
    > sum (case when (TIME_OF_DAY) = 'Morn' then AMOUNT else 0 END) as
    > MORN,
    > sum (case when (TIME_OF_DAY) = 'Aft' then AMOUNT else 0 END) as AFT


    This technique requires advance knowledge of what the column key
    values are (the hard-coded 'Morn' and 'Aft').

    <rant>
    It is the sort of thing that one sees when %SQL% is the *only*
    language used to produce end-user reports. Innocuous when there are
    only 2 possible columns, but bletchworthy when there are more than 20
    and the conditions are complex and the whole thing is replicated
    several times in the %SQL% script because either %SQL% doesn't support
    temporary procedures/functions or the BOsFH won't permit their use...
    not in front of the newbies, please!
    </rant>
    John Machin, Dec 29, 2007
    #8
  9. Guest

    Petr, thanks for the SQL suggestion, but I'm having enough trouble in
    Python.

    John would you mind walking me through your class in normal speak? I
    only have a vague idea of why it works and this would help me a lot to
    get a grip on classes and this sort of particular problem. The next
    step is to imagine if there was another variable, like departments and
    add up the information by name, department, and time, and so on...that
    will come another day.

    Thanks.



    On Dec 29, 1:00 am, John Machin <> wrote:
    > On Dec 29, 9:58 am, wrote:
    >
    > > What about to let SQL to work for you.

    >
    > The OP is "trying to learn how to make pivot tables from some excel
    > sheets". You had better give him a clue on how to use ODBC on an
    > "excel sheet" :)
    >
    > [snip]
    >
    > > SELECT
    > > NAME,
    > > sum (AMOUNT) as TOTAL,
    > > sum (case when (TIME_OF_DAY) = 'Morn' then AMOUNT else 0 END) as
    > > MORN,
    > > sum (case when (TIME_OF_DAY) = 'Aft' then AMOUNT else 0 END) as AFT

    >
    > This technique requires advance knowledge of what the column key
    > values are (the hard-coded 'Morn' and 'Aft').
    >
    > <rant>
    > It is the sort of thing that one sees when %SQL% is the *only*
    > language used to produce end-user reports. Innocuous when there are
    > only 2 possible columns, but bletchworthy when there are more than 20
    > and the conditions are complex and the whole thing is replicated
    > several times in the %SQL% script because either %SQL% doesn't support
    > temporary procedures/functions or the BOsFH won't permit their use...
    > not in front of the newbies, please!
    > </rant>
    , Dec 29, 2007
    #9
  10. John Machin Guest

    On Dec 29, 11:51 am, wrote:

    > John would you mind walking me through your class in normal speak?


    Yes.

    > I
    > only have a vague idea of why it works and this would help me a lot to
    > get a grip on classes and this sort of particular problem.


    It's about time you got a *concrete* idea of how something works. Grab
    a pencil and a large piece of paper, pretend you are python.exe and
    follow through what happens when it executes
    ptab = SimplePivotTable(.......)
    and the ptab.add_item(......) loop with this set of data:
    data = [
    ['Bob', 'Morn', 240],
    ['Bob', 'Aft', 300],
    ['Bob', 'Morn', 40],
    ]
    with the goal of being able to say what is in ptab.cell_dict and
    understanding how it got there.

    Cheers,
    John
    John Machin, Dec 29, 2007
    #10
  11. Guest

    Patrick,

    in your first posting you are writing "... I'm trying to learn how to
    make pivot tables from some excel sheets...". Can you be more specific
    please? AFIK Excel offers very good support for pivot tables. So why
    to read tabular data from the Excel sheet and than transform it to
    pivot tabel in Python?

    Petr
    , Dec 29, 2007
    #11
  12. Guest

    On Dec 29, 3:00 pm, wrote:
    > Patrick,
    >
    > in your first posting you are writing "... I'm trying to learn how to
    > make pivot tables from some excel sheets...". Can you be more specific
    > please? AFIK Excel offers very good support for pivot tables. So why
    > to read tabular data from the Excel sheet and than transform it to
    > pivot tabel in Python?
    >
    > Petr


    Yes, I realize Excel has excellent support for pivot tables. However,
    I hate how Excel does it and, for my particular excel files, I need
    them to be formated in an automated way because I will have a number
    of them over time and I'd prefer just to have python do it in a flash
    than to do it every time with Excel.

    >It's about time you got a *concrete* idea of how something works.


    Absolutely right. I tend to take on ideas that I'm not ready for, in
    the sense that I only started using Python some months ago for some
    basic tasks and now I'm trying on some more complicated ones. With
    time, though, I will get a concrete idea of what python.exe does, but,
    for someone who studied art history and not comp sci, I'm doing my
    best to get a handle on all of it. I think a pad of paper might be a
    good way to visualize it.
    , Dec 29, 2007
    #12
  13. Guest

    > Yes, I realize Excel has excellent support for pivot tables. However,
    > I hate how Excel does it and, for my particular excel files, I need
    > them to be formated in an automated way because I will have a number
    > of them over time and I'd prefer just to have python do it in a flash
    > than to do it every time with Excel.
    >


    Patrick,

    Few more questions:
    - Where the data come from (I mean: are your data in Excel already
    when you get them)?

    - If your primary source of data is the Excel file, how do you read
    data from the Excel
    file to Python (I mean did you solve this part of the task already)?

    Petr
    , Dec 29, 2007
    #13
  14. Guest

    Sorry for the delay in my response. New Year's Eve and moving
    apartment

    > - Where the data come from (I mean: are your data in Excel already
    > when you get them)?
    > - If your primary source of data is the Excel file, how do you read
    > data from the Excel file to Python (I mean did you solve this part of the task already)?


    Yes, the data comes from Excel and I use xlrd and PyExcelerator to
    read and write, respectively.
    #open for reading
    path_file = "c:\\1\\data.xls"
    book = xlrd.open_workbook(path_file)
    Counts = book.sheet_by_index(1)
    #get data
    n=1
    data = []
    while n<Counts.nrows:
    data.append(Counts.row_values(n, start_colx=0, end_colx=None))
    n=n+1
    #
    #Heres the part where I need to do the pivot table
    #
    #open a new Excel file for writing
    wb=pyExcelerator.Workbook()
    matrix = wb.add_sheet("matrix")
    wb.save('c:\\1\\matrix.xls')


    So the data comes in as a long list. I'm dealing with some
    information on various countries with 6 pieces of information to
    pivot. Just to make it simple it's like a video store database. The
    data is like [Country, Category, Sub Category, Film Title, Director,
    Number of Copies]. data = [['Italy', 'Horror', '70s', 'Suspiria',
    'Dario Argento', 4],['Italy', 'Classics', 'Neo-Realist', 'Otto e
    Mezzo', 'Fellini', 3],['Italy', 'Horror', '70s', 'Profondo Rosso',
    'Dario Argento', 4],...]. So there are 4 copies of Suspiria and 3 of
    8 1/2. What I want is the total number of films for each country,
    category and subcategory, ie there are 11 Italian films and 8 Italian
    horror films from the 70s, etc...I will then output the data like this
    | Horror | Classics ...
    Total | 70s Slasher | Neo-Realist Western ...
    Total
    America 200 20 30 0 10 ...
    Argentina 304 1 0 0 0 ...
    .....
    Italy 11 7 0 3 0 ...

    I'm just making up data here, but that's the idea. As I said, Excel
    does it in it's own way, but that whole idea here is to organize it
    exactly how I want it in an automated way. It's an interesting
    problem, one that I think a lot of people might want to use. I'm
    surprised that Programming Python, with all it's talk of dictionaries,
    doesn't have a section on organizing tabular data like this, or I'm
    more likely missing something or not making a connection.

    Any ideas, hints, or pointers on where I could learn more about this
    would be helpful. Otherwise, I'm off to the drawing board!
    , Jan 2, 2008
    #14
  15. Guest

    > So the data comes in as a long list. I'm dealing with some
    > information on various countries with 6 pieces of information to
    > pivot. Just to make it simple it's like a video store database. The
    > data is like [Country, Category, Sub Category, Film Title, Director,
    > Number of Copies]. data = [['Italy', 'Horror', '70s', 'Suspiria',
    > 'Dario Argento', 4],['Italy', 'Classics', 'Neo-Realist', 'Otto e
    > Mezzo', 'Fellini', 3],['Italy', 'Horror', '70s', 'Profondo Rosso',
    > 'Dario Argento', 4],...]. So there are 4 copies of Suspiria and 3 of
    > 8 1/2. What I want is the total number of films for each country,
    > category and subcategory, ie there are 11 Italian films and 8 Italian
    > horror films from the 70s, etc...I will then output the data like this
    > | Horror | Classics ...
    > Total | 70s Slasher | Neo-Realist Western ...
    > Total
    > America 200 20 30 0 10 ...
    > Argentina 304 1 0 0 0 ...
    > ....
    > Italy 11 7 0 3 0 ...


    Did you mean your table has to look like the following?

    | Horror | Horror | Classics | Classics
    Total | 70s | Slasher | Neo-Realist | Western ...
    Total
    America 200 20 30 0 10 ...
    Argentina 304 1 0 0 0 ...
    .....
    , Jan 3, 2008
    #15
  16. Guest

    Yes in the sense that the top part will have merged cells so that
    Horror and Classics don't need to be repeated every time, but the
    headers aren't the important part. At this point I'm more interested
    in organizing the data itself and i can worry about putting it into a
    new excel file later.
    , Jan 3, 2008
    #16
  17. Guest

    On Jan 3, 3:41 pm, wrote:
    > Yes in the sense that the top part will have merged cells so that
    > Horror and Classics don't need to be repeated every time, but the
    > headers aren't the important part. At this point I'm more interested
    > in organizing the data itself and i can worry about putting it into a
    > new excel file later.


    What I am able to do is to use SQL, I think it is quite simple, about
    50 rows of code including sample data and comments. It works for me
    and IMHO it is easy tu understand and I think you can use as it is.
    Otherwise you can control Excel from Python using win32api, win32com
    ( http://tinyurl.com/2m3x3v )

    HTH

    Petr Jakes

    #!/usr/bin/env python
    # -*- coding: cp1250 -*-

    import sqlite3
    con = sqlite3.connect(":memory:")
    cur = con.cursor()

    normalizedInputData=[]
    subCategories=[]

    rawData = [['Italy', 'Horror', '70s', 'Suspiria','Dario Argento', 4],
    ['Italy', 'Classics', 'Neo-Realist', 'Otto eMezzo', 'Fellini',
    3],
    ['GB', 'Classics', 'Neo-Humoristic', 'Otto eMezzo', 'Fellini',
    3],
    ['Fr', 'Horror', 'Neo-Realist', 'Otto eMezzo', 'Fellini', 8],
    ['Fr', 'Classics', 'Neo-Realist', 'Otto eMezzo', 'Fellini',
    55],
    ['GB', 'Horror', 'Neo-Realist', 'Otto eMezzo', 'Fellini', 5],
    ['Italy', 'Horror', '70s', 'Profondo Rosso','Dario Argento',
    4]]

    def alphanum(s):
    """only letters, numbers and '_' are acceptable for column names
    by SQL"""
    filtered=''
    for ch in s:
    if ch.isalnum() or ch in '_':
    filtered+=ch
    return filtered

    for myRow in rawData :
    cat_SubCat = alphanum("_".join(myRow[1:3]))
    if cat_SubCat not in subCategories:
    subCategories.append(cat_SubCat)
    myRow[1:3] = [cat_SubCat]
    normalizedInputData.append(myRow)

    def data_generator(dataSet):
    for dataSetRow in dataSet:
    yield dataSetRow

    subCategories=sorted(subCategories)

    # create SQL table named "MOVIES" with the apropriate fields (the tabe
    is store in the memory only)
    cur.execute("create table MOVIES(COUNTRY, CATEGORY, TITLE, DIRECTOR,
    QUANTITY)")

    # fill the table with data
    cur.executemany("""insert into MOVIES(COUNTRY, CATEGORY, TITLE,
    DIRECTOR, QUANTITY) values (?,?,?,?,?)""",
    data_generator(normalizedInputData))

    # assemble dynamic SQL SELECT query, which returns PIVOT TABLE
    prologue = "select COUNTRY, SUM(QUANTITY) AS TOTAL, "
    template = "SUM (CASE CATEGORY WHEN '%s' THEN QUANTITY ELSE 0 END) %s"
    epilogue = " FROM MOVIES GROUP BY 1 ORDER BY 1"
    pivotSelect = prologue + ", ".join([template % (x, x) for x in
    subCategories]) + epilogue

    # execute SQL SELECT and return data row by row
    cur.execute(pivotSelect)
    for row in cur.fetchall():
    print row
    , Jan 3, 2008
    #17
  18. Guest

    Petr thanks so much for your input. I'll try to learn SQL, especially
    if I'll do a lot of database work.

    I tried to do it John's way as en exercise and I'm happy to say I
    understand a lot more. Basically I didn't realize I could nest
    dictionaries like db = {country:{genre:{sub_genre:3}}} and call them
    like db[country][genre][sub_genre]. The Python Cookbook was quite
    helpful to figure out why items needed to be added the way they did.
    Also using the structure of the dictionary was a conceptually easier
    solution than what I found on http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334695.

    So, now I need to work on writing it to Excel. I'll update with the
    final code.

    Thanks again.

    #Movie Store Example
    class PivotData:
    def __init__(self):
    self.total_mov = 0
    self.total_cou = {}
    self.total_gen = {}
    self.total_sub = {}
    self.total_cou_gen ={}
    self.db = {}
    def add_data(self,country,genre,sub_genre,value):
    self.total_mov += value
    try:
    self.total_cou[country] += value
    except KeyError:
    self.total_cou[country] = value
    try:
    self.total_gen[genre] += value
    except:
    self.total_gen[genre] = value
    try:
    self.total_sub[sub_genre] += value
    except:
    self.total_sub[sub_genre] = value
    try:
    self.total_cou_gen[country][genre] += value
    except KeyError:
    try:
    self.total_cou_gen[country][genre] = value
    except KeyError:
    self.total_cou_gen[country] = {genre:value}
    try:
    self.db[country][genre][sub_genre] += value
    except KeyError:
    try:
    self.db[country][genre][sub_genre] = value
    except KeyError:
    try:
    self.db[country][genre] = {sub_genre:value}
    except:
    self.db[country] = {genre:{sub_genre:value}}

    data = [['argentina','Horror', 'Slasher',4],
    ['argentina','Horror', 'Halloween',6],
    ['argentina','Drama','Romance',5],
    ['argentina','Drama','Romance',1],
    ['argentina','Drama','True Life',1],
    ['japan','Classics','WWII',1],
    ['japan','Cartoons','Anime',1],
    ['america','Comedy','Stand-Up',1],
    ['america','Cartoons','WB',10],
    ['america','Cartoons','WB',3]]

    COUNTRY, GENRE, SUB_GENRE, VALUE =range(4)
    x=PivotData()
    for s in data:
    x.add_data(s[COUNTRY],s[GENRE],s[SUB_GENRE],s[VALUE])
    print
    print 'Total Movies:\n', x.total_mov
    print 'Total for each country\n', x.total_cou
    print 'Total Genres\n', x.total_gen
    print 'Total Sub Genres\n', x.total_sub
    print 'Total Genres for each Country\n', x.total_cou_gen
    print
    print x.db
    , Jan 4, 2008
    #18
  19. Guest

    On Jan 4, 4:55 pm, wrote:
    > Petr thanks so much for your input. I'll try to learnSQL, especially
    > if I'll do a lot of database work.
    >
    > I tried to do it John's way as en exercise and I'm happy to say I
    > understand a lot more. Basically I didn't realize I could nest
    > dictionaries like db = {country:{genre:{sub_genre:3}}} and call them
    > like db[country][genre][sub_genre]. The Python Cookbook was quite
    > helpful to figure out why items needed to be added the way they did.
    > Also using the structure of the dictionary was a conceptually easier
    > solution than what I found onhttp://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/334695.
    >
    > So, now I need to work on writing it to Excel. I'll update with the
    > final code.
    >


    Hi, good to know you have succeded. I think it is matter of taste
    which way to go (dictionary or database). My feelig is: for data use
    database! If you are trying to work with data, you will need it sooner
    or later anyway. Again: database is made for data! :)

    Writing your data to excel?
    Just save your numbers separated by commas in the file with the
    extension csv (my_data.csv) and you can open it directly in Excel.

    Good luck :)

    Petr
    , Jan 4, 2008
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?QWpheQ==?=

    Create Pivot table

    =?Utf-8?B?QWpheQ==?=, Feb 13, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    7,443
    Alvin Bruney [MVP]
    Feb 16, 2004
  2. =?Utf-8?B?UG9udGlNYXg=?=

    Pivot table control

    =?Utf-8?B?UG9udGlNYXg=?=, Jun 15, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    4,360
    Ken Cox [Microsoft MVP]
    Jun 15, 2004
  3. Phil Winstanley [Microsoft MVP ASP.NET]

    Re: Pivot table control

    Phil Winstanley [Microsoft MVP ASP.NET], Jun 15, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    355
    Phil Winstanley [Microsoft MVP ASP.NET]
    Jun 15, 2004
  4. Phil Winstanley [Microsoft MVP ASP.NET]

    Re: Pivot table control

    Phil Winstanley [Microsoft MVP ASP.NET], Jun 15, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    409
    Kevin Spencer
    Jun 15, 2004
  5. Lee
    Replies:
    14
    Views:
    2,288
Loading...

Share This Page