What is the best data structure for a very simple spreadsheet?

Discussion in 'Python' started by vsoler, Jan 3, 2010.

  1. vsoler

    vsoler Guest

    Hi,

    Not sure this is the best group to post, but I cannot think of any
    other.

    My application would contain a limited set of "cells" represented by
    the instances of a Cell class:

    class Cell:
    ....

    A1=Cell(7)
    A2=Cell(2*A1)
    A3=Cell(3*A1+A2)
    A4=Cell(A3*4)

    Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140

    Now, I somehow want to be able to show a dependency tree

    1 level dependency trees
    A1: None
    A2: A1
    A3: A1, A2
    A4: A3

    All levels dependency trees

    A1: None
    A2: A1
    A3: A1, A2
    A4: A3, A2, A1

    Leaf + values dependency trees:

    A1: 7
    A2: A1=7, 2
    A3: 3, A1=7, 2
    A4: 3, A1=7, 2, 4

    What I'd like to know is:

    1) what are, in your opinion, the basic elements of the Cell class?
    2) Do I need a parser to evaluate the formulas like “3*A1+A2”? Can you
    recommend one library that already contains one?
    3) Do I need a tree data structure to represent my data? would the
    tree be an attribute of the class instance?

    I imagine a lot can be said on these questions. What I am looking for
    is some hints that help me get out of where I am now.

    Any help is highly appreciated.

    Vicente Soler
    vsoler, Jan 3, 2010
    #1
    1. Advertising

  2. On Sun, 03 Jan 2010 03:27:46 -0800, vsoler wrote:

    > My application would contain a limited set of "cells" represented by the
    > instances of a Cell class:
    >
    > class Cell:
    > ...
    >
    > A1=Cell(7)
    > A2=Cell(2*A1)
    > A3=Cell(3*A1+A2)
    > A4=Cell(A3*4)
    >
    > Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140
    >
    > Now, I somehow want to be able to show a dependency tree
    >
    > 1 level dependency trees
    > A1: None
    > A2: A1
    > A3: A1, A2
    > A4: A3
    >
    > All levels dependency trees
    >
    > A1: None
    > A2: A1
    > A3: A1, A2
    > A4: A3, A2, A1
    >
    > Leaf + values dependency trees:
    >
    > A1: 7
    > A2: A1=7, 2
    > A3: 3, A1=7, 2
    > A4: 3, A1=7, 2, 4
    >
    > What I'd like to know is:
    >
    > 1) what are, in your opinion, the basic elements of the Cell class?


    def Cell(object):
    def __init__(self, payload):
    self.payload = payload
    def __str__(self):
    return str(self.payload)
    def __float__(self):
    return float(self.payload)
    def dependency(self):
    try:
    return self.payload.dependency()
    except AttributeError:
    return ['None']


    Cells can contain one of three things: a number, a string, or a formula.
    The first two can be supported by providing a built-in Python object
    (float or str) as payload. You can support formulae two ways that I can
    think of:

    (1) Provide a formula as a string, with a leading '='. Then, whenever you
    want to evaluate such a cell, you fetch the string from the cell, parse
    it, generate an arithmetic expression, and calculate it.

    (2) Instead of parsing the formula on every single spreadsheet refresh,
    use a couple of helper classes:

    class Indirect(object):
    def __init__(self, ref, sheet=None):
    if sheet is None:
    self.sheet = default_sheet()
    else:
    self.sheet = sheet
    self.ref = ref
    def __str__(self):
    return str(self.sheet[self.ref])
    def float(self):
    return float(self.sheet[self.ref])
    def dependency(self):
    return [self.ref]

    class Formula(object):
    def __init__(self, x, y, op):
    self.x = x
    self.y = y
    self.op = op
    def eval(self):
    return self.op(float(x), float(y))
    def dependency(self):
    return self.x.dependency(level) + self.y.dependency(level)



    Then do something like this:

    sheet = {}
    sheet['A1'] = Cell(7)
    sheet['A2'] = Cell(Formula(2, Indirect('A1'), operator.mul))
    sheet['A3'] = Cell(
    Formula(
    Formula(3, Indirect('A1'), operator.mul),
    Indirect('A2'),
    operator.add
    ))
    sheet['A4'] = Cell(Formula(Indirect('A3'), 4, operator.mul))


    Then you only need to parse each human-readable formula like '=2*A1' once.


    > 2) Do I need a parser to evaluate the formulas like “3*A1+A2�


    Yes.

    > Can you recommend one library that already contains one?


    Try PyParsing.


    > 3) Do I need a tree
    > data structure to represent my data? would the tree be an attribute of
    > the class instance?


    I suspect a dict will be faster.


    To get the dependencies of each cell:

    for key, value in sheet.items():
    print key, value.dependency()


    Keep in mind I haven't tested ANY of this -- it is entirely stream of
    consciousness. I've never actually done this, so I have no idea whether
    it is a good approach or not, but it seems to me that it might be.



    --
    Steven
    Steven D'Aprano, Jan 3, 2010
    #2
    1. Advertising

  3. vsoler

    vsoler Guest

    On Jan 3, 1:28 pm, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Sun, 03 Jan 2010 03:27:46 -0800, vsoler wrote:
    > > My application would contain a limited set of "cells" represented by the
    > > instances of a Cell class:

    >
    > > class Cell:
    > > ...

    >
    > > A1=Cell(7)
    > > A2=Cell(2*A1)
    > > A3=Cell(3*A1+A2)
    > > A4=Cell(A3*4)

    >
    > > Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140

    >
    > > Now, I somehow want to be able to show a dependency tree

    >
    > > 1 level dependency trees
    > >   A1: None
    > >   A2: A1
    > >   A3: A1, A2
    > >   A4: A3

    >
    > > All levels dependency trees

    >
    > >   A1: None
    > >   A2: A1
    > >   A3: A1, A2
    > >   A4: A3, A2, A1

    >
    > > Leaf + values dependency trees:

    >
    > >   A1: 7
    > >   A2: A1=7, 2
    > >   A3: 3, A1=7, 2
    > >   A4: 3, A1=7, 2, 4

    >
    > > What I'd like to know is:

    >
    > > 1) what are, in your opinion, the basic elements of the Cell class?

    >
    > def Cell(object):
    >     def __init__(self, payload):
    >         self.payload = payload
    >     def __str__(self):
    >         return str(self.payload)
    >     def __float__(self):
    >         return float(self.payload)
    >     def dependency(self):
    >         try:
    >             return self.payload.dependency()
    >         except AttributeError:
    >             return ['None']
    >
    > Cells can contain one of three things: a number, a string, or a formula.
    > The first two can be supported by providing a built-in Python object
    > (float or str) as payload. You can support formulae two ways that I can
    > think of:
    >
    > (1) Provide a formula as a string, with a leading '='. Then, whenever you
    > want to evaluate such a cell, you fetch the string from the cell, parse
    > it, generate an arithmetic expression, and calculate it.
    >
    > (2) Instead of parsing the formula on every single spreadsheet refresh,
    > use a couple of helper classes:
    >
    > class Indirect(object):
    >     def __init__(self, ref, sheet=None):
    >         if sheet is None:
    >             self.sheet = default_sheet()
    >         else:
    >             self.sheet = sheet
    >         self.ref = ref
    >     def __str__(self):
    >         return str(self.sheet[self.ref])
    >     def float(self):
    >         return float(self.sheet[self.ref])
    >     def dependency(self):
    >         return [self.ref]
    >
    > class Formula(object):
    >     def __init__(self, x, y, op):
    >         self.x = x
    >         self.y = y
    >         self.op = op
    >     def eval(self):
    >         return self.op(float(x), float(y))
    >     def dependency(self):
    >         return self.x.dependency(level) + self.y.dependency(level)
    >
    > Then do something like this:
    >
    > sheet = {}
    > sheet['A1'] = Cell(7)
    > sheet['A2'] = Cell(Formula(2, Indirect('A1'), operator.mul))
    > sheet['A3'] = Cell(
    >   Formula(
    >     Formula(3, Indirect('A1'), operator.mul),
    >     Indirect('A2'),
    >     operator.add
    >     ))
    > sheet['A4'] = Cell(Formula(Indirect('A3'), 4, operator.mul))
    >
    > Then you only need to parse each human-readable formula like '=2*A1' once.
    >
    > > 2) Do I need a parser to evaluate the formulas like “3*A1+A2”?

    >
    > Yes.
    >
    > > Can you recommend one library that already contains one?

    >
    > Try PyParsing.
    >
    > > 3) Do I need a tree
    > > data structure to represent my data? would the tree be an attribute of
    > > the class instance?

    >
    > I suspect a dict will be faster.
    >
    > To get the dependencies of each cell:
    >
    > for key, value in sheet.items():
    >     print key, value.dependency()
    >
    > Keep in mind I haven't tested ANY of this -- it is entirely stream of
    > consciousness. I've never actually done this, so I have no idea whether
    > it is a good approach or not, but it seems to me that it might be.
    >
    > --
    > Steven


    WOW!!!
    After lunch I am going to read your post thoroughly, but I can already
    see that you've put into it a lot of time and expertise.

    Thank you
    vsoler, Jan 3, 2010
    #3
  4. vsoler

    Lie Ryan Guest

    On 1/3/2010 10:27 PM, vsoler wrote:

    > 1) what are, in your opinion, the basic elements of the Cell class?


    The "user-entered formula" and "effective value". A Cell containing a
    formula "abc" has a value of "abc"; a cell containing the formula "=1+5"
    has a value of "6". You could use the 'property' decorator for the
    "effective value" attribute.

    > 2) Do I need a parser to evaluate the formulas like “3*A1+A2”? Can you
    > recommend one library that already contains one?


    Yes you do. PyParsing, regex, write your own parser; choose your own
    poison depending on the expected complexity of the formulaes.

    > 3) Do I need a tree data structure to represent my data?


    There are two main options for your "primary data structure":
    - table (2x2 list) -- if you expect most of your cells to be filled,
    this is simple, but uses lots of memory
    - sparse table (list/dicts of cells) -- if you have a large table that
    is mostly empty cells

    The Dependency Tree is the "side data structure"; a "Cell" is a "Node";
    a Cell's "dependencies" is the Node's "children"; a Cell's "dependant"
    is the Node's "parent". Cells maintains a list of dependencies (or
    dynamically generates them when requested) by analyzing the "formula"
    attribute for references to other cells.

    PS: Strictly speaking, this "side data structure" is not a "Tree" as
    there are multiple root nodes. I think it's called "Directed Acyclic
    Graph". However, it's true that, given a cell as a root, the cell's
    dependencies is a proper tree.

    > would the
    > tree be an attribute of the class instance?


    you could use 'property' decorator and make the dependency tree a
    "computed attribute" of the cell; many people recommended against a
    'property' that is heavy to compute though and calculating dependencies
    may be quite heavy depending on the complexity of the spreadsheet.

    > I imagine a lot can be said on these questions. What I am looking for
    > is some hints that help me get out of where I am now.
    >
    > Any help is highly appreciated.
    >
    > Vicente Soler
    Lie Ryan, Jan 3, 2010
    #4
  5. vsoler

    Jon Clements Guest

    On Jan 3, 2:58 pm, Lie Ryan <> wrote:
    > On 1/3/2010 10:27 PM, vsoler wrote:
    >
    > > 1) what are, in your opinion, the basic elements of the Cell class?

    >
    > The "user-entered formula" and "effective value". A Cell containing a
    > formula "abc" has a value of "abc"; a cell containing the formula "=1+5"
    > has a value of "6". You could use the 'property' decorator for the
    > "effective value" attribute.
    >
    > > 2) Do I need a parser to evaluate the formulas like 3*A1+A2 ? Can you
    > > recommend one library that already contains one?

    >
    > Yes you do. PyParsing, regex, write your own parser; choose your own
    > poison depending on the expected complexity of the formulaes.
    >
    > > 3) Do I need a tree data structure to represent my data?

    >
    > There are two main options for your "primary data structure":
    > - table (2x2 list) -- if you expect most of your cells to be filled,
    > this is simple, but uses lots of memory
    > - sparse table (list/dicts of cells) -- if you have a large table that
    > is mostly empty cells
    >
    > The Dependency Tree is the "side data structure"; a "Cell" is a "Node";
    > a Cell's "dependencies" is the Node's "children"; a Cell's "dependant"
    > is the Node's "parent". Cells maintains a list of dependencies (or
    > dynamically generates them when requested) by analyzing the "formula"
    > attribute for references to other cells.
    >
    > PS: Strictly speaking, this "side data structure" is not a "Tree" as
    > there are multiple root nodes. I think it's called "Directed Acyclic
    > Graph". However, it's true that, given a cell as a root, the cell's
    > dependencies is a proper tree.
    >
    > > would the
    > > tree be an attribute of the class instance?

    >
    > you could use 'property' decorator and make the dependency tree a
    > "computed attribute" of the cell; many people recommended against a
    > 'property' that is heavy to compute though and calculating dependencies
    > may be quite heavy depending on the complexity of the spreadsheet.
    >
    > > I imagine a lot can be said on these questions. What I am looking for
    > > is some hints that help me get out of where I am now.

    >
    > > Any help is highly appreciated.

    >
    > > Vicente Soler

    >
    >


    As well as what Steven & Lie have mentioned, maybe you could get
    inspiration from http://pyspread.sourceforge.net/

    Although I've not looked at it, my gut feeling is the author would've
    had to address these issues (take that as a disclaimer of some
    sort :) )

    From not having to had a need for this sort of thing, I'd probably go
    for a sparse representation as a graph.

    But again, depends why you're segmenting the "spreadsheet" from "need
    to tell dependencies".

    Jon.
    Jon Clements, Jan 3, 2010
    #5
  6. vsoler

    MRAB Guest

    vsoler wrote:
    > Hi,
    >
    > Not sure this is the best group to post, but I cannot think of any
    > other.
    >
    > My application would contain a limited set of "cells" represented by
    > the instances of a Cell class:
    >
    > class Cell:
    > ...
    >
    > A1=Cell(7)
    > A2=Cell(2*A1)
    > A3=Cell(3*A1+A2)
    > A4=Cell(A3*4)
    >
    > Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140
    >
    > Now, I somehow want to be able to show a dependency tree
    >
    > 1 level dependency trees
    > A1: None
    > A2: A1
    > A3: A1, A2
    > A4: A3
    >
    > All levels dependency trees
    >
    > A1: None
    > A2: A1
    > A3: A1, A2
    > A4: A3, A2, A1
    >
    > Leaf + values dependency trees:
    >
    > A1: 7
    > A2: A1=7, 2
    > A3: 3, A1=7, 2
    > A4: 3, A1=7, 2, 4
    >
    > What I'd like to know is:
    >
    > 1) what are, in your opinion, the basic elements of the Cell class?
    > 2) Do I need a parser to evaluate the formulas like “3*A1+A2”? Can you
    > recommend one library that already contains one?
    > 3) Do I need a tree data structure to represent my data? would the
    > tree be an attribute of the class instance?
    >
    > I imagine a lot can be said on these questions. What I am looking for
    > is some hints that help me get out of where I am now.
    >
    > Any help is highly appreciated.
    >

    As well as considering the other replies, you don't necessarily need to
    parse the expressions yourself.

    If A1 is a cell then 2*A1 tries to call the __rmul__ method of A1 (and
    A1*2 tries to call the __mul__ method of A1), so you could have it
    return a formula object that will multiply and return the numeric value
    of self (A1) by 2 when its own numeric value is requested. This means
    that a cell could be initialised with either a number or a formula.
    MRAB, Jan 3, 2010
    #6
  7. vsoler

    mdipierro Guest

    Perhaps this can be useful:
    http://www.web2py.com/examples/spreadsheet

    The code is in a single file with not dependencies and it does not
    require web2py to run:
    http://code.google.com/p/web2py/source/browse/gluon/contrib/spreadsheet.py

    Here is a sample controller that shows you how to embed the
    spreadsheet in web page:
    http://code.google.com/p/web2py/source/browse/applications/examples/controllers/spreadsheet.py

    Massimo



    On Jan 3, 5:27 am, vsoler <> wrote:
    > Hi,
    >
    > Not sure this is the best group to post, but I cannot think of any
    > other.
    >
    > My application would contain a limited set of "cells" represented by
    > the instances of a Cell class:
    >
    > class Cell:
    > ...
    >
    > A1=Cell(7)
    > A2=Cell(2*A1)
    > A3=Cell(3*A1+A2)
    > A4=Cell(A3*4)
    >
    > Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140
    >
    > Now, I somehow want to be able to show a dependency tree
    >
    > 1 level dependency trees
    >   A1: None
    >   A2: A1
    >   A3: A1, A2
    >   A4: A3
    >
    > All levels dependency trees
    >
    >   A1: None
    >   A2: A1
    >   A3: A1, A2
    >   A4: A3, A2, A1
    >
    > Leaf + values dependency trees:
    >
    >   A1: 7
    >   A2: A1=7, 2
    >   A3: 3, A1=7, 2
    >   A4: 3, A1=7, 2, 4
    >
    > What I'd like to know is:
    >
    > 1) what are, in your opinion, the basic elements of the Cell class?
    > 2) Do I need a parser to evaluate the formulas like “3*A1+A2”? Can you
    > recommend one library that already contains one?
    > 3) Do I need a tree data structure to represent my data? would the
    > tree be an attribute of the class instance?
    >
    > I imagine a lot can be said on these questions. What I am looking for
    > is some hints that help me get out of where I am now.
    >
    > Any help is highly appreciated.
    >
    > Vicente Soler
    mdipierro, Jan 3, 2010
    #7
  8. vsoler

    vsoler Guest

    On 3 ene, 22:40, mdipierro <> wrote:
    > Perhaps this can be useful:http://www.web2py.com/examples/spreadsheet
    >
    > The code is in a single file with not dependencies and it does not
    > require web2py to run:http://code.google.com/p/web2py/source/browse/gluon/contrib/spreadshe...
    >
    > Here is a sample controller that shows you how to embed the
    > spreadsheet in web page:http://code.google.com/p/web2py/source/browse/applications/examples/c...
    >
    > Massimo
    >
    > On Jan 3, 5:27 am, vsoler <> wrote:
    >
    > > Hi,

    >
    > > Not sure this is the best group to post, but I cannot think of any
    > > other.

    >
    > > My application would contain a limited set of "cells" represented by
    > > the instances of a Cell class:

    >
    > > class Cell:
    > > ...

    >
    > > A1=Cell(7)
    > > A2=Cell(2*A1)
    > > A3=Cell(3*A1+A2)
    > > A4=Cell(A3*4)

    >
    > > Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140

    >
    > > Now, I somehow want to be able to show a dependency tree

    >
    > > 1 level dependency trees
    > >   A1: None
    > >   A2: A1
    > >   A3: A1, A2
    > >   A4: A3

    >
    > > All levels dependency trees

    >
    > >   A1: None
    > >   A2: A1
    > >   A3: A1, A2
    > >   A4: A3, A2, A1

    >
    > > Leaf + values dependency trees:

    >
    > >   A1: 7
    > >   A2: A1=7, 2
    > >   A3: 3, A1=7, 2
    > >   A4: 3, A1=7, 2, 4

    >
    > > What I'd like to know is:

    >
    > > 1) what are, in your opinion, the basic elements of the Cell class?
    > > 2) Do I need a parser to evaluate the formulas like “3*A1+A2”? Can you
    > > recommend one library that already contains one?
    > > 3) Do I need a tree data structure to represent my data? would the
    > > tree be an attribute of the class instance?

    >
    > > I imagine a lot can be said on these questions. What I am looking for
    > > is some hints that help me get out of where I am now.

    >
    > > Any help is highly appreciated.

    >
    > > Vicente Soler

    >
    >


    There is something that I appreciate in this group, and it is the high
    degree of knowledge of all the participants that are helping with
    their answers.

    After studying your suggestions, I'll come back to you.

    Thank you very much.

    Vicente Soler
    vsoler, Jan 5, 2010
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Raymond Arthur St. Marie II of III

    very Very VERY dumb Question About The new Set( ) 's

    Raymond Arthur St. Marie II of III, Jul 23, 2003, in forum: Python
    Replies:
    4
    Views:
    442
    Raymond Hettinger
    Jul 27, 2003
  2. shanx__=|;-

    very very very long integer

    shanx__=|;-, Oct 16, 2004, in forum: C Programming
    Replies:
    19
    Views:
    1,577
    Merrill & Michele
    Oct 19, 2004
  3. Abhishek Jha

    very very very long integer

    Abhishek Jha, Oct 16, 2004, in forum: C Programming
    Replies:
    4
    Views:
    404
    jacob navia
    Oct 17, 2004
  4. Peter

    Very very very basic question

    Peter, Feb 8, 2005, in forum: C Programming
    Replies:
    14
    Views:
    490
    Dave Thompson
    Feb 14, 2005
  5. olivier.melcher

    Help running a very very very simple code

    olivier.melcher, May 12, 2008, in forum: Java
    Replies:
    8
    Views:
    2,230
Loading...

Share This Page