What is the best data structure for a very simple spreadsheet?

V

vsoler

Hi,

Not sure this is the best group to post, but I cannot think of any
other.

My application would contain a limited set of "cells" represented by
the instances of a Cell class:

class Cell:
....

A1=Cell(7)
A2=Cell(2*A1)
A3=Cell(3*A1+A2)
A4=Cell(A3*4)

Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140

Now, I somehow want to be able to show a dependency tree

1 level dependency trees
A1: None
A2: A1
A3: A1, A2
A4: A3

All levels dependency trees

A1: None
A2: A1
A3: A1, A2
A4: A3, A2, A1

Leaf + values dependency trees:

A1: 7
A2: A1=7, 2
A3: 3, A1=7, 2
A4: 3, A1=7, 2, 4

What I'd like to know is:

1) what are, in your opinion, the basic elements of the Cell class?
2) Do I need a parser to evaluate the formulas like “3*A1+A2”? Can you
recommend one library that already contains one?
3) Do I need a tree data structure to represent my data? would the
tree be an attribute of the class instance?

I imagine a lot can be said on these questions. What I am looking for
is some hints that help me get out of where I am now.

Any help is highly appreciated.

Vicente Soler
 
S

Steven D'Aprano

My application would contain a limited set of "cells" represented by the
instances of a Cell class:

class Cell:
...

A1=Cell(7)
A2=Cell(2*A1)
A3=Cell(3*A1+A2)
A4=Cell(A3*4)

Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140

Now, I somehow want to be able to show a dependency tree

1 level dependency trees
A1: None
A2: A1
A3: A1, A2
A4: A3

All levels dependency trees

A1: None
A2: A1
A3: A1, A2
A4: A3, A2, A1

Leaf + values dependency trees:

A1: 7
A2: A1=7, 2
A3: 3, A1=7, 2
A4: 3, A1=7, 2, 4

What I'd like to know is:

1) what are, in your opinion, the basic elements of the Cell class?

def Cell(object):
def __init__(self, payload):
self.payload = payload
def __str__(self):
return str(self.payload)
def __float__(self):
return float(self.payload)
def dependency(self):
try:
return self.payload.dependency()
except AttributeError:
return ['None']


Cells can contain one of three things: a number, a string, or a formula.
The first two can be supported by providing a built-in Python object
(float or str) as payload. You can support formulae two ways that I can
think of:

(1) Provide a formula as a string, with a leading '='. Then, whenever you
want to evaluate such a cell, you fetch the string from the cell, parse
it, generate an arithmetic expression, and calculate it.

(2) Instead of parsing the formula on every single spreadsheet refresh,
use a couple of helper classes:

class Indirect(object):
def __init__(self, ref, sheet=None):
if sheet is None:
self.sheet = default_sheet()
else:
self.sheet = sheet
self.ref = ref
def __str__(self):
return str(self.sheet[self.ref])
def float(self):
return float(self.sheet[self.ref])
def dependency(self):
return [self.ref]

class Formula(object):
def __init__(self, x, y, op):
self.x = x
self.y = y
self.op = op
def eval(self):
return self.op(float(x), float(y))
def dependency(self):
return self.x.dependency(level) + self.y.dependency(level)



Then do something like this:

sheet = {}
sheet['A1'] = Cell(7)
sheet['A2'] = Cell(Formula(2, Indirect('A1'), operator.mul))
sheet['A3'] = Cell(
Formula(
Formula(3, Indirect('A1'), operator.mul),
Indirect('A2'),
operator.add
))
sheet['A4'] = Cell(Formula(Indirect('A3'), 4, operator.mul))


Then you only need to parse each human-readable formula like '=2*A1' once.

2) Do I need a parser to evaluate the formulas like “3*A1+A2�
Yes.

Can you recommend one library that already contains one?

Try PyParsing.

3) Do I need a tree
data structure to represent my data? would the tree be an attribute of
the class instance?

I suspect a dict will be faster.


To get the dependencies of each cell:

for key, value in sheet.items():
print key, value.dependency()


Keep in mind I haven't tested ANY of this -- it is entirely stream of
consciousness. I've never actually done this, so I have no idea whether
it is a good approach or not, but it seems to me that it might be.
 
V

vsoler

My application would contain a limited set of "cells" represented by the
instances of a Cell class:
class Cell:
...

Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140
Now, I somehow want to be able to show a dependency tree
1 level dependency trees
  A1: None
  A2: A1
  A3: A1, A2
  A4: A3
All levels dependency trees
  A1: None
  A2: A1
  A3: A1, A2
  A4: A3, A2, A1
Leaf + values dependency trees:
  A1: 7
  A2: A1=7, 2
  A3: 3, A1=7, 2
  A4: 3, A1=7, 2, 4
What I'd like to know is:
1) what are, in your opinion, the basic elements of the Cell class?

def Cell(object):
    def __init__(self, payload):
        self.payload = payload
    def __str__(self):
        return str(self.payload)
    def __float__(self):
        return float(self.payload)
    def dependency(self):
        try:
            return self.payload.dependency()
        except AttributeError:
            return ['None']

Cells can contain one of three things: a number, a string, or a formula.
The first two can be supported by providing a built-in Python object
(float or str) as payload. You can support formulae two ways that I can
think of:

(1) Provide a formula as a string, with a leading '='. Then, whenever you
want to evaluate such a cell, you fetch the string from the cell, parse
it, generate an arithmetic expression, and calculate it.

(2) Instead of parsing the formula on every single spreadsheet refresh,
use a couple of helper classes:

class Indirect(object):
    def __init__(self, ref, sheet=None):
        if sheet is None:
            self.sheet = default_sheet()
        else:
            self.sheet = sheet
        self.ref = ref
    def __str__(self):
        return str(self.sheet[self.ref])
    def float(self):
        return float(self.sheet[self.ref])
    def dependency(self):
        return [self.ref]

class Formula(object):
    def __init__(self, x, y, op):
        self.x = x
        self.y = y
        self.op = op
    def eval(self):
        return self.op(float(x), float(y))
    def dependency(self):
        return self.x.dependency(level) + self.y.dependency(level)

Then do something like this:

sheet = {}
sheet['A1'] = Cell(7)
sheet['A2'] = Cell(Formula(2, Indirect('A1'), operator.mul))
sheet['A3'] = Cell(
  Formula(
    Formula(3, Indirect('A1'), operator.mul),
    Indirect('A2'),
    operator.add
    ))
sheet['A4'] = Cell(Formula(Indirect('A3'), 4, operator.mul))

Then you only need to parse each human-readable formula like '=2*A1' once.
2) Do I need a parser to evaluate the formulas like “3*A1+A2”?
Yes.

Can you recommend one library that already contains one?

Try PyParsing.
3) Do I need a tree
data structure to represent my data? would the tree be an attribute of
the class instance?

I suspect a dict will be faster.

To get the dependencies of each cell:

for key, value in sheet.items():
    print key, value.dependency()

Keep in mind I haven't tested ANY of this -- it is entirely stream of
consciousness. I've never actually done this, so I have no idea whether
it is a good approach or not, but it seems to me that it might be.

WOW!!!
After lunch I am going to read your post thoroughly, but I can already
see that you've put into it a lot of time and expertise.

Thank you
 
L

Lie Ryan

1) what are, in your opinion, the basic elements of the Cell class?

The "user-entered formula" and "effective value". A Cell containing a
formula "abc" has a value of "abc"; a cell containing the formula "=1+5"
has a value of "6". You could use the 'property' decorator for the
"effective value" attribute.
2) Do I need a parser to evaluate the formulas like “3*A1+A2”? Can you
recommend one library that already contains one?

Yes you do. PyParsing, regex, write your own parser; choose your own
poison depending on the expected complexity of the formulaes.
3) Do I need a tree data structure to represent my data?

There are two main options for your "primary data structure":
- table (2x2 list) -- if you expect most of your cells to be filled,
this is simple, but uses lots of memory
- sparse table (list/dicts of cells) -- if you have a large table that
is mostly empty cells

The Dependency Tree is the "side data structure"; a "Cell" is a "Node";
a Cell's "dependencies" is the Node's "children"; a Cell's "dependant"
is the Node's "parent". Cells maintains a list of dependencies (or
dynamically generates them when requested) by analyzing the "formula"
attribute for references to other cells.

PS: Strictly speaking, this "side data structure" is not a "Tree" as
there are multiple root nodes. I think it's called "Directed Acyclic
Graph". However, it's true that, given a cell as a root, the cell's
dependencies is a proper tree.
would the
tree be an attribute of the class instance?

you could use 'property' decorator and make the dependency tree a
"computed attribute" of the cell; many people recommended against a
'property' that is heavy to compute though and calculating dependencies
may be quite heavy depending on the complexity of the spreadsheet.
 
J

Jon Clements

The "user-entered formula" and "effective value". A Cell containing a
formula "abc" has a value of "abc"; a cell containing the formula "=1+5"
has a value of "6". You could use the 'property' decorator for the
"effective value" attribute.


Yes you do. PyParsing, regex, write your own parser; choose your own
poison depending on the expected complexity of the formulaes.


There are two main options for your "primary data structure":
- table (2x2 list) -- if you expect most of your cells to be filled,
this is simple, but uses lots of memory
- sparse table (list/dicts of cells) -- if you have a large table that
is mostly empty cells

The Dependency Tree is the "side data structure"; a "Cell" is a "Node";
a Cell's "dependencies" is the Node's "children"; a Cell's "dependant"
is the Node's "parent". Cells maintains a list of dependencies (or
dynamically generates them when requested) by analyzing the "formula"
attribute for references to other cells.

PS: Strictly speaking, this "side data structure" is not a "Tree" as
there are multiple root nodes. I think it's called "Directed Acyclic
Graph". However, it's true that, given a cell as a root, the cell's
dependencies is a proper tree.


you could use 'property' decorator and make the dependency tree a
"computed attribute" of the cell; many people recommended against a
'property' that is heavy to compute though and calculating dependencies
may be quite heavy depending on the complexity of the spreadsheet.

As well as what Steven & Lie have mentioned, maybe you could get
inspiration from http://pyspread.sourceforge.net/

Although I've not looked at it, my gut feeling is the author would've
had to address these issues (take that as a disclaimer of some
sort :) )

From not having to had a need for this sort of thing, I'd probably go
for a sparse representation as a graph.

But again, depends why you're segmenting the "spreadsheet" from "need
to tell dependencies".

Jon.
 
M

MRAB

vsoler said:
Hi,

Not sure this is the best group to post, but I cannot think of any
other.

My application would contain a limited set of "cells" represented by
the instances of a Cell class:

class Cell:
...

A1=Cell(7)
A2=Cell(2*A1)
A3=Cell(3*A1+A2)
A4=Cell(A3*4)

Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140

Now, I somehow want to be able to show a dependency tree

1 level dependency trees
A1: None
A2: A1
A3: A1, A2
A4: A3

All levels dependency trees

A1: None
A2: A1
A3: A1, A2
A4: A3, A2, A1

Leaf + values dependency trees:

A1: 7
A2: A1=7, 2
A3: 3, A1=7, 2
A4: 3, A1=7, 2, 4

What I'd like to know is:

1) what are, in your opinion, the basic elements of the Cell class?
2) Do I need a parser to evaluate the formulas like “3*A1+A2”? Can you
recommend one library that already contains one?
3) Do I need a tree data structure to represent my data? would the
tree be an attribute of the class instance?

I imagine a lot can be said on these questions. What I am looking for
is some hints that help me get out of where I am now.

Any help is highly appreciated.
As well as considering the other replies, you don't necessarily need to
parse the expressions yourself.

If A1 is a cell then 2*A1 tries to call the __rmul__ method of A1 (and
A1*2 tries to call the __mul__ method of A1), so you could have it
return a formula object that will multiply and return the numeric value
of self (A1) by 2 when its own numeric value is requested. This means
that a cell could be initialised with either a number or a formula.
 
V

vsoler

Perhaps this can be useful:http://www.web2py.com/examples/spreadsheet

The code is in a single file with not dependencies and it does not
require web2py to run:http://code.google.com/p/web2py/source/browse/gluon/contrib/spreadshe...

Here is a sample controller that shows you how to embed the
spreadsheet in web page:http://code.google.com/p/web2py/source/browse/applications/examples/c...

Massimo

There is something that I appreciate in this group, and it is the high
degree of knowledge of all the participants that are helping with
their answers.

After studying your suggestions, I'll come back to you.

Thank you very much.

Vicente Soler
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top