Spreadsheet-style dependency tracking

Florian Weimer · Oct 16, 2010

Are there libraries which implement some form of spreadsheet-style
dependency tracking? The idea is to enable incremental updates to
some fairly convoluted computation. I hope that a general dependency
tracking framework would avoid making the computation even more
convoluted and difficult to change.

It would also be preferable if the whole dataset would not have to be
kept in memory for the whole computation. (It's rather smallish,
though, so it wouldn't be impossible to keep it resident, I guess.)

Chris Torek · Oct 17, 2010

Are there libraries which implement some form of spreadsheet-style
dependency tracking? The idea is to enable incremental updates to
some fairly convoluted computation. I hope that a general dependency
tracking framework would avoid making the computation even more
convoluted and difficult to change.

Don't know of any libraries myself, but I wrote some code to do
topological sort for dependencies, which I can paste here. It
is also worth noting that a lot of spreadsheets cheat: they just
repeat a sheet-wide computation until values stop changing (or a
cycle count limit runs out).

The code below uses a datatype that has a get_deps() that gives
you a list of "cells I depend on for my value", i.e., for a
spreadsheet, if a cell had the rule "my value = (A1 + B2) / C3"
then get_deps() would have to return the list [cell_holding_A1,
cell_holding_B2, cell_holding_C3] (in any order). It assumes that
it can write on cell.visited. It returns (as a tuple) a list giving
"some evaluation order", and a list of cycles found. If the cycle
list is empty, evaluating the cells in the given evaluation order,
once, is sufficient to update the sheet. If the cycle list is
nonempty, these are cells containing some kind of circular
dependencies, and each element of the cycle list is itself a list
of cells.

(This code was part of an exercise, as it were, and I never finished
the whole thing, just enough to test the topo-sorting, and some
other exercise-y bits. I added cycle-collecting as a later item.
My testing was not terribly thorough either, but I think the code
is still functional and correct.)

--------

# Topological sort due to Tarjan (1974, 1976)
# <http://en.wikipedia.org/wiki/Topological_sorting>
#
# Return a list of cells in some suitable order, plus
# a list (not necessarily complete) that says "these loops
# were found" (in which case the list is not truly sorted).
def tsort(sheet):
class Locals(object):
def __init__(self):
self.S = {}
self.L = []
self.cycles = []
class FakeCell(object):
def __init__(self):
self.deps = []
def __str__(self):
return 'mark'

# initial result list (r.L) is empty
r = Locals()

# Do a depth first search of "cell" and its dependents.
#
# The recursion is flattened here to avoid blowing out
# Python's function-call stack, regardless of how deep
# the code gets. See alternative recursive version below.
def visit(cell):
mark = FakeCell() # marks "recursion points" w/in worklist
worklist = [cell] # list of cells we still have to visit
nstack = [] # cell stack during (flattened) recursion
while worklist:
cell = worklist.pop()
# If we hit the "mark", we want to pop out of a flattened
# recursion. We could just store the cell itself, but:
# - we need a separate stack of just the "pushed" cells
# that are being recursed-on, and
# - we need to know when we have reached that point
# so worklist has "mark"s wherever we should pop a cells
# off the cells-stack "nstack" instead. (At which point
# the only thing left to do is add it to the result-list.)
if cell is mark:
cell = nstack.pop()
#print 'popped out of recursion, cell:', cell
r.L.append(cell)
continue
# If we have seen this cell before, don't re-work it.
# If the way we saw this cell before was "it's on our
# cells stack", we have a cycle (otherwise this a
# harmless revisit due to being somewhere else in the
# overall graph).
if cell.visited:
#print 'revisit:', cell, 'stack:', nstack
if cell in nstack:
# The cycle runs from nstack[nstack.index(cell)]
# through the end of nstack.
r.cycles.append(nstack[nstack.index(cell):])
continue
#print 'first visit for:', cell, 'deps:', cell.get_deps()
#print 'nstack:', nstack
cell.visited = True
# TODO: optimize if cell.get_deps() is empty (we'll add a mark
# and then just pop it)
#print 'do deps:'
nstack.append(cell)
#print 'worklist before:', worklist
worklist = (worklist + [mark] +
[r.S[ref.get()] for ref in cell.get_deps()])
#print 'worklist after:', worklist

# def visit(node, nstack = None):
# if nstack is None:
# nstack = []
# if node.visited:
# if node in nstack:
# r.cycles = True
# return
# node.visited = True
# nstack.append(node)
# for c in node.deps:
# visit(r.S[c], nstack)
# nstack.pop()
# r.L.append(node)

# Build "set S of all cells" (r.S) which gives their dependencies.
# By indexing by cell, we can find cells from dependencies in visit().
for row in sheet:
for cell in row:
if cell:
r.S[cell] = cell
cell.visited = False

# Now simply (initial-)visit all the cells.
for cell in r.S.itervalues():
visit(cell)

# Now r.L defines an evaluation order; it has at least one cycle in it
# if r.cycles is nonempty.
return (r.L, r.cycles)

Florian Weimer · Oct 17, 2010

* Chris Torek:

Don't know of any libraries myself, but I wrote some code to do
topological sort for dependencies, which I can paste here. It
is also worth noting that a lot of spreadsheets cheat: they just
repeat a sheet-wide computation until values stop changing (or a
cycle count limit runs out).

I think most of the relevant implementations use dependency
information to speed up incremental recomputation. For instance,
gnumeric seems to have code for this. This is the part I'm most
interested in. I already have got an explicit ordering of the
computations (fortunately, that part is fairly simple).

Fuzzyman · Oct 18, 2010

* Chris Torek:

I think most of the relevant implementations use dependency
information to speed up incremental recomputation. For instance,
gnumeric seems to have code for this. This is the part I'm most
interested in. I already have got an explicit ordering of the
computations (fortunately, that part is fairly simple).

And note that the repeating sheet-wide computation is a feature not a
"cheat" (the default should still be to report cycles). Allowing
calculations to complete even in the presence of cycles can be very
useful.

Michael Foord

Lawrence D'Oliveiro · Oct 19, 2010

In message

Allowing calculations to complete even in the presence of cycles can be
very useful.

But then the answer is no longer completely deterministic.

Carl Banks · Oct 19, 2010

In message

But then the answer is no longer completely deterministic.

Yes it is, unless you're using your Geiger counter random bit
generator function.

Carl Banks

Lawrence D'Oliveiro · Oct 20, 2010

In message

Carl said:
Yes it is ...

No, because it will change depending on the iteration number. And if the
computation doesnâ€™t converge, this is a recipe for oscillations, including
chaotic ones.

Hmmm, I wonder how much of Wall Streetâ€™s crash-proneness can be blamed on
dependency cycles in damned Excel spreadsheets...

Carl Banks · Oct 20, 2010

In message

No, because it will change depending on the iteration number. And if the
computation doesn’t converge, this is a recipe for oscillations, including
chaotic ones.

Go look up deterministic in Wikipedia.

Carl Banks

Stefaan Himpe · Oct 21, 2010

Florian said:
Are there libraries which implement some form of spreadsheet-style
dependency tracking?

The first that come to mind (I have very limited experience with them):

trellis http://peak.telecommunity.com/DevCenter/Trellis
pycells http://pycells.pdxcb.net/

Best regards,
Stefaan.

Adding and removing many style elements	28	May 6, 2010
should we Go now?	12	Feb 24, 2013
[ANN] webgen 0.5.8 released	0	May 7, 2009
[ANN] pr-zlib 1.0.0 - A Pure Ruby zlib library	5	Jun 13, 2009
ANN: Sequel 1.5.0 Released	0	Apr 29, 2008
YASS (Yet Another Success Story)	1	Jun 20, 2009
PyPy 1.0: JIT compilers for free and more	29	Mar 27, 2007
BioRuby & Google Summer of Code 2011	0	Mar 25, 2011

Spreadsheet-style dependency tracking

Florian Weimer

Chris Torek

Florian Weimer

Fuzzyman

Lawrence D'Oliveiro

Carl Banks

Lawrence D'Oliveiro

Carl Banks

Stefaan Himpe

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads