Spreadsheet-style dependency tracking

F

Florian Weimer

Are there libraries which implement some form of spreadsheet-style
dependency tracking? The idea is to enable incremental updates to
some fairly convoluted computation. I hope that a general dependency
tracking framework would avoid making the computation even more
convoluted and difficult to change.

It would also be preferable if the whole dataset would not have to be
kept in memory for the whole computation. (It's rather smallish,
though, so it wouldn't be impossible to keep it resident, I guess.)
 
C

Chris Torek

Are there libraries which implement some form of spreadsheet-style
dependency tracking? The idea is to enable incremental updates to
some fairly convoluted computation. I hope that a general dependency
tracking framework would avoid making the computation even more
convoluted and difficult to change.

Don't know of any libraries myself, but I wrote some code to do
topological sort for dependencies, which I can paste here. It
is also worth noting that a lot of spreadsheets cheat: they just
repeat a sheet-wide computation until values stop changing (or a
cycle count limit runs out).

The code below uses a datatype that has a get_deps() that gives
you a list of "cells I depend on for my value", i.e., for a
spreadsheet, if a cell had the rule "my value = (A1 + B2) / C3"
then get_deps() would have to return the list [cell_holding_A1,
cell_holding_B2, cell_holding_C3] (in any order). It assumes that
it can write on cell.visited. It returns (as a tuple) a list giving
"some evaluation order", and a list of cycles found. If the cycle
list is empty, evaluating the cells in the given evaluation order,
once, is sufficient to update the sheet. If the cycle list is
nonempty, these are cells containing some kind of circular
dependencies, and each element of the cycle list is itself a list
of cells.

(This code was part of an exercise, as it were, and I never finished
the whole thing, just enough to test the topo-sorting, and some
other exercise-y bits. I added cycle-collecting as a later item.
My testing was not terribly thorough either, but I think the code
is still functional and correct.)

--------

# Topological sort due to Tarjan (1974, 1976)
# <http://en.wikipedia.org/wiki/Topological_sorting>
#
# Return a list of cells in some suitable order, plus
# a list (not necessarily complete) that says "these loops
# were found" (in which case the list is not truly sorted).
def tsort(sheet):
class Locals(object):
def __init__(self):
self.S = {}
self.L = []
self.cycles = []
class FakeCell(object):
def __init__(self):
self.deps = []
def __str__(self):
return 'mark'

# initial result list (r.L) is empty
r = Locals()

# Do a depth first search of "cell" and its dependents.
#
# The recursion is flattened here to avoid blowing out
# Python's function-call stack, regardless of how deep
# the code gets. See alternative recursive version below.
def visit(cell):
mark = FakeCell() # marks "recursion points" w/in worklist
worklist = [cell] # list of cells we still have to visit
nstack = [] # cell stack during (flattened) recursion
while worklist:
cell = worklist.pop()
# If we hit the "mark", we want to pop out of a flattened
# recursion. We could just store the cell itself, but:
# - we need a separate stack of just the "pushed" cells
# that are being recursed-on, and
# - we need to know when we have reached that point
# so worklist has "mark"s wherever we should pop a cells
# off the cells-stack "nstack" instead. (At which point
# the only thing left to do is add it to the result-list.)
if cell is mark:
cell = nstack.pop()
#print 'popped out of recursion, cell:', cell
r.L.append(cell)
continue
# If we have seen this cell before, don't re-work it.
# If the way we saw this cell before was "it's on our
# cells stack", we have a cycle (otherwise this a
# harmless revisit due to being somewhere else in the
# overall graph).
if cell.visited:
#print 'revisit:', cell, 'stack:', nstack
if cell in nstack:
# The cycle runs from nstack[nstack.index(cell)]
# through the end of nstack.
r.cycles.append(nstack[nstack.index(cell):])
continue
#print 'first visit for:', cell, 'deps:', cell.get_deps()
#print 'nstack:', nstack
cell.visited = True
# TODO: optimize if cell.get_deps() is empty (we'll add a mark
# and then just pop it)
#print 'do deps:'
nstack.append(cell)
#print 'worklist before:', worklist
worklist = (worklist + [mark] +
[r.S[ref.get()] for ref in cell.get_deps()])
#print 'worklist after:', worklist

# def visit(node, nstack = None):
# if nstack is None:
# nstack = []
# if node.visited:
# if node in nstack:
# r.cycles = True
# return
# node.visited = True
# nstack.append(node)
# for c in node.deps:
# visit(r.S[c], nstack)
# nstack.pop()
# r.L.append(node)

# Build "set S of all cells" (r.S) which gives their dependencies.
# By indexing by cell, we can find cells from dependencies in visit().
for row in sheet:
for cell in row:
if cell:
r.S[cell] = cell
cell.visited = False

# Now simply (initial-)visit all the cells.
for cell in r.S.itervalues():
visit(cell)

# Now r.L defines an evaluation order; it has at least one cycle in it
# if r.cycles is nonempty.
return (r.L, r.cycles)
 
F

Florian Weimer

* Chris Torek:
Don't know of any libraries myself, but I wrote some code to do
topological sort for dependencies, which I can paste here. It
is also worth noting that a lot of spreadsheets cheat: they just
repeat a sheet-wide computation until values stop changing (or a
cycle count limit runs out).

I think most of the relevant implementations use dependency
information to speed up incremental recomputation. For instance,
gnumeric seems to have code for this. This is the part I'm most
interested in. I already have got an explicit ordering of the
computations (fortunately, that part is fairly simple).
 
F

Fuzzyman

* Chris Torek:



I think most of the relevant implementations use dependency
information to speed up incremental recomputation.  For instance,
gnumeric seems to have code for this.  This is the part I'm most
interested in.  I already have got an explicit ordering of the
computations (fortunately, that part is fairly simple).

And note that the repeating sheet-wide computation is a feature not a
"cheat" (the default should still be to report cycles). Allowing
calculations to complete even in the presence of cycles can be very
useful.

Michael Foord
 
L

Lawrence D'Oliveiro

In message
Allowing calculations to complete even in the presence of cycles can be
very useful.

But then the answer is no longer completely deterministic.
 
C

Carl Banks

In message


But then the answer is no longer completely deterministic.

Yes it is, unless you're using your Geiger counter random bit
generator function.


Carl Banks
 
L

Lawrence D'Oliveiro

In message
Carl said:
Yes it is ...

No, because it will change depending on the iteration number. And if the
computation doesn’t converge, this is a recipe for oscillations, including
chaotic ones.

Hmmm, I wonder how much of Wall Street’s crash-proneness can be blamed on
dependency cycles in damned Excel spreadsheets...
 
C

Carl Banks

In message



No, because it will change depending on the iteration number. And if the
computation doesn’t converge, this is a recipe for oscillations, including
chaotic ones.

Go look up deterministic in Wikipedia.


Carl Banks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,127
Latest member
CyberDefense
Top