Pattern for "make"-like dependencies between objects

P

Phil Endecott

Dear Experts,

Say I have an application with various data inputs (files, network
connections, peripherals etc), various data outputs (files, display
etc.) and various intermediate internal data objects. I would like
changes in the inputs to propogate through to the outputs. In my mind,
I'm imagining the dependencies between the objects like "make" dependencies.

There are various ways of doing this, including "push" and "pull"
methods, with various pros and cons. Is anyone aware of any literature
("patterns") describing something like this?

One thought that has occured to me is that C++ has built-in compile-time
dependency graph logic in the multiple inheritance constructors. So,
say I have these objects:

DataSource src;
InternalObj obj1;
InternalObj obj2;
Display disp;

If obj1 and obj2 get their data from the DataSrc, and the Display gets
its data from obj1 and obj2 (i.e. a diamond-shape dependency graph), and
each object has an "update" method, then I can write:

struct make_src {
make_src() { src.update(); }
};

struct make_obj1: make_src {
make_obj1() { obj1.update(); }
};

struct make_obj2: make_src {
make_obj2() { obj2.update(); }
};

struct make_disp: virtual make_obj1, virtual make_obj2 {
make_disp { disp.update(); }
};

Now, when I want to update the display I can just create a temporary
make_disp object, and all of the update() methods will be called in the
right order. (I think. I'm a bit rusty on what the virtual base
classes do. I hope that src.update() gets called only once.)

I bet I'm not the first one to think of this, but google doesn't find
anything - perhaps because of the zillions of hits for extracting
#include dependencies for make from C++ source.


Any thoughts? What else is needed to make this useful? For example,
what's the best way to tag objects with the equivalent of make's timestamps?


Regards,

Phil.
 
A

alan

Dear Experts,

Say I have an application with various data inputs (files, network
connections, peripherals etc), various data outputs (files, display
etc.) and various intermediate internal data objects. I would like
changes in the inputs to propogate through to the outputs. In my mind,
I'm imagining the dependencies between the objects like "make" dependencies.

There are various ways of doing this, including "push" and "pull"
methods, with various pros and cons. Is anyone aware of any literature
("patterns") describing something like this?

One thought that has occured to me is that C++ has built-in compile-time
dependency graph logic in the multiple inheritance constructors. So,
say I have these objects:

DataSource src;
InternalObj obj1;
InternalObj obj2;
Display disp;

If obj1 and obj2 get their data from the DataSrc, and the Display gets
its data from obj1 and obj2 (i.e. a diamond-shape dependency graph), and
each object has an "update" method, then I can write:

struct make_src {
make_src() { src.update(); }

};

struct make_obj1: make_src {
make_obj1() { obj1.update(); }

};

struct make_obj2: make_src {
make_obj2() { obj2.update(); }

};

struct make_disp: virtual make_obj1, virtual make_obj2 {
make_disp { disp.update(); }

};

Now, when I want to update the display I can just create a temporary
make_disp object, and all of the update() methods will be called in the
right order. (I think. I'm a bit rusty on what the virtual base
classes do. I hope that src.update() gets called only once.)

I bet I'm not the first one to think of this, but google doesn't find
anything - perhaps because of the zillions of hits for extracting
#include dependencies for make from C++ source.

Any thoughts? What else is needed to make this useful? For example,
what's the best way to tag objects with the equivalent of make's timestamps?
Use a cache with a "dirty" bit to indicate that the cache's contents
are no longer valid. Each object's update() method then checks if the
cache is clean, and if so returns the data in the cache. Otherwise it
calls the data sources' update() functions and performs the
computation for that node, then stores the result in the cache and
cleans the cache.

The sourcemost data nodes, of course, will probably have to overload
the operator= or whatever equivalent to dirty itself. Each data node
would then have a dirty() method which not only dirties the cache but
also invokes dirty() for each data sink connected to it (as an
optimization, of course, the if the data node sees it is itself dirty,
then it doesn't have to propagate the call to each data sink).

However this requires pointers both from the data sink to the data
source (so that the sink knows who to query) and a corresponding
pointer from the source to the sink (so that if the source becomes
dirty, it will also dirty each sink). You probably have to implement
your own smart pointer here, so that it will update both the source
and the sink objects whenever the sink disconnects from the source
(assuming, of course, that you may need to change the connection at
runtime).
 
P

Phil Endecott

Use a cache with a "dirty" bit to indicate that the cache's contents
are no longer valid. Each object's update() method then checks if the
cache is clean, and if so returns the data in the cache. Otherwise it
calls the data sources' update() functions and performs the
computation for that node, then stores the result in the cache and
cleans the cache.

I don't think that single bits are sufficient in general, e.g. if B
depends on A and C also depends on A, and A changes, and I ask to
re-make B but not C, what happens? I need to record that C is
out-of-date wrt A but B is up-to-date wrt A.

Single timestamps (like make) also don't work in the case where C is
generated from B and B is generated from A, but some changes to A don't
influence B (e.g. B is a sub-set of A). In this case, when I re-make C,
I first re-make B from A; but if B doesn't change I then don't need to
update C. So I need to record that B is update-to-date with the latest
version of A, but has not changed since some previous time. Can this be
solved with just a pair of timestamps, one for "last changed" and one
for "last updated"?

(By timestamp, I really mean some sort of generation number.)
The sourcemost data nodes, of course, will probably have to overload
the operator= or whatever equivalent to dirty itself.

Right, there's a whole issue there about modifying an object to know
when it changes. I've worried about this for implementing copy-on-write
semantics. Most objects have more than just operator= that change them.
Each data node
would then have a dirty() method which not only dirties the cache but
also invokes dirty() for each data sink connected to it (as an
optimization, of course, the if the data node sees it is itself dirty,
then it doesn't have to propagate the call to each data sink).

However this requires pointers both from the data sink to the data
source (so that the sink knows who to query) and a corresponding
pointer from the source to the sink (so that if the source becomes
dirty, it will also dirty each sink).

One interesting feature of the structs that I describe above is that
they're non-intrusive of the objects whose updates they control, which
seems like a nice feature. It's also entirely compile-time.

I'd love to hear some feedback about that idea, by the way.


Phil.
 
S

Sohail Somani

There are various ways of doing this, including "push" and "pull"
methods, with various pros and cons. Is anyone aware of any literature
("patterns") describing something like this?

The observer pattern may align with your push method concept. The only
problem then is that all your calculations may happen at once. You could
alternatively only set a flag that a given node needs to be recalculated.

So say you have objects that describe:

A = B + C

If B changes, A is notified of the change, but A does not recalculate the
expression B + C until A is asked for it's value (at which point, it
signals similarly.)

Hope that makes sense. I think this is an interesting but not difficult
problem and I think I will be solving a similar one soon.
 
K

Kaz Kylheku

I don't think that single bits are sufficient in general, e.g. if B
depends on A and C also depends on A, and A changes, and I ask to
re-make B but not C, what happens? I need to record that C is
out-of-date wrt A but B is up-to-date wrt A.

Single timestamps (like make) also don't work in the case where C is
generated from B and B is generated from A, but some changes to A don't
influence B (e.g. B is a sub-set of A).

Timestamps, for tracking dependency updates, are garbage. The problem
is that clocks in the real world can be wrong. Have you ever seen the
message "clock skew detected, your build may be incomplete" from make?

The virtue of timestamps is that they are a /cheap/ way to detect that
something has changed. However, it's best to compare them for exact
equality. That is to say, have a cached copy of the original
timestamps somewhere. If the object's timestamp is different from the
cached copy, then suspect it has been changed. That is to say, do not
compare the timestamp of a prerequisite object A and its dependent B
to determine that B is out of date with respect to A, but rather
detect that A has changed because its timestamp no longer matches the
cached copy.

That still leaves the problem of what happens in a partial update.
Once you update the cached timestamp of A to reflect the actual one, A
looks up-to-date. If only B was updated because of this, but C wasn't,
that's a problem.

The solution to this is to cache A's timestamp multiple times:
replicate it in every object that depends on A, as an attribute of
that dependency relationship.

The update is triggered if an object has a dependency whose timestamp
doesn't match the prerequisite object. When the update of an object
completes, all of the timestamps in its dependency relationships are
made up-to-date.

A further refinement is to use hashes in addition to timestamps.

Use a timestamp as a hint to determine that an object has been
changed. If an object's actual timestamp doesn't match its cached
timestamp, then recompute and cache that object's hash, and bring its
cached timestamp up to date.

The dependency relationships store just the hash for every dependency.
If B depends on A, then you rebuild B if A's hash has changed compared
to the last hash that B remembers about A.

Of course, hashes, even good ones have potential problems with
collisions, however unlikely.

For the case that A depends on only certain relevant parts of B, you
could have a hashing function which is specific to the A -> B
dependency. What B remembers about A is not a total hash affected by
the entire state of A, but a hash which covers just the parts that are
relevant to B.

The ccache program does something like this: it has a mode that can
ignore trivial whitespace differences. If you make a formatting change
to a header file, it won't change the hash code for the compilation
units which #include it, and so the cached object files for those
units are still valid.
 
T

terminator

Dear Experts,

Say I have an application with various data inputs (files, network
connections, peripherals etc), various data outputs (files, display
etc.) and various intermediate internal data objects. I would like
changes in the inputs to propogate through to the outputs. In my mind,
I'm imagining the dependencies between the objects like "make" dependencies.

There are various ways of doing this, including "push" and "pull"
methods, with various pros and cons. Is anyone aware of any literature
("patterns") describing something like this?

One thought that has occured to me is that C++ has built-in compile-time
dependency graph logic in the multiple inheritance constructors. So,
say I have these objects:

DataSource src;
InternalObj obj1;
InternalObj obj2;
Display disp;

If obj1 and obj2 get their data from the DataSrc, and the Display gets
its data from obj1 and obj2 (i.e. a diamond-shape dependency graph), and
each object has an "update" method, then I can write:

struct make_src {
make_src() { src.update(); }

};

struct make_obj1: make_src {
make_obj1() { obj1.update(); }

};

struct make_obj2: make_src {
make_obj2() { obj2.update(); }

};

struct make_disp: virtual make_obj1, virtual make_obj2 {
make_disp { disp.update(); }

};

Now, when I want to update the display I can just create a temporary
make_disp object, and all of the update() methods will be called in the
right order. (I think. I'm a bit rusty on what the virtual base
classes do. I hope that src.update() gets called only once.)

I did not get the purpose but if you need virtual inheritance you
should:

struct make_src {
make_src() { src.update(); }



};


struct make_obj1: virtual make_src { //virtual base
make_obj1() { obj1.update(); }


};


struct make_obj2: virtual make_src { //virtual base
make_obj2() { obj2.update(); }


};


struct make_disp: make_obj1, make_obj2 {/*now make_disp has only one
subobject of type make_src*/
make_disp { disp.update(); }


};

regards,
FM.
 
J

James Kanze

I don't think that single bits are sufficient in general, e.g.
if B depends on A and C also depends on A, and A changes, and
I ask to re-make B but not C, what happens? I need to record
that C is out-of-date wrt A but B is up-to-date wrt A.
Single timestamps (like make) also don't work in the case
where C is generated from B and B is generated from A, but
some changes to A don't influence B (e.g. B is a sub-set of
A).

That's the classical problem of the granularity of make. Update
a comment in a header, and watch the entire system get
rebuilt:). You can use a variant of the classical solution
used in makefiles: do a diff on the object with its old value
before updating its timestamp. Thus, B knows whether the
changes to A have modified it or not; it only updates its last
modified timestamp if they have. (In makefiles, this is
classically used when a program generates both a header and a
source file, and the header changes very rarely. The rules in
make generate the header locally, then do something like:
cmp local.h target.h || cp local.h target.h
. Other rules only look at the timestamp for target.h.)
In this case, when I re-make C, I first re-make B from A; but
if B doesn't change I then don't need to update C.

Nor update the timestamp associated with B.
So I need to record that B is update-to-date with the latest
version of A, but has not changed since some previous time.
Can this be solved with just a pair of timestamps, one for
"last changed" and one for "last updated"?

Do you even need the last updated?

[...]
One interesting feature of the structs that I describe above
is that they're non-intrusive of the objects whose updates
they control, which seems like a nice feature. It's also
entirely compile-time.
I'd love to hear some feedback about that idea, by the way.

It's interesting, since it involves letting the compiler manage
the DAG, rather than doing it manually at run-time. On the
other hand, I'm not too sure about its readability. It looks a
little too subtle to me. You'd need a lot of comments, at any
rate.
 
P

Phil Endecott

James said:
That's the classical problem of the granularity of make. Update
a comment in a header, and watch the entire system get
rebuilt:). You can use a variant of the classical solution
used in makefiles: do a diff on the object with its old value
before updating its timestamp. Thus, B knows whether the
changes to A have modified it or not; it only updates its last
modified timestamp if they have. (In makefiles, this is
classically used when a program generates both a header and a
source file, and the header changes very rarely. The rules in
make generate the header locally, then do something like:
cmp local.h target.h || cp local.h target.h
. Other rules only look at the timestamp for target.h.)


Nor update the timestamp associated with B.


Do you even need the last updated?

You need something so that if A has not changed when you next make C,
then B does not need to be re-made. If you haven't recorded B's "last
updated" timestamp, then you don't know whether you looked for any
changes resulting from the last change to A. (I have seen the same
thing with makefiles that use the move-if-changed idea that you describe.)
It's interesting, since it involves letting the compiler manage
the DAG, rather than doing it manually at run-time. On the
other hand, I'm not too sure about its readability. It looks a
little too subtle to me. You'd need a lot of comments, at any
rate.

Yes, it would certainly need a couple of paragraphs of introduction.


Phil.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top