dependency-detection in java - Take 2

A

Andreas Leitgeb

Roedy Green said:
... It looks for the string "public static final".

It's not only the static final's that need to be taken care for.
They were special only insofar, as they triggered this thread
in the first place, and, in that they aren't "documented" in
the using classes .class-file, unlike all other dependecies.
If it sees it it redoes a clean compile, if not, incremental.

The "conditional" clean compile seems to be the best one can get.
However, the "condition" needs yet some fleshing out. Probably
I'll start a new thread for that.
This is different from doing the compiles at the client machines since
it is triggered as soon as possible.

Anyway, the concept of central compilation and class-file distribution
is entirely orthogonal to the question of which class-files are actually
regenerated during a particular build-run on a particular machine.

I understand your point to be: the time saved by central full-compilation
is still more than by any intelligent-incremental build on the developer
machine...
Our developer machines are actually almost dumb terminals (with some
X11-server/emulation) to the central sun worksation, and most of the
compilation happens there *before* checkin, so we avoid checking in
broken sources.
 
A

Andreas Leitgeb

Mike Schilling said:
I've thought about this a bit, though not to the point of creating a design,
much less building prototypes. It seems to me that this approach is worth
investigating:

1. The interface of each class C in the system needs to be captured and
stored persistently, where "interface" means method signatures, field
definitions, and constant values. Superclass name too, to cover the changes
that can occur if what C inherits changes.

Yes, that's a good start.
2. Dependencies also need to be captured and stored persistently. This will
be information of the form:
. Class D depends on (some feature of) the interface of class C

While I originally thought into that direction, I now have the
impression that this is really too difficult. Afterall there
are a couple of java's features that would need to be taken care
for. (inheritence, compiletime-resolving of fields, static methods
and the choice among overloaded methods)

As long as none of the recently changed classes changes its interface
in an incompatible way(*), the current normal incremental build
suffices.

If any class (well, except for private nested or anonymous ones)
changes its interface such that it is possible for dependent classes
to notice the difference, then a full-rebuild is worth it.

My original point of avoiding full-builds is still fulfilled, because
the full rebuild wouldn't be necessary *everytime*! In the outset,
the problem was, that one either needs to *always* do full-rebuilds,
or risk inconsistencies. Now, it's about a reliable indicator, that
still shouldn't fire too often.
A. How granular should the dependency information be?

I'd be already happy, if any .class-file's change can be reliably
characterized as incompatible(*) or not.
It wouldn't be a question of which java-file changed, but rather:
Was any .class file changed incompatibly(*) during a normal incremental
build?
B. How to generate the dependency information. I presume it can be
calculated from .class file analysis

No, not possible (unless javac was modified), because usage of static
finals is not explicitly "mentioned" in the .class file.
Inheritance adds some wrinkles. If Sub overrides a method it inherits from
Super, that doesn't really change its interface. Classes which previously
called Sub.meth() don't have to be recompiled.

Unless it's a static method :-/ ... where dependent classes might continue
to call Super's version, even if they refer to Sub.meth().
The same goes with fields and overloaded (even non-static) methods.

(*): Chapter 13 of the JLS-3.0 (Java Language Specification 3rd Edition)
mentions (among other allowed changes):
* Adding new fields, methods, or constructors to an existing class or interface.
as a binary compatible change, but it isn't always compatible in our sense.
Our rules for compatibility are stricter, in that they demand
not only linkability, but also "equivalence in behaviour whether
or not a dependent class is recompiled as well".
I'm sure there are many more of these which further analysis would reveal.
I also fear so ... but once I leave out the problem to list all dependents,
I think that what remains should be possible and still useful.
One more note: this is an ideal open source project, since it could be
greatly useful to the development community and there is no money to be made
by solving it.

I wouldn't say so. Making builds reliable without resorting to always
doing full rebuilds might safe some costs. However, I'm a fan of not
only using open-source, but also contributing to it ...
 
M

Michael Jung

Andreas Leitgeb said:
Michael Jung said:
Andreas Leitgeb said:
Is this what you want: every time a file A changes, all dependant files (B)
should be recompiled automatically?
For some definition of "automatically", yes :)
I do *not* expect javac to handle reverse-dependencies (B) as it does
forward-dependencies. That would be a bad thing.
What I want instead is some team-work of ant and javac.
Example 1: ant passes all files (of the codebase) to javac (plus
some new option) and javac will first find all the changed ones,
and then all the type-"B" ones among the others and compile those
as well, but not those unrelated to all regenerated ones.
[...]
Sticking to this: This would require ant/javac to walk through all of the
codebase and then through all of the imports.
"Walk through all the codebase" ... this sounds quite expensive, but
actually this happens with every incremental build already: ant checks
every .java-file in the codebase, whether it's newer than its .class
file.

No, it only checks the file that is scheduled for compiling. That's "just" a tree
out of your graph.
The advantage of javac doing the reverse-dependencies itself could be,
that it could trigger those recompilations only if not just the depended
source has changed, but it even also changed it's interface.
That would mean: a central java-class adding a new method/field or
changing only implementation could even skip the reverse-dependency-
handling. If, however, some class changed its interface (like changing
a static final, adding abstract methods, or removing non-abstract ones),
then anything else than following reverse-dependencies leaves an
inconsistent state among your .class-files.

How would you do that? A would be the target of javac, right? Then see "walk
through the whole codebase".
Perhaps following reverse-dependencies isn't the only solution.
Having a way to detect binary-incompatible interface-changes
in any of the recompiled classes (during the normal incremental
build, like ant already supports) would let me know when to start
a full-compile. This might even be enough.

That is still the same thing. You must walk the whole codebase.
Actually, now that I think of that, this could even be done
without any enhancements on javac.

I'm pretty sure it must be done without javac, because javac works only
"downward". Complete "upward" walking is only possibly when everything is
available, generally at runtime.
I never made any claims, that a build should be auto-started on
file-change. It's quite comfortable for the guy with the big machine
(to whom a background compile is hardly noticable). The other guys rather
turn off auto-build-on-file-save (as well as auto-build-on-key-typed :)

The guy with the big machine can spare a few moments for a rebuild of a
subproject with a change in constants. Even IDL-constants shouldn't be spread
about the whole project. You should know who far they carry.
That's what I feared since the start of this discussion. Is
dependency-analysis necessarily more (or almost as) expensive
than a full compile?

I'm pretty sure it's of the same order with regards to code base size in
general. What the factor between them is, will depend on your needs.

There is a way to circumvent that by keeping the reverse-dependencies in some
database, which you update as B's are checked in and which you query when A's
are checked in.

Michael
 
M

Mike Schilling

Andreas Leitgeb said:
No, not possible (unless javac was modified), because usage of static
finals is not explicitly "mentioned" in the .class file.

I didn't quite believe this, so I created an example, got out my handy class
file analyzer and found that you're completely correct. Even compiling with
debugging information, there is no information put in the class file about
which constants were referenced, or even which classes constants were
referenced from.
Making builds reliable without resorting to always
doing full rebuilds might safe some costs.

As someone who was in the software tools business for years, I feel
confident in predicting that you wouldn't find enough people that would pay
for it to recover your development costs.
 
A

Andreas Leitgeb

Mike Schilling said:
I didn't quite believe this, so I created an example, got out my handy class
file analyzer ...

I'm curious as to which tool you use for this task. The way you said that
("got out my ...") seems to me to indicate that you've also made your own...

I have written one myself (in Tcl, not in Java, so I do the parsing completely
myself), because I didn't like javap hiding away private fields and methods.
Unfortunately the user-interface of my script is still somewhat cryptic (not
yet good enough for prime time).

PS: this is not meant as a general question about such tools. Meanwhile I know
some already, but back then, when I wrote my own, all I knew then was javap.
 
M

Mike Schilling

Andreas Leitgeb said:
I'm curious as to which tool you use for this task. The way you said that
("got out my ...") seems to me to indicate that you've also made your
own...

I did. It's rudimentary so far, just parses out and prints all of the bits
including the constant pool entries. When I have time I'll flesh it out to a
tool that helps determine class dependencies, so that I can safely
re-organize a large existing code base.
 
A

Andreas Leitgeb

Michael Jung said:
I'm pretty sure it's of the same order with regards to code base size in
general. What the factor between them is, will depend on your needs.
There is a way to circumvent that by keeping the reverse-dependencies in some
database, which you update as B's are checked in and which you query when A's
are checked in.

The current approach is a bit more reluctant. I'd just like to
determine "relevant" changes between each new .class file and it's
previous version (stored in some database or file)

My current focus would be finding out what changes in a class are
relevant, and which are not. Based on the existence of relevant
changes, a full-compile of the project would be triggered.

What is "relevant"? The definition would be: any change in
a class A, for which a class B exists in the project's codebase,
which compiled with old and new version of A.java in place would
result in different generated .class files. (not only B.class!)
This definition is not yet cast in stone, since even changes
in the classes generated from such B.java could also be
subject to qualification as effective or non-effective.

But how do we practically determine "relevant"ness?
I think we can only carefully *approach* it for now.

As a very first step, any change in the class' interface could be
considered relevant, which would save full-compiles when only
method-implementations or private members were changed.
(Note that private nested classes are classes on their own, so
adding one, or even removing one is irrelevant, and for changing
one, the same rules apply on that private class itself)

For the next step, adding a new non-static method (whose name wasn't
yet used in that class, not even with a different signature), or certain
types of private methods or data could be considered harmless.

Also, as long as no variadic methods exist yet, even overloaded method
names with a previously nonexisting arity should be harmless. (There is
no way for the compiler to pick these instead of te existing ones in
the course of recompiling any dependent class.)

Any other interface changes, that are provably irrelevant to
recompiling depending classes?
 
M

Michael Jung

Andreas Leitgeb said:
The current approach is a bit more reluctant. I'd just like to
determine "relevant" changes between each new .class file and it's
previous version (stored in some database or file)

I'd try to keep it simple. If you try to restrict dependency too much, you
might need a compile to determine dependency and that would gain you nothing.
Go through imports. Also beware of successive dependency, e.g. classes B that
inherit from classes that inherit from A.

Michael
 
A

Andreas Leitgeb

I'd try to keep it simple.
That's indeed my intention.
If you try to restrict dependency too much, you
might need a compile to determine dependency and that would gain you nothing.
The point is: I've given up the task of finding reverse-dependecies
altogether, since I now think it's really impossible to get right.

I just scan the tree for class-files which are newer than they were at
last scan, and for each new class file I compare its current interface
with the stored old one, and if only one differs, I raise a flag that
suggests/triggers a full compile.

E.g.: I see a class with a static method which it didn't have before,
then I know, that in principle there *could* exist some other class,
which *might* be affected in some way, which is enough to say: "Hey
developer! better recompile all!"

The bonus is, that if none of these relevant changes happened, the
developer *knows* that the previous incremental build was safe.

PS: there are still more caveats to my approach, e.g. if a build
runs into errors, some class-files might be already new,
whereas others are still old, and if this mixed state gets fed
into the interface-database, it turns into garbage.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,196
Latest member
ScottChare

Latest Threads

Top