Incremental Java Compile

J

Joshua Maurice

I'm back. I've been learning a lot over the last couple of months,
toying around with solutions, and realizing the inadequacies of each
approach I've tried.

From a high level perspective, I realized what I want from a build
system: A developer should be able to do any combination of the
following actions and trigger a small / minimal / incremental build,
and it should be correct without any corner cases of incorrectness.
1- Add, remove, and/or edit a source file, such as a Java file, cpp
file, etc.
2- Add, remove, and/or edit a build system script file to invoke a
standard rule, such as adding a new jar to the build, modifying a
classpath of an existing jar, removing a jar from the build, etc.

Specifically, some actions would require a full clean build:
1- If the logic which tracks incremental dependencies is changed, then
the developer must do a full clean.
2- If any other tool, such as Javac version, changes, then the
developer must do a full mean.
These are akin to modifying the build system itself. That I understand
has very hard "backwards compatibility" issues. That's outside the
scope of what I want. However, the aforementioned activities of
messing with source files, and invoking build system macros / rules to
create new standard binaries should "just work", and it should
"just work" quickly.

So, I've been trying to do this for Java. Man this is actually quite
hard, harder the more I learn. I think I finally "struck gold" when I
found this wonderful paper here:
http://www.jot.fm/issues/issue_2004_12/article4.pdf
For build systems focusing on Java, I rank it as just as important as
Recursive Make Considered Harmful.

However, the paper assumes that the build will "cascade", or recompile
everything downstream. I am trying very hard to avoid this if
possible, to get a much smaller rebuild without writing my own Java
compiler ala Eclipse. I think my current solution in my head will
work, using a combination of
1- Ghost Dependencies
2- Each class file depends on the last not-no-op build of all
previously used class files from the last build.

I finally finished an implementation of part 1. Part 2 is much easier
if I rely on Javac's verbose output, but to do that I need to do a
compile up front, passing all the out of date java files, and then a
separate compile per java file to get useful information from Javac's
output, to know for exactly which java file was a class loaded.

So, I post here because I feel better prepared to discuss this
subject. I still disagree that "build from clean" is the correct
answer. That would make our product's build still around ~25 minutes
for just the Java compilation of around ~20,000 source files (and
growing). There must / should be something better. Separation
translation units make so much sense. I just wish Java had them.

And yes, we're also working on "componentizing" to some degree, but
when all of the components are under active development, I would still
very much like builds to be as fast as possible to do regular
integration tests on an automated build machine.

So, anyone know a quick and easy way to get the list of class files
loaded during a compile, and know for exactly which subset of java
files in the compile is the class file is needed? Invoking a separate
Javac after the fact (using tools.jar analyze API) almost 4x my
overall from-clean build time for a subset of real code in my
company's codebase.
 
L

Lew

Joshua said:
So, I post here because I feel better prepared to discuss this
subject. I still disagree that "build from clean" is the correct
answer. That would make our product's build still around ~25 minutes
for just the Java compilation of around ~20,000 source files (and
growing). There must / should be something better. Separation
translation units make so much sense. I just wish Java had them.

What, you never heard of JAR files?

There's no excuse for "build clean" having to touch all 20K files.
 
M

Mike Schilling

markspace said:
Ant can do most kinds of Java source dependencies for you:

http://ant.apache.org/manual/OptionalTasks/depend.html

"The most obvious example of these limitations is that the task can't tell
which classes to recompile when a constant primitive data type exported by
other classes is changed. For example, a change in the definition of
something like
public final class Constants {
public final static boolean DEBUG=false;
}

will not be picked up by other classes. "

That is, it's an incremental (no pun intended) improvement on the usual Ant
algorithm of "recompile what onviouslt needs recompilation; if that doesn't
seem to work, do a clean build"
 
L

Lew

Mike said:
"The most obvious example of these limitations is that the task can't tell
which classes to recompile when a constant primitive data type exported by
other classes is changed. For example, a change in the definition of
something like
public final class Constants {
public final static boolean DEBUG=false;
}

will not be picked up by other classes. "

That is, it's an incremental (no pun intended) improvement on the usual Ant
algorithm of "recompile what onviouslt needs recompilation; if that doesn't
seem to work, do a clean build"

You can't blame Ant for that one. The class that depends on the compile-time
constant, such as 'DEBUG' in your example, compiles the constant into its
class, not the symbol. Without some external indication of the dependency,
there's not any way for a compiler or build tool to detect that it exists.

With respect to dependencies where the symbol is stored in the class rather
than its value, even 'javac' handles the situation pretty well.
 
M

Mike Schilling

Lew said:
You can't blame Ant for that one.

True; it's the design of Java that doesn't lend itself to calculating
dependencies with a reasonable amount of effort.
The class that depends on the
compile-time constant, such as 'DEBUG' in your example, compiles the
constant into its class, not the symbol. Without some external
indication of the dependency, there's not any way for a compiler or
build tool to detect that it exists.

Other than by noting it when the symbol is used during compilation, and
storing that bit of information somewhere. But the details of that get
messy.
With respect to dependencies where the symbol is stored in the class
rather than its value, even 'javac' handles the situation pretty well.

There are other difficult cases, like a method being added in class A that
results in a method in B (one of A's descendents) becoming overloaded, such
that a client of B should now choose the new overload. Java really makes
this stuff hard. (C# is no better.)
 
L

Lew

Mike said:
Other than by noting it when the symbol is used during compilation, and
storing that bit of information somewhere. But the details of that get
messy.

I said that there is no way, not that there couldn't be a way.

Mike said:
There are other difficult cases, like a method being added in class A that
results in a method in B (one of A's descendents) becoming overloaded, such
that a client of B should now choose the new overload. Java really makes
this stuff hard. (C# is no better.)

How would any language handle this?

Short of a class being aware of every possible past, present and future
extension of it.

You do present a good argument against overuse of inheritance.
 
M

markspace

Lew said:
How would any language handle this?

Class A exports a method:

public class A {
public void m( Object o ) {}
}

which B uses:

public class B {
public void b(A a) { a.m( "Hello" ) }
}

Now B has a dependency on A. If A changes, for any reason:

public class A {
public void m( Object o ) {}
public void m( String s ) {}
}

or:

public class A {
public void x( Object o ) {}
}


then B has to recompiled. That's pretty standard stuff I think. There
really isn't any need to detect an overloaded method, this simple
dependency graph catches it, and many other cases too.
 
M

Mike Schilling

Lew said:
I said that there is no way, not that there couldn't be a way.




How would any language handle this?

Languages that store the defintion of a class in a different file than its
implementation (e.g. C++) handle it by simple comparisons of file dates.
It's also possible to do this by having the compiler update a repository of
class definitions (I used to develop a system that did just that.)
Short of a class being aware of every possible past, present and
future extension of it.

You'd need to do it the other way around -- have the client of B ask B if
its definition had changed, and have B in turn ask A.
 
M

Mike Schilling

markspace said:
Class A exports a method:

public class A {
public void m( Object o ) {}
}

which B uses:

public class B {
public void b(A a) { a.m( "Hello" ) }
}

Now B has a dependency on A. If A changes, for any reason:

public class A {
public void m( Object o ) {}
public void m( String s ) {}
}

or:

public class A {
public void x( Object o ) {}
}


then B has to recompiled. That's pretty standard stuff I think. There
really isn't any need to detect an overloaded method, this
simple dependency graph catches it, and many other cases too.

The overload is a subtle case, because the added method isn;t actually used
by anyone, so dependencies at the granularity of method usage won't catch
it. (Unless you conflate all overloads as being "the same method", which is
probably a good idea.)
 
A

Andreas Leitgeb

Mike Schilling said:
Now B has a dependency on A. If A changes, for any reason: [...]
then B has to recompiled. That's pretty standard stuff I think. There
really isn't any need to detect an overloaded method, this
simple dependency graph catches it, and many other cases too.
The overload is a subtle case, because the added method isn;t actually used
by anyone, so dependencies at the granularity of method usage won't catch
it. (Unless you conflate all overloads as being "the same method", which is
probably a good idea.)

Or, somewhat finer-grained: conflate all overloaded methods that take the
same number of arguments - with some special reasoning about varargs...

Another approach was: Maintain a database of each .class's API,
and if, after an incremental build, any recompiled class has a
changed API, or any class was removed, or added, then do a clean
build. Insert "non-private" whereever you find it appropriate.
 
J

Joshua Maurice

What, you never heard of JAR files?

There's no excuse for "build clean" having to touch all 20K files.

And what if all of the code is under active development, aka new
features are being added to each layer on a weekly basis?

And what if a large portion of that Java code is generated from a
model file to facilitate serialization between C++ and Java? Thus a
change to a single file would require recompiling a large amount of
generated "interface" files, which theoretically touches a large
portion of the 20,000 Java files.

And that's still no excuse to not having an incremental compile. Even
if componentized, with a full clean build every time, that could be 5
to 10 minutes lost of my work for every compile which is just wasted
time.
 
J

Joshua Maurice

You can't blame Ant for that one.  The class that depends on the compile-time
constant, such as 'DEBUG' in your example, compiles the constant into its
class, not the symbol.  Without some external indication of the dependency,
there's not any way for a compiler or build tool to detect that it exists..

With respect to dependencies where the symbol is stored in the class rather
than its value, even 'javac' handles the situation pretty well.

I'm not blaming anyone in particular. I just want to know how to get a
fully correct, aka 100% incremental build under the actions: adding,
removing, modifying java files, and adding, removing, or modifying
build steps of "take these jars, compile them to class files, then jar
them", aka the standard developer actions.
 
J

Joshua Maurice

I said that there is no way, not that there couldn't be a way.



How would any language handle this?

Short of a class being aware of every possible past, present and future
extension of it.

You do present a good argument against overuse of inheritance.

Read the paper in my opening post, Ghost Dependencies.
 
J

Joshua Maurice

Languages that store the defintion of  a class in a different file than its
implementation (e.g. C++) handle it by simple comparisons of file dates.
It's also possible to do this by having the compiler update a repository of
class definitions (I used to develop a system that did just that.)

Actually no. It's not that. It's that the search path lookup is
handled in a fundamentally different way. In Java, any piece of code
anywhere in the file can result in a file system lookup, and this
lookup is "ambiguous", where I mean subject to change, or depends on
context. See the paper in my opening post Ghost Dependencies.

C++ gets around this by having 2 separate compilation steps, the
preprocessor, and the compiler proper. The preprocessor has a very
simple well defined lookup process which is not context dependent or
"ambiguous" unlike Java's lookup process. The preprocessor produces a
single file which the compiler proper takes and produces a single
output file. The entire thing is much less black box and much less
complex than Java's classpath lookup which makes it much easier to
produce a correct incremental build.
 
J

Joshua Maurice

Mike Schilling said:
Now B has a dependency on A.  If A changes, for any reason: [...]
then B has to recompiled.  That's pretty standard stuff I think. There
really isn't any need to detect an overloaded method, this
simple dependency graph catches it, and many other cases too.
The overload is a subtle case, because the added method isn;t actually used
by anyone, so dependencies at the granularity of method usage won't catch
it.  (Unless you conflate all overloads as being "the same method", which is
probably a good idea.)

Or, somewhat finer-grained: conflate all overloaded methods that take the
same number of arguments - with some special reasoning about varargs...

Another approach was:  Maintain a database of each .class's API,
and if, after an incremental build, any recompiled class has a
changed API, or any class was removed, or added, then do a clean
build.  Insert "non-private" whereever you find it appropriate.

This would work, but it's "overkill", and not very incremental. I was
hoping for a much more "minimal" rebuild.
 
L

Lew

Joshua said:
And what if all of the code is under active development, aka new
features are being added to each layer on a weekly basis?

You have not designed your system in a very modular way.
 
J

Joshua Maurice

You have not designed your system in a very modular way.

Agreed. Sadly, as I'm a more junior developer, not much I can do about
it for such a large codebase, a fair share of which predates C++98
standardization.
 
A

Arne Vajhøj

I'm not blaming anyone in particular. I just want to know how to get a
fully correct, aka 100% incremental build under the actions: adding,
removing, modifying java files, and adding, removing, or modifying
build steps of "take these jars, compile them to class files, then jar
them", aka the standard developer actions.

To me the entire idea is rather pointless.

The tool can not be made to work with binaries.

It should be possible to do it working with source code.

But it would require a huge effort to create a 100% working tool.

You could solve your build problems for much less effort
by working on the structure of the project.

The project does not provide bang for the buck.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,008
Latest member
Rahul737

Latest Threads

Top