Incremental Java Compile

J

Joshua Maurice

I've worked in systems roughly as large as yours (tens of thousands of
source file) which were layered, so that each seperately compiled subsystem
had  at most a few hundreds of files. At that point, there's no particular
advantage to avoiding clean builds.

During development, a developer works on a small set of subsystems.  He
knows when he's changing interfaces rather than implementations, and at that
point can afford the clean build.

The automated build-and-test might spend an hour or so on the clean build,
but that's a small fraction of the time the tests take.

We recently were handed out this book describing Scrum, a variant of
agile development. I agree with what the author bolds and italicizes,
that interfaces need to be \stable\ (just the word \stable\).

I would love what you describe. However, my fellow employees and
managers understand little and respect little of what decoupled,
relatively well thought out, well defined interfaces can do for them.
It's always about the new feature. No code cleanup ever really get
done. My only real option to attack that front is to vote with my
feet. (As an example, I remember this one time that an architect at
the company in question said it was perfectly fine to use a finalizer
to manage C standard library heap memory allocated via JNI. I
protested quite vigorously.)

Also, as a potentially incorrect observation, do you think most java
developers use notepad or some other text editor to do their work? I
would suspect that most people use Eclipse nowadays. Eclipse is almost
exactly what I want from a build system, except it's limited to Java.
It's a nearly fully incrementally correct build system, and is a lot
better than I could ever do on my own as a side project. Would you all
be saying the same straw man arguments if you lost your incremental
IDE and had to use notepad / wordpad / emacs without all your cool
java-specific stuff to do your work? I think not.
 
M

Mike Schilling

Joshua said:
We recently were handed out this book describing Scrum, a variant of
agile development. I agree with what the author bolds and italicizes,
that interfaces need to be \stable\ (just the word \stable\).

I would love what you describe. However, my fellow employees and
managers understand little and respect little of what decoupled,
relatively well thought out, well defined interfaces can do for them.
It's always about the new feature. No code cleanup ever really get
done. My only real option to attack that front is to vote with my
feet. (As an example, I remember this one time that an architect at
the company in question said it was perfectly fine to use a finalizer
to manage C standard library heap memory allocated via JNI. I
protested quite vigorously.)

If the situation is completely (^&%ed and no one with the power to fix it
will do anything, by all means find a better place.
Also, as a potentially incorrect observation, do you think most java
developers use notepad or some other text editor to do their work? I
would suspect that most people use Eclipse nowadays. Eclipse is almost
exactly what I want from a build system, except it's limited to Java.
It's a nearly fully incrementally correct build system, and is a lot
better than I could ever do on my own as a side project. Would you all
be saying the same straw man arguments if you lost your incremental
IDE and had to use notepad / wordpad / emacs without all your cool
java-specific stuff to do your work? I think not.

I don't use Eclipse, not do I do builds from within the IDE I do use
(IntelliJ), since our build system is complicated enough to require ANT.
(I could probably make IntelliJ call ANT, but I've never bothered.)

Anyway, I don't think people are dismissing you because they have a solution
they won't tell you about. Your company has built a horrific system far
outside the parameters of what Java was intended to handle, and they're
paying the price for that. It's a lot like someone who puts 100 novels in a
single Word document and complains that it's slow. Yes, it is.
 
J

Joshua Cranmer

Also, as a potentially incorrect observation, do you think most java
developers use notepad or some other text editor to do their work? I
would suspect that most people use Eclipse nowadays. Eclipse is almost
exactly what I want from a build system, except it's limited to Java.

As an aside, I've noticed that not using a fully-featured IDE seems to
increase my productivity. Trying to figure out how to get it to just add
on a single -classpath argument to the build step was an hour of my life
wasted. Not to mention the length of time it takes to start up on my
system, as well as the braindead autocompletion it attempts to do.
It's a nearly fully incrementally correct build system, and is a lot
better than I could ever do on my own as a side project. Would you all
be saying the same straw man arguments if you lost your incremental
IDE and had to use notepad / wordpad / emacs without all your cool
java-specific stuff to do your work? I think not.

Actually, I think IDEs tend to fall flat on their face when presented
with humongous heterogeneous heaps of code which defy standard build
system logic and which require large databases of tag information (i.e.,
in the 100s of MB range or higher).

I do work on a project which requires about 2 hours to build on my
laptop whenever I update the source, aggravated by the fact that the
thermal setpoints on my laptop appear to be misset and not attending the
build would frequently cause it to shut down due to overheating.

As people have repeated stated:
1. You can't abstract dependency data out of class files alone.
2. If your project were properly compartmentalized, this wouldn't be an
issue.
3. If fixing the design is really so big a deal, then you are up a creek
without a paddle.
 
T

Tom Anderson

And perhaps you won't be so damn rude next time. What the hell?

You have consistently rejected every piece of good advice, given
complete nonsense excuses for doing so, and thrown mud in the face of
people who try to help you. What a piece of work!

He hasn't been given good advice. He's been given advice completely
unrelated to the problem he's explained at great length, and which is not
nonsense. The only mystifying thing is that he's still here, rather than
having long since buggered off in search of people who might engage with
his problem.

tom
 
T

Tom Anderson

As an aside, I've noticed that not using a fully-featured IDE seems to
increase my productivity. Trying to figure out how to get it to just add on a
single -classpath argument to the build step was an hour of my life wasted.

Could you expand on that? What do you mean by 'build step', and how did
you do it? Was this some specific and unusual need, or did you just not
know about the Configure Build Path dialogue box?
Not to mention the length of time it takes to start up on my system,

Fair enough!
as well as the braindead autocompletion it attempts to do.

Could you expand on that too?
As people have repeated stated:
1. You can't abstract dependency data out of class files alone.

He knows that, and has stated so.
2. If your project were properly compartmentalized, this wouldn't be an
issue.

He knows that, and has stated so.
3. If fixing the design is really so big a deal, then you are up a creek
without a paddle.

Which is exactly why he's trying to build a paddle!

tom
 
A

Arne Vajhøj

He hasn't been given good advice. He's been given advice completely
unrelated to the problem he's explained at great length, and which is
not nonsense. The only mystifying thing is that he's still here, rather
than having long since buggered off in search of people who might engage
with his problem.

The advice is not unrelated to his problem (except for Roedys,
but that is not special for his questions).

He has been giving the correct advice.

They need to fix their way of building software.

If they do that then their build time problems will
disappear. And a lot of other problems.

He has gotten explanation of some of the technical
difficulties that the tool he asks for will face.

He has not gotten such a tool. Because nobody wants
to spend lots of time (thousands of hours) developing
a tool that is only needed in completely fucked up
environments.

Arne
 
A

Arne Vajhøj

We recently were handed out this book describing Scrum, a variant of
agile development. I agree with what the author bolds and italicizes,
that interfaces need to be \stable\ (just the word \stable\).

I would love what you describe. However, my fellow employees and
managers understand little and respect little of what decoupled,
relatively well thought out, well defined interfaces can do for them.
It's always about the new feature. No code cleanup ever really get
done. My only real option to attack that front is to vote with my
feet. (As an example, I remember this one time that an architect at
the company in question said it was perfectly fine to use a finalizer
to manage C standard library heap memory allocated via JNI. I
protested quite vigorously.)

You should fix that problem instead of searching for the
magic tool that can compensate for those problems.
Also, as a potentially incorrect observation, do you think most java
developers use notepad or some other text editor to do their work? I
would suspect that most people use Eclipse nowadays. Eclipse is almost
exactly what I want from a build system, except it's limited to Java.
It's a nearly fully incrementally correct build system, and is a lot
better than I could ever do on my own as a side project. Would you all
be saying the same straw man arguments if you lost your incremental
IDE and had to use notepad / wordpad / emacs without all your cool
java-specific stuff to do your work? I think not.

If you think it is useful, then the Eclipse compiler is
open source and you can grab it and hack it to do what you
want.

Arne
 
M

Mike Schilling

Arne said:
The advice is not unrelated to his problem (except for Roedys,
but that is not special for his questions).

He has been giving the correct advice.

They need to fix their way of building software.

If they do that then their build time problems will
disappear. And a lot of other problems.

He has gotten explanation of some of the technical
difficulties that the tool he asks for will face.

He has not gotten such a tool. Because nobody wants
to spend lots of time (thousands of hours) developing
a tool that is only needed in completely fucked up
environments.

I'm qoing to quibble. First, it wouldn't take thousands of hours; hundreds
at most. (My latest idea is to modify javac to create dependency files.
That way you wouldn't need to do a seperate source analysis to find, where,
e.g. constants are used.) Second, it would be useful, though not required,
in all environments with a big source tree. If I pull down the latest
changes from the SCM and see that some are in low-levbel utility routines, I
probably don't need to recompile all the code that used them, but I don't
know for sure. That's annoying. And if I start to see odd errors, not
knowing whether taking the 20 or 30 minutes to build everything clean is the
fix or a waste of time is annoying as well. If such a tool existed, I'd use
it.

But I'll agree that the reason no such tool exists is that almost all opf us
we get on well enough without it.
 
J

Joshua Maurice

If anyone else cares, I managed to inadvertently stumble across a
solution. On impulse, I asked a co-worker at lunch. It seems that
class files do not contain sufficient information with default javac
options. However, when compiled with -g, it contains a listing of all
types used in the compile. When combined with Ghost Dependencies, I
think this can result in a correct incremental build at the file level
which will not cascade endlessly downstream. I'm working on the
finishing touches to my prototype now.
 
M

Mike Schilling

Joshua said:
If anyone else cares, I managed to inadvertently stumble across a
solution. On impulse, I asked a co-worker at lunch. It seems that
class files do not contain sufficient information with default javac
options. However, when compiled with -g, it contains a listing of all
types used in the compile. When combined with Ghost Dependencies, I
think this can result in a correct incremental build at the file level
which will not cascade endlessly downstream. I'm working on the
finishing touches to my prototype now.

You realize that you're now going to recompile a class when it refers to
another class to which a comment was added.
 
L

Lew

Unplonk. Whatever.

If I plonked everyone who's rude here I wouldn't be allowed to post either.
 
J

Joshua Maurice

You realize that you're now going to recompile a class when it refers to
another class to which a comment was added.

Yes. I'm pretty sure that it would be better than doing a full clean
build or a cascading jar-dir-unit incremental build.
 
M

Mike Schilling

Joshua said:
Yes. I'm pretty sure that it would be better than doing a full clean
build or a cascading jar-dir-unit incremental build.

No doubt, but the result isn't the minimal amount of recompilation we were
discussing earlier.
 
T

Tom Anderson

Are you sure?

$ javac -version
javac 1.6.0_16
$ echo "class Foo {public static final int X=23;}" >Foo.java
$ echo "class Bar {public static final int Y=Foo.X;}" >Bar.java
$ javac -g Foo.java Bar.java
$ grep Foo Bar.class
$

I can see no sign of Bar.class containing any mention of Foo.
Yes. I'm pretty sure that it would be better than doing a full clean
build or a cascading jar-dir-unit incremental build.

The previous time we discussed this, the idea came up of looking at
changed class files to see if the changes were consequential -
essentially, if the change changed the interface of the class (added a
method, changed a method's signature, changed the value of a constant,
etc). If you did that, you could filter the changes so that only
consequential ones triggerd recompilation of dependents. That would avoid
the unnecessary recompilation Mike mentions, wouldn't it?

tom
 
J

Joshua Maurice

Are you sure?

$ javac -version
javac 1.6.0_16
$ echo "class Foo {public static final int X=23;}" >Foo.java
$ echo "class Bar {public static final int Y=Foo.X;}" >Bar.java
$ javac -g Foo.java Bar.java
$ grep Foo Bar.class
$

I can see no sign of Bar.class containing any mention of Foo.

Apparently I am mistaken. I would suggest looking for "Bar" and not
"Bar.class", but the result is the same. static finals might be an
exception to the debug information, which is sad. I'm wondering how
I'll work around this now. I am still in the process of implementing,
so I haven't really been able to test, or I would have caught this
eventually. Thanks for letting me catch it earlier. I could still
catch this through Ghost Dependency analysis, but it becomes more
tricky. I'll have to think about it. At a minimum, I could detect all
class files which have static final fields, and force all classes
downstream to be out of date. Not very incremental in this case, but
at least it's correct. Hopefully this is the only such corner case. I
need more tests.
 
J

Joshua Maurice

No doubt, but the result isn't the minimal amount of recompilation we were
discussing earlier.

I'm not sure what this minimal recompile which we were discussing is.
It is technically impossible to do a true minimal recompile
algorithmically. Let's define it as "Let a build be a set of file
compilations. Let the minimum recompile be the minimum such set for
which the output class file are equivalent to the class files of a
full clean build." First, we'd have to prove such a minimum exists.
That's relatively straightforward. With that out of the way, I think I
could then prove that the problem is equivalent to the Halting
problem. If you define "equivalent" generously, I'm pretty sure this
is the case. If you define it as "same binary file content", then
perhaps not, though still possibly yes.

Either way, this is not my goal. If someone modifies comments to a
Java source file, I'm not going to try and catch that. What I will do
is recompile all files which depend directly on that changed-source
Java file, any files affected by Ghost Dependencies, and continue
cascading this change down until all of the "leaves" of the cascading
recompile are binary equivalent class files to the class files before
the recompile. Perhaps too conservative, but I think that's easy
enough to show that it's correct. Perhaps I'll make it a "tighter fit"
later, though honestly I'm still fumbling around in the dark at the
moment, still learning.
 
T

Tom Anderson

I'm not sure what this minimal recompile which we were discussing is. It
is technically impossible to do a true minimal recompile
algorithmically. Let's define it as "Let a build be a set of file
compilations. Let the minimum recompile be the minimum such set for
which the output class file are equivalent to the class files of a full
clean build." First, we'd have to prove such a minimum exists. That's
relatively straightforward.

Extremely so.
With that out of the way, I think I could then prove that the problem is
equivalent to the Halting problem.

Certainly not.
If you define "equivalent" generously, I'm pretty sure this is the case.
If you define it as "same binary file content", then perhaps not, though
still possibly yes.

I'm not sure what you mean by 'generously'. Is there a kind of equivalence
less strict than binary equivalence which would actually work?

Anyway, here's a straightforward but slow algorithm to find the minimal
recompile:

1. Copy all your source code somewhere and do a clean build on it; call
the output the reference output
2. Count your source files, and call the total number N
3. Number all your source files, starting at 0 and going up to N
4. Let M be the set of all source files
5. For each integer i between 0 and 2**N - 1:
6a. Let S be the set of source files for whose number j, the jth bit in i
is set
6b. Do a recompilation of just the files in S
6c. Compare the output to the reference output, and if it is identical,
and the size of S is smaller than the size of M, let M be S
6d. Restore the class files to how they were before recompilation

S now contains the set of source files needed for a minimal recompile. It
doesn't follow from this algorithm that it's the only minimal set,
although i suspect that in practice it will be.

I wouldn't suggest you do this in practice, but it shows that the minimal
set exists, can be found algorithmically, and can be found in O(2**N)
time, with a rather large constant. Your task is thus merely to improve
the speed!
Either way, this is not my goal. If someone modifies comments to a Java
source file, I'm not going to try and catch that. What I will do is
recompile all files which depend directly on that changed-source Java
file, any files affected by Ghost Dependencies, and continue cascading
this change down until all of the "leaves" of the cascading recompile
are binary equivalent class files to the class files before the
recompile. Perhaps too conservative, but I think that's easy enough to
show that it's correct.

Agreed. If you were a bit more aggressive about the consequentiality of
changes, you could prune off a lot of the leaves of the tree, but it
wouldn't be an asymptotic speedup.

That said, i don't think it would be that hard to work out
consequentiality. The output of javap is almost exactly what you need - i
think the only thing it's missing is those bloody constant values. Adding
them doesn't look hard. This is the relevant bit of javap's source:

https://openjdk.dev.java.net/source...s/javap/JavapPrinter.java?rev=257&view=markup

You need to change the line that says:

out.println(fields[f].getType()+" " +fields[f].getName()+";");

To say:

int cpx = fields[f].getConstantValueIndex();
if (cpx == 0) out.println(fields[f].getType()+" " +fields[f].getName()+";");
else out.println(fields[f].getType()+" " +fields[f].getName()+" = "+cls.getCpoolEntryobj(cpx)+";");

That adds the value of any compile-time constants to the output. I'm not
sure if it will also add values to instance fields which have initial
values; i think those are handled in the constructors, rather than as
ConstantValue attributes.

You could then compare the output of javap, or hashes of that output, to
determine if the interface of the class had changed. If it hasn't, then
any changes to the class file are inconsequential in terms of
recompilation.

Unless i've missed something. Notably absent from javap output is
annotations - can the annotations on a class affect compilation of other
classes which refer to it? @Override only affects the declaring class.
@Deprecated could affect another class, but could only cause a warning to
be generated.
Perhaps I'll make it a "tighter fit" later, though honestly I'm still
fumbling around in the dark at the moment, still learning.

PROTIP: that phase never actually ends.

tom
 
T

Tom Anderson

Apparently I am mistaken. I would suggest looking for "Bar" and not
"Bar.class", but the result is the same. static finals might be an
exception to the debug information, which is sad. I'm wondering how
I'll work around this now.

I wonder how hard it would be to modify javac to add an attribute to
classes to record the origins of any inlined constants. That would let you
pull the information out from the class file later on, just as you can
already do with all the other types of dependency.

A bit of a poke around in the javac source code suggests it already has
code for tracking dependencies, which is there to support JWS in some way.
There are some flags that should switch it on (-xdepend, -Xdepend, -Xjws),
but they don't work on the version i have installled. There's also some
flag you can set on the compilation environment object that will make it
print dependencies, so if you're driving compilation from code, you should
be able to set that. You'd have to parse the compiler output, but that's
not that bad.
I am still in the process of implementing, so I haven't really been able
to test,

You're doing it wrong. Test first, test incrementally, build things in a
way you can test as you go. Before you start work for real, do a
higher-level test, a 'spike solution', to make sure that all your
assumptions (like this one) are valid.
Hopefully this is the only such corner case. I need more tests.

Always true!

Since you have this vast and terrifying codebase, you can probably
generate some pretty thorough functional tests from it. Take pairs of
adjacent revisions from source control, do full builds on each, find the
differences in output, then apply your tool and see if it comes up with
the right answers. You should be able to automate the process of turning a
pair of revisions into a test suite, and then you can just leave it to
crank away generating them for a few days.

tom
 
M

Mike Schilling

Joshua said:
I'm not sure what this minimal recompile which we were discussing is.
It is technically impossible to do a true minimal recompile
algorithmically. Let's define it as "Let a build be a set of file
compilations. Let the minimum recompile be the minimum such set for
which the output class file are equivalent to the class files of a
full clean build." First, we'd have to prove such a minimum exists.
That's relatively straightforward. With that out of the way, I think I
could then prove that the problem is equivalent to the Halting
problem. If you define "equivalent" generously, I'm pretty sure this
is the case. If you define it as "same binary file content", then
perhaps not, though still possibly yes.

Why would you think that? The ways in which a change to A can affect B is
finite and well-defined.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,282
Latest member
RoseannaBa

Latest Threads

Top