Incremental Java Compile

J

Joshua Maurice

Why would you think that?  The ways in which a change to A can affect B is
finite and well-defined.

Let's suppose the source changed from "2" to "(1+1)". Using the strict
interpretation, the output class files would probably be equivalent
(ignoring debug information like original source code). Thus a rebuild
is technically not required.

Or, a slightly less trivial case: Suppose we have class A which has 2
public static functions A1 and A2 which have independent
implementations. Class C uses A1. Suppose someone comes along and
changes A2 in some meaningful way. Class C does not need to be
recompiled, but any sort of file level dependency analysis which is
correct would recompile Class C.

If you define it as binary file contents as equivalent output files,
then it's not equivalent to the Halting problem, I think. However, if
you define it as "The output class files display the same visible
behavior across all valid input", then I think the general case \is\
the Halting problem. You would have to prove for all allowed inputs to
the built program that the behavior is the same with the "minimal"
rebuild and with a full clean build, and that there is no smaller
rebuild which would display the same behavior.

If you instead simplify it to "If a file is touched, then all direct
dependencies and ghost dependencies should be recompiled, and repeat
until the leaves of the cascade are unchanged", then I think this is
quite doable. However, someone did mention "But what if you change
just a comment?". I can extend this too "but what if you just changed
a '2' to '1+1'?", which can probably be extended further. Hopefully
you see my point.
 
J

Joshua Maurice

On 10/05/2010 1:26 AM, Tom Anderson wrote:

Doesn't/didn't 'jikes' do all this?

As far as I can tell, Jikes is no longer in development or supported.
Also, I have doubts as to its correctness. Also, my company would
probably feel better if the compiler in use was Sun' javac and not
Jikes.
 
J

Joshua Maurice

You're doing it wrong. Test first, test incrementally, build things in a
way you can test as you go. Before you start work for real, do a
higher-level test, a 'spike solution', to make sure that all your
assumptions (like this one) are valid.

I am. It's just that implementing this class file parser, ghost
dependency, build system, etc., from scratch will take some time. I
almost have it working on a toy example, at which point I can throw
what little tests I have against it.
 
M

Mike Schilling

EJP said:
Doesn't/didn't 'jikes' do all this?

It did something of the sort (I don't know how thoroughly), but it's very
obsolete, never having supported the language features that were new in JDK
1.5. But having thought about this a bit, I think that the onlt feasible
way to support anything like minimal recompilation is by having the compiler
calculate and persist dependency information.
 
M

Mike Schilling

Joshua said:
Let's suppose the source changed from "2" to "(1+1)". Using the strict
interpretation, the output class files would probably be equivalent
(ignoring debug information like original source code). Thus a rebuild
is technically not required.

If the change is from

public static final int I = 2;

to

public static final int I = 1 + 1;

I'd accept a system that recompiles all users of I as still "minimal". (Not
that it's a difficult optimization to make, since the rules for what's a
compile-time constant are straightforward.) If this change isn't to a
compile-time constant, it would have no effect on anything defined in a
different source file.
Or, a slightly less trivial case: Suppose we have class A which has 2
public static functions A1 and A2 which have independent
implementations. Class C uses A1. Suppose someone comes along and
changes A2 in some meaningful way.

That is, chaged its signature. Merely chaning its implementation would not
rquire C to be recompiled.
..
Class C does not need to be
recompiled, but any sort of file level dependency analysis which is
correct would recompile Class C.

Right, strictly minimal recompilation wouild need to know not only that C
uses A but what parts of A it uses. Quite strightforward, though perhaps not
a good idea. (It's possbile that gathering too much dependency information,
and evaluationing it at too granular a level, slows the whole process down,
compared to allowing some recompilations that are strictly speaking
unnecessary.)
If you define it as binary file contents as equivalent output files,
then it's not equivalent to the Halting problem, I think. However, if
you define it as "The output class files display the same visible
behavior across all valid input", then I think the general case \is\
the Halting problem.

If I understand you, you're distinguishing between:

1. A recompilation would result in the same class file that already
exists, and
2. A recompilation would result in a change to the class file, but the
result of running the two will always be the same

I agree that a system that tries to determine 2 is infeasible. And my point
about adding the comment isn't that the file itself would be recompiled.
That's acceptable, and might even be necessary to get the line numbers seen
by a debugger to be correct. My point is that all files which refer to it
are recompiled, even though nothing they use (the class definitions, the
field and method definitions, and the values of the compile-time constants)
has changed. The same is true when e.g. method implementations change or an
init block is added; I was using "add a comment" as the limiting case of a
change no other file would care about.
 
L

Lew

Mike said:
If the change is from

public static final int I = 2;

to

public static final int I = 1 + 1;

I'd accept a system that recompiles all users of I as still "minimal". (Not
that it's a difficult optimization to make, since the rules for what's a
compile-time constant are straightforward.) If this change isn't to a
compile-time constant, it would have no effect on anything defined in a
different source file.

That source change would produce no change in the bytecode of the class
wherein 'I' is defined.

Ergo there would be no need for users of I to be recompiled.

So a system that recompiles no users of I in that case would be "minimal".

I'm having a hard time wrapping my mind around your last sentence in that
paragraph.
 
M

Mike Schilling

Lew said:
That source change would produce no change in the bytecode of the
class wherein 'I' is defined.

Ergo there would be no need for users of I to be recompiled.

So a system that recompiles no users of I in that case would be
"minimal".

I'm saying that one that says "definition of I changed, we need to recompile
I's users" would be minimal enough for me.
I'm having a hard time wrapping my mind around your last sentence in
that paragraph.

If the sole change were

void doit()
{
int i = 1 + 1; //formerly int i = 2
}

it's still true that the class's bytecode doesn't change. It's also true
that nothing about A that could possibly require anything else to recompile
changed, since there are never dependencies on method implementations. Thus
there's no need to recognize that A's bytecode didn't change, because it's
really irrelevant; there's also be no need to recompile any other files if
the sole change were

void doit()
{
int i = 1 + 2; //formerly int i = 2
}
 
J

Joshua Maurice

Bad news. It appears that class files do not contain the necessary
dependency information for my goal of not rebuilding all java files
downstream. Ex:

//AA.java
public class AA { public final int x = 1; }

//BB.java
public class BB { public int x = new AA().x; }

//javap -verbose -classpath . BB
Compiled from "BB.java"
public class BB extends java.lang.Object
SourceFile: "BB.java"
minor version: 0
major version: 50
Constant pool:
const #1 = Method #7.#19; // java/lang/Object."<init>":()V
const #2 = class #20; // AA
const #3 = Method #2.#19; // AA."<init>":()V
const #4 = Method #7.#21; // java/lang/Object.getClass:()Ljava/
lang/Class
;
const #5 = Field #6.#22; // BB.x:I
const #6 = class #23; // BB
const #7 = class #24; // java/lang/Object
const #8 = Asciz x;
const #9 = Asciz I;
const #10 = Asciz <init>;
const #11 = Asciz ()V;
const #12 = Asciz Code;
const #13 = Asciz LineNumberTable;
const #14 = Asciz LocalVariableTable;
const #15 = Asciz this;
const #16 = Asciz LBB;;
const #17 = Asciz SourceFile;
const #18 = Asciz BB.java;
const #19 = NameAndType #10:#11;// "<init>":()V
const #20 = Asciz AA;
const #21 = NameAndType #25:#26;// getClass:()Ljava/lang/Class;
const #22 = NameAndType #8:#9;// x:I
const #23 = Asciz BB;
const #24 = Asciz java/lang/Object;
const #25 = Asciz getClass;
const #26 = Asciz ()Ljava/lang/Class;;

{
public int x;

public BB();
Code:
Stack=3, Locals=1, Args_size=1
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: aload_0
5: new #2; //class AA
8: dup
9: invokespecial #3; //Method AA."<init>":()V
12: invokevirtual #4; //Method java/lang/Object.getClass:()Ljava/
lang/Clas
s;
15: pop
16: iconst_1
17: putfield #5; //Field x:I
20: return
LineNumberTable:
line 1: 0

LocalVariableTable:
Start Length Slot Name Signature
0 21 0 this LBB;


}
/////
/////

Specifically note that the instructions to initialize BB.x involve
"iconst_1", which, as I understand it, puts the constant 1 on the
stack. javac, even with -g, inlined the value of a final not-static
int field. If it can do this, I don't know what else it can do, and I
don't want to try to write tests to catch all possible variations. So,
I'm back to calling javac -verbose once per java file (but still only
one JVM), and parsing the verbose output to get the actual class
compile dependencies. From initial testing, this slows down a clean
build by a factor of 4x to 5x for my code base.

So, I'm back to square one. Anyone know a way to get the dependency
information javac -verbose would supply per java file without calling
javac -verbose (using the tools.jar API) once per java file?

Anyone have an inkling of how hard this would be to just modify javac
itself to output the information I need? I assume the change would be
quite minor, relatively speaking. It already prints out a message when
it loads a class file, and it has to know which java file it's loading
it for; it's just that it only prints the message the first time
loading the class file (presumably caching its contents), and does not
print out the java file for which it's being loaded.
 
A

Arne Vajhøj

Bad news. It appears that class files do not contain the necessary
dependency information for my goal of not rebuilding all java files
downstream. Ex:

//AA.java
public class AA { public final int x = 1; }

//BB.java
public class BB { public int x = new AA().x; }

Well - we told you that last week, so that should not
come as a surprise.

Arne
 
J

Joshua Maurice

Well - we told you that last week, so that should not
come as a surprise.

I believe no one suggested specifically that "The dependency
information contained in class files compiled with debug information
ala javac -g and ghost dependencies obtained via JavacTask in
tools.jar (which alone is highly underspecified) is insufficient to do
a Java file level incremental build which does not cascade endlessly
downstream."

Nevertheless, I have finished the implementation, and it appears to be
working quite well, passing a small suite of tests I wrote beforehand.
I plan to implement this on ~1/8 of a piece of my product's code base,
~100 Maven poms, and see how well it performs. I've never written
something quite like this in Java before (yes, I'm a C++ guy in case
you couldn't tell), and the design is new as well. I'm cautiously
optimistic.
 
A

Arne Vajhøj

I believe no one suggested specifically that "The dependency
information contained in class files compiled with debug information
ala javac -g and ghost dependencies obtained via JavacTask in
tools.jar (which alone is highly underspecified) is insufficient to do
a Java file level incremental build which does not cascade endlessly
downstream."

No but the case of constants not being in the class file
where explicitly discussed.

Arne
 
J

Joshua Maurice

No but the case of constants not being in the class file
where explicitly discussed.

I'm sorry. Perhaps I am mistaken, but I recall only discussion of
"static finals", not "constants". It does appear that "constants" is
the correct conclusion.
 
Joined
May 14, 2010
Messages
1
Reaction score
0
eclipse has a good incremental compiler, Why do you reinvent the wheel?

please read this paper, "Language-specific make technology for the Java programming language"

groups.csail.mit.edu/pag/reading-group/dmitriev02make.pdf

the paper abstract

Keeping the code of a Java application consistent (code is consistent if all of the project classes can be recompiled together without errors) prevents late linking errors, and thus may significantly improve development turnaround time. In this paper we describe a make technology for the Java programming language, that is based on smart dependency checking, guarantees consistency of the project code, and at the same time reduces the number of source code recompilations to the minimum. After project code consistency is initially assured by complete recompilation, the information extracted from the binary classes is stored in a so-called project database. Whenever the source code for some class C is changed, its recompiled binary is compared to the old version of C preserved in the project database. As a result, we find a minimum subset of classes that depend on C and may be affected by the particular change made to it. These are recompiled in turn, and absence of compilation errors at this phase guarantees the consistency of the new project code. To determine which dependent classes to recompile, we categorize all source incompatible changes, and for each category establish a criterion for finding the smallest possible subset of dependent classes.
 
J

Joshua Maurice

Surely, you /do/ have enough information in this case, as BB.class
refers to AA in order to call 'new AA()' in order to get AA#x.  You
don't specifically need to know that BB uses AA#x, do you?  ...just that
BB uses AA.

Only if AA#x was static would you be able to write AA.x, which would be
inlined with no reference to AA.

A simple example of my goal is the following, for the appropriate
definition of "using":
A uses B
B uses C.

I modify C. I need to rebuild C. If C's classfile has the same binary
contents, then no further work needs to be done. Otherwise, I need to
rebuild B. If B's classfile has the same binary contents, then no
further work needs to be done. Otherwise, I need to rebuild A.

Put more simply, I want a system where I do not have to rebuild all
java files downstream. I think that it is unnecessary to do such a
thing, and a lot of time could be saved if you could identify a point
where a change no longer "ripples" down the dependency chain.


Let's take a slight alteration of the example from the previous
post:

//AA.java
public class AA { public final int x = 1; }
//BB.java
public class BB extends AA {}
//CC.java
public class CC { public final int x = new BB().x; }

When javac compiles CC.java, it loads BB.class, looks for a member
named x, finds no such member, then loads AA.class, and finds member
x. javac's verbose output contains this information.

The class file CC.class does not refer to "AA" or "x". It calls BB's
constructor, and it hardcodes "1" through the JVM instruction
iconst_1.

With just the information available in the class files, I don't think
it would be possible to detect when the change cascading down the
dependency graph can have no further effects. To know when the cascade
is done, I need the full compile dependencies, specifically "CC uses
AA". (I also need ghost dependencies, as mentioned else-thread.)


Note that there is no analogy between the first and second examples of
this post. The first example is "A uses B, and B uses C". For the
second example, I would say "BB uses AA, CC uses BB, and CC uses AA."


Maybe one day I'll get even fancier, and instead of "same class file
binary contents", I'll use something like "same super class, same
interface, same methods".
 
J

Joshua Maurice

Maybe one day I'll get even fancier, and instead of "same class file
binary contents", I'll use something like "same super class, same
interface, same methods".

Ack, that should read "same super class, same interfaces*, same
members*".
 
T

Tom Anderson

//AA.java
public class AA { public final int x = 1; }

//BB.java
public class BB { public int x = new AA().x; }

//javap -verbose -classpath . BB
public BB();
Code:
Stack=3, Locals=1, Args_size=1
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: aload_0
5: new #2; //class AA
8: dup
9: invokespecial #3; //Method AA."<init>":()V
12: invokevirtual #4; //Method java/lang/Object.getClass:()Ljava/lang/Class;
15: pop
16: iconst_1
17: putfield #5; //Field x:I
20: return

Specifically note that the instructions to initialize BB.x involve
"iconst_1", which, as I understand it, puts the constant 1 on the
stack. javac, even with -g, inlined the value of a final not-static
int field.

Yeah, this is a bit weird.

Also, what the hell is that getClass call all about? I see that in code i
compile too (javac 1.6.0_16). A bit of googling reveals it's the code
generated to force a null check of a variable, and this is used in
compiling certain contortions involving inner classes. But there's no
inner class here, and there is no way in a month of sundays that the top
of stack can be null at instruction 12 - it's produced by applying dup to
the result of new, and new can never produce a null (right?). So what's it
doing?

Anyway, turning back to the initialisation of x. if you look at the
bytecode of AA, that's also weird. It has a constructor which does
iconst_1 + putfield to initialise x - but x *also* has a ConstantValue
attribute, giving it the value 1. Why both? If you write a verion of AA
where x is static, then there's only a ConstantValue, and no synthetic
clinit or anything touching it. Or instead make it non-final, and of
course it keeps the constructor but loses the ConstantValue.

The good news is that it looks like you can detect 'silently inlinable'
variables by the presence of a ConstantValue attribute. The bad news is
that javac does seem to be violating the VM spec (AIUI) here.

And on the gripping hand, you still have no way to discover the relevance
of AA from CC (the class you mention in a later post).

When i looked into this a while ago, my planned approach was:

1. Keep a table of explicit dependencies between classes (ie CC -> BB, but
not CC -> AA)

2. Keep a tree of direct inheritance relationships, probably including
interface implementation (ie BB -> AA)

3. Define the 'signature' of a class to be the aggregation of its
kind (class or interface), name, list of direct supertypes, the names
and types of its non-private fields, the values of its constant fields,
and the names, parameter types, return types, and exception lists of
its methods. Anything else?

4. When a source file changes, recompile, and compare the signature of the
new class to that of the old class

5. If the signature has changed, walk the inheritance tree, and build
the set of all classes which descend from the class - call this,
including the original class, the family.

6. Use the dependency table to find every class which depends on a member
of the family. Call these the friends.

7. Recompile the family and friends.

8. Repeat the analysis on the newly recompiled files; this is necessary
because changes to constant values can propagate.

If you extend javap to report constant field values, then you can use the
hash of the output of javap has a practical stand-in for a complete
signature. It's a bit oversensitive, because it will change if you add or
remove a static block, or cause the set of secret inner-class backdoor
methods to change, neither of which really change the signature.

I didn't know about ghost dependencies, so i didn't deal with those at
all. But on that subject - am i right in thinking that to build the set of
ghost dependencies, you need to know every name used by the class? If so,
doesn't that already cover this situation? CC uses the name BB.x, and
presumably you have to have an inheritance rule like the above that means
that a change to AA.x means a change to BB.x if there is no actual BB.x.

It is really bloody annoying that compile-time constants can be inlined
like this. Would it be legal for a compiler to *not* inline them? If so,
an option to javac to tell it to do that would be incredibly useful in a
situation like this.

tom
 
J

Joshua Maurice

Yeah, this is a bit weird.

Also, what the hell is that getClass call all about? I see that in code i
compile too (javac 1.6.0_16). A bit of googling reveals it's the code
generated to force a null check of a variable, and this is used in
compiling certain contortions involving inner classes. But there's no
inner class here, and there is no way in a month of sundays that the top
of stack can be null at instruction 12 - it's produced by applying dup to
the result of new, and new can never produce a null (right?). So what's it
doing?

Anyway, turning back to the initialisation of x. if you look at the
bytecode of AA, that's also weird. It has a constructor which does
iconst_1 + putfield to initialise x - but x *also* has a ConstantValue
attribute, giving it the value 1. Why both? If you write a verion of AA
where x is static, then there's only a ConstantValue, and no synthetic
clinit or anything touching it. Or instead make it non-final, and of
course it keeps the constructor but loses the ConstantValue.

The good news is that it looks like you can detect 'silently inlinable'
variables by the presence of a ConstantValue attribute. The bad news is
that javac does seem to be violating the VM spec (AIUI) here.

And on the gripping hand, you still have no way to discover the relevance
of AA from CC (the class you mention in a later post).

When i looked into this a while ago, my planned approach was:

1. Keep a table of explicit dependencies between classes (ie CC -> BB, but
    not CC -> AA)

2. Keep a tree of direct inheritance relationships, probably including
    interface implementation (ie BB -> AA)

3. Define the 'signature' of a class to be the aggregation of its
    kind (class or interface), name, list of direct supertypes, the names
    and types of its non-private fields, the values of its constant fields,
    and the names, parameter types, return types, and exception lists of
    its methods. Anything else?

4. When a source file changes, recompile, and compare the signature of the
    new class to that of the old class

5. If the signature has changed, walk the inheritance tree, and build
    the set of all classes which descend from the class - call this,
    including the original class, the family.

6. Use the dependency table to find every class which depends on a member
    of the family. Call these the friends.

7. Recompile the family and friends.

8. Repeat the analysis on the newly recompiled files; this is necessary
    because changes to constant values can propagate.

If you extend javap to report constant field values, then you can use the
hash of the output of javap has a practical stand-in for a complete
signature. It's a bit oversensitive, because it will change if you add or
remove a static block, or cause the set of secret inner-class backdoor
methods to change, neither of which really change the signature.

I didn't know about ghost dependencies, so i didn't deal with those at
all. But on that subject - am i right in thinking that to build the set of
ghost dependencies, you need to know every name used by the class? If so,
doesn't that already cover this situation? CC uses the name BB.x, and
presumably you have to have an inheritance rule like the above that means
that a change to AA.x means a change to BB.x if there is no actual BB.x.

See
http://www.jot.fm/issues/issue_2004_12/article4.pdf
for "ghost dependencies".

I don't think as presented in the paper that ghost dependencies will
catch this. Again, take the example
//AA.java
public class AA { public final int x = 1; }
//BB.java
public class BB extends AA {}
//CC.java
public class CC { public final int x = new BB().x; }

CC.java has ghost dependencies "CC", "BB", "x", aka all names in the
class file (using the Java technical definition of "name" as a single
identifier, or a list of identifiers separated by dots '.'), then get
all possible interpretations under all imports (including the implicit
import <this-package>.*;), then close over all such prefixes. (Or
something like that. The details are somewhat involved. See the
paper.)

AA.class exports the name "AA", aka the full name of the class.
BB.class exports the name "BB", aka the full name of the class.

I'm not sure offhand if there is a good way to extend ghost
dependencies to catch this case without introduces a lot of false
positives.

--
I've also given some thought as you had to maintain this list keeping
track of super classes. I'm not sure how it would interact with this
example:

//AAA.java
public class AAA { public static int aaa = 1; }
//BBB.java
public class BBB { public static AAA bbb = null; }
//CCC.java
public class CCC { public static BBB ccc = null; }
//DDD.java
public class DDD { public final int ddd = CCC.ccc.bbb.aaa; }

If we chance AAA.aaa to "public static double aaa = 2", then BBB.class
would be a noop recompile, CCC.class would be a noop recompile, but
DDD.class would need a recompile. Again, I think I would need the same
information to make this work without endless cascading; I would need
to know that DDD (directly) uses AAA. I thus think that your / my
scheme of keeping tracking of super classes would not be terribly
effective / productive.
 
A

Arne Vajhøj

I'm sorry. Perhaps I am mistaken, but I recall only discussion of
"static finals", not "constants". It does appear that "constants" is
the correct conclusion.

static final is constant in Java.

Arne
 
J

Joshua Maurice

static final is constant in Java.

Ok. Let me try again. I recall only discussions of static finals. The
problem of expanding inline "things" also happens with non-static
final fields. I do not recall any discussion previous to that post of
mine about non-static final fields being a problem. I'm sorry; I did
not know that "constant" means "static final" in Java.
 
L

lewis

Joshua Maurice said:
Ok. Let me try again. I recall only discussions of static finals. The
problem of expanding inline "things" also happens with non-static
final fields. I do not recall any discussion previous to that post of
mine about non-static final fields being a problem. I'm sorry; I did
not know that "constant" means "static final" in Java.

Strictly speaking, it doesn't. It's a really, really, really good
idea to study the language spec if you actually want to learn Java.
The relevant section is:
<http://java.sun.com/docs/books/jls/third_edition/html/
typesValues.html#10931>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top