Incremental Java Compile

A

Arne Vajhøj

Agreed. Sadly, as I'm a more junior developer, not much I can do about
it for such a large codebase, a fair share of which predates C++98
standardization.

It would give more benefits for the effort to try and clean
up things.

Arne
 
A

Arne Vajhøj

And what if all of the code is under active development, aka new
features are being added to each layer on a weekly basis?

Still not a good excuse.

Each team/stream/whatever you call it should work on their own
component and a stable binary release of all other components.
And what if a large portion of that Java code is generated from a
model file to facilitate serialization between C++ and Java? Thus a
change to a single file would require recompiling a large amount of
generated "interface" files, which theoretically touches a large
portion of the 20,000 Java files.

Structure things better.

With a good OO model a series of related changes should not
require changes in "a large portion of 20000 Java files".

Arne
 
J

Joshua Maurice

Structure things better.

With a good OO model a series of related changes should not
require changes in "a large portion of 20000 Java files".

So, how do you suggest doing that when there's a code generator under
active development which generates Java code, and a large portion of
the Java code directly or indirectly works with the output of this
code generator? We model the object domains in a simple modeling
language which is then compiled to C++ and Java code to allow
serializing a description of a unit of work from the Java tools to the
C++ tools and back. Most of the infrastructure and apps work with the
output of this code generator in some form or another.

Unfortunately, one cannot fiat interfaces into being stable.
 
M

Mike Schilling

Arne said:
To me the entire idea is rather pointless.

The tool can not be made to work with binaries.

I think it could. At least, reducing the problem to:

* You have a complete correct build of the system
* You have a set of changes to the source since that build was done
* Do a minimal (more or less) amount of recompliation to arrive at a new
complete and correct build

is feasible.

But I don't see the point, really. It's simpler, cheaper, and more reliable
to throw hardware at the problem. Buy a fast machine to do a continuous
build and archive the last N days worth of builds. You can now fetch a
completely built system at any release level with no compilation required at
all.
 
J

Joshua Maurice

I think it could.  At least, reducing the problem to:

    * You have a complete correct build of the system
    * You have a set of changes to the source since that build was done
    * Do a minimal (more or less) amount of recompliation to arrive at a new
complete and correct build

is feasible.

But I don't see the point, really.  It's simpler, cheaper, and more reliable
to throw hardware at the problem.  Buy a fast machine to do a continuous
build and archive the last N days worth of builds.  You can now fetch a
completely built system at any release level with no compilation required at
all.

My company tried to do that, but I think they missed the important
part of the memo: that it only works when the code is decoupled,
modular, and relatively stable and well defined interfaces instead of
the ~25,000 source file mess we have now. It's made me really hate
Maven (~800 poms and counting), though I accept it may have situations
in which it's a decent build tool.
 
L

Lew

So, how do you suggest doing that when there's a code generator under
active development which generates Java code, and a large portion of
the Java code directly or indirectly works with the output of this
code generator? We model the object domains in a simple modeling

Generate code into packages. Generate different parts of the project into
separate modules.
language which is then compiled to C++ and Java code to allow
serializing a description of a unit of work from the Java tools to the
C++ tools and back. Most of the infrastructure and apps work with the
output of this code generator in some form or another.

The generator can be forced to follow good practices, rather than have bad
practices use "the generator" as an excuse.
Unfortunately, one cannot fiat interfaces into being stable.

But one can *design* interfaces to be modular. Try it.
 
M

Mike Schilling

Joshua said:
My company tried to do that, but I think they missed the important
part of the memo: that it only works when the code is decoupled,
modular, and relatively stable and well defined interfaces instead of
the ~25,000 source file mess we have now.

Why doesn't it work, even with the mess you have now?
 
R

Roedy Green

system: A developer should be able to do any combination of the
following actions and trigger a small / minimal / incremental build,

IF you use ANT, you don't need to bother with this. The time in a
traditional compile is mostly loading Javac.exe. With ANT it gets
compiled only once. Further JAVAC looks at dates of *.java and
*.class files and avoids most unnecessary recompilation.

Compiling is almost inconsequential. Building Jars and Zips takes
much more of the time.

See http://mindprod.com/jgloss/ant.html
 
J

Joshua Maurice

Generate code into packages.  Generate different parts of the project into
separate modules.


The generator can be forced to follow good practices, rather than have bad
practices use "the generator" as an excuse.

So, I ask again: what if the generator changes, which it does
"somewhat" frequently? I'd like to do a build in that case. The
generator is an example of what ties all of the code together, though
there's a couple more things. What's good practices for the generator,
never change? Well, ideally yes, but that's beyond my control.
But one can *design* interfaces to be modular.  Try it.

Again, I do not hold sufficient sway, and we're dealing with a product
with a code level published API which wasn't well designed, so we've
coded ourselves into a corner, so to speak.
 
J

Joshua Maurice

IF you use ANT, you don't need to bother with this.  The time in a
traditional compile is mostly loading Javac.exe.  With ANT it gets
compiled only once.  Further JAVAC looks at dates of *.java and
*.class files and avoids most unnecessary recompilation.

Did you even read any of my other posts in this thread? Ant's
incremental compile is woefully incorrect, so incorrect as to be near
useless on an automated build machine. As a developer, I would rather
take the extra 10 min - 1.5 hours to do a full clean build to not have
to debug bizarre obscure issues which result from a clean build.
There's nothing quite like debugging a system in which you have
inconsistent dlls / jars for a day straight; it's quite aggravating.
Compiling is almost inconsequential.  Building Jars and Zips takes
much more of the time.

Do you actually have timing numbers for any of this? I rewrote an
ant / make system which loads Sun's tools.jar and invokes javac
through the tools.jar Java interface, thus I loaded javac into memory
just once like Ant. The full clean compilation of a small portion of
my product, ~3000 files, took ~3 minutes, whereas a separate build
invocation to produce the jars from no jars took ~15 seconds (5-8 sec
of which is just reading in the build script files aka makefiles, stat-
ing files, checking dependencies, etc.). It seems that the
conventional wisdom is quite wrong here. It seems that making jars is
actually quite quick. Well, it's at least quick if you turn off
compression with the "0" flag to jar, as you should during
development.
 
J

Joshua Maurice

As a developer, I would rather
take the extra 10 min - 1.5 hours to do a full clean build to not have
to debug bizarre obscure issues which result from a clean build.

Ack. Typo. It should read "[...] which result from a *inconsistent*
build."
 
J

Joshua Maurice

Why doesn't it work, even with the mess you have now?

~3-7 hours turnaround time for any change on the build machine. Double
that for the common developer machine. That really hurts
productivity.

The whole thing is a mess, and some degree of componentization is
required, and is being done, but that's still no excuse to have a 30
min compile time for developers for a clean build when they could have
5 seconds + minimal rebuild time. Dittos for the automated build
machine.
 
M

Mike Schilling

Joshua said:
~3-7 hours turnaround time for any change on the build machine. Double
that for the common developer machine. That really hurts
productivity.

The whole thing is a mess, and some degree of componentization is
required, and is being done, but that's still no excuse to have a 30
min compile time for developers for a clean build when they could have
5 seconds + minimal rebuild time. Dittos for the automated build
machine.

Sorry, still confused. Is the recompilation time 30 min. or 3-7 hours?
 
J

Joshua Maurice

Sorry, still confused.    Is the recompilation time 30 min. or 3-7 hours?

Well, 30 min compile only for the hypothetical situation after a
"realistic" level componentization, and depending on the level of
tests run.

Currently ~145 min compile and package, no tests, on the automated
build machine. 188 min more for the standard regression / acceptance /
integration test suite. Some of the tests are currently distributed
across several automated build machines, with the longest suite at 87
min. Double those times, or thereabouts, for a lower end developer
computer. Any change requires a full clean build as we no
incrementally correct build, and it has not been componentized into
smaller chunks. For example, the serialization framework
implementation changes slightly frequently, which affects a lot of the
code, such as the file persistence, database persistence, engine, and
GUI "components".

Throwing more hardware at the tests is easy for the automated build
machine(s). Throwing more hardware at the compile for the automated
build is hard. Throwing more hardware at it for the developer is
really hard, and really expensive in cash. (I can't imagine a quick
solution to giving the developer 5-10 computers each, and the
maintenance nightmare to trying to have them all maintain their own
build farm.)
 
M

Mike Schilling

Joshua said:
Currently ~145 min compile and package, no tests, on the automated
build machine. 188 min more for the standard regression / acceptance /
integration test suite. Some of the tests are currently distributed
across several automated build machines, with the longest suite at 87
min. Double those times, or thereabouts, for a lower end developer
computer. Any change requires a full clean build as we no
incrementally correct build, and it has not been componentized into
smaller chunks. For example, the serialization framework
implementation changes slightly frequently, which affects a lot of the
code, such as the file persistence, database persistence, engine, and
GUI "components".

Does the serialization framework change often? That would be horrific, and
there's probably nothing to be done to improve the build cost when it does.
But I presume that it changes as the result of some feature being added, so
that can be mitigated by not checking the change into source control until
the feature (or better yet, set of fesatures) is complete.

Also, developers are usually good at optimizing their own work. If a
developer is adding new classes or changing implementation rather than
interface, there's no need to recompile the world. Even when changing
interfaces, the developer usually has a good idea of which bits of the
system use those interfaces, and can recompile just those parts.

Anyway, I'd suggest:

1. Invest in a good SCM system, one that handles multiple branches and
shared branches well.
2. Encourage developers to stay isolated, rather than intergating often and
updating other developers' changes often.
3. Do a continuous build that allows developers to grab the most recent
complete, tested code, so they can recompile only the code they have checked
out and the code that depends on it. Throw lots of hardware at this, so
that failures are found early.
 
L

Lew

Joshua said:
Did you even read any of my other posts in this thread? Ant's
incremental compile is woefully incorrect, so incorrect as to be near
useless on an automated build machine. As a developer, I would rather

That's a damn snarky tone to take with Roedy, who was just giving you good
advice, especially considering how you keep blaming Ant, Java and everything
else when it's clear from your own admission that it's your own process that's
at fault, as you keep throwing back at us every time someone makes a useful
suggestion.

It's not the tools' fault, it's your'n.
take the extra 10 min - 1.5 hours to do a full clean build to not have
to debug bizarre obscure issues which result from a clean build.
There's nothing quite like debugging a system in which you have
inconsistent dlls / jars for a day straight; it's quite aggravating.

So fix your system and quit whining about it.
 
J

Joshua Maurice

That's a damn snarky tone to take with Roedy, who was just giving you good
advice, especially considering how you keep blaming Ant, Java and everything
else when it's clear from your own admission that it's your own process that's
at fault, as you keep throwing back at us every time someone makes a useful
suggestion.

It's snarky because of what I consider to be this near absurd level of
deference given to the tools. If this was any other piece of software,
and there was a product out there which ran 10x to 100x faster, it was
be a no brainer conclusion which to use. Instead, I see far too many
people say "Meh. Just do a clean build. It's not that bad."
It's not the tools' fault, it's your'n.

No. If you read my posts, you would know that I blame both process and
tool. The best the process could do is divide the build into more
manageable chunks, but as a developer I would still have to spend an
hour or so waiting on a build when most of the work is extraneous
So fix your system and quit whining about it.

I am fixing it. I am not whining. I was asking for help on how to do
it. I have asked for real solutions to the real problems I am facing
writing it, such as how to get a list of class files per compiled java
file as if I called javac once per java file in the dir.
 
J

Joshua Maurice

Does the serialization framework change often?  That would be horrific, and
there's probably nothing to be done to improve the build cost when it does.
But I presume that it changes as the result of some feature being added, so
that can be mitigated by not checking the change into source control until
the feature (or better yet, set of fesatures) is complete.

I wish I knew. I just got an email today from the serialization team
asking "What's with this error?" I "hacked" the C++ Maven plugin to
report "<> has detected visual studios warning <>, deletion of a
pointer to an incomplete type. This is formally undefined behavior
according to the C++ spec. Fix it." Apparently changes are still
ongoing.
Also, developers are usually good at optimizing their own work.  If a
developer is adding new classes or changing implementation rather than
interface, there's no need to recompile the world.  Even when changing
interfaces, the developer usually has a good idea of which bits of the
system use those interfaces, and can recompile just those parts.

As such a developer, perhaps, but when I mess up, I break the mainline
build, and because the build on the automated build machine, or
private perforce branch build machine, can take the better part of a
day, it's sometimes hard to isolate down who broke it, and especially
when ML is broken this leaves people in a bind. Currently we lock ML
on such events. Rollback is possible. Devops is floating that idea
around at the moment.
Anyway, I'd suggest:

1. Invest in a good SCM system, one that handles multiple branches and
shared branches well.

Done. Perforce is so awesome for the record.
2. Encourage developers to stay isolated, rather than intergating often and
updating other developers' changes often.

Sounds like integration hell. We do have separate teams working on
their own little view for weeks or a month or two on end, and each
team has their own private branch in perforce which is integrated
roughly weekly with mainline.
3. Do a continuous build that allows developers to grab the most recent
complete, tested code, so they can recompile only the code they have checked
out and the code that depends on it.  Throw lots of hardware at this, so
that failures are found early.

Also done.

The problem is that it's not helping. It's way too much code, way too
many tests, taking way too long to build.
 
A

Arne Vajhøj

Did you even read any of my other posts in this thread?

Most likely not.
Ant's
incremental compile is woefully incorrect, so incorrect as to be near
useless on an automated build machine.

Ant is very useful for automated builds.

But it is common practice to clean and rebuild.

Your project structure is just not suited for this.

Arne
 
A

Arne Vajhøj

So, how do you suggest doing that when there's a code generator under
active development which generates Java code, and a large portion of
the Java code directly or indirectly works with the output of this
code generator? We model the object domains in a simple modeling
language which is then compiled to C++ and Java code to allow
serializing a description of a unit of work from the Java tools to the
C++ tools and back. Most of the infrastructure and apps work with the
output of this code generator in some form or another.

Unfortunately, one cannot fiat interfaces into being stable.

You are working on fixing the symptoms not the problem.

Something is horrible wrong with the object model if
so many classes change all the time.

If you fix that problem (better requirements or more time
spend designing before coding or whatever necessary), then
you will be much better off.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top