Incremental build systems, infamake


J

Joshua Maurice

I mentioned this little side project I had here quite a while ago in
comp.lang.c++. The context was me asking for an incrementally correct
build system.

To explain, let me define some terms.

A build system tool - is a distributed, customizable tool that let's
you create a build system. Ex: GNU make.

A build system - is a particular build system, usually built on top of
some build tool. This is the in-fact implementation that does your
build. GNU make is a build tool, which can be used to create a build
system by writing a makefile.

An incremental build - is a build on top of a pre-exsting build, where
there have been some source code changes between the old build and
now. Usually such builds do not rebuild everything and instead use
some "intelligence" to build less than the full clean rebuild, but
enough to get results /as if/ it did a full clean rebuild.

A correct incremental build - is an incremental build which produces
results "equivalent" to that of a hypothetical full clean rebuild.

An incrementally correct build system - is a build system which can
only produce incrementally correct builds. That is, there is no source
code changes which a developer may do, inadvertently or purposefully,
which can result in the build system doing an incrementally incorrect
build.

The key part to realize about this requirement is that the build
system scripts, such as makefiles, are themselves source code under
this model. At least, I claim that they ought to be considered as
source code for the purposes of determining if your incremental build
system is incrementally correct.

Now, at this point, there is a blurring of lines between build system
tool and build system. When the makefiles are both source code and
build system, it becomes hard to distinguish between them. I'm merely
trying to say that I want a tool in which it is exceptionally hard, if
not impossible short of maliciousness, to "break" the build system
such that it might produce an incrementally incorrect build.

Now, why do I want this? I hate waiting for hours each day doing a
build. Yes, I know that the source code base in question has other
problems if it takes that long to build, but unfortunately I am not
the king of the company, and so I have to make due. Besides, even
spending 10 minutes on a full clean rebuild is 10 minutes
(potentially) wasted, if another better solution exists. And even more
importantly, 4 hours spent tracking down a bug which was simply the
result of a broken incremental build is 4 hours wasted.

I claim that it's not that hard. However, as far as I can tell, very
few people have attempted such a thing. In the C++ world at least,
most ended with something that was "good enough" that caught 95% of
the cases. A good example AFAIK is the Recursive Make Considered
Harmful solution, found here:
http://miller.emu.id.au/pmiller/books/rmch/
It's good, but it still misses cases. Depending on the exact details
of implementation, it won't do a rebuild if you change compiler or
linker options, such as a new preprocessor define. It won't relink an
executable if you delete one of its associated cpp source files. It
won't recompile an object file if you introduce a new header file
which hides a previously included header file on the include path.

I'm not presumptuous enough to claim that what I'm about to link to is
fully incrementally correct, but AFAIK it's much further along than
anything else I've ever found, and it has the good potential to get
there.

Do note that you have to build trust on trust. To take a side trip,
security works by trust. You have to have some trusted agent in order
to have secure communication online. In a similar token, the
incremental correctness of a build system cannot be guaranteed if
someone "accidentally" modifies some of the make source code badly,
and installs it locally. The incremental correctness of a build system
is thus contingent on some facts, like on the correctness of some core
build system executables. It would also be dependent on the
correctness of the compiler and other such things.

My aim is simply to catch as much as I can, focusing on common
developer activities, such as changing preprocessor defines, adding
files, removing files, and ultimately using source control to sync to
a new revision of the code, which will include changes of which he is
not aware, and the incremental build system ought to just "work".

So, I'm experimenting around with ideas now. My initial plan was to do
a make clone. What I have is somewhat close to a drop in replacement
to GNU make 3.81. It's missing a lot of the bells and whistles - for
example it doesn't have a job server, nor a bunch of built-in
functions. I also hate the tab syntax, and I would be loathe to add
that back in. However, it has user definable functions, the same basic
scripting language, a bunch of the built-in functions, and simple and
recursive variables.

In fact, it has enough functionality in similar that I am prepared to
even do some speed comparisons between the two, my make, infamake, and
GNU make. On a standard Linux box, infamake parses makefiles about 20
times faster (not 20%, but 20x). On a standard Linux box, infamake
also does an "all from all" build about 8 times faster (not 8%, but
8x) than GNU make. An "all from clean" build invoking g++ on basically
empty cpp files runs about just as fast with my infamake and GNU make
- stuck on IO I presume. I have included the generators of these
tests, so that other people can look at them and confirm, or point out
weaknesses in my testing approach.

While I lack a lot of the functionality of GNU make, I believe that I
have enough of it that the missing features would not noticeably
change the results of these tests.

With a little work, I could put the rule syntax with all of its evil
tabs in, and make it an actual drop-in replacement for GNU make (as
long as you don't do things like job control, vpath, etc.).

https://sourceforge.net/projects/infamake/files/

(Documentation is sorely lacking, I know. It comes with a bunch of
unit tests, which should make it clear if you spend a lot of time with
it. I'll try to add some documentation sometime soon if anyone
cares.)

So, why did I rewrite it instead of fixing GNU make? A couple of
reasons. First, it was a fun exercise. Second, I didn't want to deal
with the existing C source code. Third, I felt that the original make
way of doing incremental builds is fundamentally broken, so I wanted
to diverge a bit from that legacy.

Let me explain why on that third point, why I think the GNU make
approach is fundamentally broken. GNU make, and in fact make in
general, uses a directed acyclic graph of nodes to control which
portions of the build are skipped on an incremental build. The nodes
usually correspond to files. This works out well in a lot of cases,
but it fails spectacularly on some common developer actions. As I
mentioned before, example include adding files, deleting files,
changing build commands. None of this is well captured by looking at a
DAG where the nodes are existing files. There are some clever hacks to
get around some of these issues, namely the .d file generation of
Recursive Make Considered Harmful, but it doesn't catch all of them.
(The src/tests2 folder includes a bunch of tests I run against my own
infamake built in C++ rule to ensure that it produces correct
results.)

Moreover, this problem is even worse for Java. Fortunately /
unfortunately, my company also uses Java, and apart from Eclipse's
compiler (which is almost fully incrementally correct for Java, but
not quite, AFAIK), there isn't anything even resembling an good
attempt an incremental Java builds. Part of my motivation is to create
a build system which handles C++ and Java, and incrementally
correctly, and with fast builds. None of this is uploaded right now,
but I have a worked out implementation lying around that I just need
to merge in.

Currently, the build logic is done largely in C++. I did this for
speed, and honestly because I'm still not quite sure if there's a good
way to generalize it to something easy to use in make's scripting
language. I wanted the full power of C++ when doing the incremental C+
+ build logic. Perhaps I can fix that later. I don't know.

So where do I go form here? I don't know. Just thought I'd post this
up, as I made some pretty large claims a while ago on comp.lang.c++,
and felt I might as well back them up. Took a bit longer than expected
for legal to get back to me to open source it.

So, any comments?
 
Ad

Advertisements

I

Ian Collins

On 07/25/11 07:25 PM, Joshua Maurice wrote:

An incrementally correct build system - is a build system which can
only produce incrementally correct builds. That is, there is no source
code changes which a developer may do, inadvertently or purposefully,
which can result in the build system doing an incrementally incorrect
build.

The key part to realize about this requirement is that the build
system scripts, such as makefiles, are themselves source code under
this model. At least, I claim that they ought to be considered as
source code for the purposes of determining if your incremental build
system is incrementally correct.

Now, at this point, there is a blurring of lines between build system
tool and build system. When the makefiles are both source code and
build system, it becomes hard to distinguish between them. I'm merely
trying to say that I want a tool in which it is exceptionally hard, if
not impossible short of maliciousness, to "break" the build system
such that it might produce an incrementally incorrect build.

Makefiles aren't source code and build system, the make utility is the
build system.
Now, why do I want this? I hate waiting for hours each day doing a
build. Yes, I know that the source code base in question has other
problems if it takes that long to build, but unfortunately I am not
the king of the company, and so I have to make due. Besides, even
spending 10 minutes on a full clean rebuild is 10 minutes
(potentially) wasted, if another better solution exists. And even more
importantly, 4 hours spent tracking down a bug which was simply the
result of a broken incremental build is 4 hours wasted.

Wouldn't the time spent working on a build tool be better spent cleaning
up the code and the existing system?
I claim that it's not that hard. However, as far as I can tell, very
few people have attempted such a thing. In the C++ world at least,
most ended with something that was "good enough" that caught 95% of
the cases. A good example AFAIK is the Recursive Make Considered
Harmful solution, found here:
http://miller.emu.id.au/pmiller/books/rmch/

Flat makefiles rule!
It's good, but it still misses cases. Depending on the exact details
of implementation, it won't do a rebuild if you change compiler or
linker options, such as a new preprocessor define. It won't relink an
executable if you delete one of its associated cpp source files. It
won't recompile an object file if you introduce a new header file
which hides a previously included header file on the include path.

Good old make will cover all of those cases. All you need is a make
that maintains state.

I work for myself, so wasted time is my money wasted. I've found the
most effective way to minimise build times is to design the code and
makefile(s) to build fast. That normally means more, smaller files and
a single flat makefile. These combine to get the best performance out
of a distributed or parallel make.

If the build still isn't fast enough, I just throw more hardware at it.
 
J

Joshua Maurice

Makefiles aren't source code and build system, the make utility is the
build system.

Makefiles are source code insofaras incremental correctness is
concerned. make, the executable, in a sense, doesn't distinguish
between cpp source files and makefiles. They're both input to the
executable. Both are regularly modified by developers. Both get
checked into source control. In terms of build correctness, I see no
reason to distinguish between them.
Wouldn't the time spent working on a build tool be better spent cleaning
up the code and the existing system?

Perhaps. If the cost is amortized across a bunch of people, then no.
If someone can just take a tool, plug it in with minimal effort, and
get a lot of bang for it, then it would be worth it.
Flat makefiles rule!

Heh. Are you saying only 1 makefile? 1 makefile vs a bunch of
makefiles with GNU make include isn't that big of a difference, if at
all.
Good old make will cover all of those cases.  All you need is a make
that maintains state.

It will? I admit - if you hack the bejesus out of it as compared to
traditional solutions you find online, such as the Recursive Make
Considered Harmful solution, then yes. I would be greatly surprised if
anyone but a remote fraction of actual build systems in use would
actually do a recompile of an object file if you added a new header
file which hid a previously included header file on the include path.
I work for myself, so wasted time is my money wasted.  I've found the
most effective way to minimise build times is to design the code and
makefile(s) to build fast.  That normally means more, smaller files and
a single flat makefile.  These combine to get the best performance out
of a distributed or parallel make.

If the build still isn't fast enough, I just throw more hardware at it.

So, do you always do a full clean rebuild on every change? Presumably
no. Are you a linux guy, or windows guy, e.g. visual studios devenv?
If you're a Linux guy, I presume you use a text editor of some kind,
possibly with some cool integration with the build tools, so that you
can trigger of some sort of incremental build after doing some
changes. That is, you don't do a full clean rebuild every time. I bet
further in fact that when working on a cpp file, you just recompile
that cpp file until you get it working and done, and only then do you
do the full build - maybe from clean, maybe an incremental. Here's the
point where I want an incrementally correct build. I want to know if I
broke the other team's code. Sure - it would be nice if I had well
documented, well defined stable interfaces in order to minimize the
impact of a change and hence the time of a full clean rebuild, but
well I don't control the coding process of my company unfortunately.
 
I

Ian Collins

Makefiles are source code insofaras incremental correctness is
concerned. make, the executable, in a sense, doesn't distinguish
between cpp source files and makefiles. They're both input to the
executable. Both are regularly modified by developers. Both get
checked into source control. In terms of build correctness, I see no
reason to distinguish between them.

Sorry I wasn't clear: makefiles are source, the make utility is the
build system.
Perhaps. If the cost is amortized across a bunch of people, then no.
If someone can just take a tool, plug it in with minimal effort, and
get a lot of bang for it, then it would be worth it.

But the someone could still spend their time fixing the current mess.
Heh. Are you saying only 1 makefile? 1 makefile vs a bunch of
makefiles with GNU make include isn't that big of a difference, if at
all.

So long as the make utility can schedule the building of all the file in
the tree, then yes.
It will? I admit - if you hack the bejesus out of it as compared to
traditional solutions you find online, such as the Recursive Make
Considered Harmful solution, then yes. I would be greatly surprised if
anyone but a remote fraction of actual build systems in use would
actually do a recompile of an object file if you added a new header
file which hid a previously included header file on the include path.

Well Sun (d)make certainly does.

If I have a rule

x.o: x.cc
CC x.cc -o x -I b -I a

and a header in a, adding a header with the same name to b will trigger
a rebuild.
So, do you always do a full clean rebuild on every change?

No, I trust dmake.
Presumably
no. Are you a linux guy, or windows guy, e.g. visual studios devenv?

Mainly Solaris.
If you're a Linux guy, I presume you use a text editor of some kind,
possibly with some cool integration with the build tools, so that you
can trigger of some sort of incremental build after doing some
changes. That is, you don't do a full clean rebuild every time. I bet
further in fact that when working on a cpp file, you just recompile
that cpp file until you get it working and done, and only then do you
do the full build - maybe from clean, maybe an incremental.

No, I run an incremental build after every change in order to run the
unit tests (make for me is "make test"). That's the main reason I put
so much effort into optimising my builds.
Here's the
point where I want an incrementally correct build. I want to know if I
broke the other team's code. Sure - it would be nice if I had well
documented, well defined stable interfaces in order to minimize the
impact of a change and hence the time of a full clean rebuild, but
well I don't control the coding process of my company unfortunately.

Trust your tools. I use the same tools as the OS developers. If they
are good enough for them (and more to the point, their QA!), they are
good enough for me...
 
J

Joshua Maurice

Well Sun (d)make certainly does.

If I have a rule

x.o: x.cc
        CC x.cc -o x -I b -I a

and a header in a, adding a header with the same name to b will trigger
a rebuild.

Well, apparently I must do some more learning. I am surprised. I
wonder exactly what sort of hackery is done here. This is kind of what
I want. It would be nice if it had already been done. I will comment
later...
 
I

Ian Collins

Well, apparently I must do some more learning. I am surprised. I
wonder exactly what sort of hackery is done here. This is kind of what
I want. It would be nice if it had already been done. I will comment
later...

It's very simple, dmake keeps a state file (if asked). In this case,
the contents are:

..MAKE_VERSION: VERSION-1.0
..BUILT_LAST_MAKE_RUN:
x.o: a/x.h /lib/libc.so /lib/libm.so /opt/studio/lib/libCrun.so
/opt/studio/lib/libCstd.so
CC x.cc -o x -I b -I a

So it knows a/x.h was used in the last build and it knows the compiler
options and libraries used.
 
Ad

Advertisements

J

Joshua Maurice

Well, apparently I must do some more learning. I am surprised. I
wonder exactly what sort of hackery is done here. This is kind of what
I want. It would be nice if it had already been done. I will comment
later...

Well, indeed there it is.

http://download.oracle.com/docs/cd/E19205-01/819-5262/aeucf/index.html
3.1.6 .KEEP_STATE and Special Dependency Checking

Use the special target .KEEP_STATE to check for command dependencies
and hidden dependencies.

When the .KEEP_STATE: target is effective, make checks the command for
building a target against the state file. If the command has changed
since the last make run, make rebuilds the target.

When the .KEEP_STATE: target is effective, make reads reports from
cpp(1) and other compilation processors for any "hidden" files, such
as #include files. If the target is out of date with respect to any of
these files, make rebuilds it.

It seems to be a solution very integrated with the Sun C compiler from
what my google-fu can uncover. Specifically, it does not seem
portable. Hell, it's not even really specified what it does and does
not do. I think the above quote is from the actual manual, and I'm not
going to find better offhand.

So, I feel a little better, as I may not have been wasting my time
doing something that's already been done. Portability is a rather
stringent requirement for a build system for my company. It's also
woefully inadequate for my company as we do more than simple C
compilation. We do C++, Java, some custom code generation, an Eclipse
plugin build, and a few more odds and ends.

Still, I'm pretty impressed that they pulled that. Glad to see I'm not
the first who's wanted this.
 
I

Ian Collins

Well, apparently I must do some more learning. I am surprised. I
wonder exactly what sort of hackery is done here. This is kind of what
I want. It would be nice if it had already been done. I will comment
later...

Well, indeed there it is.

http://download.oracle.com/docs/cd/E19205-01/819-5262/aeucf/index.html
3.1.6 .KEEP_STATE and Special Dependency Checking

It seems to be a solution very integrated with the Sun C compiler from
what my google-fu can uncover. Specifically, it does not seem
portable. Hell, it's not even really specified what it does and does
not do. I think the above quote is from the actual manual, and I'm not
going to find better offhand.

It works with gcc as well, on Solaris and Linux. I use both compilers
on both platforms. If I have to deploy elsewhere, I still develop on
one of my preferred platforms, where I know incremental builds work.
So, I feel a little better, as I may not have been wasting my time
doing something that's already been done. Portability is a rather
stringent requirement for a build system for my company. It's also
woefully inadequate for my company as we do more than simple C
compilation. We do C++, Java, some custom code generation, an Eclipse
plugin build, and a few more odds and ends.

So do I, excluding (spit) Java.
 
J

Joshua Maurice

Well, indeed there it is.



It works with gcc as well, on Solaris and Linux.  I use both compilers
on both platforms.  If I have to deploy elsewhere, I still develop on
one of my preferred platforms, where I know incremental builds work.


So do I, excluding (spit) Java.

Know if it's open source?

Alas, my current company uses lots of Java, and most developers are
windows people.
 
N

Nobody

An incrementally correct build system - is a build system which can
only produce incrementally correct builds. That is, there is no source
code changes which a developer may do, inadvertently or purposefully,
which can result in the build system doing an incrementally incorrect
build.

The key part to realize about this requirement is that the build
system scripts, such as makefiles, are themselves source code under
this model.

"Source code" is the wrong term; the correct term is "prerequisite".
Nothing prevents you from specifying makefiles as prerequisites.

Ensuring correctness of incremental builds boils down to ensuring that the
list of prerequisites for each target is "adequate". How far you take the
concept of adequacy determines how easy or hard it is for an incremental
build to be incorrect (i.e. differ from a full build).

If you want to be pedantic, then the tools (compiler, linker, make, etc)
need to be prerequisites of the files which they generate, as do the
libraries which those tools use, and anything which influences their
behaviour. And I mean *anything*; e.g. for every prerequisite, you would
need to treat all of its parent directories as prerequisites (to deal with
renaming, changes to mounted filesystems, etc).

If you want to be really pedantic, you would have to treat the data
returned from system calls as prerequisites. For an extreme example, any C
source file which uses the __TIME__ macro will need to be rebuilt whenever
the value returned from time() changes (i.e. always).
 
Ad

Advertisements

J

Joshua Maurice

"Source code" is the wrong term; the correct term is "prerequisite".
Nothing prevents you from specifying makefiles as prerequisites.

Ensuring correctness of incremental builds boils down to ensuring that the
list of prerequisites for each target is "adequate". How far you take the
concept of adequacy determines how easy or hard it is for an incremental
build to be incorrect (i.e. differ from a full build).

If you want to be pedantic, then the tools (compiler, linker, make, etc)
need to be prerequisites of the files which they generate, as do the
libraries which those tools use, and anything which influences their
behaviour. And I mean *anything*; e.g. for every prerequisite, you would
need to treat all of its parent directories as prerequisites (to deal with
renaming, changes to mounted filesystems, etc).

If you want to be really pedantic, you would have to treat the data
returned from system calls as prerequisites. For an extreme example, any C
source file which uses the __TIME__ macro will need to be rebuilt whenever
the value returned from time() changes (i.e. always).

I don't know if you read my entire post, and I don't know if you mean
a literal "list of dependencies" or a figurative list and some more
complex handling. If you merely keep track of a list of /file/
dependencies, then this is completely insufficient. Changing command
line options won't trigger a rebuild, removing a cpp file won't delete
its corresponding object file and relink its corresponding lib or
executable, adding a new header file which hides a previously included
header file on the include path won't trigger a rebuild, etc. These
are not some obscure errors. These are things that can happen from
normal developer actions. As I said, I want to at least capture the
common actions so that a developer feels safe doing a source control
get latest, then an incremental build, and trust that things will just
work.

Apparently Sun dmake is awesome, and will do that for certain kinds of
build steps which it natively knows. If you have a different kind of
build step, you're out of luck, and if you need it on a different
platform, you are again out of luck.

I imagine that most GNU make solutions do not correctly handle all of
the above. Please, however, continue to amaze me by correcting me, and
perhaps even pointing me towards a portable solution which I can
extend to handle custom build steps. Preferably open source. I would
be in your debt.

Oh, and finally, I forgot another motivation that made me want to do
this. The make way of doing things is whenever it encounters an out of
date file, it will rebuild that file, and it will rebuild all of its
direct and transitive dependencies, unconditionally. This
unconditional cascading rebuild may be fine for the C and C++
compilation model, but it is completely inadequate for the Java
compilation model, which I also want to handle in the same tool. Ex:
A.java uses directly B.java
B.java uses directly C.java
A.java does not use directly C.java

A change to C.java requires a rebuild of B.java, but it does not
require a rebuild of A.java (usually). You need something more
interesting than an unconditional cascading rebuild to make this work
out right while doing the incremental thing and trying to skip
unnecessary parts of the build.
 
J

Jorgen Grahn

["Followup-To:" header set to comp.software.config-mgmt.]

I mentioned this little side project I had here quite a while ago in
comp.lang.c++. The context was me asking for an incrementally correct
build system.

To explain, let me define some terms.
....

Now, at this point, there is a blurring of lines between build system
tool and build system. When the makefiles are both source code and
build system, it becomes hard to distinguish between them. I'm merely
trying to say that I want a tool in which it is exceptionally hard, if
not impossible short of maliciousness, to "break" the build system
such that it might produce an incrementally incorrect build.

Now, why do I want this? I hate waiting for hours each day doing a
build. Yes, I know that the source code base in question has other
problems if it takes that long to build, but unfortunately I am not
the king of the company, and so I have to make due.
....

My aim is simply to catch as much as I can, focusing on common
developer activities, such as changing preprocessor defines, adding
files, removing files, and ultimately using source control to sync to
a new revision of the code, which will include changes of which he is
not aware, and the incremental build system ought to just "work".

So, I'm experimenting around with ideas now. My initial plan was to do
a make clone. What I have is somewhat close to a drop in replacement
to GNU make 3.81. It's missing a lot of the bells and whistles - for
example it doesn't have a job server, nor a bunch of built-in
functions. ....

So, why did I rewrite it instead of fixing GNU make? A couple of
reasons. First, it was a fun exercise. Second, I didn't want to deal
with the existing C source code. Third, I felt that the original make
way of doing incremental builds is fundamentally broken, so I wanted
to diverge a bit from that legacy.

Let me explain why on that third point, why I think the GNU make
approach is fundamentally broken. GNU make, and in fact make in
general, uses a directed acyclic graph of nodes to control which
portions of the build are skipped on an incremental build. ....
Moreover, this problem is even worse for Java. ....

So where do I go form here? I don't know. Just thought I'd post this
up, as I made some pretty large claims a while ago on comp.lang.c++,
and felt I might as well back them up. Took a bit longer than expected
for legal to get back to me to open source it.

So, any comments?

You describe the problem well, and I agree there is a problem. Since
the last time we discussed this on Usenet I've been exposed to even
worse failed build systems, evolved over the years into such a mess
that I don't think it's possible to fix.

But I fail to see the solution. Are you saying you've "only" come as
far as creating a make clone, and now you'll use it as a base for
further hacking? What is your design for fixing the problem? I
couldn't see it mentioned above, but perhaps I missed it?

/Jorgen
 
J

Jorgen Grahn

Well, apparently I must do some more learning. I am surprised. I
wonder exactly what sort of hackery is done here. This is kind of what
I want. It would be nice if it had already been done. I will comment
later...

Well, indeed there it is.

http://download.oracle.com/docs/cd/E19205-01/819-5262/aeucf/index.html
3.1.6 .KEEP_STATE and Special Dependency Checking ....

Still, I'm pretty impressed that they pulled that. Glad to see I'm not
the first who's wanted this.

For a while earlier this year, I had the idea to:

- use a Makefile with no sane dependency information
- make a clean build wrapped in strace -f (i.e. catch system calls
like open(2) in Make itself and recursively into the compiler and
other tools called from it
- parse the strace output and generate dependencies from it

I think this is perfectly doable, and it shouldn't have to be
compiler-dependant: for one rule, make forks off a process, and that
process -- with its children - reads N different files to create M
other files.

The tricky part (which made me abandon the project) is:

- compilers like g++ work as a set of cooperating processes which pass
data between them, write temporary files and so on. It's not as
simple as seeing foo.c and a number of header files being opened for
reading, and foo.o opened for writing.

- you'd have to track chdir(2) calls

- probably other things I've forgotten

Perhaps this is how dmake does it too.

/Jorgen
 
J

Joshua Maurice

Well, indeed there it is.

For a while earlier this year, I had the idea to:

- use a Makefile with no sane dependency information
- make a clean build wrapped in strace -f (i.e. catch system calls
  like open(2) in Make itself and recursively into the compiler and
  other tools called from it
- parse the strace output and generate dependencies from it

I think this is perfectly doable, and it shouldn't have to be
compiler-dependant: for one rule, make forks off a process, and that
process -- with its children - reads N different files to create M
other files.

The tricky part (which made me abandon the project) is:

- compilers like g++ work as a set of cooperating processes which pass
  data between them, write temporary files and so on. It's not as
  simple as seeing foo.c and a number of header files being opened for
  reading, and foo.o opened for writing.

- you'd have to track chdir(2) calls

- probably other things I've forgotten

Perhaps this is how dmake does it too.

I doubt it. I'd be willing to bet that dmake does it by seeing that
the target is an .o file, that the command will build that .o file,
and does some /very/ specific hackery whereby it passes additional
options to the compiler to get full dependency information, like gcc -
MD. In other words, I strongly suspect that it has some very specific
hacks when it sees a C/C++ compiler call, and for nothing else. Maybe
another built-in hack for linking. In other words, I doubt they've
pursued a "general purpose" strategy that will work for all possible
build steps and build kinds.

In fact, they could not have. That's the beauty, and pain, of doing an
incrementally correct build system. In order to get C++ compilation
correct, you must understand in detail how it is done, because of the
include path. Because of the include path, the list of dependencies
can change from build to build, specifically my above example of a
newly created header hiding a previously included header on the
include path. This search path, because that's what it is, requires
special handling. Java compilation has its own search path, the class
path, which has its own specific and interesting rules. (Much harder
to work with rules, but I've managed something.)

Simply knowing what files it happened to use on the previous build is
not good enough. You need to know what files it /might/ have used, so
that if one of those "might" files gets created, then you know that in
the next build you need to rebuild that file.

Using wildcards is another kind of "search path" problem. The example
I used above is if you delete a cpp file, this ought to delete its
object file, and that ought to trigger a relink. I imagine a lot of
build systems won't do that on an incremental build.

I've since given up on trying a general purpose approach. I'm still
thinking in the back of my mind how I could refactor what I've done to
be general purpose in some way, but I don't see it yet.

My philosophy is pretty different than the /naive/ make solution (and
I am not claiming that most actually do this in practice). The naive
solution is to create a new rule from scratch every time you need to
build a new file. The next step is at least to use some make variables
to capture the common parts so that you can easily change compiler
options, or compilers.

My next step is to remove the user's ability to define a new rule
altogether. Instead, I envision the common developer using something
like an Ant built-in macro. He can specify the source directory, the
object directory, the output link path, if it's a shared-obj aka dll
or an executable, specify the include path, and so on. He doesn't get
to specify the make rule itself. When you do this, it centralizes all
of the crucial logic in one place so that the common developer may no
longer specify something incorrectly. Instead, the exposed "Ant macro -
like" interface will always produce an incrementally-correct build.

Again, this is my current strategy because I don't see right now an
obvious and simple way to do it in the current makefile syntax and
idioms. Instead, people need to take a lot of time getting the logic
just write, then shove that into a reusable Ant-like macro which can
be used by common developers. Common developer's don't need to unit
test their build scripts, but the person working on the implementation
of the macro will need to unit test the bejesus out of it.

In a sense, the macro implementation under my model is no longer part
of the "source code" nor "build scripts", but becomes part of the
build tool itself. Now, incremental correctness is contingent on the
correctness of the macro implementations, just like it is contingent
on the correctness of the make executable, on the correctness of the
file system, and so on.
 
J

Joshua Maurice

I doubt it. I'd be willing to bet that dmake does it by seeing that
the target is an .o file, that the command will build that .o file,
and does some /very/ specific hackery whereby it passes additional
options to the compiler to get full dependency information, like gcc -
MD. In other words, I strongly suspect that it has some very specific
hacks when it sees a C/C++ compiler call, and for nothing else. Maybe
another built-in hack for linking. In other words, I doubt they've
pursued a "general purpose" strategy that will work for all possible
build steps and build kinds.

In fact, they could not have. That's the beauty, and pain, of doing an
incrementally correct build system. In order to get C++ compilation
correct, you must understand in detail how it is done, because of the
include path. Because of the include path, the list of dependencies
can change from build to build, specifically my above example of a
newly created header hiding a previously included header on the
include path. This search path, because that's what it is, requires
special handling. Java compilation has its own search path, the class
path, which has its own specific and interesting rules. (Much harder
to work with rules, but I've managed something.)

Simply knowing what files it happened to use on the previous build is
not good enough. You need to know what files it /might/ have used, so
that if one of those "might" files gets created, then you know that in
the next build you need to rebuild that file.

Using wildcards is another kind of "search path" problem. The example
I used above is if you delete a cpp file, this ought to delete its
object file, and that ought to trigger a relink. I imagine a lot of
build systems won't do that on an incremental build.

I've since given up on trying a general purpose approach. I'm still
thinking in the back of my mind how I could refactor what I've done to
be general purpose in some way, but I don't see it yet.

My philosophy is pretty different than the /naive/ make solution (and
I am not claiming that most actually do this in practice). The naive
solution is to create a new rule from scratch every time you need to
build a new file. The next step is at least to use some make variables
to capture the common parts so that you can easily change compiler
options, or compilers.

My next step is to remove the user's ability to define a new rule
altogether. Instead, I envision the common developer using something
like an Ant built-in macro. He can specify the source directory, the
object directory, the output link path, if it's a shared-obj aka dll
or an executable, specify the include path, and so on. He doesn't get
to specify the make rule itself. When you do this, it centralizes all
of the crucial logic in one place so that the common developer may no
longer specify something incorrectly. Instead, the exposed "Ant macro -
like" interface will always produce an incrementally-correct build.

Again, this is my current strategy because I don't see right now an
obvious and simple way to do it in the current makefile syntax and
idioms. Instead, people need to take a lot of time getting the logic
just write, then shove that into a reusable Ant-like macro which can
be used by common developers. Common developer's don't need to unit
test their build scripts, but the person working on the implementation
of the macro will need to unit test the bejesus out of it.

In a sense, the macro implementation under my model is no longer part
of the "source code" nor "build scripts", but becomes part of the
build tool itself. Now, incremental correctness is contingent on the
correctness of the macro implementations, just like it is contingent
on the correctness of the make executable, on the correctness of the
file system, and so on.

In fact, I would be willing to bet that dmake does almost exactly what
I want to do, except it has almost undocumented hacks that work with
existing make rules, whereas I would expose a new interface, something
like an Ant built-in macro. Currently infamake has this macro look
like any other make built-in function to makefile code. Also, in fact,
the implementation of this macro, what I've called $(cpp-to-exe ), is
nothing special compared to the implementation of $(foreach ) and $
(patsubst ) - it's just another make built-in function to the make
parsing facilities.
 
Ad

Advertisements

J

Joshua Maurice

In fact, they could not have. That's the beauty, and pain, of doing an
incrementally correct build system. In order to get C++ compilation
correct, you must understand in detail how it is done, because of the
include path. Because of the include path, the list of dependencies
can change from build to build, specifically my above example of a
newly created header hiding a previously included header on the
include path. This search path, because that's what it is, requires
special handling. Java compilation has its own search path, the class
path, which has its own specific and interesting rules. (Much harder
to work with rules, but I've managed something.)

Simply knowing what files it happened to use on the previous build is
not good enough. You need to know what files it /might/ have used, so
that if one of those "might" files gets created, then you know that in
the next build you need to rebuild that file.

Oh, and the funny part is a day after I posted it, I realized my
handling of the include path is broken. It works for most cases, but
when I added a couple more test cases, it started breaking.

The problem is that AFAIK the msvc c preprocessor, and the GNU C
preprocessor, and most of the other common desktop C preprocessors,
look up include files in the following way:

For
#include "some file"
The C preprocessor will try to find that file as a relative path,
relative to the directory of the includer file - which is not
necessarily the cpp file. (I feel silly for not knowing this before,
but it's never come up.) If not found, then use the #include <some
file> rules.

For
#include <some file>
The C preprocessor will try to find that file by consulting the -I
command line options, and then consult the system include dir(s).

The example which breaks it is:
<root>/a.cpp
<root>/b/c.hpp
<root>/b/d.hpp

Suppose the command line includes "-I some_dir -I <root>/b" in that
order. Suppose that a build with the GNU gcc option -MD shows that the
compilation of a.cpp included headers c.hpp and d.hpp. The question is
will a new file in some_dir/d.hpp hide <root>/b/d.hpp? If d.hpp was
found using the #include "some file" lookup rule of being in a path
relative to the directory of the includer, then no, such files cannot
be hidden by another file on the search path. However, if d.hpp was
found using the #include <some file> rules, specifically if it was
found from the "-I <root>/b" directory, then it can be hidden and
moreover some_dir/d.hpp will in fact hide it.

The output from GNU gcc -MD is insufficient to distinguish between
these cases. I can't take a "conservative" approach either. If I
assume that it can't be hidden, then a new file may in fact hide it
and the build will be incrementally incorrect. If I assume that it can
be hidden, then the incremental build will always infer that it's out
of date, and always do the build - the GNU gcc -MD result will be the
same, so it will always rebuild, and never be up to date.

The annoying part about this is that not only do I need to know the
include paths, but I need to know how they were found, whether they
were found as a path relative to the directory of the includer, or
found from a command line include path option. In the above posted
infamake code, I used a hack that I thought was sufficient, but it is
not. To do this correctly, I need to see the include line as-is in the
source, to see the exact path specified in the C++ source, and whether
it was specified with "" or <>. That's going to be a relative pain to
do. Shouldn't be too bad with the available options of the GNU C
preprocessor. I wonder if the other preprocessors have similar
options. Otherwise I may have to find ports of GNU cpp on all systems
- that or write my own C preprocessor which sounds like loads of not-
fun.
 
N

Nobody

I don't know if you read my entire post, and I don't know if you mean
a literal "list of dependencies" or a figurative list and some more
complex handling. If you merely keep track of a list of /file/
dependencies, then this is completely insufficient. Changing command
line options won't trigger a rebuild, removing a cpp file won't delete
its corresponding object file and relink its corresponding lib or
executable, adding a new header file which hides a previously included
header file on the include path won't trigger a rebuild, etc.

As I said:

You can handle the above issues with any "make" program; you just have to
specify the correct prerequisites.

E.g. any directory which will be searched by the compiler or linker needs
to be specified as a prerequisite, as does all of its parent directories
(and also /etc/mtab; mounting a directory won't update the modification
time of the mount point). This will cause the target to be rebuilt if
files are added or removed from those directories, or if renaming
causes the directory's pathname to refer to a different directory).
However, it will do so regardless of whether the change would actually
affect the target (i.e. it's a sufficient condition rather than a
necessary one).

For data which isn't a file (this includes devices and pseudo-files such
as /proc/* whose timestamps aren't meaningful), you can achieve the
desired behavour with e.g.:

echo $ENVVAR > envvar-value.tmp
if cmp envvar-value envvar-value.tmp ; then \
rm envvar-value.tmp ; \
else \
mv envvar-value.tmp envvar-value ; \
fi

A similar approach could be used for a more accurate version of the header
search issue above, by using e.g. "gcc -E -MM ..." (list dependencies) to
generate a dependency list, and using that as a prerequisite.

then listing envvar-value as a prerequisite to any target which will
be affected by changes to $ENVVAR.
As I said, I want to at least capture the common actions so that a
developer feels safe doing a source control get latest, then an
incremental build, and trust that things will just work.

So you want make to automatically determine the prerequisites for a target?

You could do that by ptrace()ing all of the commands used to build a
target. Determining whether a target needs to be rebuilt would involve
determining whether any of the system calls involved will return different
data. But that would result in unnecessary work; just because a system
call returns different data, it doesn't automatically follow that the
target will actually change; the target may actually only use a small
portion of the data returned (e.g. a single macro from a large header file).

Also, even if the target would change, it doesn't follow that it should be
rebuilt. E.g. any source file using __TIME__ will result in an object file
which always changes, but that doesn't mean that you want to rebuild it.
A change to C.java requires a rebuild of B.java, but it does not require
a rebuild of A.java (usually). You need something more interesting than
an unconditional cascading rebuild to make this work out right while
doing the incremental thing and trying to skip unnecessary parts of the
build.

That would require a slight change to "make". Specifically, you'd need to
check whether a target's timestamp was actually changed, and only cascade
if it did. But note that this would reduce the potential for parallelism.

If you did that, then you could deal with the issue by restoring the
timestamp if the new file is "equivalent" to the old one.
 
J

Joshua Maurice

As I said:


You can handle the above issues with any "make" program; you just have to
specify the correct prerequisites.

E.g. any directory which will be searched by the compiler or linker needs
to be specified as a prerequisite, as does all of its parent directories
(and also /etc/mtab; mounting a directory won't update the modification
time of the mount point). This will cause the target to be rebuilt if
files are added or removed from those directories, or if renaming
causes the directory's pathname to refer to a different directory).
However, it will do so regardless of whether the change would actually
affect the target (i.e. it's a sufficient condition rather than a
necessary one).

Let's walk this through.

One. As soon as I add or delete a cpp file, that will change the cpp
file's directory's last modified timestamp. There will be other cpp
files in that directory, and the effective include search rules
include directory of the includer, which means that under your naive
rules, changing any cpp file will cause all of the object files of all
cpp files in that directory to be considered out of date. Similar
problems, but bigger in effect, happen whenever I add or remove a
header file. Instead of causing just a full clean rebuild of that
directory of cpp files, I've caused a full clean rebuild of basically
all cpp files downstream. I suppose this is limited to only file
creation or deletion, but this is still pretty crappy.

Two. This fails for #include "../foo.hpp", and similar stuff.

I presume that:
1- you actually haven't implemented this solution, or
2- it's not quite as incrementally correct as you think it is, and/or
it builds a fair bit more than what is required, and more than is
reasonably possible to achieve with a good incremental build system.

PS: Sorry for taking so long to reply. Stupid google groups is
apparently acting up again. I swear - I'm about to break down and get
a real newsreader and a free server / service whatever.
 
Ad

Advertisements

Joined
Feb 10, 2014
Messages
2
Reaction score
0
Apache OpenOffice seems to be providing (although not exactly "maintaining") dmake now. Check out the OpenOffice tools page (I'm not allowed to post links yet).

Also, have you looked at clearmake ? It's strictly commercial (part of IBM Rational ClearCase) and highly integrated with MVFS, but appears to be very close to what you want... Search for clearmake and choose the publib IBM HTML page for it...
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top