Incremental build systems, infamake

Discussion in 'C++' started by Joshua Maurice, Jul 25, 2011.

  1. I mentioned this little side project I had here quite a while ago in
    comp.lang.c++. The context was me asking for an incrementally correct
    build system.

    To explain, let me define some terms.

    A build system tool - is a distributed, customizable tool that let's
    you create a build system. Ex: GNU make.

    A build system - is a particular build system, usually built on top of
    some build tool. This is the in-fact implementation that does your
    build. GNU make is a build tool, which can be used to create a build
    system by writing a makefile.

    An incremental build - is a build on top of a pre-exsting build, where
    there have been some source code changes between the old build and
    now. Usually such builds do not rebuild everything and instead use
    some "intelligence" to build less than the full clean rebuild, but
    enough to get results /as if/ it did a full clean rebuild.

    A correct incremental build - is an incremental build which produces
    results "equivalent" to that of a hypothetical full clean rebuild.

    An incrementally correct build system - is a build system which can
    only produce incrementally correct builds. That is, there is no source
    code changes which a developer may do, inadvertently or purposefully,
    which can result in the build system doing an incrementally incorrect

    The key part to realize about this requirement is that the build
    system scripts, such as makefiles, are themselves source code under
    this model. At least, I claim that they ought to be considered as
    source code for the purposes of determining if your incremental build
    system is incrementally correct.

    Now, at this point, there is a blurring of lines between build system
    tool and build system. When the makefiles are both source code and
    build system, it becomes hard to distinguish between them. I'm merely
    trying to say that I want a tool in which it is exceptionally hard, if
    not impossible short of maliciousness, to "break" the build system
    such that it might produce an incrementally incorrect build.

    Now, why do I want this? I hate waiting for hours each day doing a
    build. Yes, I know that the source code base in question has other
    problems if it takes that long to build, but unfortunately I am not
    the king of the company, and so I have to make due. Besides, even
    spending 10 minutes on a full clean rebuild is 10 minutes
    (potentially) wasted, if another better solution exists. And even more
    importantly, 4 hours spent tracking down a bug which was simply the
    result of a broken incremental build is 4 hours wasted.

    I claim that it's not that hard. However, as far as I can tell, very
    few people have attempted such a thing. In the C++ world at least,
    most ended with something that was "good enough" that caught 95% of
    the cases. A good example AFAIK is the Recursive Make Considered
    Harmful solution, found here:
    It's good, but it still misses cases. Depending on the exact details
    of implementation, it won't do a rebuild if you change compiler or
    linker options, such as a new preprocessor define. It won't relink an
    executable if you delete one of its associated cpp source files. It
    won't recompile an object file if you introduce a new header file
    which hides a previously included header file on the include path.

    I'm not presumptuous enough to claim that what I'm about to link to is
    fully incrementally correct, but AFAIK it's much further along than
    anything else I've ever found, and it has the good potential to get

    Do note that you have to build trust on trust. To take a side trip,
    security works by trust. You have to have some trusted agent in order
    to have secure communication online. In a similar token, the
    incremental correctness of a build system cannot be guaranteed if
    someone "accidentally" modifies some of the make source code badly,
    and installs it locally. The incremental correctness of a build system
    is thus contingent on some facts, like on the correctness of some core
    build system executables. It would also be dependent on the
    correctness of the compiler and other such things.

    My aim is simply to catch as much as I can, focusing on common
    developer activities, such as changing preprocessor defines, adding
    files, removing files, and ultimately using source control to sync to
    a new revision of the code, which will include changes of which he is
    not aware, and the incremental build system ought to just "work".

    So, I'm experimenting around with ideas now. My initial plan was to do
    a make clone. What I have is somewhat close to a drop in replacement
    to GNU make 3.81. It's missing a lot of the bells and whistles - for
    example it doesn't have a job server, nor a bunch of built-in
    functions. I also hate the tab syntax, and I would be loathe to add
    that back in. However, it has user definable functions, the same basic
    scripting language, a bunch of the built-in functions, and simple and
    recursive variables.

    In fact, it has enough functionality in similar that I am prepared to
    even do some speed comparisons between the two, my make, infamake, and
    GNU make. On a standard Linux box, infamake parses makefiles about 20
    times faster (not 20%, but 20x). On a standard Linux box, infamake
    also does an "all from all" build about 8 times faster (not 8%, but
    8x) than GNU make. An "all from clean" build invoking g++ on basically
    empty cpp files runs about just as fast with my infamake and GNU make
    - stuck on IO I presume. I have included the generators of these
    tests, so that other people can look at them and confirm, or point out
    weaknesses in my testing approach.

    While I lack a lot of the functionality of GNU make, I believe that I
    have enough of it that the missing features would not noticeably
    change the results of these tests.

    With a little work, I could put the rule syntax with all of its evil
    tabs in, and make it an actual drop-in replacement for GNU make (as
    long as you don't do things like job control, vpath, etc.).

    (Documentation is sorely lacking, I know. It comes with a bunch of
    unit tests, which should make it clear if you spend a lot of time with
    it. I'll try to add some documentation sometime soon if anyone

    So, why did I rewrite it instead of fixing GNU make? A couple of
    reasons. First, it was a fun exercise. Second, I didn't want to deal
    with the existing C source code. Third, I felt that the original make
    way of doing incremental builds is fundamentally broken, so I wanted
    to diverge a bit from that legacy.

    Let me explain why on that third point, why I think the GNU make
    approach is fundamentally broken. GNU make, and in fact make in
    general, uses a directed acyclic graph of nodes to control which
    portions of the build are skipped on an incremental build. The nodes
    usually correspond to files. This works out well in a lot of cases,
    but it fails spectacularly on some common developer actions. As I
    mentioned before, example include adding files, deleting files,
    changing build commands. None of this is well captured by looking at a
    DAG where the nodes are existing files. There are some clever hacks to
    get around some of these issues, namely the .d file generation of
    Recursive Make Considered Harmful, but it doesn't catch all of them.
    (The src/tests2 folder includes a bunch of tests I run against my own
    infamake built in C++ rule to ensure that it produces correct

    Moreover, this problem is even worse for Java. Fortunately /
    unfortunately, my company also uses Java, and apart from Eclipse's
    compiler (which is almost fully incrementally correct for Java, but
    not quite, AFAIK), there isn't anything even resembling an good
    attempt an incremental Java builds. Part of my motivation is to create
    a build system which handles C++ and Java, and incrementally
    correctly, and with fast builds. None of this is uploaded right now,
    but I have a worked out implementation lying around that I just need
    to merge in.

    Currently, the build logic is done largely in C++. I did this for
    speed, and honestly because I'm still not quite sure if there's a good
    way to generalize it to something easy to use in make's scripting
    language. I wanted the full power of C++ when doing the incremental C+
    + build logic. Perhaps I can fix that later. I don't know.

    So where do I go form here? I don't know. Just thought I'd post this
    up, as I made some pretty large claims a while ago on comp.lang.c++,
    and felt I might as well back them up. Took a bit longer than expected
    for legal to get back to me to open source it.

    So, any comments?
    Joshua Maurice, Jul 25, 2011
    1. Advertisements

  2. Joshua Maurice

    Ian Collins Guest

    On 07/25/11 07:25 PM, Joshua Maurice wrote:

    Makefiles aren't source code and build system, the make utility is the
    build system.
    Wouldn't the time spent working on a build tool be better spent cleaning
    up the code and the existing system?
    Flat makefiles rule!
    Good old make will cover all of those cases. All you need is a make
    that maintains state.

    I work for myself, so wasted time is my money wasted. I've found the
    most effective way to minimise build times is to design the code and
    makefile(s) to build fast. That normally means more, smaller files and
    a single flat makefile. These combine to get the best performance out
    of a distributed or parallel make.

    If the build still isn't fast enough, I just throw more hardware at it.
    Ian Collins, Jul 25, 2011
    1. Advertisements

  3. Makefiles are source code insofaras incremental correctness is
    concerned. make, the executable, in a sense, doesn't distinguish
    between cpp source files and makefiles. They're both input to the
    executable. Both are regularly modified by developers. Both get
    checked into source control. In terms of build correctness, I see no
    reason to distinguish between them.
    Perhaps. If the cost is amortized across a bunch of people, then no.
    If someone can just take a tool, plug it in with minimal effort, and
    get a lot of bang for it, then it would be worth it.
    Heh. Are you saying only 1 makefile? 1 makefile vs a bunch of
    makefiles with GNU make include isn't that big of a difference, if at
    It will? I admit - if you hack the bejesus out of it as compared to
    traditional solutions you find online, such as the Recursive Make
    Considered Harmful solution, then yes. I would be greatly surprised if
    anyone but a remote fraction of actual build systems in use would
    actually do a recompile of an object file if you added a new header
    file which hid a previously included header file on the include path.
    So, do you always do a full clean rebuild on every change? Presumably
    no. Are you a linux guy, or windows guy, e.g. visual studios devenv?
    If you're a Linux guy, I presume you use a text editor of some kind,
    possibly with some cool integration with the build tools, so that you
    can trigger of some sort of incremental build after doing some
    changes. That is, you don't do a full clean rebuild every time. I bet
    further in fact that when working on a cpp file, you just recompile
    that cpp file until you get it working and done, and only then do you
    do the full build - maybe from clean, maybe an incremental. Here's the
    point where I want an incrementally correct build. I want to know if I
    broke the other team's code. Sure - it would be nice if I had well
    documented, well defined stable interfaces in order to minimize the
    impact of a change and hence the time of a full clean rebuild, but
    well I don't control the coding process of my company unfortunately.
    Joshua Maurice, Jul 25, 2011
  4. Joshua Maurice

    Ian Collins Guest

    Sorry I wasn't clear: makefiles are source, the make utility is the
    build system.
    But the someone could still spend their time fixing the current mess.
    So long as the make utility can schedule the building of all the file in
    the tree, then yes.
    Well Sun (d)make certainly does.

    If I have a rule

    CC -o x -I b -I a

    and a header in a, adding a header with the same name to b will trigger
    a rebuild.
    No, I trust dmake.
    Mainly Solaris.
    No, I run an incremental build after every change in order to run the
    unit tests (make for me is "make test"). That's the main reason I put
    so much effort into optimising my builds.
    Trust your tools. I use the same tools as the OS developers. If they
    are good enough for them (and more to the point, their QA!), they are
    good enough for me...
    Ian Collins, Jul 25, 2011
  5. Well, apparently I must do some more learning. I am surprised. I
    wonder exactly what sort of hackery is done here. This is kind of what
    I want. It would be nice if it had already been done. I will comment
    Joshua Maurice, Jul 25, 2011
  6. Joshua Maurice

    Ian Collins Guest

    It's very simple, dmake keeps a state file (if asked). In this case,
    the contents are:

    x.o: a/x.h /lib/ /lib/ /opt/studio/lib/
    CC -o x -I b -I a

    So it knows a/x.h was used in the last build and it knows the compiler
    options and libraries used.
    Ian Collins, Jul 25, 2011
  7. Well, indeed there it is.
    It seems to be a solution very integrated with the Sun C compiler from
    what my google-fu can uncover. Specifically, it does not seem
    portable. Hell, it's not even really specified what it does and does
    not do. I think the above quote is from the actual manual, and I'm not
    going to find better offhand.

    So, I feel a little better, as I may not have been wasting my time
    doing something that's already been done. Portability is a rather
    stringent requirement for a build system for my company. It's also
    woefully inadequate for my company as we do more than simple C
    compilation. We do C++, Java, some custom code generation, an Eclipse
    plugin build, and a few more odds and ends.

    Still, I'm pretty impressed that they pulled that. Glad to see I'm not
    the first who's wanted this.
    Joshua Maurice, Jul 25, 2011
  8. Joshua Maurice

    Ian Collins Guest

    Ian Collins, Jul 25, 2011
    Joshua Maurice, Jul 25, 2011
  10. Joshua Maurice

    Ian Collins Guest

    No. It almost was, but no Oracle is in charge, I doubt it ever will be.
    Ian Collins, Jul 25, 2011
  11. Joshua Maurice

    Nobody Guest

    "Source code" is the wrong term; the correct term is "prerequisite".
    Nothing prevents you from specifying makefiles as prerequisites.

    Ensuring correctness of incremental builds boils down to ensuring that the
    list of prerequisites for each target is "adequate". How far you take the
    concept of adequacy determines how easy or hard it is for an incremental
    build to be incorrect (i.e. differ from a full build).

    If you want to be pedantic, then the tools (compiler, linker, make, etc)
    need to be prerequisites of the files which they generate, as do the
    libraries which those tools use, and anything which influences their
    behaviour. And I mean *anything*; e.g. for every prerequisite, you would
    need to treat all of its parent directories as prerequisites (to deal with
    renaming, changes to mounted filesystems, etc).

    If you want to be really pedantic, you would have to treat the data
    returned from system calls as prerequisites. For an extreme example, any C
    source file which uses the __TIME__ macro will need to be rebuilt whenever
    the value returned from time() changes (i.e. always).
    Nobody, Jul 25, 2011
  12. I don't know if you read my entire post, and I don't know if you mean
    a literal "list of dependencies" or a figurative list and some more
    complex handling. If you merely keep track of a list of /file/
    dependencies, then this is completely insufficient. Changing command
    line options won't trigger a rebuild, removing a cpp file won't delete
    its corresponding object file and relink its corresponding lib or
    executable, adding a new header file which hides a previously included
    header file on the include path won't trigger a rebuild, etc. These
    are not some obscure errors. These are things that can happen from
    normal developer actions. As I said, I want to at least capture the
    common actions so that a developer feels safe doing a source control
    get latest, then an incremental build, and trust that things will just

    Apparently Sun dmake is awesome, and will do that for certain kinds of
    build steps which it natively knows. If you have a different kind of
    build step, you're out of luck, and if you need it on a different
    platform, you are again out of luck.

    I imagine that most GNU make solutions do not correctly handle all of
    the above. Please, however, continue to amaze me by correcting me, and
    perhaps even pointing me towards a portable solution which I can
    extend to handle custom build steps. Preferably open source. I would
    be in your debt.

    Oh, and finally, I forgot another motivation that made me want to do
    this. The make way of doing things is whenever it encounters an out of
    date file, it will rebuild that file, and it will rebuild all of its
    direct and transitive dependencies, unconditionally. This
    unconditional cascading rebuild may be fine for the C and C++
    compilation model, but it is completely inadequate for the Java
    compilation model, which I also want to handle in the same tool. Ex: uses directly uses directly does not use directly

    A change to requires a rebuild of, but it does not
    require a rebuild of (usually). You need something more
    interesting than an unconditional cascading rebuild to make this work
    out right while doing the incremental thing and trying to skip
    unnecessary parts of the build.
    Joshua Maurice, Jul 25, 2011
  13. Joshua Maurice

    Jorgen Grahn Guest

    ["Followup-To:" header set to]

    You describe the problem well, and I agree there is a problem. Since
    the last time we discussed this on Usenet I've been exposed to even
    worse failed build systems, evolved over the years into such a mess
    that I don't think it's possible to fix.

    But I fail to see the solution. Are you saying you've "only" come as
    far as creating a make clone, and now you'll use it as a base for
    further hacking? What is your design for fixing the problem? I
    couldn't see it mentioned above, but perhaps I missed it?

    Jorgen Grahn, Jul 26, 2011
  14. Joshua Maurice

    Jorgen Grahn Guest

    Jorgen Grahn, Jul 26, 2011
    Joshua Maurice, Jul 26, 2011
  16. In fact, I would be willing to bet that dmake does almost exactly what
    I want to do, except it has almost undocumented hacks that work with
    existing make rules, whereas I would expose a new interface, something
    like an Ant built-in macro. Currently infamake has this macro look
    like any other make built-in function to makefile code. Also, in fact,
    the implementation of this macro, what I've called $(cpp-to-exe ), is
    nothing special compared to the implementation of $(foreach ) and $
    (patsubst ) - it's just another make built-in function to the make
    parsing facilities.
    Joshua Maurice, Jul 26, 2011
  17. Oh, and the funny part is a day after I posted it, I realized my
    handling of the include path is broken. It works for most cases, but
    when I added a couple more test cases, it started breaking.

    The problem is that AFAIK the msvc c preprocessor, and the GNU C
    preprocessor, and most of the other common desktop C preprocessors,
    look up include files in the following way:

    #include "some file"
    The C preprocessor will try to find that file as a relative path,
    relative to the directory of the includer file - which is not
    necessarily the cpp file. (I feel silly for not knowing this before,
    but it's never come up.) If not found, then use the #include <some
    file> rules.

    #include <some file>
    The C preprocessor will try to find that file by consulting the -I
    command line options, and then consult the system include dir(s).

    The example which breaks it is:

    Suppose the command line includes "-I some_dir -I <root>/b" in that
    order. Suppose that a build with the GNU gcc option -MD shows that the
    compilation of a.cpp included headers c.hpp and d.hpp. The question is
    will a new file in some_dir/d.hpp hide <root>/b/d.hpp? If d.hpp was
    found using the #include "some file" lookup rule of being in a path
    relative to the directory of the includer, then no, such files cannot
    be hidden by another file on the search path. However, if d.hpp was
    found using the #include <some file> rules, specifically if it was
    found from the "-I <root>/b" directory, then it can be hidden and
    moreover some_dir/d.hpp will in fact hide it.

    The output from GNU gcc -MD is insufficient to distinguish between
    these cases. I can't take a "conservative" approach either. If I
    assume that it can't be hidden, then a new file may in fact hide it
    and the build will be incrementally incorrect. If I assume that it can
    be hidden, then the incremental build will always infer that it's out
    of date, and always do the build - the GNU gcc -MD result will be the
    same, so it will always rebuild, and never be up to date.

    The annoying part about this is that not only do I need to know the
    include paths, but I need to know how they were found, whether they
    were found as a path relative to the directory of the includer, or
    found from a command line include path option. In the above posted
    infamake code, I used a hack that I thought was sufficient, but it is
    not. To do this correctly, I need to see the include line as-is in the
    source, to see the exact path specified in the C++ source, and whether
    it was specified with "" or <>. That's going to be a relative pain to
    do. Shouldn't be too bad with the available options of the GNU C
    preprocessor. I wonder if the other preprocessors have similar
    options. Otherwise I may have to find ports of GNU cpp on all systems
    - that or write my own C preprocessor which sounds like loads of not-
    Joshua Maurice, Jul 26, 2011
  18. Joshua Maurice

    Nobody Guest

    As I said:

    You can handle the above issues with any "make" program; you just have to
    specify the correct prerequisites.

    E.g. any directory which will be searched by the compiler or linker needs
    to be specified as a prerequisite, as does all of its parent directories
    (and also /etc/mtab; mounting a directory won't update the modification
    time of the mount point). This will cause the target to be rebuilt if
    files are added or removed from those directories, or if renaming
    causes the directory's pathname to refer to a different directory).
    However, it will do so regardless of whether the change would actually
    affect the target (i.e. it's a sufficient condition rather than a
    necessary one).

    For data which isn't a file (this includes devices and pseudo-files such
    as /proc/* whose timestamps aren't meaningful), you can achieve the
    desired behavour with e.g.:

    echo $ENVVAR > envvar-value.tmp
    if cmp envvar-value envvar-value.tmp ; then \
    rm envvar-value.tmp ; \
    else \
    mv envvar-value.tmp envvar-value ; \

    A similar approach could be used for a more accurate version of the header
    search issue above, by using e.g. "gcc -E -MM ..." (list dependencies) to
    generate a dependency list, and using that as a prerequisite.

    then listing envvar-value as a prerequisite to any target which will
    be affected by changes to $ENVVAR.
    So you want make to automatically determine the prerequisites for a target?

    You could do that by ptrace()ing all of the commands used to build a
    target. Determining whether a target needs to be rebuilt would involve
    determining whether any of the system calls involved will return different
    data. But that would result in unnecessary work; just because a system
    call returns different data, it doesn't automatically follow that the
    target will actually change; the target may actually only use a small
    portion of the data returned (e.g. a single macro from a large header file).

    Also, even if the target would change, it doesn't follow that it should be
    rebuilt. E.g. any source file using __TIME__ will result in an object file
    which always changes, but that doesn't mean that you want to rebuild it.
    That would require a slight change to "make". Specifically, you'd need to
    check whether a target's timestamp was actually changed, and only cascade
    if it did. But note that this would reduce the potential for parallelism.

    If you did that, then you could deal with the issue by restoring the
    timestamp if the new file is "equivalent" to the old one.
    Nobody, Jul 27, 2011
  19. Let's walk this through.

    One. As soon as I add or delete a cpp file, that will change the cpp
    file's directory's last modified timestamp. There will be other cpp
    files in that directory, and the effective include search rules
    include directory of the includer, which means that under your naive
    rules, changing any cpp file will cause all of the object files of all
    cpp files in that directory to be considered out of date. Similar
    problems, but bigger in effect, happen whenever I add or remove a
    header file. Instead of causing just a full clean rebuild of that
    directory of cpp files, I've caused a full clean rebuild of basically
    all cpp files downstream. I suppose this is limited to only file
    creation or deletion, but this is still pretty crappy.

    Two. This fails for #include "../foo.hpp", and similar stuff.

    I presume that:
    1- you actually haven't implemented this solution, or
    2- it's not quite as incrementally correct as you think it is, and/or
    it builds a fair bit more than what is required, and more than is
    reasonably possible to achieve with a good incremental build system.

    PS: Sorry for taking so long to reply. Stupid google groups is
    apparently acting up again. I swear - I'm about to break down and get
    a real newsreader and a free server / service whatever.
    Joshua Maurice, Jul 29, 2011
  20. Joshua Maurice


    Feb 10, 2014
    Likes Received:
    Apache OpenOffice seems to be providing (although not exactly "maintaining") dmake now. Check out the OpenOffice tools page (I'm not allowed to post links yet).

    Also, have you looked at clearmake ? It's strictly commercial (part of IBM Rational ClearCase) and highly integrated with MVFS, but appears to be very close to what you want... Search for clearmake and choose the publib IBM HTML page for it...
    RandallFactor, Feb 10, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.