Meta-C question about header order

Discussion in 'C Programming' started by Jens Schweikhardt, Apr 12, 2014.

  1. hello, world\n

    consider a small project of 100 C source and 100 header files.
    The coding rules require that only C files include headers,
    headers are not allowed to include other headers (not sure if
    this is 100% the "IWYU - Include What You Use" paradigma).

    The headers contain only what headers should contain:
    prototypes, typedefs, declarations, macro definitions.

    The problem: given a set of headers, determine a sequence of
    #include directives that avoids syntax errors due to undeclared
    identifiers. I.e. if "foo.h" declares type foo_t and "bar.h" uses
    foo_t in a prototype, "bar.h" must be included before "foo.h".

    I'm not a computer scientist, but it sounds as if this requires a
    topological sort of all the '"foo.h" needs "bar.h"' relations. Now the
    interesting part is: how to automate this, i.e. how to determine "this
    header declares identifiers A, B, C and requires X, Y, Z"? I looked
    at IWYU by google, but it requires a clang source tree plus some more
    hoop jumping and that's way too elephantine, so I gave up not knowing
    whether it would provide a solution.

    Can you think of a lightweight way to solve this? Maybe using perl, the
    unix tool box, make, gmake, gcc, a C lexer?

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 12, 2014
    #1
    1. Advertisements

  2. Jens Schweikhardt

    Stefan Ram Guest

    If you really deem this to be »interesting«, then go ahead
    and write it.
    The common solution is IIRC: Allow that every header includes
    all headers needed and use include-guards.

    There also is a special tool: "idep" by "stolk", hard to
    find nowadays. Also check out the Linux »tsort« command
    for topological sorts.
     
    Stefan Ram, Apr 12, 2014
    #2
    1. Advertisements

  3. That strikes me as a stupid^H^H^H^H^H^H suboptimal rule.

    You'll have headers that depend on other headers, but without that
    relationship being expressed in the source.
    So you need to build a set of tools that would be unnecessary if you
    were allowed to have #include directives in headers.

    A not quite serious suggestion: Cheat.

    Write your headers *sanely*, with headers #including any other
    headers they need. Make sure this version of the code compiles
    and executes correctly. Now all the dependencies are explicitly
    specified in the #include directives.

    Create a tool that analyzes the *.h and *.c files and generates *new*
    ..h and .c files, where the .h files have their #include directives
    removed, and the .c file have any required #include directives
    added in the correct order.

    This only works if everyone works only on the first version of the code;
    the version with #include directives in the .c files is useful only to
    satisfy the arbitrary rule.

    A more realistic method: Rather than having #include directives in
    headers, add comments that specify the dependencies, and write a tool
    that uses those comments.

    Better yet: Drop the rule and use #include directives as needed.
     
    Keith Thompson, Apr 12, 2014
    #3
  4. Jens Schweikhardt

    Ian Collins Guest

    I agree with the other comments of the folly of this rule. Don't do it!

    Just take a look at some of your system headers, or popular open source
    library headers and consider whether you want all of the conditional
    includes and other and unnecessary nonsense in all of your source files?
     
    Ian Collins, Apr 13, 2014
    #4
  5. The way I've "solved" this in my postscript interpreter source,
    is to use header guards and the preprocessor's #error directive
    to describe what header needs to be included.

    If header X needs Y to be previously included, then check for
    Y's header guard and #error if not defined.

    Y.h:

    #ifndef Y_H
    #define Y_H
    //contents of Y
    #endif


    X.h:

    #ifndef X_H
    #define X_H

    # ifndef Y_H
    # error must include Y.h before X.h
    # endif

    #endif
     
    luser- -droog, Apr 13, 2014
    #5
  6. How about a tool that, for each C file analyzes the .h files,
    then copies them in the appropriate nested order into one big .h
    file for each .c file. Then modifies the .c file to include the
    combination .h file. Only slightly different than your suggestion,
    but maybe different enough.

    -- glen
     
    glen herrmannsfeldt, Apr 13, 2014
    #6
  7. Jens Schweikhardt

    Kaz Kylheku Guest

    Projects organized according to this principle don't actually need to solve
    this problem, because they start small, at which point it is easy to get the
    order right by hand. Then they grow incrementally, whereby it is easy to
    maintain the correct order. When a new source file is added, its section of
    #include directives can be copied from another file and tweaked.
    Here is possible algorithm, at least for reasonably well behaved
    headers: iterate over the header files, and for each one compile a dummy
    translation unit which includes it, to discover the set of header files file
    which can be included without any of the other files. These are the "root" or
    "stratum 0" headers in the dependency tree.

    Next, find the set of headers which can be individually included if all these
    "stratum 0" headers are already present. The set of these is "stratum 1".

    Then iterate through the remaining strata.
     
    Kaz Kylheku, Apr 13, 2014
    #7
  8. Jens Schweikhardt

    Kaz Kylheku Guest

    I used to think so fresh out of school.

    But actually, this style is superior because leads to much cleaner code
    organization and faster compilation. It keeps everything "tight".

    I chose this approach in the TXR project, and am very pleased with it.
    I understand now why some people recommend it.

    The compiler diagnostics are simpler. None of this:

    "Syntax error in line 32 of X
    included from line 42 of Y,
    included from line 15 of Z ..."

    Also, the dependencies are easy to understand. If you look at the
    list of #include directives at the top of a .c file, those are
    the files which, if they are touched, will trigger a re-compile
    of this file. And no others!

    It's easy to generate a dependency makefile. If we have a foo.c
    with these contents:

    #include <stdio.h>
    #include "a.h"
    #include "b.h"
    #include "foo.h"

    then the dependency rule is precisely this:

    foo.o: foo.c a.h b.h foo.h

    Done! At a glance we know all the dependencies. Our regular confrontation with
    these dependencies in every source file prevents us from screwing up the
    program with a spaghetti of creeping dependencies.
    Not any more than a hammer which doesn't dispense its own nails and wooden
    planks.

    Anyway, there is no encapsulation to speak of; we are dealing with a primitive
    text file inclusion mechanism which doesn't even come close to solving the
    modularity problem.

    The fact that you have a Makefile (or whatever) which has to list object files
    breaks "encapsulation".

    The order in which you have to set up global initialization calls breaks
    "encapsulation".

    Proper module support in a language solves everything. You can just say "This
    module uses that one", and the linking, global initialization, incremental
    recompilation and linking are all taken care of.

    Emulating one small aspect of this with #includes is pointless.
     
    Kaz Kylheku, Apr 13, 2014
    #8
  9. Jens Schweikhardt

    Ian Collins Guest

    The faster compilation argument is a non-starter these days.
    A decent make will do this for you.
    So would you rather include and maintain all of the headers (and
    accompanying platform specific conditional include spaghetti) a
    particular library uses in every source file, or just include the
    library's public header? All of that crud is the encapsulation referred
    to here.
    Not when bringing in a library.
     
    Ian Collins, Apr 13, 2014
    #9
  10. Jens Schweikhardt

    Kaz Kylheku Guest

    Personal preference.

    Even if the recompile is fast thanks to the hardware, I would still rather wait
    5 seconds for a recompile than 8 seconds.

    The faster the machines get, the less I tolerate response and turnaround time.
    And, I like it; I think it is beautiful to have an explicit view of the
    dependencies laid out in the code.

    Another benefit: no ugly #ifndef SYMBOL / #define SYMBOL ... #endif crap
    in all the header files! Just a comment block and the definitions!
    I don't know of any make that generates dependencies. Compilers do (e.g gcc -MM).

    This is nicer.
    When we divide the program into libraries, we are adding a level to the
    organizational hirarchy. So it would be too pigheaded not to allow the
    permitted #include level to also increase by one level.

    The library does encapsulate; as a user of the library, I don't care
    about how it is divided into modules.

    Of course a library is a unit, and it should ideally provide one simple header
    (or one for each major feature area).

    If a library has some base definitions that are used by several features, then
    I'd probably want to have a header for those base definitions which must be
    included before the main features.

    But all these headers can, internally, include the detailed internal headers:
    all of the needed ones, in the correct order (which do not include other
    headers).
    Actually yes. If library X also needs library Y, which also needs library Z,
    you will have to break encapsulation and link in all of these, even though you
    only have #include "X.h".

    The documentation for X might say that you need to initialize Y first, etc.

    The simplicity of #include "X.h" only goes so far.
     
    Kaz Kylheku, Apr 13, 2014
    #10
  11. Jens Schweikhardt

    Ian Collins Guest

    My lost point was that in practice where the includes are included makes
    no real difference to the build time.
    I must admit I do miss being able to do crosswords during builds :)
    Yes, but if there are platform or other outside dependencies which
    govern which headers a particular configuration requires, they have to
    be written out in each source file. If code were write once, change
    never this wouldn't be a problem. But it isn't (except for perl, which
    we all know is a write only language).
    OK, "the build system" will do this for you!
    No argument there then.
     
    Ian Collins, Apr 13, 2014
    #11
  12. Jens Schweikhardt

    BartC Guest

    Whose rule is that? Because it seems to be broken by most standard headers.

    (If I look at windows.h for example, it is a mess of conditional directives
    and includes for 25 or 30 other files; I don't fancy having all that in my
    own sources. Besides which, a different compiler will have a different
    windows.h. You can't get away from the problem.)

    (not sure if
    The trouble, any of those could rely on 'something' which is in another
    header. Yet the rest of the module does not directly use that 'something',
    and shouldn't have to care about all those indirect includes (I'm not
    allowed to use the word 'imports').

    Should module A, needing to include library B, also have to include 29 other
    include files that B uses?

    And if A also needs library C, which uses some of those 29 files plus some
    of its own, which order should these dozens of extra files appear in?
    Besides, a new update of B or C may have a different set of includes, but
    now you need to update all those extra includes in A.

    It's best if these things are as self-contained as possible; A.c:

    #include "B.h"
    #include "C.h"

    You don't even need to worry about whether B needs C, or C needs B, or even
    if they have mutual dependencies.
    I'm not sure it's even possible, because of the mutual or circular
    dependencies I mentioned.
     
    BartC, Apr 13, 2014
    #12
  13. It's case of one rule for libraries, one rule for user code.

    Unless you work for Microsoft, you're unlikely to want to touch windows.h.
    So the priority is ease for the application programmer, he wants to just
    include "windows.h" and have everything work, including backwards
    compatibility. The MS programmer working on windows system files has a mess
    to work through.

    If you're not writing a library, however, then the main audience is maintaining
    programmers who come after you. It makes life easier far them if they have
    a list of all the files a module depends on, and if by a simple text search
    they can get a list of all modules that depend on it. Then it's easy to
    check manually whether a change will cause problems elsewhere.
     
    Malcolm McLean, Apr 13, 2014
    #13
  14. in <-berlin.de>:
    #>interesting part is: how to automate this, i.e. how to determine "this
    #>header declares identifiers A, B, C and requires X, Y, Z"? I looked
    #
    # If you really deem this to be »interesting«, then go ahead
    # and write it.
    #
    #>Can you think of a lightweight way to solve this? Maybe using perl, the
    #>unix tool box, make, gmake, gcc, a C lexer?
    #
    # The common solution is IIRC: Allow that every header includes
    # all headers needed and use include-guards.

    The "common solution" is quick and dirty. It leads to include spaghetti,
    careless sprinkling of #include directives, useless #includes that are
    actually not needed, horrific dependency generation for "make", longer
    compile time, useless file system access churn, lint warnings about
    repeated headers, the need for include guards, and other uglyness that
    our rule avoids. There's also ample prior art in Unix, with sys/socket.h
    requiring sys/types.h and so on. (I understand why ISO C takes a different
    approach, but that's not the point; I'm strictly interested in the
    code for the project.)

    We have used doxygen to generate "A includes B" and "A is included by B"
    graphs. The result was a big mess, uglier than hell, as soon as one
    module needs access to just 10 other modules on average.

    Now the graphs are one root with N leaves. Beauty!
    Dependency generation is looking at the C file's includes. Beauty!
    No more lint warning about repeated header inclusion. Beauty!
    No need for include guards. Beauty!

    I can live with a topological sort to flatten the dependency graphs.
    It's a bit of rocket science to create it automatically, but hey,
    I *am* in the rocket science business.

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 13, 2014
    #14
  15. in <>:
    #> On 4/12/14, 3:37 PM, Jens Schweikhardt wrote:
    #>> hello, world\n
    #>>
    #>> consider a small project of 100 C source and 100 header files.
    #>> The coding rules require that only C files include headers,
    #>> headers are not allowed to include other headers (not sure if
    #>> this is 100% the "IWYU - Include What You Use" paradigma).
    #>>
    #>
    #> A rule that says that a header can NOT include other headers, even if it
    #> depends on the contents of that header is, in my mind, totally broken.
    #
    # I used to think so fresh out of school.

    We all did, at one time or another. The approach is attractive because
    it's so simple and the ugly consequences are hidden (mostly behind sheer
    compute power and fast disk access).

    # But actually, this style is superior because leads to much cleaner code
    # organization and faster compilation. It keeps everything "tight".
    #
    # I chose this approach in the TXR project, and am very pleased with it.
    # I understand now why some people recommend it.
    #
    # The compiler diagnostics are simpler. None of this:
    #
    # "Syntax error in line 32 of X
    # included from line 42 of Y,
    # included from line 15 of Z ..."
    #
    # Also, the dependencies are easy to understand. If you look at the
    # list of #include directives at the top of a .c file, those are
    # the files which, if they are touched, will trigger a re-compile
    # of this file. And no others!
    #
    # It's easy to generate a dependency makefile. If we have a foo.c
    # with these contents:
    #
    # #include <stdio.h>
    # #include "a.h"
    # #include "b.h"
    # #include "foo.h"
    #
    # then the dependency rule is precisely this:
    #
    # foo.o: foo.c a.h b.h foo.h
    #
    # Done! At a glance we know all the dependencies. Our regular confrontation with
    # these dependencies in every source file prevents us from screwing up the
    # program with a spaghetti of creeping dependencies.

    Finally someone who has seen the light, too. I have nothing to
    add except that if you were a girl, I'd like to marry you :)

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 13, 2014
    #15
  16. in <>:
    #> hello, world\n
    #>
    #> consider a small project of 100 C source and 100 header files.
    #> The coding rules require that only C files include headers,
    #> headers are not allowed to include other headers (not sure if
    #> this is 100% the "IWYU - Include What You Use" paradigma).
    #
    # That strikes me as a stupid^H^H^H^H^H^H suboptimal rule.

    I thought the same some time ago, but was able to leave the
    dark side behind me and become a good jedi.

    # You'll have headers that depend on other headers, but without that
    # relationship being expressed in the source.

    This is exactly what I want to avoid. I *want* the dependencies to be as
    clear as possible. I want to look at the include directives and have
    them stare right at me; it helps me realize when there is too much
    interdependency emerging and rethink modularization and write smaller
    sexier interfaces.

    Sprinkling yet another include in yet another header is just piling mess
    upon mess.

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 13, 2014
    #16
  17. in <>:
    # Jens Schweikhardt wrote:
    #> hello, world\n
    #>
    #> consider a small project of 100 C source and 100 header files.
    #> The coding rules require that only C files include headers,
    #> headers are not allowed to include other headers (not sure if
    #> this is 100% the "IWYU - Include What You Use" paradigma).
    #
    # I agree with the other comments of the folly of this rule. Don't do it!
    #
    # Just take a look at some of your system headers, or popular open source
    # library headers and consider whether you want all of the conditional
    # includes and other and unnecessary nonsense in all of your source files?

    The "unnecessary nonsense" is all the #ifndef crap in third party
    headers. Looking at system headers makes me cringe. It's not a role
    model.

    There's a reason why the lint we use (Gimpel's FlexeLint) has an option
    to warn about repeated use of headers along arbitrary include chains. I
    realize beauty is in the eyes of the beholder, but I suspect that all
    the participants of this thread calling my approach stupid haven't
    really given much thought to it. There are many advantages to it (see my
    other posts in this thread).

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 13, 2014
    #17
  18. in <>:
    ....
    # The way I've "solved" this in my postscript interpreter source,
    # is to use header guards and the preprocessor's #error directive
    # to describe what header needs to be included.
    #
    # If header X needs Y to be previously included, then check for
    # Y's header guard and #error if not defined.
    #
    # Y.h:
    #
    # #ifndef Y_H
    # #define Y_H
    # //contents of Y
    # #endif
    #
    #
    # X.h:
    #
    # #ifndef X_H
    # #define X_H
    #
    # # ifndef Y_H
    # # error must include Y.h before X.h
    # # endif
    #
    # #endif

    That's an interesting approach. Thanks for sharing!

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 13, 2014
    #18
  19. in <fCs2v.73133$4>:
    # #
    #> consider a small project of 100 C source and 100 header files.
    #> The coding rules require that only C files include headers,
    #> headers are not allowed to include other headers
    #
    # Whose rule is that? Because it seems to be broken by most standard headers.

    Those aren't the problem. It's a rule strictly for our project headers.
    Third party libraries can do what they want.

    # (If I look at windows.h for example, it is a mess of conditional directives
    # and includes for 25 or 30 other files; I don't fancy having all that in my
    # own sources. Besides which, a different compiler will have a different
    # windows.h. You can't get away from the problem.)
    #
    # (not sure if
    #> this is 100% the "IWYU - Include What You Use" paradigma).
    #>
    #> The headers contain only what headers should contain:
    #> prototypes, typedefs, declarations, macro definitions.
    #
    # The trouble, any of those could rely on 'something' which is in another
    # header. Yet the rest of the module does not directly use that 'something',
    # and shouldn't have to care about all those indirect includes (I'm not
    # allowed to use the word 'imports').
    #
    # Should module A, needing to include library B, also have to include 29 other
    # include files that B uses?

    If it needs to, yes. I believe that this would be no different
    with the "common include" rule. Eventually all the 29 headers
    would be included. Most likely even *many times over.* Under
    our rule each exactly once. Beauty!

    ....
    #> Can you think of a lightweight way to solve this? Maybe using perl, the
    #> unix tool box, make, gmake, gcc, a C lexer?
    #
    # I'm not sure it's even possible, because of the mutual or circular
    # dependencies I mentioned.

    You can't have circular include dependencies, no matter what rule
    you follow. In the end, all identifiers must be declared/defined
    before use (modulo some esoteric situations like tag names in
    prototype scope).

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 13, 2014
    #19
  20. Jens Schweikhardt

    BartC Guest

    (I've been roundly castigated, and called a 'troll' to boot, for daring to
    talk about non-C language ideas in this group, so I'm taking my life in my
    hands here, but here goes...)

    I have converted a project (of maybe 16 or so modules) from C source using
    traditional headers, into a scheme that uses 'import' statements (you say
    what module you're importing and it takes care of the details).

    However, the C approach was much more straightforward! Instead of a single
    line such as:

    #include "header.h"

    which took care of everything, and was exactly the same in every module,
    each module now had from six to twelve import statements, different for each
    module. You need to be ultra-aware of interdependencies, module hierarchy
    etc, and to my mind it's a lot more work.

    Maybe that discipline is a good thing, and can help create modules with
    better-defined interfaces that can then be more easily used in other
    projects. But it also has its headaches!

    However, even such a scheme doesn't give a full list of dependencies: each
    module only lists the imports it directly needs directly.

    You've made a reasonably good case for having declare everything, but I'm
    not sure that's workable. Because the libraries, headers, whatever resources
    a particular header might need for its workings, should be a private
    (encapsulated) part of it. Otherwise when you next compile an updated
    version, you might have a bunch of compilation errors to sort out!
     
    BartC, Apr 13, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.