Meta-C question about header order

Discussion in 'C Programming' started by Jens Schweikhardt, Apr 12, 2014.

  1. Jens Schweikhardt

    Tim Rentsch Guest

    I have been reading the many comments in this thread with some
    interest. Reading through your responses, I have come up with
    this summary of motivations for using this approach (these are
    my paraphrasings, often not quotes of the originals):

    + no lint warnings about repeated headers
    + no need for include guards
    + doxygen dependency graph much simpler
    + no cycles in include graph
    + removing unneeded includes is easier
    + simpler compiler diagnostics
    + easier to generate dependency makefile
    + improved identifiability of refactoring opportunities
    + ... and of interface accumulation [not sure what this means]
    + ... and of code collecting fat
    + constant reminders of all dependencies of each .c file

    Some questions:

    1. Is this an accurate summary?

    2. Has anything been left out (ie, is there any other
    positive you would add to the list)?

    3. Would you mind listing these from most important
    to least important, and giving some indication of
    relative weight for each item?
    Yes, a topological sort. The topological sort is the easy
    part - the harder part is identifying what the first-level
    dependencies are.
    I may have some suggestions here, but first I would like to read
    through responses to the questions asked above, to make sure I'm
    going in a good direction.
     
    Tim Rentsch, Apr 19, 2014
    #61
    1. Advertisements

  2. in <>:
    #
    #> consider a small project of 100 C source and 100 header files.
    #> The coding rules require that only C files include headers,
    #> headers are not allowed to include other headers (not sure if
    #> this is 100% the "IWYU - Include What You Use" paradigma).
    #>
    #> The headers contain only what headers should contain:
    #> prototypes, typedefs, declarations, macro definitions.
    #>
    #> The problem: given a set of headers, determine a sequence of
    #> #include directives that avoids syntax errors due to undeclared
    #> identifiers. I.e. if "foo.h" declares type foo_t and "bar.h" uses
    #> foo_t in a prototype, "bar.h" must be included before "foo.h".
    #
    # I have been reading the many comments in this thread with some
    # interest. Reading through your responses, I have come up with
    # this summary of motivations for using this approach (these are
    # my paraphrasings, often not quotes of the originals):
    #
    # + no lint warnings about repeated headers
    # + no need for include guards
    # + doxygen dependency graph much simpler
    # + no cycles in include graph
    # + removing unneeded includes is easier
    # + simpler compiler diagnostics
    # + easier to generate dependency makefile
    # + improved identifiability of refactoring opportunities
    # + ... and of interface accumulation [not sure what this means]
    # + ... and of code collecting fat
    # + constant reminders of all dependencies of each .c file

    Thanks Tim, for taking the time. To expand on interface accumulation:
    the process where interface A needs another, which then grows the need
    for yet another and eventually includes half the total number of
    interfaces of the project.

    Lets face it: programmers are lazy, and its too easy in C to blow up an
    initially small interface design by writing another #include in the
    first header that looks like it's included by "most" of the files where
    it is needed and include it directly where not. How many projects have
    you seen with project_types.h, misc.h, macros.h, and such headers
    invented on the spot.


    # Some questions:
    #
    # 1. Is this an accurate summary?
    #
    # 2. Has anything been left out (ie, is there any other
    # positive you would add to the list)?

    + Reduced processing time by all the tools that operate on
    C source. That's the compiler of course, but also lint,
    auto dependency generators, static checkers, doxygen, ...
    For each translation unit, the headers are tokenized and
    parsed at most once (not at all when in an disabled #ifdef).
    I observe 20% in our project.

    + Giving developers a hard and fast unambiguous rule in which file the
    include directives go. There is only one choice. If foo.c needs the
    bar_t declaration from bar.h, it gets included. Contrast this with
    "traditional wisdom", where possibly a large number of headers would be
    candidates for the new #include statement. A good design would make this
    choice obvious, a bright developer would know the state of the art, but
    it's a rare trait. "Indented six feet down and covered with dirt" is the
    reality out there. Yes, this requires selection of the proper *line*
    among the includes. But *any* compiler will tell in no uncertain words
    if that was the wrong line or you're missing another header. It's fool
    proof.

    # 3. Would you mind listing these from most important
    # to least important, and giving some indication of
    # relative weight for each item?

    + improved identifiability of refactoring opportunities
    $ grep -c '#include "foo.h"' */*.c
    Whoa! foo.h is included by 95% of files, why?
    Whoa! foo.h is included by one file only. Maybe incorporate it.
    Hmm. All foo.h require bar.h, baz.h and blurb.h. Could I
    encapsulate this better? Maybe merge some headers?
    (50 points)
    + ... and of interface accumulation
    $ grep -c '#include' */*.c
    Whoa! big.c includes everything and the kitchen sink. What's up?
    (30 points)
    + ... and of code collecting fat - optional debug code in #ifdef maze.
    Should be moved out to separate object files, linked in when needed.
    (20 points)
    + doxygen dependency graph much simpler. It's a document for
    the customer.
    (20 points)
    + removing unneeded includes is easier
    (20 points)
    + constant reminders of all dependencies of each .c file
    (10 points)
    + no cycles in include graph
    (10 points)
    + giving developers a fast rule in what file the include goes.
    (10 points)
    + reduced processing time by all the tools that operate on source
    (10 points)
    + no lint warnings about repeated headers
    (10 points)
    + easier to generate dependency makefile
    (7 points)
    + no need for include guards
    (5 points)
    + simpler compiler diagnostics
    (5 points)

    The overall goal is to make emerging complexity stand out the moment it
    emerges, opening developers eyes. The reality in any random project is:
    not all developers are stellar C programmers (the set of participants in
    this newsgroup then and now looks like an accurate statistical sample.
    From Tanmoy to Bill...)

    Unfortunately, the C preprocessor is a deceptive tool (apologies to dmr,
    may his soul rest in peace, I know why it was needed in the time back
    then) and gets frequently abused. Taming it is probably what I'm after.
    The only reason cpp has survived is because of the include guard kluge.
    Making interfaces stand out, both in number and circumference, should
    help, I hope.

    [...]
    #> Can you think of a lightweight way to solve this? Maybe using
    #> perl, the unix tool box, make, gmake, gcc, a C lexer?
    #
    # I may have some suggestions here, but first I would like to read
    # through responses to the questions asked above, to make sure I'm
    # going in a good direction.

    This is certainly incomplete:
    One would need to find the identifiers of macro definitions (easy)
    and typedefs (harder). In prototypes one must distinguish between
    types and optional parameter names.
    In other declarations one needs to determine the declared identifier.
    This is a little more involved for enums and aggregates.
    Build the "needed by" pairs and pipe to tsort(1). Voilà!
    Version 7 came with all the goodies built in, didn't it?


    Regards,

    Jens
     
    Jens Schweikhardt, Apr 23, 2014
    #62
    1. Advertisements

  3. in <JLv4v.45915$1.easynews.com>:
    # On 4/19/14, 4:25 AM, Tim Rentsch wrote:
    # I have been reading the many comments in this thread with some
    #> interest. Reading through your responses, I have come up with
    #> this summary of motivations for using this approach (these are
    #> my paraphrasings, often not quotes of the originals):
    #>
    #> + no lint warnings about repeated headers
    #
    #> + no need for include guards
    #
    #> + doxygen dependency graph much simpler
    # I am not sure I would call a "flat" graph simpler. It obscures the
    # difference between what you actually reference and what has been pulled
    # in due to implementation detail of something you reference.

    You can't see this in a graph of 20 nodes with 40 edges either.
    I understand the graph with a root and 20 leaves.

    #> + no cycles in include graph
    # Neither method will generated cycles. What becomes a "cycle" in nested
    # includes becomes a broken topographical order (a.h must be before b.h,
    # but b.h also must be before a.h)

    The cycles I mean a closed loops of edges via one or more nodes.

    #> + removing unneeded includes is easier
    # I disagree here, with headers include what they need, you need to only
    # look at one file to see what is needed, so if you make a change, you
    # have less to look at to see what is no longer needed. With all
    # dependencies, both direct and indirect, expressed in the .c file, to
    # identify that a header is no longer needed you need to look at every
    # header file included after it to confirm. Also, if you make a change in
    # a header, you need to know every client for that header, to know where
    # you need to make the changes.

    Well, I use a tool (Gimpel FlexeLint) to tell me which headers are
    not needed. That is at least simple for me. However, FlexeLint can
    tell this only for a C file, not for headers.

    #> + simpler compiler diagnostics
    # Yes, you get simpler diagnostics, but a lot more of them (or a lot more
    # work to avoid them). If headers include what they need, you don't need
    # to look at the include trace, if foo.h has a error because it is missing
    # a dependency, it doesn't matter how it got included, it needs to resolve
    # it, once. In the .c includes everything case, every file that included
    # that header will need to have the needed dependency fixed.
    #
    #> + easier to generate dependency makefile
    # Yes, if you are manually generating makefiles.
    #
    #> + improved identifiability of refactoring opportunities
    # I disagree. Since you have lost all the real dependency information, you
    # have lost the hints that help you see the refactoring.

    See my answer to Tim Rentsch for details of what I expect to find and how.

    ....
    # Ultimately, it is the programmer putting effort in to make the computer
    # jobs easier. The biggest advantage it gains is that for a naive
    # compiler, that doesn't recognize include guards, it will reparse
    # multiply included files (looking for the #endif). This can be fixed in
    # the most used headers by making the include itself conditional testing
    # the include guard.

    Which nobody does. Having the include guard *in* the included header
    instead of some intelligence in the preprocessor is the ugly kluge.
    Have you ever seen

    #ifndef PROJECT_TYPES_H
    #include "project_types.h"
    #endif
    ...repeat for N other headers...

    out in the wild? I haven't. So the common wisdom accepts endless
    rereading and retokenization of the same headers in the header djungle.
    I question the status quo in search of a paradigm shift. Big words :)

    # The main purpose we use computers is that they can do work much faster
    # than us, and there job is to take away the mechanical operations so we
    # can focus on the creative. This "rule" puts back on the programmer a lot
    # of mechanical operations that belong really to the computer.

    I believe this burden is quite lightweight. You get the header sequence
    right once, that's basically it. I provide a reference in a dummy C file
    including all headers with an empty main()). Look up where your header
    appears in the reference sequence--no more guessing or compiler errors.

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 23, 2014
    #63
  4. Jens Schweikhardt

    James Kuyper Guest

    On 04/23/2014 03:47 PM, Jens Schweikhardt wrote:
    ....
    test_header.c:
    #include "header.h"
    int dummy=0; // to silences some compiler warning messages.
     
    James Kuyper, Apr 23, 2014
    #64
  5. Jens Schweikhardt

    Tim Rentsch Guest

    I think you may have misunderstood my intentions there. I wasn't
    trying to agree with his points, just restate them to make sure
    I understood his position and didn't leave out anything. It was
    useful to see his reply, much moreso I think than if I had started
    by arguing against the scheme proposed.
     
    Tim Rentsch, Jun 10, 2014
    #65
  6. Jens Schweikhardt

    Tim Rentsch Guest

    Thank you for the extended reply. Reading through it, I
    don't find any of your arguments convincing. In almost all
    cases they either mischaracterize one of the two positions
    or make use of a non-logical inference. It isn't necessary
    to use the scheme you propose to get the benefits you say
    are important, and it's significantly more work for developers,
    starting with having to build a tool that will produce the
    necessary include ordering. By contrast, following the more usual
    rule that include files will #include any other header directly
    necessary for themselves, I did a little scripting in my regular
    development environment to produce a list of include files used
    by each .c file. It took 10 or 15 minutes. The arguments you
    give just don't make your case.
     
    Tim Rentsch, Jun 10, 2014
    #66
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.