Meta-C question about header order

Discussion in 'C Programming' started by Jens Schweikhardt, Apr 12, 2014.

  1. Jens Schweikhardt

    BartC Guest

    Suppose you have two silly modules like a.c and b.c here:

    //--a.c-----------------------
    #include <stdio.h>
    #include "b.h"

    void function_a(float x) {
    printf("%f\n",x);
    }

    int main(void) {
    function_b(56);
    }
    //----------------------------

    //--b.c-----------------------
    #include <stdio.h>
    #include "a.h"

    void function_b(float x) {
    function_a(x);
    }
    //----------------------------

    with their respective header files:
    //--a.h-----------------------
    void function_a(float);
    //----------------------------

    //--b.h-----------------------
    void function_b(float);
    //----------------------------

    What is the module hierarchy here? Now a.c contains main(), so that might be
    considered the root, but then b.c also depends on a.c. You can't have one
    without the other. And without an obvious external entry point such as
    main(), a.c and b.c could have the same hierarchical status.
     
    BartC, Apr 13, 2014
    #21
    1. Advertisements

  2. The snag is this.

    image.h

    type strut {dum de dum } IMAGE;
    IMAGE *loadimage(char *fname);
    int getpixel(IMAGE *image, int x, int y);

    All very reasonable, agree?

    Now we have another file

    graphics.c

    #include "image.h"

    void antialiasedcircle(IMAGE *dest, double ox, double oy, double r)
    {
    dum de dum ...

    getpixel(ox, oy + r);

    dum de dum ...
    }

    All very reasonable, agree?

    Now here's the snag. After developing our program, we decide to bundle all
    the images into one big recourse file. So we need a new routine

    IMAGE *floadimage(FILE *fp)

    basically it's just the same code as loadimage(), but instead of passing a
    filename, we open the big file, fseek to the image, and read from there.

    But when we add floadimage to image.h, graphics.c will break. It doesn't
    really have dependency on FILE *. The code is designed to work with any
    IMAGE structure. It's not interested in reading things to and from disk.

    We've fallen foul of the stickiness rule. A fix is often to #include
    stdio.h in image.h. Paradoxically, that's for the sake of functions which
    don't use stdio FILE *s, If they use a FILE *directly, they should #include
    it themselves. Which means stdio.h needs include guards.
     
    Malcolm McLean, Apr 13, 2014
    #22
    1. Advertisements

  3. Jens Schweikhardt

    Martin Shobe Guest

    The problem is the dependencies are still that big, messy, uglier than
    hell graph doxygen generated. That graph now has to be kept in the heads
    of each and every developer. Really, really, really, ..., ugly!

    Martin Shobe
     
    Martin Shobe, Apr 13, 2014
    #23
  4. in <SGv2v.315640$4>:
    ....
    # What is the module hierarchy here? Now a.c contains main(), so that might be
    # considered the root, but then b.c also depends on a.c. You can't have one
    # without the other. And without an obvious external entry point such as
    # main(), a.c and b.c could have the same hierarchical status.

    Such a concept of module hierarchy for C files is not something I
    particularly worry about (I'd rather use "unit" := all C files in one
    directory; but even that is pretty meaningless since not much follows
    from it).

    My thinking revolves around dependencies of headers, and these can be
    resolved in your example. With doxygen's "includes" graph, each C File
    is a root node and each of the included headers is a leaf. For the "is
    included by" graph, each header is a root node, and all C files
    including it are its direct leaves. Any tree has depth 2. It doesn't get
    any simpler than that.

    The trees of header dependencies are directed acyclic graphs.
    The order of #include directives of a C file is a topological
    sort of the required headers (usually a subset of all headers).

    On another note, the concept of "module" is not precisely defined. Each
    developer, each project, each author probably has their own
    understanding of it. I avoid it when I can and rather think in terms of
    public and private interfaces. That's a concept much better defined ("if
    it's public I can call it from anywhere; if it's private I try to make
    it static and call it only in that one file.") This is something you can
    explain to and is understood by any developer knowing only the basics of
    C. Keep it simple! The rope C gives us shouldn't be used to create knots
    to strangle ourselves with.

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 13, 2014
    #24
  5. in <lie4jc$2r1$>:
    ....
    # The problem is the dependencies are still that big, messy, uglier than
    # hell graph doxygen generated.

    No, they aren't. The dependencies are directed *acyclic* graphs. Before,
    they contained cycles since with the include guards, a.h can include b.h
    and b.h can include a.h without an actual cycle emerging. But both
    directions must appear as edges in the graph for each header might be
    included in isolation. Cycles are impossible by design under the
    "headers don't include headers" rule.

    # That graph now has to be kept in the heads
    # of each and every developer. Really, really, really, ..., ugly!

    Why in the heads? If you forget one, the compiler tells you
    "foo_t undeclared" and you include the header with typedef foo_t
    above it. Repeat if necessary. This Turing machine can be proven to halt. :)

    FlexeLint is even smart enough to tell us about "Header file frob.h not
    used in file bar.c", so we can cut down the headers to what's minimally
    needed.

    Can you imagine how much code is out there containing tons of unneeded
    #include statements? Who goes to the trouble of analyzing whether some
    header is actually needed? The common pattern is that include directives
    grow in number, never decrease, like entropy. Who would dare to remove
    an include directive from a *header* file? After all, it might be
    required somewhere up the include chain in one of those myriads of
    files. With the "headers don't include headers" this is dead simple to
    answer, even without FlexeLint: remove it and check if it compiles to
    the same object file as before. All you need to worry about is *one* C
    file. If you want to remove an include from a header, you repeat this
    check for *all the others including it directly or indirectly*. Nobody
    but the most determined developers dare doing this. THAT is ugly.

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 13, 2014
    #25
  6. Jens Schweikhardt

    Kaz Kylheku Guest

    Also, here is the thing. Suppose we have a module "g" which uses some data
    structure and functions defined by "c" which is built up using data types "a"
    and "b".

    Module "g" might depend on "a" and "b" not only through "c",
    but also directly.

    This is why it is nice in "g.c" to also include the "a.h" and "b.h"
    headers.

    It is quite bothersome when "g.c" relies on the fact that, for instance sprintf
    is declared because it happens that "c.h" contains interface that have FILE *
    arguments, and includes <stdio.h> for that. Then "g.c" is relying on the side
    effect that "c.h" provides sprintf, which is brutally ugly, and shows that
    attempts at automatic modularity through textual preprocessing are a failure.

    In a language with real modularity, if we depended on a module C which depends
    on some types in a standard I/O library, we would not inherit that entire
    standard I/O library through B by accident: the module inheritance mechanism
    "uses interface C" would not leak through irrelevant things, because C's
    interface would in turn use only what it needs to, like the FILE * type.

    With the enlightened inclusion approach, g.c has a right to use sprintf because
    g.c contains, somewhere at the top, #include <stdio.h>. This satisfies a
    need in #include "c.h", and an *independent* need in the body of g.c.

    When I'm reading g.c, I'm not bothered by things like "hey this uses malloc,
    but it's not including <stdlib.h> anywhere!" (3 minutes later) Oh, it's
    picking it up via foo.h (which needs stdlib.h for ldiv_t) and foo.h is included
    by bar.h, which is included by xyzzy.h which g.c includes.
     
    Kaz Kylheku, Apr 13, 2014
    #26
  7. Jens Schweikhardt

    Kaz Kylheku Guest

    Also, they are *simple* directed graphs: a simple graph has no multiple
    paths between the same two nodes.

    This is the case when the dependencies are flattened and included in order.
    No cycles, no redundancy.
    Well lookie who said he is not a computer scientist! Phwt!
     
    Kaz Kylheku, Apr 13, 2014
    #27
  8. Jens Schweikhardt

    Ian Collins Guest

    Have you ever written or had to maintain cross platform software?
    Probably quite a few of us have spent many decades developing and
    maintaining a variety of code bases. Sure there was a time when opening
    and reading headers took seconds rather than the micro-seconds it takes
    today. These days developer time is way more valuable than machine
    time. Having to update every source file that includes a header when a
    dependency changes or a new platform type is added is pure folly.
     
    Ian Collins, Apr 13, 2014
    #28
  9. Jens Schweikhardt

    Kaz Kylheku Guest

    I have.
    Everything has sped up. The *proportional* time wasted scanning headers
    redundantly is still there.
    Having to update every source file prevents the folly of making these kinds of
    changes on a regular basis.

    When programmers make changes that affect every translation unit, maybe they
    should be "punished" by having to make an edit to the root source file of every
    translation unit.

    Otherwise they have this sense that "gee, I'm just changing a few lines of code
    in a just one file; there is hardly any impact".

    If you have to do this a lot, your program is badly designed.

    If adding a new platform type affects your header file inclusion in many files,
    that indicates that you're failing to isolate the platform-specific stuff
    as well as it could be. Perhaps your types are not opaque, and their
    non-opaque bits are polluted with platform-specific types. So, yes, it hurts
    to add a new platform because you have to confront the fact that it affects
    every translation unit.

    None of this stuff is a problem if you design for minimal dependencies.
    Just because some module A uses B, that doesn't always mean that the A
    interface has to use the B interface ("a.c" requires "b.h", but "a.h" doesn't
    have to require "b.h").
     
    Kaz Kylheku, Apr 13, 2014
    #29
  10. Jens Schweikhardt

    Ian Collins Guest

    The time is micro-seconds. Unless you have millions of includes, you
    won't be able to measure it.
    It also makes simple design improvements a major arse ache.
    Why? If the change fixes a bug, or improves the design they should be
    congratulated, not punished.
    Well there isn't if the includes are managed sensibly...
    Or evolving. Most of my requirements are very vague, the client doesn't
    really know what they want until they start seeing the code in action.
     
    Ian Collins, Apr 13, 2014
    #30
  11. Jens Schweikhardt

    Martin Shobe Guest

    There is such a thing as a circular dependency. Your rule won't change
    that fact at all. Neither does it change anything about what information
    any part of the code needs to be compiled correctly. All you are
    changing the one responsible for providing that information.
    Yes, and the developers need to remember where foo_t is declared.
    They'll have to know this even when they don't care about foo_t because
    it's only an implementation detail five levels deep on the dependency
    graph. By requiring each header to include all the information it needs,
    the developer doesn't even have to know foo_t exists, let alone where to
    find it. (Or, even worse, which one to use.)
    As for removing include directives from header files, I do it whenever I
    modify a header file in such a way that the include directive is no
    longer needed. Furthermore, it's not a question of who might need that
    header file up the food chain, it's only a question of is that header
    needed to provide what this header is supposed to provide. If something
    downstream no longer compiles, then I fix that there. It's also how most
    of the people I've worked with do it.

    Martin Shobe
     
    Martin Shobe, Apr 14, 2014
    #31
  12. Not common for headers. Either a or b must be first in the pre-processed
    result, unless you're doing something really weird with conditional
    statements and including multiply, you can't generate a circular
    dependency.

    But not uncommon for source. a can call a function in b and b a function in
    a. Generally a sign of poor design, but not always easy to eliminate.
     
    Malcolm McLean, Apr 14, 2014
    #32
  13. Jens Schweikhardt

    BartC Guest

    You still need to decide which to declare first, if it can be either. So if
    there is an automatic tool to decide these things, if might get stuck.

    But also, if you want to use some automatic tools to derive all or part of
    the headers from their associated modules, then a circular dependency can
    render this impossible, if the processing of one module requires access to
    the header of the other, which doesn't yet exist.
    It can be tricky. Project modules A, B, C all use an interface module I
    which in turn calls some external (to the project) library. You want I to be
    independent from A, B and C.

    But 'I' makes use of a very handy function f() residing in C for example,
    which might also make use of some typedef, macro, named constant** etc which
    is also common to A, B and C. It's going to be a lot of work to separate out
    f(), and the result is going to be untidy.

    (** However the named const is implemented.)
     
    BartC, Apr 14, 2014
    #33
  14. Jens Schweikhardt

    Kaz Kylheku Guest

    The theme we are seeing from the naysayers is developers shouldn't have to know
    this, or shouldn't care about that, and in general should be able to focus on a
    small part of the program through a narrow peep-hole in order to make an
    intended change while learning as little as possible about the program.

    That may be very well and fine in most of the industry, but in certain programs
    that are developed to be polished gems of engineerng by people who take pride,
    this idea of "know as little as possible" is poor ideological fit.
    "Something downstream" could be every single source file. Suppose "b.h" does
    not actually need "a.h" itself, but everything implicitly depends on "a.h"
    bringing in "b.h". For instance, everything has printf in it, and "b.h" is what
    brings in <stdio.h>, though b.h doesn't actually use it. (Or just something
    like size_t, which could have been obtained minimally from <stddef.h>).

    These kinds of problems in the nested include system tend not to get fixed:
    nobody wants to.
     
    Kaz Kylheku, Apr 14, 2014
    #34
  15. The idea is that we have a set interface

    y = square_root(x);

    and we can play with the square_root() code to our heart's content. As long as we don't change the
    interface, we won't break anything.

    However that only works to a limited extent. if we're not changing the behaviour of square_root(),
    why are we editing the code at all? In the bad old days, delays were sometimes implemented by
    loops calling square_root functions (so that compilers wouldn't optimise them to nothing). So
    even increasing the efficiency could break things. In fact we might want to tweak our handling of
    negative x. So then we've got to go through all the calling code carefully to make sure that
    nothing depends on the previous handling of negatives. Quite likely we've specified the behaviour
    is undefined / reserved for future expansion, so if callers have done their job correctly, we won't
    break anything. But you can't rely on that.
     
    Malcolm McLean, Apr 14, 2014
    #35
  16. Jens Schweikhardt

    Stefan Ram Guest

    Reasons to edit might be:

    - to correct an error when the previous versions did not
    yet implement the interface specification correctly
    (This actually /is/ changing the behavior, but only
    to the behavior that is already assumed by the clients)
    (This is called »debugging«),

    - to make the function more efficient, or
    (this is called »optimization«)

    - to make the source code more readable, testable,
    and/or maintainable (this is called »refactoring«)

    - due to other demands (such as legal issues, when a
    previous implementation infringes rights of third parties).
    When one wants a new interface, the natural thing would be
    to write a new interface specification for a new function
    name and leave the old function as it is.

    In the general case, given a published library, one cannot
    go through all the calling code, because one cannot know
    who is using the library.
     
    Stefan Ram, Apr 14, 2014
    #36
  17. in <lifi4h$3fe$>:
    # On 4/13/2014 9:25 AM, Jens Schweikhardt wrote:
    #> in <lie4jc$2r1$>:
    #> ...
    #> # The problem is the dependencies are still that big, messy, uglier than
    #> # hell graph doxygen generated.
    #>
    #> No, they aren't. The dependencies are directed *acyclic* graphs. Before,
    #> they contained cycles since with the include guards, a.h can include b.h
    #> and b.h can include a.h without an actual cycle emerging. But both
    #> directions must appear as edges in the graph for each header might be
    #> included in isolation. Cycles are impossible by design under the
    #> "headers don't include headers" rule.
    #
    # There is such a thing as a circular dependency.

    No, not in any meaningful (for the C compiler) way. But maybe I'm
    misunderstanding your point. Could you give an example of a circular
    dependency?

    # Your rule won't change
    # that fact at all. Neither does it change anything about what information
    # any part of the code needs to be compiled correctly. All you are
    # changing the one responsible for providing that information.
    #
    #> # That graph now has to be kept in the heads
    #> # of each and every developer. Really, really, really, ..., ugly!
    #>
    #> Why in the heads? If you forget one, the compiler tells you
    #> "foo_t undeclared" and you include the header with typedef foo_t
    #> above it. Repeat if necessary. This Turing machine can be proven to halt. :)
    #
    # Yes, and the developers need to remember where foo_t is declared.

    Of course. Any SW engineer worth a bean uses ctags or whatever his
    IDE provides. That's not a burden.

    Regards,

    Jens
     
    Jens Schweikhardt, Apr 14, 2014
    #37
  18. Jens Schweikhardt

    Martin Shobe Guest

    Allowing people to focus their limited resources on the important issues
    instead of distracting them with irrelevant detail results, in my
    opinion, in a greater chance of those "polished gems of engineering by
    people who take pride" being produced.
    I suppose that could happen. In my opinion, those files downstream were
    already broken since they relied on an implementation detail of "b.h".
    Personally, I would remove "a.h" from "b.h" and add "a.h" to all those
    files downstream.

    Martin Shobe
     
    Martin Shobe, Apr 14, 2014
    #38
  19. Jens Schweikhardt

    Stefan Ram Guest

    Sometimes, people go out of their way to please their lint.
    When they have a lint that warns on multiple includes of the
    same file, they deem that to be »dirty«. Would they use a
    lint that would warn on the lack of include guards, they
    would instead deem this to be »dirty«.

    To please their lint, some - otherwise perfectly sane -
    people write

    if(( buf = malloc( bufsiz )))

    instead of

    if( buf = malloc( bufsiz ))

    and then call this »good style«.
     
    Stefan Ram, Apr 14, 2014
    #39
  20. in <>:
    # Jens Schweikhardt wrote:
    #> in <>:
    #> # Jens Schweikhardt wrote:
    #> #> hello, world\n
    #> #>
    #> #> consider a small project of 100 C source and 100 header files.
    #> #> The coding rules require that only C files include headers,
    #> #> headers are not allowed to include other headers (not sure if
    #> #> this is 100% the "IWYU - Include What You Use" paradigma).
    #> #
    #> # I agree with the other comments of the folly of this rule. Don't do it!
    #> #
    #> # Just take a look at some of your system headers, or popular open source
    #> # library headers and consider whether you want all of the conditional
    #> # includes and other and unnecessary nonsense in all of your source files?
    #>
    #> The "unnecessary nonsense" is all the #ifndef crap in third party
    #> headers. Looking at system headers makes me cringe. It's not a role
    #> model.
    #
    # Have you ever written or had to maintain cross platform software?

    While at Sun Microsystems, I maintained VirtualBox. Does that qualify?
    I also believe that the SW engineers that came up with the idea
    of automated IWYU are not exactly misguided. They argue their case
    in
    (with C++ examples, where the paradigma works even better). On
    http://www.eclipsecon.org/2013/category/tags/cdt-c-c-refactoring-include-includes-iwyu
    there's a Google engineer's presentation using it for CDT.
    Apparently, some people realize the potential.

    ....
    # Probably quite a few of us have spent many decades developing and
    # maintaining a variety of code bases. Sure there was a time when opening
    # and reading headers took seconds rather than the micro-seconds it takes
    # today.
    # These days developer time is way more valuable than machine
    # time.

    The whole idea is not about editor loading time. I'd say it's not even
    in the first place about compile time minimization. But that's a welcome
    and free side effect. For me the best benefit is improved
    identifiability of refactoring opportunities, interface accumulation,
    code collecting fat.

    To all the people who think the "headers don't include headers" rule
    is a folly, honestly, did you actually try it for a non-trivial code
    base or are you possibly prejudiced? I encourage you to actually try
    it. You might be in for a surprise. Granted, converting an existing
    project is making a pig fly. But with enough thrust...
    To be absolutely clear: it's not about system or third party stuff, it's
    about your project's interfaces.


    Regards,

    Jens
     
    Jens Schweikhardt, Apr 14, 2014
    #40
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.