Re: Who owns the variable in my header file ?

Discussion in 'C Programming' started by Eric Sosman, Oct 3, 2012.

  1. Eric Sosman

    Eric Sosman Guest

    On 10/3/2012 2:13 PM, lipska the kat wrote:
    > Hi
    >
    > I have the following program
    > distributed over 4 files
    >
    > /* foo.h */
    > int foo;
    > [...]
    >
    > /* main.c */
    > #include <stdio.h>
    > #include <foo.h>
    > [...]
    >
    > /* fooset.c */
    > #include <foo.h>
    > [...]
    >
    > /* fooget.c */
    > #include <foo.h>
    > [...]


    This is wrong, assuming all three modules are linked into
    one program. Each module provides its own definition of the
    variable `foo', and those three definitions collide. There
    must be one and only one `foo' in the program, not three.

    The way to accomplish this is to remove the *definition*
    of `foo' from foo.h and replace it with a *declaration*. The
    difference is not so hard to understand: It is the difference
    between "I am Lipska" and "I know someone named Lipska." The
    way you spell "I know someone named" in C is

    /* foo.h */
    extern int foo; // "I know an int named foo"
    [...]

    Each of the three modules thus gets an introduction to `foo'.
    In exactly one of these modules (it doesn't matter which; just
    pick one that makes the most sense) you also put a definition:

    /* wherever.c */
    #include <foo.h> // "I know an int named foo"
    int foo; // "Yes, and here I am!"
    [...]

    (The defining module doesn't actually *need* the declaration,
    since defining the variable also declares it. But it's a good
    idea to use the #include anyhow, because if the compiler sees
    both the declaration and the definition together it can alert
    you if they disagree -- like if you change one to a `long' and
    forget to change the other.)

    Incidentally, your use of #include <foo.h> is suspect. The
    <> form is for system-provided headers like <stdio.h>, while
    programmer-provided header files should use #include "foo.h"
    instead. Compilers search for <> and "" inclusions in different
    places, and even if the mixup is sometimes harmless it is also
    sometimes not so harmless.

    > I run make on my makefile (I'm a beginner at make, Ant is more my thing)
    > and see a humungous great glob of bytes called foo.h.gch, looks like
    > foo.h has been compiled ... but I've no idea
    > why it's so huge.
    >
    > -rw-rw-r-- 1 lipska lipska 1339792 Oct 3 17:28 foo.h.gch


    This looks like a "precompiled header," generated as a time-
    saving step by the (wait for it...) C++ compiler you're using.
    (You may have thought you were writing C, but the available
    evidence suggests you've set up your build environment to use
    C++ instead. Might want to check your setup ...)

    > Anyway, the question is who 'owns' the foo declared in foo.h


    If you have exactly one definition (as C requires), you might
    say that `foo' is "owned" by the module where that definition
    appears. (Or you might not; once the modules are linked together,
    all global variables are on an equal footing and might as well be
    said to be "owned" by the entire program.)

    If you have three colliding definitions -- well, there's no
    useful way to answer questions about undefined behavior.

    > Storage is obviously set aside as when I run the program I get the
    > expected output
    >
    > foo is 10
    > foo is now 11


    As I hope you're beginning to learn, "It worked" does not
    imply "It's right." The possible manifestations of undefined
    behavior include "It did (or seemed to do) what I expected."

    > I guess this big old lump of bytes has something to do with it.


    Possibly, but probably not.

    --
    Eric Sosman
    d
    Eric Sosman, Oct 3, 2012
    #1
    1. Advertising

  2. Eric Sosman

    Alan Curry Guest

    In article <k4i2a7$uhj$>,
    Eric Sosman <> wrote:
    >On 10/3/2012 2:13 PM, lipska the kat wrote:
    >> I run make on my makefile (I'm a beginner at make, Ant is more my thing)
    >> and see a humungous great glob of bytes called foo.h.gch, looks like
    >> foo.h has been compiled ... but I've no idea
    >> why it's so huge.
    >>
    >> -rw-rw-r-- 1 lipska lipska 1339792 Oct 3 17:28 foo.h.gch

    >
    > This looks like a "precompiled header," generated as a time-
    >saving step by the (wait for it...) C++ compiler you're using.
    >(You may have thought you were writing C, but the available
    >evidence suggests you've set up your build environment to use
    >C++ instead. Might want to check your setup ...)


    gcc creates those files in C mode too, when you run gcc -c foo.h

    >
    >> Anyway, the question is who 'owns' the foo declared in foo.h

    >
    > If you have exactly one definition (as C requires), you might
    >say that `foo' is "owned" by the module where that definition
    >appears. (Or you might not; once the modules are linked together,
    >all global variables are on an equal footing and might as well be
    >said to be "owned" by the entire program.)
    >
    > If you have three colliding definitions -- well, there's no
    >useful way to answer questions about undefined behavior.


    Oh please. It's not unuseful to explain what actually happened. gcc made a
    "common" symbol in each object file, and the linker merged them. This
    behavior may not be standardized but it's not hard to explain, and after
    you've explained it you can add that there are ways to change it:

    compile with -fno-common and the common symbol will be changed to a normal
    symbol, and the linker will fail when it sees multiple normal symbols with
    the same name. This way your program won't link until you've fixed it to obey
    the "one owner" rule.

    Or, assuming the GNU linker is being used, link with -Wl,-no-common which
    will do the merging of common symbols but also warn you about them, allowing
    you to use the program while providing a reminder that you still have some
    work to do to make it portable.

    --
    Alan Curry
    Alan Curry, Oct 3, 2012
    #2
    1. Advertising

  3. Eric Sosman

    Eric Sosman Guest

    On 10/3/2012 3:17 PM, Alan Curry wrote:
    > In article <k4i2a7$uhj$>,
    > Eric Sosman <> wrote:
    >>[...]
    >> If you have three colliding definitions -- well, there's no
    >> useful way to answer questions about undefined behavior.

    >
    > Oh please. It's not unuseful to explain what actually happened. gcc made a
    > "common" symbol in each object file, and the linker merged them. This
    > behavior may not be standardized but it's not hard to explain, and after
    > you've explained it you can add that there are ways to change it:
    >[...]


    I'm sticking with "no useful way."

    --
    Eric Sosman
    d
    Eric Sosman, Oct 3, 2012
    #3
  4. Eric Sosman

    Eric Sosman Guest

    On 10/3/2012 3:59 PM, lipska the kat wrote:
    > On 03/10/12 20:05, Eric Sosman wrote:
    >> On 10/3/2012 2:13 PM, lipska the kat wrote:
    >>> Hi
    >>>
    >>> I have the following program
    >>> distributed over 4 files

    >
    > [snip]
    >
    >> This looks like a "precompiled header," generated as a time-
    >> saving step by the (wait for it...) C++ compiler you're using.
    >> (You may have thought you were writing C, but the available
    >> evidence suggests you've set up your build environment to use
    >> C++ instead. Might want to check your setup ...)

    >
    > Yes ... of course, this implies that I know how to change it :)
    > I've done what I can with the hideously complex ... er I mean
    > feature rich gcc software to ensure that I am compiling as c


    From what other posters have written, it appears I guessed
    incorrectly about the C/C++ distinction. You may in fact be
    compiling your code as C -- but for some reason you're "compiling"
    the header file itself. That's probably not what you want to do.

    > > As I hope you're beginning to learn, "It worked" does not
    > > imply "It's right." The possible manifestations of undefined
    > > behavior include "It did (or seemed to do) what I expected."

    >
    > Well yes, but if I run a program 10 times with the same data and get the
    > same results each time I might start to think that something is 'right'.


    If you run it ten times with the same data, you're probably hitting
    the same fragile set of coincidences each time. Running with different
    data could be more illuminating -- although, as Dijkstra said, testing
    cannot demonstrate absence of errors, but only their presence.

    > If I design my test cases in the usual way (boundary cases and random
    > 'middle ground' cases at the very least) then run those tests with the
    > same data and get the same output each time then I get a feeling that I
    > may be on the right path.
    >
    > Is testing C code fundamentally different to testing code in other
    > languages ?


    No, not fundamentally. It seemed to me you'd distributed
    the middle of

    "Correct programs work."
    "This program is correct."
    "Therefore, this program works."

    to obtain

    "Correct programs work."
    "This program works."
    "Therefore, this program is correct."
    "BZZZZT! Thank you for playing."

    .... and you would certainly not have been the first to do so.

    --
    Eric Sosman
    d
    Eric Sosman, Oct 3, 2012
    #4
  5. Eric Sosman

    James Kuyper Guest

    On 10/03/2012 03:59 PM, lipska the kat wrote:
    > On 03/10/12 20:05, Eric Sosman wrote:
    >> On 10/3/2012 2:13 PM, lipska the kat wrote:

    ....
    > > As I hope you're beginning to learn, "It worked" does not
    > > imply "It's right." The possible manifestations of undefined
    > > behavior include "It did (or seemed to do) what I expected."

    >
    > Well yes, but if I run a program 10 times with the same data and get the
    > same results each time I might start to think that something is 'right'.


    That's a bad assumption. One of the most common ways in which code with
    undefined behavior actually behaves is to produce exactly the same
    result that you incorrectly assume that it's required to produce. That's
    because your assumptions happen to match decisions made by the
    implementors of the version of C that you're testing with. Other
    implementors of C are free to make different decisions, ones that are
    incompatible with your incorrect assumptions.

    > If I design my test cases in the usual way (boundary cases and random
    > 'middle ground' cases at the very least) then run those tests with the
    > same data and get the same output each time then I get a feeling that I
    > may be on the right path.
    >
    > Is testing C code fundamentally different to testing code in other
    > languages ?


    No, the inappropriateness of concluding that a program is correct, just
    because it appears to work, is common to all computer languages.
    --
    James Kuyper
    James Kuyper, Oct 4, 2012
    #5
  6. Eric Sosman

    Kaz Kylheku Guest

    On 2012-10-03, lipska the kat <> wrote:
    > Well yes, but if I run a program 10 times with the same data and get the
    > same results each time I might start to think that something is 'right'.


    If the program is essentially deterministic (no real-time inputs, no threads)
    then running the same test case (same data, same program, same platform)
    ten times is quite silly. It is one test case.

    (There may be some differences between the runs, like the OS randomizing the
    stack locations or some such thing.)

    But it is better to have ten different test cases in a suite and run through
    those.

    If you want to prove something with ten runs of one test case, perform the ten
    runs on ten different platforms and show that the results are the same.
    Kaz Kylheku, Oct 4, 2012
    #6
  7. lipska the kat <> writes:
    [...]
    > Of course, the point I was trying to make is that if my program is
    > behaving in an 'undefined' way then I might expect 10 runs with
    > identical data to provide different results. I'm in no way sufficiently
    > knowledgeable about C to assume otherwise. I suppose it depends on what
    > you mean by undefined.


    No, that's not what undefined means. The C standard's definition of
    *undefined behavior* is:

    behavior, upon use of a nonportable or erroneous program construct
    or of erroneous data, for which this International Standard imposes
    no requirements

    NOTE Possible undefined behavior ranges from ignoring the
    situation completely with unpredictable results, to behaving
    during translation or program execution in a documented manner
    characteristic of the environment (with or without the issuance
    of a diagnostic message), to terminating a translation or
    execution (with the issuance of a diagnostic message).

    > If I have a program that reverses it's input a line at a time (ex 1-19 K
    > and R second edition for example) and I try it with as many different
    > inputs as my feeble brain can devise and the results are what I expect
    > then what can I assume from this. In other languages I have used (10s of
    > KLOC running daily without error) I would assume that the program was
    > 'correct'.


    C, as the saying goes, gives you enough rope to shoot yourself in the
    foot. I'll show you a concrete example:

    #include <stdio.h>

    static void write_array(int *arr) {
    for (int i = 0; i <= 5; i ++) {
    arr = i;
    }
    }

    static void read_array(int *arr) {
    for (int i = 0; i <= 5; i ++) {
    printf("%d", arr);
    putchar(i == 5 ? '\n' : ' ');
    }
    }

    int main(void) {
    int x[5] = { 0 };
    int y[5] = { 0 };
    int z[5] = { 0 };

    write_array(y);
    read_array(y);

    return 0;
    }

    The array y is defined to have 5 elements, but the program attempts to
    store 6 int values in it, and then retrieve and print those 6 values.
    Accessing y[5] has undefined behavior, since it's outside the bounds of
    the array. But since y is surrounded in memory by two other arrays, x
    and z, it's likely that y[5] refers to an element of one of those other
    two arrays. (There's no guarantee that x, y, and z are allocated in any
    particular order, or even that they're adjacent, but it's likely that
    one of them immediately follows y in memory.)

    I can compile and run this program 100 times, and it's very likely to
    produce the same output every time:

    0 1 2 3 4 5

    That's just one of the infinitely many things that can happen when the
    language standard "imposes no requirements".

    (A sufficiently clever optimizing compiler might cause it to produce
    different output, or to crash, or even to be rejected at compile time.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Oct 4, 2012
    #7
  8. Eric Sosman

    Angel Guest

    On 2012-10-04, Keith Thompson <> wrote:
    > lipska the kat <> writes:
    >
    >> If I have a program that reverses it's input a line at a time (ex 1-19 K
    >> and R second edition for example) and I try it with as many different
    >> inputs as my feeble brain can devise and the results are what I expect
    >> then what can I assume from this. In other languages I have used (10s of
    >> KLOC running daily without error) I would assume that the program was
    >> 'correct'.

    >
    > C, as the saying goes, gives you enough rope to shoot yourself in the
    > foot. I'll show you a concrete example:
    >
    > #include <stdio.h>
    >
    > static void write_array(int *arr) {
    > for (int i = 0; i <= 5; i ++) {
    > arr = i;
    > }
    > }
    >
    > static void read_array(int *arr) {
    > for (int i = 0; i <= 5; i ++) {
    > printf("%d", arr);
    > putchar(i == 5 ? '\n' : ' ');
    > }
    > }
    >
    > int main(void) {
    > int x[5] = { 0 };
    > int y[5] = { 0 };
    > int z[5] = { 0 };
    >
    > write_array(y);
    > read_array(y);
    >
    > return 0;
    > }
    >
    > The array y is defined to have 5 elements, but the program attempts to
    > store 6 int values in it, and then retrieve and print those 6 values.
    > Accessing y[5] has undefined behavior, since it's outside the bounds of
    > the array. But since y is surrounded in memory by two other arrays, x
    > and z, it's likely that y[5] refers to an element of one of those other
    > two arrays. (There's no guarantee that x, y, and z are allocated in any
    > particular order, or even that they're adjacent, but it's likely that
    > one of them immediately follows y in memory.)
    >
    > I can compile and run this program 100 times, and it's very likely to
    > produce the same output every time:
    >
    > 0 1 2 3 4 5
    >
    > That's just one of the infinitely many things that can happen when the
    > language standard "imposes no requirements".
    >
    > (A sufficiently clever optimizing compiler might cause it to produce
    > different output, or to crash, or even to be rejected at compile time.)


    Just out of curiosity, I ran this little test through gcc. Without
    optimization, or at optimization level 1, gcc only warns about the
    unused variables x and z.

    At optimization level 2, gcc warns about a subscript out of bounds
    on line 5 (in the write_array function). At optimization level 3 it
    also gives this warning about line 11 (in the read_array function).

    The program does give the 0 1 2 3 4 5 output every time, though.


    --
    "C provides a programmer with more than enough rope to hang himself.
    C++ provides a firing squad, blindfold and last cigarette."
    - seen in comp.lang.c
    Angel, Oct 4, 2012
    #8
  9. Eric Sosman

    James Kuyper Guest

    On 10/04/2012 03:50 AM, lipska the kat wrote:
    ....
    > Of course, the point I was trying to make is that if my program is
    > behaving in an 'undefined' way then I might expect 10 runs with
    > identical data to provide different results.


    That's a very bad expectation, unless your 10 runs were done using
    wildly different implementations of C, on 10 wildly different platforms.

    > ... I'm in no way sufficiently
    > knowledgeable about C to assume otherwise. I suppose it depends on what
    > you mean by undefined.


    "undefined behavior" has a very specific meaning in the C standard:
    "behavior, upon use of a nonportable or erroneous program construct or
    of erroneous data, for which this International Standard imposes no
    requirements" (3.4.3p1). A key phrase needs to be noted: "this
    International Standard". Behavior which is defined by something else
    (such as the POSIX standard, or an ABI standard for a given platform, or
    the documentation for a given compiler, or the fundamental laws of
    physics) would still be undefined behavior as far as the C standard is
    concerned. If there's anything other than the C standard which defines
    the behavior (and there usually is), it will be perfectly repeatable for
    as long as that other definition applies, and will fail to be repeatable
    as soon as you use it in a situation where the other definition no
    longer applies.

    For example, if the "undefined behavior" is defined by the POSIX
    standard, you can expect the results to be perfectly repeatable on every
    POSIX-conforming system, but you'll have no guarantees about non-POSIX
    systems. If it's defined by Intel for all CPUs in the same family, the
    undefined behavior will be perfectly repeatable as long as you execute
    only on that family of CPUs, but not necessarily if you port your code
    to an AMD system.

    Keep in mind that most of the code constructs that have repeatable
    undefined behavior will be repeatable for much less portable reasons
    than the examples I gave above. It may be "defined" (though not in any
    publicly available document) by a particular version of a particular
    compiler when used with a particular set of command line options, and
    may be defined differently if you change any of those options, or
    upgrade your compiler, or change your code in any way, such as
    reordering the variable declarations.

    > If I have a program that reverses it's input a line at a time (ex 1-19 K
    > and R second edition for example) and I try it with as many different
    > inputs as my feeble brain can devise and the results are what I expect
    > then what can I assume from this.


    That it handled those test cases correctly, and might mishandle any case
    you didn't think of. It could even mishandle the cases you did test, if
    it contains a time-dependent defect (such as mis-handling a leap year).
    You can generalize your test results beyond those test cases only in
    proportion to how much you know about what the guaranteed behavior of
    your code is. I would judge that you know a fair amount about C, but
    your question is about a fairly fundamental point, which implies that
    there's still a lot of details you don't know yet.

    > ... In other languages I have used (10s of
    > KLOC running daily without error) I would assume that the program was
    > 'correct'.


    10s of KLOC isn't a lot, and your assumptions in those other languages
    would be just as unjustified as they are in C. The details of what I've
    said are specific to C, but the general principle is not.
    --
    James Kuyper
    James Kuyper, Oct 4, 2012
    #9
  10. Eric Sosman

    James Kuyper Guest

    On 10/04/2012 04:02 AM, lipska the kat wrote:
    > On 04/10/12 05:34, James Kuyper wrote:

    ....
    >> That's a bad assumption. One of the most common ways in which code with
    >> undefined behavior actually behaves is to produce exactly the same
    >> result that you incorrectly assume that it's required to produce. That's
    >> because your assumptions happen to match decisions made by the
    >> implementors of the version of C that you're testing with. Other
    >> implementors of C are free to make different decisions, ones that are
    >> incompatible with your incorrect assumptions.

    >
    > Er ... wow, OK, that is a bit of a head****
    > Do you mean to say that even if I test my program to destruction and as
    > far as I can tell it's 'correct', that is it complies with requirements
    > and behaves as expected it could still be incorrect when compiled with
    > a different compiler ???


    Certainly. That's not just because of undefined behavior, either.
    There's also behavior that is merely unspecified: the standard provides
    (explicitly or, more commonly, implicitly) a list of possible behaviors,
    and each implementation gets to choose from that list - in some cases,
    it can even make a different choice each time a given piece of code is
    executed. Some unspecified behavior is "implementation-defined" which
    means that an implementation is required to document which choice it has
    made, but there's also a lot of cases where there's no such requirement.

    > Surely there is some 'base' implementation of C that is used to test
    > compilers ..


    No, there is not. Even if there were, the base implementation would have
    to make specific choices in every case where the C standard leaves the
    behavior unspecified or undefined, and other fully-conforming
    implementations of C would not be required to make the same choices,
    which greatly reduces the usefulness of having a base implementation.
    That may be one reason why there isn't one.

    > ... or is it a free for all ...


    It's not a free-for-all - the standard does impose a great many specific
    requirements. However, the things that it does not specify are what
    gives implementors sufficient freedom to create a conforming
    implementation of C on almost every platform. That is the reason why C
    is one of the most widely implemented of all computer languages.

    > ... to me this implies that there can
    > be more than one 'correct' implementation of the C language,


    Correct - the set of possible fully-conforming implementations of the C
    language is infinite. The set of actual fully-conforming implementations
    is much smaller, but still large enough that it's not feasible to test
    any given program on all of them. It's also sufficiently varied that
    testing on only a few dozen of them is insufficient to prove that your
    code will work on all of the untested ones.

    > ... or several
    > or many Cs in fact. Please remember I am a raw beginner at C although I
    > find this whole discussion fascinating.
    >
    > [snip]
    >
    > Given a program written in C, how does one determine that it is
    > 'correct' if complying with requirements and returning the same output
    > from the same input is not enough.


    That depends upon the requirements. Well-written requirements should
    identify a specific version of the C standard (C2011 just came out, so
    there aren't many implementations of it, and full implementations of C99
    are still rare - but C90 has been fully implemented just about
    everywhere). Those requirements should specify that your code must have
    no syntax errors or constraint violations according to that version.
    Then you read the standard and learn what constitutes a syntax error or
    a constraint violation.

    Well-written requirements should also limit the dependence of the code
    on unspecified or undefined behavior in some appropriate fashion. Useful
    programs seldom completely avoid undefined behavior, and almost never
    avoid unspecified behavior, but you can fill in those gaps by, for
    instance, requiring POSIX conformance.
    --
    James Kuyper
    James Kuyper, Oct 4, 2012
    #10
  11. lipska the kat <> writes:
    <snip>
    > Given a program written in C, how does one determine that it is
    > correct' if complying with requirements and returning the same output
    > from the same input is not enough.


    There are a few tools that can help. For example, there's valgrind (and
    other similar things) that can check all of your memory accesses as you
    run your tests. But there are many other things that can be wrong but
    which appear to work. One general tool is to get into the habit of
    reasoning about your programs.

    Testing is very helpful of course, but I'd venture to say that the
    balance between treating programming as a formal mathematical activity
    and treating it like engineering has tended, in recent years, to down
    play the mathematical side to the detriment of the field.

    --
    Ben.
    Ben Bacarisse, Oct 4, 2012
    #11
  12. lipska the kat <> writes:
    > On 04/10/12 09:30, Keith Thompson wrote:
    >> lipska the kat<> writes:
    >> [...]

    >
    > [snip]
    >
    >> C, as the saying goes, gives you enough rope to shoot yourself in the
    >> foot. I'll show you a concrete example:

    >
    > [snip]
    >
    > gcc example.c
    > example.c: In function ‘write_array’:
    > example.c:4:9: error: ‘for’ loop initial declarations are only allowed
    > in C99 mode
    > example.c:4:9: note: use option -std=c99 or -std=gnu99 to compile your code
    > example.c: In function ‘read_array’:
    > example.c:10:9: error: ‘for’ loop initial declarations are only allowed
    > in C99 mode
    > make: *** [example] Error 1
    >
    > gcc -ansi example.c
    > ditto above


    Right, I used a C99-specific feature, and gcc with no arguments, or with
    "-ansi", doesn't implement C99. You can avoid that by using "-std=c99",
    or by changing

    for (int i = 0; i <= 5; i ++) {
    /* ... */
    }

    to:

    int i;
    for (int i = 0; i <= 5; i ++) {
    /* ... */
    }

    Note that the "int i;" declaration has to be at the top of the block,
    before any statements (a C90 restriction that C99 removed).


    > gcc -std=c99 -Wall example.c
    > example.c: In function ‘main’:
    > example.c:19:13: warning: unused variable ‘z’ [-Wunused-variable]
    > example.c:17:13: warning: unused variable ‘x’ [-Wunused-variable]
    >
    > gcc -std=c99 -O1 -Wall example.c
    > ditto above
    >
    > gcc -std=c99 -O2 -Wall example.c
    > example.c: In function ‘main’:
    > example.c:19:13: warning: unused variable ‘z’ [-Wunused-variable]
    > example.c:17:13: warning: unused variable ‘x’ [-Wunused-variable]
    > example.c:5:20: warning: array subscript is above array bounds
    > [-Warray-bounds]
    >
    > gcc -std=c99 -O3 -Wall example.c
    > example.c: In function ‘main’:
    > example.c:19:13: warning: unused variable ‘z’ [-Wunused-variable]
    > example.c:17:13: warning: unused variable ‘x’ [-Wunused-variable]
    > example.c:5:20: warning: array subscript is above array bounds
    > [-Warray-bounds]
    > example.c:11:19: warning: array subscript is above array bounds
    > [-Warray-bounds]


    Yes, all those warnings are valid. An "unused variable" warning
    doesn't mean that your program is wrong, it just means that you've
    probably made a logical error.

    The "array subscript is above array bounds" is more serious. As I
    said, I deliberately wrote a program whose behavior is undefined;
    this absolutely was *not* an example of what you should do.

    The program attempts to store values outside the bounds of an array.
    I added extra array declarations to try to give those accesses
    somewhere to go.

    > 0 1 2 3 4 5 every time
    >
    > Now I'm really confused


    The program's behavior is undefined. Printing 0 1 2 3 4 5 every
    time is therefore perfectly valid, since nothing in the standard
    says it *shouldn't* behave that way.

    If you go beyond what the standard actually says, there are reasons
    why it behaves the way it does. The arrays x, y, and z probably
    happen to be stored next to each other in memory. Writing past
    the end of y probably clobbers the beginning of either x or z.
    Since x, y, and z are all in memory that you "own", the program is
    able to do that with no apparent problem.

    A more stringent compiler might have caused the program to crash
    when it tried to store past the end of y. Most compilers don't
    do that because it requires explicit checking, which is expensive
    (it would catch incorrect programs, but slow down correct programs).
    A cynic might say that C compilers are designed to let you get your
    wrong answers as quickly as possible.

    > Maybe I should be reading the C99 spec %-}


    It's not easy reading, and there's not really anything in it that
    explains the way this program behaves. As far as the standard is
    concerned, running that program could make demons fly out of your
    nose (obviously that won't really happen, but nothing in the C
    standard forbids it).

    It's entirely possible that my example was more complex than it
    should have been. If you don't understand it, don't worry about it
    too much for now. Perhaps you should concentrate more on writing
    correct code than on understanding incorrect code.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Oct 5, 2012
    #12
  13. On Oct 4, 1:30 pm, James Kuyper <> wrote:
    > On 10/04/2012 04:02 AM, lipska the kat wrote:
    > > On 04/10/12 05:34, James Kuyper wrote:



    > >> That's a bad assumption. One of the most common ways in which code with
    > >> undefined behavior actually behaves is to produce exactly the same
    > >> result that you incorrectly assume that it's required to produce. That's
    > >> because your assumptions happen to match decisions made by the
    > >> implementors of the version of C that you're testing with. Other
    > >> implementors of C are free to make different decisions, ones that are
    > >> incompatible with your incorrect assumptions.

    >
    > > Er ... wow, OK, that is a bit of a head****
    > > Do you mean to say that even if I test my program to destruction and as
    > > far as I can tell it's 'correct', that is it complies with requirements
    > > and behaves as expected it could still be incorrect when compiled with
    > > a different compiler ???

    >
    > Certainly. That's not just because of undefined behavior, either.
    > There's also behavior that is merely unspecified: the standard provides
    > (explicitly or, more commonly, implicitly) a list of possible behaviors,
    > and each implementation gets to choose from that list - in some cases,
    > it can even make a different choice each time a given piece of code is
    > executed. Some unspecified behavior is "implementation-defined" which
    > means that an implementation is required to document which choice it has
    > made, but there's also a lot of cases where there's no such requirement.
    >
    > > Surely there is some 'base' implementation of C that is used to test
    > > compilers ..

    >
    > No, there is not. Even if there were, the base implementation would have
    > to make specific choices in every case where the C standard leaves the
    > behavior unspecified or undefined, and other fully-conforming
    > implementations of C would not be required to make the same choices,
    > which greatly reduces the usefulness of having a base implementation.
    > That may be one reason why there isn't one.
    >
    > > ... or is it a free for all ...

    >
    > It's not a free-for-all - the standard does impose a great many specific
    > requirements. However, the things that it does not specify are what
    > gives implementors sufficient freedom to create a conforming
    > implementation of C on almost every platform. That is the reason why C
    > is one of the most widely implemented of all computer languages.
    >
    > > ... to me this implies that there can
    > > be more than one 'correct' implementation of the C language,

    >
    > Correct - the set of possible fully-conforming implementations of the C
    > language is infinite. The set of actual fully-conforming implementations
    > is much smaller, but still large enough that it's not feasible to test
    > any given program on all of them. It's also sufficiently varied that
    > testing on only a few dozen of them is insufficient to prove that your
    > code will work on all of the untested ones.


    <snip>

    As someone remarked this business with "undefined behaviour" is true
    of pretty much all programming languages (I'm not convinced Godel has
    anything to contribute to this). To some extent C stresses it more,
    this is partly because C runs nearly everywhere and has huge numbers
    of implementations.

    Langauages like Perl and Python have less trouble with this as there
    are actually very few implementations. Java side steps it by running
    on a virtual machine. In a sense java is utterly non-portable as it
    only runs on one platform (the JVM)! Java also nails down many things
    that C doesn't such as order of expression of evaluation and size of
    fundamental types. Some languages such as Ada had extensive test
    suites to validate compilers; but such things are very expensive to
    maintain.
    Nick Keighley, Oct 6, 2012
    #13
  14. Eric Sosman

    Les Cargill Guest

    Nick Keighley wrote:
    > On Oct 4, 1:30 pm, James Kuyper <> wrote:
    >> On 10/04/2012 04:02 AM, lipska the kat wrote:
    >>> On 04/10/12 05:34, James Kuyper wrote:

    >
    >
    >>>> That's a bad assumption. One of the most common ways in which code with
    >>>> undefined behavior actually behaves is to produce exactly the same
    >>>> result that you incorrectly assume that it's required to produce. That's
    >>>> because your assumptions happen to match decisions made by the
    >>>> implementors of the version of C that you're testing with. Other
    >>>> implementors of C are free to make different decisions, ones that are
    >>>> incompatible with your incorrect assumptions.

    >>
    >>> Er ... wow, OK, that is a bit of a head****
    >>> Do you mean to say that even if I test my program to destruction and as
    >>> far as I can tell it's 'correct', that is it complies with requirements
    >>> and behaves as expected it could still be incorrect when compiled with
    >>> a different compiler ???

    >>
    >> Certainly. That's not just because of undefined behavior, either.
    >> There's also behavior that is merely unspecified: the standard provides
    >> (explicitly or, more commonly, implicitly) a list of possible behaviors,
    >> and each implementation gets to choose from that list - in some cases,
    >> it can even make a different choice each time a given piece of code is
    >> executed. Some unspecified behavior is "implementation-defined" which
    >> means that an implementation is required to document which choice it has
    >> made, but there's also a lot of cases where there's no such requirement.
    >>
    >>> Surely there is some 'base' implementation of C that is used to test
    >>> compilers ..

    >>
    >> No, there is not. Even if there were, the base implementation would have
    >> to make specific choices in every case where the C standard leaves the
    >> behavior unspecified or undefined, and other fully-conforming
    >> implementations of C would not be required to make the same choices,
    >> which greatly reduces the usefulness of having a base implementation.
    >> That may be one reason why there isn't one.
    >>
    >>> ... or is it a free for all ...

    >>
    >> It's not a free-for-all - the standard does impose a great many specific
    >> requirements. However, the things that it does not specify are what
    >> gives implementors sufficient freedom to create a conforming
    >> implementation of C on almost every platform. That is the reason why C
    >> is one of the most widely implemented of all computer languages.
    >>
    >>> ... to me this implies that there can
    >>> be more than one 'correct' implementation of the C language,

    >>
    >> Correct - the set of possible fully-conforming implementations of the C
    >> language is infinite. The set of actual fully-conforming implementations
    >> is much smaller, but still large enough that it's not feasible to test
    >> any given program on all of them. It's also sufficiently varied that
    >> testing on only a few dozen of them is insufficient to prove that your
    >> code will work on all of the untested ones.

    >
    > <snip>
    >
    > As someone remarked this business with "undefined behaviour" is true
    > of pretty much all programming languages (I'm not convinced Godel has
    > anything to contribute to this). To some extent C stresses it more,
    > this is partly because C runs nearly everywhere and has huge numbers
    > of implementations.
    >
    > Langauages like Perl and Python have less trouble with this as there
    > are actually very few implementations. Java side steps it by running
    > on a virtual machine.


    Perl and Python, being interpreted, also have a "virtual machine"
    each.

    > In a sense java is utterly non-portable as it
    > only runs on one platform (the JVM)! Java also nails down many things
    > that C doesn't such as order of expression of evaluation and size of
    > fundamental types. Some languages such as Ada had extensive test
    > suites to validate compilers; but such things are very expensive to
    > maintain.
    >


    --
    Les Cargill
    Les Cargill, Oct 6, 2012
    #14
  15. On 10/6/12 5:30 AM, Nick Keighley wrote:

    > As someone remarked this business with "undefined behaviour" is true
    > of pretty much all programming languages (I'm not convinced Godel has
    > anything to contribute to this). To some extent C stresses it more,
    > this is partly because C runs nearly everywhere and has huge numbers
    > of implementations.
    >
    > Langauages like Perl and Python have less trouble with this as there
    > are actually very few implementations. Java side steps it by running
    > on a virtual machine. In a sense java is utterly non-portable as it
    > only runs on one platform (the JVM)! Java also nails down many things
    > that C doesn't such as order of expression of evaluation and size of
    > fundamental types. Some languages such as Ada had extensive test
    > suites to validate compilers; but such things are very expensive to
    > maintain.
    >


    Undefined behavior is allowed in C to provide for (significantly)
    improved efficiency in some operations. For example, accessing an array
    beyond its bounds. If we removed pointers into arrays (and passing
    arrays with unspecified bounds), then the compiler could easily add code
    to check the subscripts to the array and trap on error conditions. If we
    want to support pointers into arrays, then these pointers could also be
    made "fatter" to include the bounds of the object they point to (and for
    multidimensional arrays, the bounds for each of the larger arrays the
    array is part of). This add significant overhead to the pointer and the
    operations. Since the design goal of C was to favor creating efficient
    code, to make it a reasonable replacement for assembly code, the
    tradeoff tend to be made in the favor of efficiency, over catching "bad"
    code. Many other languages have chosen to limit the realm of undefined
    behavior, by defining what is supposed to happen, forcing the compiler
    to possible generate less efficient (but more predictable) code.
    Richard Damon, Oct 7, 2012
    #15
  16. Eric Sosman

    Ian Collins Guest

    On 10/07/12 14:19, Gordon Burditt wrote:
    > It is very easy to write a program in C that deliberately crashes


    Are you replying to some one or posting random musings?

    --
    Ian Collins
    Ian Collins, Oct 7, 2012
    #16
  17. Eric Sosman

    BartC Guest

    "Richard Damon" <> wrote in message
    news:k4qm0b$jr0$...
    > On 10/6/12 5:30 AM, Nick Keighley wrote:
    >
    >> As someone remarked this business with "undefined behaviour" is true
    >> of pretty much all programming languages (I'm not convinced Godel has
    >> anything to contribute to this). To some extent C stresses it more,
    >> this is partly because C runs nearly everywhere and has huge numbers
    >> of implementations.


    > If we removed pointers into arrays (and passing
    > arrays with unspecified bounds), then the compiler could easily add code
    > to check the subscripts to the array and trap on error conditions. If we
    > want to support pointers into arrays, then these pointers could also be
    > made "fatter" to include the bounds of the object they point to (and for
    > multidimensional arrays, the bounds for each of the larger arrays the
    > array is part of).


    Arrays can have any numbers of dimensions, so would be highly impractical
    for any of a thousand possible pointers into an array for each to duplicate
    it's half-dozen or dozen dimensions. You would likely also need different
    pointers for each of the sub-dimensions.

    And for an array whose dimensions are not realised until runtime, or for
    'ragged' arrays where the bounds vary through the array, how would
    such a pointer be initialised? Other languages would tend to build the
    bounds into the arrays themselves.

    In any case, C allows pointers into all sorts of objects, including
    non-arrays, or a single element of that multi-dimensional array, or to cast
    one type of pointer into another; you wouldn't then be able to step or do
    arithmetic on such a pointer, without by-passing the bounds checking.

    So 'undefined behaviour', if it's as simple as having the wrong value in a
    pointer, is built-in to the language!

    (For single-dimensional arrays, a 'fat' pointer containing exactly one
    bound, could work, provided they are a new explicit type in addition to
    regular pointers. Then an array allocator could return such a pointer, which
    can be passed to functions and would carry it's length for use by programs,
    and could optionally be used for bounds checking by internal code. But for
    multi-dimensions, it gets complicated...)

    > This add significant overhead to the pointer and the
    > operations.


    Not if the alternative is to have to always pass the length of the array
    together with a pointer to the array. Having bounds-checking code inserted
    would be an extra overhead, but that can be optional.

    --
    Bartc
    BartC, Oct 7, 2012
    #17
  18. Eric Sosman

    James Kuyper Guest

    On 10/07/2012 06:40 AM, BartC wrote:
    >
    >
    > "Richard Damon" <> wrote in message
    > news:k4qm0b$jr0$...

    ....
    >> If we removed pointers into arrays (and passing
    >> arrays with unspecified bounds), then the compiler could easily add code
    >> to check the subscripts to the array and trap on error conditions. If we
    >> want to support pointers into arrays, then these pointers could also be
    >> made "fatter" to include the bounds of the object they point to (and for
    >> multidimensional arrays, the bounds for each of the larger arrays the
    >> array is part of).

    >
    > Arrays can have any numbers of dimensions, so would be highly impractical
    > for any of a thousand possible pointers into an array for each to duplicate
    > it's half-dozen or dozen dimensions. You would likely also need different
    > pointers for each of the sub-dimensions.


    None of that matters; only one range is needed at any given time - it
    can be modified whenever changing levels in the multidimensional array.
    Whenever an lvalue of array type gets converted to a pointer of it's
    element type, that pointer can be given a range corresponding to the
    beginning and ending of the array. It doesn't matter whether the element
    type is itself an array type - that can only come into play upon
    conversion of an lvalue of the element type being converted to a pointer
    to it's first element; at which point the same rule applies, giving the
    pointer a different range.

    > And for an array whose dimensions are not realised until runtime, or for
    > 'ragged' arrays where the bounds vary through the array, how would
    > such a pointer be initialised?


    In C, ragged arrays can only be implemented by allocating each row from
    a larger memory space. If the allocation is handled by malloc(), then
    the bounds can be inserted at the time malloc() is called. If the user
    code allocates one large array, and then fills in an array of pointers
    to irregularly-sized pieces of that array, there's no way for the C
    compiler to know what the bounds are; it will necessarily use only the
    bounds of the big array.

    Other languages would tend to build the
    > bounds into the arrays themselves.
    >
    > In any case, C allows pointers into all sorts of objects, including
    > non-arrays,


    That poses no problems - the C standard specifies that a pointer to a
    non-array object can be treated as a pointer to the first and only
    element of a 1-element array of the object's type.

    > ... or a single element of that multi-dimensional array,


    That poses no problem, either; the bounds for the pointer to the single
    element are the bounds for the array from which it was selected. If the
    programmer wants to restrict the permitted range more tightly than that,
    the C language currently provides no mechanism for doing so; though
    *((*element_type)[n])element_pointer seems a plausible mechanism that
    could be used to tell the compiler to treat it as though it came from a
    n-element array (I do NOT claim that the current standard endorses any
    such use of this construct).

    This construct could also be used to tell the compiler what bounds to
    use when filling in a ragged array from a single large array.
    --
    James Kuyper
    James Kuyper, Oct 7, 2012
    #18
  19. On 05/10/2012 08:44, lipska the kat wrote:
    >
    > I understand it perfectly well, I just think if someone makes the effort
    > to reply to my question I should make the effort to respond.
    > One thing I learned from this and other posts is that C99 is probably a
    > better choice for me that earlier standards. I added the option you
    > suggested to my gcc commands along with -Wall ran make on all my current
    > code and waited for the explosion (of demons perhaps:) ... but nothing
    > really of note appeared, There was one warning but that was about it.
    > Most gratifying.
    >
    > Thanks for taking the time to reply
    >
    > lipska
    >


    When compiling with GCC you probably want to add the -Wextra and
    -pedantic flags as well to your compilation command.
    Chicken McNuggets, Oct 7, 2012
    #19
  20. Eric Sosman

    James Kuyper Guest

    On 10/07/2012 01:22 PM, lipska the kat wrote:
    > On 07/10/12 02:19, Gordon Burditt wrote:
    >> It is very easy to write a program in C that deliberately crashes
    >> (here this means: calls abort()) under conditions which you

    >
    > [snip]
    >
    >> - Crashes only when calling asctime() and the year is greater
    >> than 9,999 (Y10K bug in the *definition* of asctime()).

    >
    > Well if I have any code running > 9999 then I'll consider it a bit of a


    Well, the issue is also relevant to code that computes future times. I
    admit, the need to determine calendar dates that far in the future is
    quite small - but it's not non-existent.

    The problem with asctime() is that it's the only C standard library
    function whose behavior is defined entirely by example code (7.3.27.1p2)
    showing how it could be implemented. asctime() provides a prime example
    of why that's a bad idea. It can be deduced from that example code that
    asctime() has undefined behavior if:

    timeptr->tm_wday < 0 || timeptr->tm_wday > 6 ||
    timeptr->tm_mon < 0 || timeptr->tm_mon > 11 ||
    timeptr->tm_year < -2899 || timeptr->tm_year > 8099

    The limits on tm_wday and tm_mon are due to their use as array indices;
    the limit on tm_year is imposed by the fact that the call to sprintf()
    will overflow the provided buffer. Even assuming that the date being
    represented is between year 1000 and year 9999, you'll still get a
    buffer overflow if

    timeptr->tm_mday < -9 || timeptr->tm_mday > 99 ||
    timeptr->tm_hour < -9 || timeptr->tm_hour > 99 ||
    timeptr->tm_sec < -9 || timeptr->tm_sec > 99

    However, until C2011, it was nowhere explicitly stated that this is the
    case. In C2011, 7.3.27.1p3 was added, which says that the behavior is
    undefined if (in effect) timeptr->tm_year < -900 || timeptr->tm_year >
    8099, or any of the other fields are outside their normal range, as
    defined in 7.27.1p4; this is more restrictive than the constraints I
    deduced above.

    asctime() doesn't have to be unsafe - the example code is only an
    example. Undefined behavior allows, as one possibility, that asctime()
    is implemented more safely than in the example code. It could return a
    null pointer when tm_wday or tm_mon are out of range, or it could choose
    a special month/day name (such as "INV"). It could also return a null
    pointer instead of producing a buffer overflow, or it it could use a
    buffer large enough to avoid any possibility of overflow.
    --
    James Kuyper
    James Kuyper, Oct 7, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kaz Kylheku

    Re: Who owns the variable in my header file ?

    Kaz Kylheku, Oct 3, 2012, in forum: C Programming
    Replies:
    0
    Views:
    359
    Kaz Kylheku
    Oct 3, 2012
  2. Ike Naar

    Re: Who owns the variable in my header file ?

    Ike Naar, Oct 3, 2012, in forum: C Programming
    Replies:
    0
    Views:
    381
    Ike Naar
    Oct 3, 2012
  3. Edward A. Falk

    Re: Who owns the variable in my header file ?

    Edward A. Falk, Oct 3, 2012, in forum: C Programming
    Replies:
    5
    Views:
    427
    Keith Thompson
    Oct 11, 2012
  4. James Kuyper

    Re: Who owns the variable in my header file ?

    James Kuyper, Oct 4, 2012, in forum: C Programming
    Replies:
    0
    Views:
    297
    James Kuyper
    Oct 4, 2012
  5. Eric Sosman

    Re: Who owns the variable in my header file ?

    Eric Sosman, Oct 4, 2012, in forum: C Programming
    Replies:
    0
    Views:
    339
    Eric Sosman
    Oct 4, 2012
Loading...

Share This Page