Black magic, or insanity?

Discussion in 'C Programming' started by Robbie Brown, Jan 21, 2014.

  1. Robbie Brown

    Robbie Brown Guest

    I've been reviewing what I've learned about pointers.

    I thought I'd do a few tests just to consolidate what I thought I'd
    learned and frankly .. I'm dumfounded.

    int main(int argc, char *argv[]){

    //declare a pointer to int
    int *ip;

    //print ... what exactly, prints 'nil'
    printf("ip is %p\n", ip);
    //dereference the pointer, seg fault
    printf("*ip is %d\n", *ip);

    }

    the output is what I expected

    ip is (nil)
    Segmentation fault (core dumped)

    I then add the following statement after the last printf

    int *ip2;

    compile and exec and get the same output

    ip is (nil)
    Segmentation fault (core dumped)

    Now then, the next bit is a total head****

    If I modify the last statement so that it reads

    int *ip2 = NULL;

    so the code is now

    int main(int argc, char *argv[]){

    //declare a pointer to int
    int *ip;

    //print ... what exactly, prints 'nil'
    printf("ip is %p\n", ip);
    //dereference the pointer, seg fault
    printf("*ip is %d\n", *ip);

    int *ip2 = NULL;

    }

    then compile and exec I get the following

    ip is 0x7fff0dfeb230
    *ip is 1

    WTF!!! ... how does initalizing ip2 to NULL cause the
    previous code to now display ... something.

    Is this for real?
    I mean seriously, this is just ... what

    I have no idea

    Dazed and confused.
     
    Robbie Brown, Jan 21, 2014
    #1
    1. Advertisements

  2. Robbie Brown

    Zoltan Kocsi Guest

    Your expectation is completely wrong. The fact that ip is nil is due to
    luck. You do not initialise it. Automatic variables (i.e. ones defined
    inside a function without the 'static' keyword) are *not* initialised
    by the compiler. Whatever junk is on the stack, that's the initial
    value. If your compiler does any optimisation, then it's not even
    the stack. Most likely ip was allocated in a register, which the start
    code (which executes before your main() enters) happened to set to 0.
    Chances are, ip was now allocated in a different register, due to the
    need of allocating space for ip2. The new register contained a valid
    address.

    Since you have not initialised the pointers and they were not in the
    BSS, you could expect nothing, absolutely nothing about their values.

    Any decent compiler should have given you a warning about the
    uninitialised nature of ip. Also note that even zeroing the BSS is a
    hosted environment thing, many embedded systems do not initialise the
    memory before starting main() at all.

    Zoltan
     
    Zoltan Kocsi, Jan 21, 2014
    #2
    1. Advertisements

  3. You just need to re-adjust your expectations. All of your examples have
    what C calls undefined behaviour. The language standard does not say
    what should happen, so compilers can do pretty much what they like.
    Having any expectation at all is going to lead to puzzlement.

    If, on the other hand, you want to know what is actually going on, then
    just look at the generated code, but keep in mind that this will tell
    you about one version of one compiler with one set of command-line flags
    on one system at some particular time. You probably won't learn much of
    use.

    <snip>
     
    Ben Bacarisse, Jan 21, 2014
    #3
  4. Robbie Brown

    Robbie Brown Guest

    Hmm, I'm using gcc version 4.6.3 ... is this a 'decent compiler'

    gcc -std=gnu99 -Wall pointers.c -g -o pointers
    gives no warnings about uninitialised anything.

    I hear what you are saying though and have taken it on board.

    Thanks for your time
     
    Robbie Brown, Jan 21, 2014
    #4
  5. Robbie Brown

    Robbie Brown Guest

    I'm discovering this, fascinating stuff.

    Thanks
     
    Robbie Brown, Jan 21, 2014
    #5
  6. Robbie Brown

    Eric Sosman Guest

    Strange. Even a much older (4.4.1) gcc gives me

    foo.c: In function 'main':
    foo.c:7: warning: implicit declaration of function 'printf'
    foo.c:7: warning: incompatible implicit declaration of built-in
    function 'printf'
    foo.c:7: warning: 'ip' is used uninitialized in this function

    A truly ancient (3.4.4) version emits only the `printf' warning,
    but if invoked with optimization at -O1 or higher it also squawks
    "warning: 'ip' might be used uninitialized in this function" (note
    "might be" rather than "is"; this could be a different warning).

    Wild guess: The detection of uninitialized uses depends on data
    developed while optimizing, and the default optimization level when
    no -Ox is specified varies from one gcc version to another. Try
    adding -O1 or -O2 (or even -O3) to your command line, to see if
    the compiler will offer more commentary.
     
    Eric Sosman, Jan 21, 2014
    #6
  7. Robbie Brown

    Kaz Kylheku Guest

    Since this is a non-static local variable that is uninitialized, it contains
    data which is traditionally called "garbage" in programmer lingo.

    In C standard formal terms, its value is "indeterminate": which means that
    it is an unspecified value which may be a trap representation.

    By dumb tuck, this indeterminate value could look like a valid pointer,
    and dereference successfully.

    The indeterminate garbage inside ip could be different upon different
    executions of the program, and could be influenced by changes to seemingly
    irrelevant parts of the program.
    This is undefined behavior already: you're accesing the value
    indeterminately-valued object ip.
    We have no basis for expecting a "seg fault" here. The behavior here is
    also undefined for the same reason. Undefined means not defined by the ISO
    standard document which describes the C language. (If there were a requirement
    to rpoduce a segmentation fault, that would be a definition of behavior; it
    would not be "undefined".)

    In the case of some undefined behaviors, we do have a basis for expecting
    some particular behavior on a particular platform. That happens when the
    language implementors give us a definition, or else we can otherwise deduce
    the behavior from the structure of the platform, or from knowing something
    about the compiler behavior, etc.
     
    Kaz Kylheku, Jan 21, 2014
    #7
  8. [...]

    This is not directly relevant to your question, but the "%p" printf
    format expects an argument of type void*. You're giving it an argument
    of type int*, which strictly speaking causes undefined behavior.

    It's very very likely to work correctly on any system where void* and
    int* have the same representation (which is the vast majority of
    existing systems), but for maximum portability you should cast the
    pointer value to void:

    printf("ip is %p\n", (void*)ip);

    This is one of the few cases where casting, particularly pointer
    casting, is a good habit.
     
    Keith Thompson, Jan 21, 2014
    #8
  9. check out my mail signature. it will also answer your question.

    --
    Helmut K. C. Tessarek

    /*
    Thou shalt not follow the NULL pointer for chaos and madness
    await thee at its end.
    */
     
    Helmut Tessarek, Jan 21, 2014
    #9
  10. Robbie Brown

    Robbie Brown Guest

    Heh, that's about right.
     
    Robbie Brown, Jan 21, 2014
    #10
  11. Good advice, but not actually relevant in this case.

    The OP *expected* a segmentation fault on dereferencing a null pointer.
    The problem was that the pointer object in question was uninitialized,
    and therefore might or might not contain a null pointer value.
     
    Keith Thompson, Jan 21, 2014
    #11
  12. A lot of people already gave extensive explanations and I think the main point
    is that anything can and will happen.

    So I think 'chaos and madness' is quite relevant, if you mess with null
    pointers (or pointers that are potentially null pointers). ;-)
    Yep, for me a(n) (uninitialized) pointer that is a potential null pointer
    still falls in the category not to mess with.

    Cheerio!

    --
    Helmut K. C. Tessarek

    /*
    Thou shalt not follow the NULL pointer for chaos and madness
    await thee at its end.
    */
     
    Helmut Tessarek, Jan 21, 2014
    #12
  13. Robbie Brown

    Robbie Brown Guest

    Yes, I'm starting to get the impression that, unlike other languages I
    have used, C (or rather the C compiler perhaps) doesn't stop you from
    doing all manner of exceptionally stupid things.

    For example, for no other reason that experimentation I tried to get my
    head around pointers to pointers and came up with the following.
    Trying hard not to make assumptions, just observations.

    [Linux 3.2.0-23-generic x86_64 GNU/Linux]

    int **arpi = (int**) malloc(sizeof(int*) * 5);
    *(arpi + 4) = malloc(sizeof(int));
    *(*(arpi + 4)) = 14;

    If I run this through gdb I can see what I expected to see (there's that
    word again, what other word can I use?).

    arpi is a pointer to the first of 5 64 bit addresses.
    the first 4 addresses contain 0x0000000000000000 I hope I understand
    that these are uninitialized addresses ... or maybe they have been
    initialized to 0 by some voodoo priest :) anyway
    the fifth address contains the 64 bit address 0x0000000000602010
    this seems reasonable as I malloc'd enough space for a pointer to int.
    if I inspect the contents of 0x602010 I see 0x0e which is (I hope) what
    I was expecting

    Then it got all strange again

    I changed the first line to
    int **arpi = (int**) malloc(sizeof(int) * 5);

    now I malloc int instead of int*
    Compile, run, inspect, same old results
    I think this works because an int is probably 64 bits same as an address
    (gross assumption)

    Then it gets weirder
    int **arpi = (int**) malloc(0);
    Now realistically what should I 'expect' to happen

    I sort of expected it not to compile ... wrong, it compiled
    I sort of expected it to blow up ... wrong, ran and exited normally
    I even found 0x0e lurking about almost where I hoped it would be.

    gdb exposed the memory and it was obviously not right but it still ran.

    This *is* fun isn't it?

    Ah well, onwards and upwards.
     
    Robbie Brown, Jan 22, 2014
    #13
  14. Robbie Brown

    James Kuyper Guest

    On 01/22/2014 07:06 AM, Robbie Brown wrote:
    ....
    What the C standard requires is that malloc(0) may return either
    a) a null pointer
    b) a pointer suitably aligned for any type, but which points at memory
    that cannot be safely written to.
     
    James Kuyper, Jan 22, 2014
    #14
  15. All you really need to understand is that C allows you to write to "raw"
    addresses. Often the bits in the pointer are the actual bits which go on the
    address bus to fetch data to and from RAM. Other times there's a very low-level
    layer of indirection which prevents programs from corrupting each other and,
    possible, damage to hardware.
    Now if you write to a random address, it's very hard to say what will happen.
    You might hit another variable, you might destroy your call stack, you might
    send a byte to a memory-mapped port or put up a pixel on a memory-mapped
    screen. The system might detect that what you are doing is illegal and issue
    a segfault (this is the best, most desirable result from the point of view
    of someone trying to write a useful program). You might even hit the pointer
    itself.

    That's all there really is to it. Some systems also put in protections against
    reading from random addresses.
     
    Malcolm McLean, Jan 22, 2014
    #15
  16. Robbie Brown

    James Kuyper Guest

    I should have mentioned that malloc(0) returns any non-null pointer
    value, that value must be the result of malloc() having behaved exactly
    the same as if it had been asked to allocate some non-zero amount of
    memory. This implies that each non-null value returned by malloc(0) will
    be unique, in the sense that will not compare equal to any other valid
    pointer to an object.
     
    James Kuyper, Jan 22, 2014
    #16
  17. Robbie Brown

    Robbie Brown Guest

    Now to me, that just seems perverse. By what strange incantation of
    inverse logic was the decision made to use a request for 0 bytes of
    memory as meaning 'give me anything but 0 bytes'.

    I would have thought NULL was the perfect value to return in this case.
    I suppose there is a good reason for it but I can't for the life of me
    think what it could be. It's almost as if it were *designed* to confuse
    and befuddle the unwary neophyte ........ no, surely not?
     
    Robbie Brown, Jan 22, 2014
    #17
  18. Do you ask for a bag of no beans or no bag of beans?
    Some took the former view, some the latter. It's a difficult problem how to
    handle the empty case, you tend to want programs that treat it as part of
    normal control flow, because that's likely to be more robust and correct.
    But often treating specially is more efficient and easier to think through.
     
    Malcolm McLean, Jan 22, 2014
    #18
  19. Both usages were already extant by the time standardization came around,
    so we're stuck with them. The logic by which the not-returning-null
    approach came about was the idea that a valid return value should not be
    the same as an error return. I don't see that as completely silly.
     
    Lowell Gilbert, Jan 22, 2014
    #19
  20. Robbie Brown

    James Kuyper Guest

    Missing word: ^ if
    For some purposes, it's convenient to create objects of varying sizes,
    without having to do special case handling for objects with a size of 0.
    It's sometimes important that each such object be distinguishable.
    Objects allocated by using malloc(0), if it returns a non-null value,
    are distinguishable by their addresses. The cost of making that possible
    is that those addresses cannot be used for any other purpose, which is
    pretty much the same effect as if those addresses had been used to store
    something. Portable code cannot rely upon this behavior, but unportable
    code exists that relies upon the fact that malloc(0) has this behavior
    on a particular implementation of C.
    No, the standard was designed to accommodate the wide variety of
    existing implementations of C. This often results in confusion and
    befuddlement, but that wasn't the purpose. There are arguments for
    either way of implementing malloc(0), but I don't think anyone would
    have chosen to allow both if they'd been free to ignore existing
    implementations.
     
    James Kuyper, Jan 22, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.