Black magic, or insanity?

Discussion in 'C Programming' started by Robbie Brown, Jan 21, 2014.

  1. Yes. Another interesting, um, feature of C is that the syntax is what I
    think of as "dense". What that means is that a single-character typo in
    an otherwise correct C program can easily produce something that's
    perfectly correct as far as the compiler is concerned, but has
    completely different behavior.
    A good idiom for malloc that mostly avoids type mismatches is:

    int **arpi = malloc(5 * sizeof *arpi);

    Casting the result of malloc is unnecessary and can mask errors in some
    cases. Applying sizeof to *arpi (more generally, to what the LHS points
    to) ensures that you have the correct size and type without having to
    repeat the type name.
    Probably better written as:

    arpi[4] = malloc(sizeof *(arpi[4]));
    *(arpi[4]) = 14;
    malloc returns a pointer to uninitialized memory. The contents might
    happen to be all bits zero, but that's not guaranteed, and you shouldn't
    rely on it. And the null pointer is very commonly represented as
    all-bits-zero, but that's not guaranteed either.
    Yes, that's the kind of type mismatch that can be avoided by the idiom I
    suggested above.
    I'm not sure why you'd expect it not to compile. malloc is a library
    function, not a built-in language feature. It takes an integer argument
    (specifically an argument of the unsigned integer type size_t), and
    you've called it with an integer value. Even if 0 were not a valid
    argument value, it's of the right type (or rather, is implicitly
    convertible to the right type), so there's nothing for the compiler to
    complain about. The run time behavior may be another matter; as James
    Kuyper already explained, the behavior of malloc(0) is
    implementation-defined.
    It's likely that malloc(0) allocated some small amount of memory from
    the heap (it could have returned a null pointer, but then your program
    probably would have crashed). The actual amount of memory allocated for
    malloc(N) is likely to be a bit bigger than N, but you can only safely
    access the first N bytes (and only if malloc(N) actually succeeded).
    But if you try to access memory beyond those first N bytes, you're
    *probably* still accessing memory within your program's memory space.
    The behavior is undefined, but that doesn't mean it's going to crash;
    if you're *unlucky*, it will appear to "work".
     
    Keith Thompson, Jan 22, 2014
    #21
    1. Advertisements

  2. Robbie Brown

    Kaz Kylheku Guest

    The actual logic is "return either null, or erturn a unique pointer".
    The issue is not the number of bytes, but rather the important expectation that
    malloc doesn't return the same pointer two or more times (when nothing is freed
    in betwen), unless perhaps it is the null.

    Since pointers are basically addresses, the requirement for returning unique
    pointers requires a non-zero amount of allocation.

    Note that the blocks returned by malloc are often larger than what is
    requested, though there isn't any portable way to find out how much larger.
    This is done for the sake of alignment of the meta-data structures that
    lie between the allocated blocks.

    If there is a free-space block after the block you've just allocated, a common
    strategy is to put a header structure into that free space, which places it
    into a list of other such free space blocks. On many architectures, such a
    structure has to be properly aligned since it contains word-sized quantities
    like pointers.

    Also, some malloc implementations simply have "buckets" of fixed-sizes of
    blocks. For instance there might be a bucket for, say, 32 byte objects, one for 48 byte ones, then 64, 92, 128, ...

    If you allocate a 49 byte object, you may actually get 64 bytes; you just don't
    know.

    It is not reasonable to get a 16 byte object when you asked for zero.
    Yes, and so did some traditional C library implementors. So when it came time
    to standardize the language, it was found that some libraries produced null,
    whereas others returned something new.

    This was simply captured in the standard: that programs being ported
    among implementations could expect either behavior.
     
    Kaz Kylheku, Jan 22, 2014
    #22
    1. Advertisements

  3. Robbie Brown

    Joe Pfeiffer Guest

    Years and years ago I came across ways to shoot yourself in the foot in
    various programming languages (in assembly code, you started by building
    a gun. In Pascal, you changed your mind and shot yourself in the head
    when you realized you couldn't actually accomplish anything useful in
    the language. And so forth.). For C, it simply stated "you shoot
    yourself in the foot".

    For me, that's always been simultaneously C's strongest and weakest
    point: it will let you do what you say you want to do without arguing
    with you about it.
     
    Joe Pfeiffer, Jan 22, 2014
    #23
  4. Robbie Brown

    Ken Brody Guest

    I assume there is a missing "if"? ("... if malloc(0) returns ...")
    Consider the fact that, for non-zero lengths, a return of NULL means
    failure. If malloc(0) returns NULL, did it really fail? (Valid arguments
    can be made for both sides.)

    I'm sure that, at the time the Standard was written, there were
    implementations on both sides of the argument, and there was no compelling
    reason to require one over the other. If there was any change to existing
    implementations, it would have been to add the requirement that non-NULL
    returns from malloc(0) must be different than any previous non-free()ed
    return from malloc(), just as would be the case of non-zero malloc()s.

    In short, you can think of "malloc(len)" where len==0 to be no different
    than any other malloc(len) call -- if it succeeds, it returns a buffer of
    the requested length.
     
    Ken Brody, Jan 22, 2014
    #24
  5. Robbie Brown

    James Kuyper Guest

    "... of at least the requested length.". malloc(n) is always permitted
    to allocate more than n bytes. In the case of malloc(0), a non-null
    return value is not only allowed to point at a larger allocation, it is
    required to do so.
     
    James Kuyper, Jan 22, 2014
    #25
  6. But even if malloc(0) returns a non-null value, it's not necessarily
    *quite* the same as a value returned by malloc() with some non-zero
    argument:

    If the size of the space requested is zero, the behavior is
    implementation-defined: either a null pointer is returned, or the
    behavior is as if the size were some nonzero value, except that the
    returned pointer shall not be used to access an object.

    So this:

    char *p1 = malloc(1);
    if (p1 != NULL) *p1 = 'x';

    is well behaved, but this:

    char p0 = malloc(0);
    if (p0 != NULL) *p0 = 'x';

    has undefined behavior.

    A reasonable implementation would probably either return NULL for
    malloc(0), or treat malloc(0) as equivalent to malloc(1), but other
    behaviors are permitted.
     
    Keith Thompson, Jan 22, 2014
    #26
  7. Robbie Brown

    James Kuyper Guest

    Yes, it's permitted to behave like malloc(n) where n is an arbitrary
    positive number which could even, in principle, differ between one call
    to malloc(0) and another. But every permitted variation for malloc(0)
    that involves returning a non-null pointer is correctly described by the
    phrase "allocates more than 0 bytes". The as-if rule provides a limited
    amount of protection - the memory need not actually be allocated, since
    the pointer cannot be safely used to access that memory. However, the
    address returned must not point to memory allocated for any other
    purpose that is visible from the user code, which is almost the same thing.
     
    James Kuyper, Jan 22, 2014
    #27
  8. Robbie Brown

    Eric Sosman Guest

    True, but that's just a special case of

    size_t n = ...;
    char *pn = malloc(n);
    if (pn != NULL) pn[n] = 'x';

    .... having undefined behavior.
     
    Eric Sosman, Jan 22, 2014
    #28
  9. Robbie Brown

    Paul N Guest

    C is derived from BCPL, of which a book co-written by the author of the language (Martin Richards) says "The philosophy of BCPL is not one of the tyrant who thinks he knows best and lays down the law on what is and what is not allowed; rather, BCPL acts more as a servant offering his services to thebest of his ability without complaint, even when confronted with apparent nonsense. The programmer is always assumed to know what he is doing and is not hemmed in by petty restrictions."
     
    Paul N, Jan 22, 2014
    #29
  10. Robbie Brown

    Kaz Kylheku Guest

    BCPL is completely "typeless"; everything is a word. If you use a word as
    apointer, then it's a pointer. If you use it as a number, it's a number.

    C has a comparatively "rich" type system, and its declarations and type
    checking are the tyranny the above alludes to.
     
    Kaz Kylheku, Jan 22, 2014
    #30
  11. Robbie Brown

    James Kuyper Guest

    The standard requires that "If the size of the space requested is zero,
    the behavior is implementation-defined: either a null pointer is
    returned, or the behavior is as if the size were some nonzero value,
    ....". This means that, since returning a non-null pointer to a block of
    memory 0 bytes long is not permissible behavior for malloc(n) when n is
    non-zero, it is therefore not permissible behavior for malloc(0).
    However, since you can't access the memory allocated, the as-if rule
    probably covers that.

    Some mallocs() use other methods of memory management, such are rounding
    all allocations up to the next power of 2, and reserving distinct blocks
    of memory for each power of two. They can then figure out the size of
    each allocation by determining which block it was allocated from, and
    therefore don't need to store the allocation's size in a header. Such an
    implementation cannot allocate 0 bytes when malloc(0) is called, because
    then it could return the same pointer for multiple calls to malloc(0).
    That would not be covered by the as-if rule: different allocations of
    non-zero amounts of memory cannot have the same starting address,
    therefore different calls to malloc(0) are also not allowed to return
    equivalent pointer values.
     
    James Kuyper, Jan 23, 2014
    #31
  12. I think the "..." in your quotation hides something critical.

    The full sentence is:

    If the size of the space requested is zero, the behavior is
    implementation-defined: either a null pointer is returned, or
    the behavior is as if the size were some nonzero value, except
    that the returned pointer shall not be used to access an object.

    So the behavior of malloc(0), if it returns a non-null pointer, is *not*
    necessarily the same as malloc(n) for some positive n. It can be, but
    it can behave differently.

    For example, the implementation could maintain a pool of addresses that
    point outside the actual memory space, and dole them out only for
    malloc(0) calls. As long as they're non-null, unique, and comparable
    for equality to other addresses, the implementation is still conforming
    (which would not be the case if the "except that" clause weren't there).

    Certainly malloc(0) *can* behave exactly like malloc(1), but it doesn't
    have to.
     
    Keith Thompson, Jan 23, 2014
    #32
  13. Robbie Brown

    James Kuyper Guest

    That's certainly acceptable, so long as they are doled out with a
    spacing of at least 1 byte; which is essentially an allocation of 1
    byte, even if the byte itself is never used. However, they can't be
    doled out with a spacing of 0 bytes, which is the possibility I was
    concerned about.
     
    James Kuyper, Jan 23, 2014
    #33
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.