overwriting memory

Discussion in 'C Programming' started by Robbie Brown, Jan 26, 2014.

  1. Robbie Brown

    Robbie Brown Guest

    I'm trying to understand the issues surrounding overwriting memory.
    To this end I have the following (truncated) gdb session.

    The main question is at the end and probably appears naive in the
    extreme. I'm just checking my understanding.

    First a deliberate mistake
    I apparently declare an array of pointers to int
    but only allocate enough space for int

    int **pr4 = malloc(sizeof(int) * 5);

    gdb print pr4
    (int **) 0x602040

    gdb x/1xg 0x602040
    0x602040: 0x0000000000000000
    0x602048: 0x0000000000000000
    0x602050: 0x00000000 <- last int
    54: 00000000 <- padding?
    0x602058: 0x0000000000020fb1

    I have actually allocated enough space to store
    5*4 byte integers, I think the last 4 bytes at 54 is (64 bit)word align.

    ***
    Is this correct?
    ***

    I then declare and init an int and assign it's address
    to the 3rd slot of the array. This effectively overwrites the padding bytes.


    int i2 = 14;
    pr4[2] = &i2; //address should overwrite the padding

    gdb print pr4
    (int **) 0x602040

    gdb x/1xg 0x602040
    0x602040: 0x0000000000000000
    0x602048: 0x0000000000000000
    0x602050: 0x00007fffffffe5d8 <- overwrites unallocated 4 bytes
    0x602058: 0x0000000000020fb1

    I can view the value and print it out

    gdb x/1xw 0x00007fffffffe5d8
    0x7fffffffe5d8: 0x0000000e <- pr4[2] i2 (14)

    and I can keep going

    ....

    int i4 = 16;
    pr4[4] = &i4;

    gdb print pr4
    (int **) 0x602040

    gdb x/1xg 0x602040
    0x602040: 0x0000000000000000
    0x602048: 0x0000000000000000
    0x602050: 0x00007fffffffe5d0 <- alloc'd mem stops at 602054
    0x602058: 0x00007fffffffe5d4
    0x602060: 0x00007fffffffe5d8

    gdbx/1xw 0x00007fffffffe5d0
    0x7fffffffe5d0: 0x0000000e <- pr4[2] i2 (14)
    0x7fffffffe5d4: 0x0000000d <- pr4[3] i3 (13)
    0x7fffffffe5d8: 0x00000010 <- pr4[4] i4 (16)

    I have now allocated 20 bytes more that I declared for.
    I can access this memory, dereference the pointer and print
    out the stored value

    printf("%d\n", *pr4[4]);

    If you have got this far, kudos :)

    The question is this

    It appears that I can go well beyond the allocated space and still
    access the memory without problem, it doesn't appear to be an issue

    What *does* appear to be the issue however is that the additional memory
    I have 'stolen' may be in use by another part of the program. I have
    overwritten this memory despite not asking for it and that may cause
    problems elsewhere. *This* appears to be the issue.

    ***
    Is this correct
    ***

    Thank you for your indulgence.
     
    Robbie Brown, Jan 26, 2014
    #1
    1. Advertisements

  2. That depends on what something being the issue means. Indexing beyond
    the bounds of an array (or, as here, allocated storage) is undefined
    behaviour (and that applies to reads as much as to writes) and avoiding
    undefined constructs is a big issue for me.

    Something particularly bad might have happened only when you stamped on
    some particular location, but that's more a accident that anything
    else. It's not what I'd call the issue.

    By the way, you can reduce allocation size mistakes by using the
    pattern:

    var = malloc(sizeof *var * number_of_elements);
     
    Ben Bacarisse, Jan 26, 2014
    #2
    1. Advertisements

  3. Robbie Brown

    Robbie Brown Guest

    The *issue* is 'overwriting memory is bad' and 'it's bad because you
    might overwrite something that another part of the program needs'

    I'm trying to understand exactly why this is, what are the possible
    ramifications and how to avoid it.

    Like I say, it's a raw beginners question.

    Thanks for the response.
     
    Robbie Brown, Jan 26, 2014
    #3
  4. Robbie Brown

    James Kuyper Guest

    All that the standard says is that the behavior is undefined. That
    doesn't mean that there's any particular thing that must go wrong; it
    just means that you can't rely upon the code to be safe. As a practical
    matter, in general reading outside the allocated memory is safer than
    writing outside of it, but even reading can be dangerous (on some
    systems, the memory you're trying to read from may be protected, in
    which case your program might be aborted).

    On the flip side, writing to memory you shouldn't write to can sometimes
    be harmless, for several reasons.

    First of all, your allocation request is quite likely to be rounded up
    to a multiple of a block size - what the block size is depends upon the
    implementation, and might depend upon the size of your request. For
    instance, rounding requests up to the next power of 2 is one common
    approach. The minimum block size is _Alignof(max_align_t), since the
    pointer returned by malloc() must always be correctly aligned for any
    type, even if the amount of memory allocated is too small to hold an
    object of that type.

    Secondly, any memory you write to might never be read again before the
    end of the program, with the net result that nothing appears to go
    wrong. Even if it is read, the value you write might happen to be one
    that doesn't cause any problems.
     
    James Kuyper, Jan 26, 2014
    #4
  5. I was maybe not clear. The issue *should* always be accessing beyond an
    array. Even read access[1]. If you think this is not the big issue,
    you've got something wrong about how you think about your programs.
    I may have got the tone wrong, for which I apologise. I really do want
    to make an important point. Writing undefined constructs is a big issue
    and it has the disadvantage in C that you can't always tell you've done
    it. Overwriting something else is not the issue, it's just the happy
    accident that can sometimes reveal a previous error.

    If you have access to a program called valgrind, get it immediately. It
    is hugely useful for C programming, especially when you are starting out.
    [1] It's worse than that (even constructing an invalid pointer is an
    error) but that is maybe a discussion for another day.
     
    Ben Bacarisse, Jan 26, 2014
    #5
  6. Robbie Brown

    Eric Sosman Guest

    Let's say your program is calculating how much money you owe to
    your loan shark, Vinnie "The Hacksaw" Goombatz. There's a variable
    named `totalAmount' sitting in your program's memory, and as the
    program tots up the usurious interest it accumulates a running total
    in that variable. Unfortunately, due to an error elsewhere in the
    program you overwrite the memory where `totalAmount' resides and
    store a zero there. When the calculation finishes, the value of
    `totalAmount' is therefore (incorrectly) zero instead of $1200.00,
    so your program decides not to send Vinnie any money. And when
    "The Hacksaw" becomes convinced you're holding out on him -- well,
    I'll leave the ramifications to your imagination.

    Avoiding such things is not always easy, because the C language
    itself gives you very little help. If you allocate space for forty
    bytes and instruct C to store fifty there, C will try to follow your
    instructions -- "Trust the programmer" is the watchword. What happens
    when you store the extra bytes? C itself doesn't say (it's "undefined
    behavior"), but quite often you'll wind up scribbling on memory where
    something else is stored. If you're lucky -- yes, "lucky" -- your
    program may crash in the attempt to scribble, but that's by no means
    a guaranteed outcome. The scribbling could even turn out to be
    harmless (or apparently so), if what's in the scribbled-on region
    isn't important -- for example, if you have a ten-character string at
    the beginning of a hundred-character array and you deface the array's
    second half, you may well get away with it.

    For prevention, C itself offers little more than "Be Careful."
    It's up to you to know how much memory you've allocated and color
    inside the lines, it's up to you to avoid indexing that hundred-
    character array at [100] or [733] or [-42]. Some C implementations
    have tools to help track down such errors once they're made --
    Valgrind is highly recommended -- but as always, prevention is
    better than cure. Scrupulous care and never-sleeping vigilance
    are prerequisites; following established patterns (like the one
    Ben Bacarisse recommended) can help keep you on the rails.
     
    Eric Sosman, Jan 26, 2014
    #6
  7. Robbie Brown

    JohnF Guest

    I'd usually use calloc here, in case extra bytes are needed
    for proper alignment of each element, etc (though I'm not
    specifically aware of other "etc"s here). You've got reasons
    for malloc in this case, or other elaboration of the issue?
     
    JohnF, Jan 26, 2014
    #7
  8. Robbie Brown

    Robbie Brown Guest

    You're not kidding :)
    Not at all, I appreciate the time.
    Got it, compiling it ... gotta love FOSS.
     
    Robbie Brown, Jan 26, 2014
    #8
  9. Robbie Brown

    Eric Sosman Guest

    `calloc(number_of_elements, sizeof *var)' will allocate the
    same amount of memory with the same alignment as the malloc()
    call will. The only difference is that if the multiplication
    in the malloc() call overflows, malloc() might succeed (having
    allocated too little space) where calloc() will detect the
    problem and fail. And, of course, a successful calloc() will
    zero the allocated memory before returning; in my experience
    that's only occasionally useful.

    If a `*var' needs special alignment, any padding or whatnot
    to achieve that alignment is included in `sizeof *var'.
     
    Eric Sosman, Jan 26, 2014
    #9
  10. A program is a delicate little device.

    Implementations vary a bit, but typically you'll have a register dedicated to
    the "stack pointer". This is incremented by a block with each function call,
    and the local variables are placed in that block. When the function returns,
    the stack pointer is put back to its original position.
    So how does the system know how much space to allocate for local variables?
    You might have a special location called "block size", which is always the
    one immediately before stack top. So when a return is executed, the system
    reads that variable, and subtracts it from the stack pointer.
    Now what happens if you write one past an array which happened to be last in
    your local variable list? You'll corrupt the block size variable. So the wrong
    value will be subtracted from the stack, and all the variables in the calling
    function will be declared to be in the wrong place.
    This is so confusing that big systems like PCs usually implement stack top
    protection, to shut the system down with an error message if this happens.
    But on a lot of smaller systems, there is no such mechanism.

    There are lots of similar things which can happen. Once you damage a program
    by setting essentially a random memory location to essentially a random value,
    you can't predict what the result will be. Programs are not robust, they're not
    like physical objects which get steadily worse until they finally break.
     
    Malcolm McLean, Jan 26, 2014
    #10
  11. Robbie Brown

    JohnF Guest

    Okay, thanks Eric. But suppose you're allocating memory for
    an array of doubles, that needs to be aligned on a doubleword
    boundary. Then our alternatives are
    double *array = (double *)calloc(number_of_elements, sizeof double);
    double *array = (double *)malloc(number_of_elements * sizeof double);
    calloc might or might not align array on a doubleword boundary,
    but at least it has the info available to do that if it wants to.
    malloc is obviously out of luck, want to or not.
    By the way, if you're implicitly telling me calloc won't align
    array on a doubleword, then how would I get that done?
     
    JohnF, Jan 26, 2014
    #11
  12. I prefer:

    double *array = calloc(number_of_elements, sizeof *array);
    double *array = malloc(number_of_elements * sizeof *array);
    malloc must align its storage so that it is suitable for all uses. So
    must calloc. If you need extra alignment, that will usually depend on
    more that just the size of the unit of allocation, so calloc will rarely
    have any more to go on.
    Both will align the storage in some implementation-dependent way that is
    considered "suitable". It has to be a bit vague, because two
    implementations may choose different speed/space trade-offs and both be
    correct as far as the language specification is concerned.

    If you need more, you will often find that it's provided. For example
    a vector math package might provide a special vm_alloc function give you
    whatever extra alignment the vector processor needs. If not, you will
    have to fallback on doing the address arithmetic yourself.
     
    Ben Bacarisse, Jan 26, 2014
    #12
  13. Robbie Brown

    Robbie Brown Guest

    I have valgrind working and it certainly indicates a problem ... but I
    don't understand the above (duh!). I think someone else mentioned
    something like this before and I didn't get then either (duh! duh!)

    var[1] = malloc(sizeof *var[2] * number_of_elements);

    1. I thought malloc returned a pointer
    2. If I want an int (32 bits on this machine) why ask for a pointer to
    int (64 bits)

    so

    int *i = malloc(sizeof(int));
    should give me a 64 bit pointer to a 32 bit int

    Sorry for being so thick, can you elucidate please.
     
    Robbie Brown, Jan 26, 2014
    #13
  14. Robbie Brown

    Eric Sosman Guest

    Not out of luck, not at all. The pointer returned by a
    successful malloc() or calloc() or realloc() "is suitably aligned
    so that it may be assigned to a pointer to any type of object with
    a fundamental alignment requirement and then used to access such an
    object or an array of such objects in the space allocated" (from
    section 7.22.3 of the C Language Standard).
    As the quote above says, calloc() et al. will always align
    for the most restrictive alignment any type needs. That may or
    may not be "a doubleword." If you actually need "doubleword"
    alignment and your C implementation supports the latest ("C11")
    version of the Standard, you can use the aligned_alloc() function.
    You never need aligned_alloc() just to meet the requirements of
    the types you'll store in the allocated area, but you may need it
    to get memory that's more strictly aligned than necessary --
    "page-aligned" memory, for example.
     
    Eric Sosman, Jan 26, 2014
    #14
  15. Robbie Brown

    Kaz Kylheku Guest

    This mistake won't even show up as a problem if sizeof (int *) happens to be
    equal to (or even smaller) than sizeof (int). It will bite you later when you
    port to another system. (E.g. 64 bit pointers, 32 bit ints.)

    One way to reduce a mistake of this type is to reduce manual repetititon:

    int **pr4 = malloc(sizeof *pr4 * 5); /* or: sizeof pr4[0] */
    According to ISO C, it is well-defined to displace a pointer one increment
    past the end of an array-like object. Accessing that location or storing
    to it is undefined behavior, and so is any other out-of-bounds activity:
    merely incrementing the pointer beyond the one-past location.

    This undefined-ness permits C implementations (or, if not the implementations
    as such, then their debugging tools) to diagnose such problems very early, at
    the level of pointer arithmetic that is being used to produce an out-of-bounds
    pointer, even if that pointer is never used.

    In practice, you can often read memory beyond objects without any ill
    effects.

    However:

    * You may hit some kind of memory access fault if you go too far.
    For instance, you may hit an unmapped page in a virtual memory system.
    Or, in a small embedded system, perhaps a memory location to which
    no hardware is attached, which triggers a timeout and bus error.
    If the object that you've gone beyond is located tightly against such
    a region, then accessing just one byte past it can trigger this problem.

    * You also have a problem if your program depends on the values pulled
    from out of bounds. (This is invariably the case, unless the out-of-bounds
    access is part of a loop optimization whose logic ensures that the
    data isn't used.) Even if these values aren't some kind of "trap
    representation" that bombs the program, their use is a bug.
    No. After you have identified *an* issue, you have to think of as many more
    as possible.
     
    Kaz Kylheku, Jan 26, 2014
    #15
  16. Robbie Brown

    JohnF Guest

    Okay, thanks again. So I take it I can safely and portably say
    double *array = (double *)malloc(number_of_elements * sizeof double);
    for ( i=0; i<number_of_elements; i++ ) array = (double)(i*i);
    or some such. That is, complying malloc (and calloc) has to return
    ptrs so that all such float/double operations will work okay
    on array addressed in that typical way.
     
    JohnF, Jan 26, 2014
    #16
  17. Robbie Brown

    JohnF Guest

    Okay, so you're saying regardless of type (int, double, char, whatever)
    ANY_TYPE *array = malloc(number_of_elements * sizeof *array);
    will work, i.e., sizeof *array evaluates as sizeof ANY_TYPE ?
    Yeah, I guess that looks right, in principle, though I did have
    to look twice (and wouldn't have been surprised if you'd told me
    it didn't always work in practice).
    Thanks, haven't yet come across situations where I've needed more
    than the default alignment guaranteed by m/calloc as you and Eric
    have described it. But will keep that additional stuff in mind for
    future reference.
     
    JohnF, Jan 26, 2014
    #17
  18. The idea is whatever is on the left is written with * in front on the
    right, so I'd have put sizeof *var[1] there, myself.
    Yes. presumably in my example var is a pointer object. In yours,
    var[1] must be a pointer object. sizeof *var[1] is the size of the type
    of object pointed to var var[1].
    I'm baffled! sizeof can be applied to an expression. It does not evaluate the
    expression, it just looks to see what type it is and uses that type to
    determine the size. See below...
    But int *i = malloc(sizeof *i); also works. i is of type 'int *' so *i
    is of type 'int'.
    Not knowing stuff is not at all the same as being thick. We all started
    off knowing no C at all.
     
    Ben Bacarisse, Jan 26, 2014
    #18
  19. Robbie Brown

    Eric Sosman Guest

    `sizeof(double)' -- the parentheses are mandatory when you're
    applying `sizeof' to a type name rather than to an expression.


    The cast is either unnecessary or misplaced (`(double)i * i'
    might be what you meant, if `i' could be large).


    Right: The pointer returned by a successful call is suitably
    aligned for every C data type. The pointer returned by an
    *unsuccessful* call, though, is NULL -- so for "safely and
    portably" you should check before plowing ahead and using it.

    Also, take another look at the pattern Ben Bacarisse showed.
    You've written (paraphrased)

    Type *array = (Type*) malloc(N * sizeof(Type));
    while he wrote
    Type *array = malloc(N * sizeof *array);

    There are two differences, one fairly minor and one of moderate
    importance:

    - It's unnecessary to cast the value returned by malloc()
    (or calloc(), etc.). That value has the type `void*',
    which will convert to any other data pointer type without
    need for a cast.

    - Writing `sizeof *array' instead of `sizeof(Type)' means
    it's impossible to get the type wrong and ask for `int'
    elements when you really meant `int*'. Such slip-ups
    occur (in my experience) either from writing the wrong
    number of `*'s or when dealing with a lot of similarly-
    named but distinct types, as in

    MessageHeader *hdr = malloc(sizeof(MessageHeader));
    MessagePayload *msg = malloc(sizeof(MessageHeader)); // ?

    Even the second of these is only of moderate importance, but
    recall what I wrote earlier: The C language gives you very little
    assistance in avoiding errors of this sort, so the burden is almost
    entirely on you. Don't make your job any harder than it already is.
     
    Eric Sosman, Jan 26, 2014
    #19
  20. Yes. Although it is even more general that this. For example, to
    allocate a triangular array:

    double **triangle = malloc(rows * sizeof *triangle);
    for (int r = 0; r < rows; r++)
    triangle[r] = malloc((r + 1) * sizeof *triangle[r]);

    You use * of whatever is being assigned to get the right size.
    There might be a corner case or two. I can't think of one off hand.

    How about something my complex. You use arrays of three numbers to
    represent some important data about things (points in space, grades in
    final exams, whatever). You don't know how many things, so you need
    malloc. Start with the declaration:

    double (*data)[3]; // A pointer to arrays of 3 doubles

    Once you know how many:

    data = malloc(how_many * sizeof *data);

    If you decide the data needs to float, or you need 4 pieces of per
    thing, or it should really be a structure for each thing, you just
    change the declaration.

    <snip>
     
    Ben Bacarisse, Jan 26, 2014
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.