When to check the return value of malloc

M

Malcolm McLean

Eric Sosman said:
I'll have to explain this phenomenon to my bank: If I write
a big check and they honor it, they are then obliged to honor a
subsequent small check, even if the first one emptied my account.
Sweet!
I don't know how much I've got in my account, but I know its several
thousand pounds, and I know the amount, which I won't reveal, to the nearest
thousand. So I write a cheque in round thousands. It doesn't bounce - by
luck, because I didn't employ any particular rounding strategy up or down.
Now I write another cheque for one pound. Given that cheque one didn't
bounce, calculate the probability that cheque two will bounce.

This is a perfectly legitimate old O-level question. There is a hidden
assumption which is that the amount of money isn't more likely to be modulus
any number under a thousand.
 
U

user923005

"I do not think that word means what you think it means."

Any PBR gets +1000 points in my book.
That's where your imagination fails you.  *Any* call to malloc() can
fail.

What if the allocation *was* deliberately tailored to the amount of
memory in the machine?  What if the allocation is tailored to the size
of an external file, and an attacker constructs a file for the purpose
of crashing your application, knowing how sloppy you are about
checking the result of malloc()?  What if the underlying memory
allocator uses different pools for different allocation sizes, and the
pool for small allocations has just run out?  What if 100MB just
happens, by coincidence, to be within 100 bytes of the amount of
memory available (not likely in any one run of the program, but it
*will* happen eventually).


Suppose an interactive application has a routine to open and load a
file of a particular type.  It needs to allocate space for the
contents of the file itself, for the name of the file, and for some
other associated information.  Let's say it calls malloc() 3 times,
once for the 100MB needed for the file, once for the 50 bytes needed
to store a copy of the file name, and once for the 27-kilobyte index
it constructs so it can access the information more efficiently.  If
any one of those allocations fails, it can free() everything it's
allocated, present an error message to the user, and return to the
caller, allowing the user to continue working on other files.

What an amazing recovery strategy!

Let's consider (for a moment) small allocations. We have one initial
question to answer:
"Why was malloc(50) called instead of just char foo[50]?"
Normally, when we call malloc() rather than just creating an auto, it
is because we are building a data structure. But more than that, we
are building a data structure of arbitrary size. Maybe it's a hash
table. Maybe it's a skiplist. Maybe it's an AVL tree. But at any
rate, a major reason for using malloc() is because we don't know the
final size of the thing. These little allocations tend to come in
clumps, and sometimes there are millions of them. I think that trying
to imagine that small allocations are safe and large allocations are
dangerous is *itself* a very dangerous assumption. It makes the
assumption that you fully understand the usage pattern of someone
else's tools that are utilizing your tools.

If we are talking about a single allocation, then it is true that:
foo = malloc(10000000);
is more likely to fail than:
foo = malloc(10);

However, we are far more likely to see:
foo = malloc(10);
in a doubly or even triply nested loop.

There is a further assumption that the machine is not being run at
near total capacity. That may or may not be valid.
 
U

user923005

[snips]

Let's consider (for a moment) small allocations.  We have one initial
question to answer:
"Why was malloc(50) called instead of just char foo[50]?"

Possibility: the app uses something akin to a linked list to store the
actual file names loaded, presumably so it can later save them.  The file
select and open mechanism can thus open the file _and_ add the name of
the file to the list.  At least as applies to the example Keith suggested.

Now, it is not likely that we will be storing millions of filenames,
but a linked list is another structure which holds an arbitrary list
of data. It's probably a very poor choice for something that holds
millions of items though, unless it is a forward-only cursor through a
data set.
 
M

Malcolm McLean

Kelsey Bjarnason said:
So, in short, you've given another example of precisely why one should
*not* be using assert.
There's certainly a case against assert.
In xmalloc(), however, the result of defining away the assert will be to
pass a large positive request to malloc(). This will probably fail,
triggering the error-handler, with the negative size passed. A savvy user
will then terminate at that point, because once a bug like that has occurred
there's no saying what might have been corrupted. A less technical user will
scratch his head in puzzlement that a negative number of bytes is being
demanded from his other applications.
There might or might not be a facility to save, and it might or might not be
a good idea. Only you can tell me whether corrupt data is better or worse
than no data.
 
M

Malcolm McLean

user923005 said:
Let's consider (for a moment) small allocations. We have one initial
question to answer:
"Why was malloc(50) called instead of just char foo[50]?"
Normally, when we call malloc() rather than just creating an auto, it
is because we are building a data structure. But more than that, we
are building a data structure of arbitrary size. Maybe it's a hash
table. Maybe it's a skiplist. Maybe it's an AVL tree. But at any
rate, a major reason for using malloc() is because we don't know the
final size of the thing. These little allocations tend to come in
clumps, and sometimes there are millions of them. I think that trying
to imagine that small allocations are safe and large allocations are
dangerous is *itself* a very dangerous assumption. It makes the
assumption that you fully understand the usage pattern of someone
else's tools that are utilizing your tools.
If we are talking about a single allocation, then it is true that:
foo = malloc(10000000);
is more likely to fail than:
foo = malloc(10);

However, we are far more likely to see:
foo = malloc(10);
in a doubly or even triply nested loop.

There is a further assumption that the machine is not being run at
near total capacity. That may or may not be valid.
Now you've got a valid point.
A massive structure can be made of lots of little entities. It depends what
your structure is. Baby X, for example, will manage a GUI. There will only
be a thousand windows or so in any legitimate app. If user is creating
windows and not destroying them ultimately the app has to fail anyway.

The worst feature of xmalloc() is that if you are making small allocations
in a loop, the machine may request a few hundred bytes of memory. So you
close down your open PDF file. A few seconds later it comes back with a
request for another few hundred bytes. So you close something else. There's
no way of knowing whether you are onto a loser here or not.

The worst possible scenario is something like a menu. It is intended to hold
ten or twenty items. So no problem for xmalloc(). But what if someone calls
it with tens of thousands of entries and these are somehow legitimate?
The problem is the only way to build more robust memory handling into BabyX
is to require virtually every user function to both accept and return an out
of memory error condition. Since the whole point of the toolkit is that it
is easy to use, this just blows a whole hole in the library rationale. Some
form of central allocation failure control is essential.
 
C

CBFalconer

Malcolm said:
There's certainly a case against assert. In xmalloc(), however,
the result of defining away the assert will be to pass a large
positive request to malloc(). This will probably fail, triggering
the error-handler, with the negative size passed. A savvy user

No. The first thing that happens is that the negative is passed to
the real live malloc, which receives a size_t. So that negative
value is converted into a nice large value (see the rules for
conversion to unsigned). Who knows what happens. malloc may fail,
or it may successfully get a fairly large number of bytes, which
will not be used.
 
H

Herbert Rosenau

Wait, you didn't understand its scope I'm afraid: if the only reasonable
answer to OOM is to terminate (and only then!), xmalloc() and friends are
the right choice.

There is no case where a simple exit() will ever the reasonable
ansewer to ENOMEM in production code.
If you need, you can still implement cleanup using
atexit() handlers to some extent or on implementations like Malcolm's use a
callback before terminating. Otherwise, you need brain 1.0 or higher and
decide that xmalloc() simply does not do what you need done and pass the
error up the whole callchain. It is a tool with limited applicability,
competent programmers can choose it when it is useful.

Pleas tell me how to reach variables in auto storage class from an
atexit handler! Tell me how to get acces to variables defined only in
static in function or simply block level. O.k., then and only then you
can be able to write an atexit() handle that can recover from an out
of memory error in all ways regardless of the numer of threads active,
which thread owns which variable and in which state whatever may be.

Oh, I see, you are trained in to have each and any thing in "gobal"
storage because you needs acces to each and any from any poit in any
source module to make the program completely unmaintable.
You cannot portably handle signed overflow, because signed overflow causes
undefined behaviour. Therefore, compilers are free to assume that you (the
programmer) already made sure it doesn't happen and assume that an
operation like 'n*sizeof x' will never yield anything below zero and
optimise out any check if the result is below zero! So, you need to check
against overflow before it happens.

Right, you can't handle signed overflow thereafter - but you can
dedect it before it occures when you have a need for. It is not so
simple as to dedect it after it occure on unsigned - but its possible.
Sometimes, however rarely, it is needed when the system gives you no
type you can use for.
I'm not sure if that was what you meant, I'd just like to point out that
after-the-fact checks for signed overflow are broken by design.

...as should be people who unnecessarily insult others, because a lack of
social skills makes it impossible to integrate them in a team. Sorry, while
I agree with you technically, I find your statement unnecessarily rude.

In german we own a proverb for what you means "jeder zieht den Schuh
an, der ihm paßt".

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2R Deutsch ist da!
 
F

Flash Gordon

Malcolm McLean wrote, On 29/01/08 22:04:
There's certainly a case against assert.
In xmalloc(), however, the result of defining away the assert will be to
pass a large positive request to malloc(). This will probably fail,
triggering the error-handler, with the negative size passed. A savvy
user will then terminate at that point, because once a bug like that has
occurred there's no saying what might have been corrupted. A less
technical user will scratch his head in puzzlement that a negative
number of bytes is being demanded from his other applications.

Especially puzzled as it was a large positive number that was passed.
Although that puzzle would occur whether or not the assert was there.
There might or might not be a facility to save, and it might or might
not be a good idea. Only you can tell me whether corrupt data is better
or worse than no data.

Alternatively it could be a perfectly valid request. This is why you
should not be applying such an arbitrary limit. Of course, as you say
your xmalloc is only suitable for small requests the problem could be
solved by changing the parameter to an unsigned char.
 
K

Kelsey Bjarnason

[snips]

Let's consider (for a moment) small allocations.  We have one initial
question to answer:
"Why was malloc(50) called instead of just char foo[50]?"

Possibility: the app uses something akin to a linked list to store the
actual file names loaded, presumably so it can later save them.  The
file select and open mechanism can thus open the file _and_ add the
name of the file to the list.  At least as applies to the example Keith
suggested.

Now, it is not likely that we will be storing millions of filenames,

No, but it was an applicable answer to the question asked.

but
a linked list is another structure which holds an arbitrary list of
data. It's probably a very poor choice for something that holds
millions of items though, unless it is a forward-only cursor through a
data set.

Indeed. However, it's not the only sort. A tree, for example, might be
useful in holding a large number of small objects sorted or otherwise
organized by some means.
 
U

Ulrich Eckhardt

Herbert said:
There is no case where a simple exit() will ever the reasonable
ansewer to ENOMEM in production code.

Sorry, but I disagree. Imagine any kind of tool that make a single pass over
some data, like a compiler. Imagine it runs out of memory, what should it
do? Prompting the user to free memory is out, it can't rely on even running
interactively. Yes, it could trigger an internal garbage collection, but
adding one is a significant amount of work and not even that is guaranteed
to help. Maybe it even causes OOM earlier, due to its own overhead. That's
why a compiler, given a too large sourcefile, will simply exit with an
appropriate error. It is not that it's impossible to fix the problem but
that it requires unreasonable amounts of work to do it.
Pleas tell me how to reach variables in auto storage class from an
atexit handler! Tell me how to get acces to variables defined only
in static in function or simply block level.

That is easy, just store them in a global. Did you perhaps miss the 'to some
extent' above? If you are writing to a temporary file, you simply declare
the FILE* global and if it is non-null you clean it up with an atexit()
handler. Simple and efficient.
O.k., then and only then you can be able to write an atexit() handle that
can recover from an out of memory error in all ways regardless of the
numer of threads active, which thread owns which variable and in which
state whatever may be.

Seriously, you are missing the point _COMPLETELY_. xmalloc() is not the holy
grail, it is not the remedy for any code cancer but it is a tool which
competent programmers can decide to use or not use. If it doesn't work for
your code then so be it, but nobody claimed that it should do that. It is
surely not the right tool for a multithreaded server application because
one too large client request would cause DoS for all clients. It would not
be correct to use it in a library, because there it would force the error
handling on the user of the library and thus strongly reduce its
applicability. Nobody claimed that xmalloc() was the solution there.
Oh, I see, you are trained in to have each and any thing in "gobal"
storage because you needs acces to each and any from any poit in any
source module to make the program completely unmaintable.

C'mon, that is just bullshitting here. If you understand why globals can be
dangerous, you would also understand when they can be used without danger
and even to an advantage. Don't construct any arguments on me that I didn't
make.

Uli
 
R

Richard Bos

Malcolm McLean said:
I don't know how much I've got in my account, but I know its several
thousand pounds, and I know the amount, which I won't reveal, to the nearest
thousand. So I write a cheque in round thousands. It doesn't bounce - by
luck, because I didn't employ any particular rounding strategy up or down.
Now I write another cheque for one pound. Given that cheque one didn't
bounce, calculate the probability that cheque two will bounce.

Exactly one in a thousand.

Programs which do more than a thousand small allocations are a dime a
dozen.

Richard
 
K

Kelsey Bjarnason

No. The first thing that happens is that the negative is passed to the
real live malloc,

No, that's the second thing.

The very first thing that happens is that xmalloc ceases to work as
designed, failing to detect the negative value, which (based on how the
code is presented, at least) is one of its fundamental guarantees and
design goals: negative allocations should be detected and rejected.

Define NDEBUG in the build process, that no longer holds true.

Having failed to meet its design goals and/or guarantees, it then
proceeds to hand the negative to malloc.
which receives a size_t. So that negative value is
converted into a nice large value (see the rules for conversion to
unsigned). Who knows what happens. malloc may fail, or it may
successfully get a fairly large number of bytes, which will not be used.

Hmm, I accidentally requested -3 bytes and all of a sudden every app in
my system is swapping to disk and the entire thing system is so
unresponsive I can't use it... but hey, I'll get that -3 bytes, or the
app will crash. :)
 
M

Malcolm McLean

Flash Gordon said:
Malcolm McLean wrote, On 29/01/08 22:04:
Alternatively it could be a perfectly valid request. This is why you
should not be applying such an arbitrary limit. Of course, as you say your
xmalloc is only suitable for small requests the problem could be solved by
changing the parameter to an unsigned char.
32K is the minimum range of an int. So xmalloc() should not be called from
portable code if there is any possibility that the request will exceed this
size.
32K is a pretty big structure, but not such a big array, in this day and
age. It is however a string that is too long for the standard library
functions to work well with.
So xmalloc() for allocating individual structures and strings, malloc() for
arrays.
 
M

Malcolm McLean

Richard Bos said:
Exactly one in a thousand.
Correct.

Programs which do more than a thousand small allocations are a dime a
dozen.
That logic would hold if every pound in my bank account was a byte of
memory, or if the machine had a few K installed, and a previous function had
taken up almost all the memory. Which lots do. However in that sort of
system generally you can't recovery from memory failure anyway, so you've
got to work out that malloc() can never fail for any input. Often there
won't even be a malloc().

Now calculate the same probability for Kelsey's example.
A request for 100MB succeeds, what's the probability that a request for 100
bytes fails? (You know that the program runs on a PC, so you need to make an
assumption here, or apply some very sophisticated stats).
 
P

pete

CBFalconer said:
You miss the point, which is that getc is encouraged to be a macro,
so that use can avoid any system function calls. When properly
implemented it effectively puts the system buffers in the program
coding.

I doubt it.
When getc is also implemented as a macro,
then the simplest way to define the functions getc and fgetc,
is to have the function bodies consist
soley of a call to the macro, like this:

int fgetc(FILE *stream)
{
return getc(stream);
}

int (getc)(FILE *stream)
{
return getc(stream);
}
 
M

Malcolm McLean

CBFalconer said:
No. The first thing that happens is that the negative is passed to
the real live malloc, which receives a size_t. So that negative
value is converted into a nice large value (see the rules for
conversion to unsigned). Who knows what happens. malloc may fail,
or it may successfully get a fairly large number of bytes, which
will not be used.
That's what I said. The emergency handler gets the true negative size,
malloc() a large positive value that will probaly fail.
 
D

dj3vande

You miss the point, which is that getc is encouraged to be a macro,
so that use can avoid any system function calls.

If the problem is the function call overhead, then that will probably
be a useful solution, as I noted.
When properly
implemented it effectively puts the system buffers in the program
coding.

It's highly unlikely that the "system buffer" that fgetc uses is any
different from the buffer that getc uses. A macro implementation of
getc will still have to do a system call to re-fill the buffer when
it's empty.


dave
 
C

CBFalconer

If the problem is the function call overhead, then that will
probably be a useful solution, as I noted.


It's highly unlikely that the "system buffer" that fgetc uses is
any different from the buffer that getc uses. A macro
implementation of getc will still have to do a system call to
re-fill the buffer when it's empty.

True, but by then the frequency is reduced by something like 80 to
1000 times.
 
K

Keith Thompson

CBFalconer said:
True, but by then the frequency is reduced by something like 80 to
1000 times.

The buffer behavior of getc should normally be identical to the buffer
behavior of fgetc. The only likely difference is the overhead of a
single function call for fgetc. If getc performs a system call to
refill the buffer every 1000 times, then fgetc performs a system call
to refill the buffer every 1000 times. (It's not clear whether their
behavior with respect to refilling the buffer is required to be
identical, but there's no reason for them to be different.)
 
D

dj3vande

Keith Thompson said:
(It's not clear whether their
behavior with respect to refilling the buffer is required to be
identical, but there's no reason for them to be different.)

getc is defined to be equivalent to fgetc except that it's allowed to
expose macro-ness by evaluating its argument more than once.

So if buffer refills can be exposed to strictly conforming programs
(it's not obvious to me whether or how that could be done), they are in
fact required to have the same behavior wrt buffering.
The fact that it's defined as equivalent except for an allowance not
relevant to buffering behavior is at the very least a pretty strong
hint.


dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top