bad alloc

Adam Skutt · Aug 31, 2011

Why would bad_alloc be thrown while writing to disk? Guess: because
writing is made in such a way as to modify program state. Doesn't seem
all that logical.

It's perfectly logical and quite reasonable. In fact, the standard
library can ( and will, on many implementations) do it for you, even
if your code doesn't call new itself. Iostreams implementations are
free to allocate memory for buffers in order to perform I/O and will
do so. As a result, they can trigger std::bad_alloc if you're close
to an out of memory condition[1].

1. walking down the stack upon OOM (or other exception) normally frees
resources.

Yes, but not in a way to typically enable retrying of the failing
operation (if it were even possible), which you is why you suggested
one should walk down the stack!

2. top-level error is not a place to do anything resource-sensitive,
exactly due to OOM possibility.

Catch blocks are almost never the place to do anything 'resource-
sensitive' (whatever that even means) because exceptions typically
lack the detail necessary to safely do any sort of complicated
processing. Where the exception is handled in the call stack has
nothing to do with this.

It's not as complicated as you make it out to be.

I fail to see how it can get any less complicated than, "Just do what
happens automatically", which is what I've said programmers should do.

Going "nice" in case
of OOM might not be worth it in all cases, but is not an and-all
response to all concerns.

Huh? This is literal nonsense, even assuming you meant to write "end-
all".

Adam

[1] Whether you actually get std::bad_alloc is dependent on the
iostream implementation, however. Regardless, it's perfectly
reasonable to expect iostream code to allocate memory, and for it to
fail if it cannot it.

Adam Skutt · Aug 31, 2011

I absolutely agree with Goran and disagree that terminate on OOM is
*always* the best approach. There may be programs where it is the
best approach but it is far from always the case.

A concrete example:

Network server using a standard pattern of one listener/producer and
multiple worker/consumer threads. The listener receives and job
request and hands done the processing of the job to one of the worker
thread.

It is a very much possible that processing one particular job might
actually require too much memory for the system. The correct thing to
do in that case is to stop processing this one oversized job, release
all the resources acquired to process this job, mark it as error and
continue processing further jobs.

Since this is a persistent server than needs to be on and alive 24/7,
it would be totally innapropriate to permanently terminate the server.
Even if there was an additional monitoring process that restart the
server if it dies, this would not be a good thing because you would
loose current progress in the currently running worker threads and
would also kill current external client connections.

Yan

Paul · Aug 31, 2011

Hi Paul

Please, please, please, please, please, please
delete some of all that unneeded quotation it is so much
waste of time all the scrolling...

P.s. You are of course not the only one.

Ok sorry. New interfase was hiding it from me.

Goran · Aug 31, 2011

Google "Virtual Memory". The two of you are talking totally unrelated
things. And given that all modern OS use virtual memory, the describe
situation will never happen as such. But once the app starts
swapping, things get very slow.

Attention, memory fragmentation might turn into a real issue and
virtual memory doesn't help. The above is perhaps a poor explanation
though.

Say you have a 32-bit system. Your pointers are 32-bits long. This
limits your __address space__ to 4GB. That is, all pointers you can
ever have can point only to a location inside these 4GB. It does not
matter that you have 64GB of have virtual memory, you can't
__address__ it, mathematically. Now, consider:

1. you allocate 1.9 GB (addresses from 0 to 1.9GB)
2. you allocate 0.2 GB (addresses from 1.9 to 2.1 GB)
3. you allocate 1.9 GB.
4. you deallocate 1 and 3
5. you have 3.8 GB free, and yet, you can't allocate a contiguous
block of more than 1.9. :-(

I see number of people confused with this. Possibly, we should be
explicit and always say "memory ADDRESS SPACE fragmentation".

Goran.

Paul · Aug 31, 2011

Not necessarily. For example:

// Adds stuff to v, return false if no memory
bool foo(vector<string>& v)
{
try { v.reserve(whatever); }
catch(const bad_alloc&) { return false; }
// Yay, we can continue!
v.push_back("whatever");
return true;

}

The above is a hallmark example of wrong C++. Suppose that reserve
worked, but OOM happened when constructing/copying a string object:
foo does not return false, but throws. IOW, foo lies about what it
does.

The problem? Programmer set out to guess all failure modes and failed.
My contention is: programmer will, by and large, fail to guess all
possible failure modes. Therefore, programmer is best off not doing
that, but thinking about handling code/data state in face of
unexpected failures (that is, apply exception safety stuff).

BTW, given that programmer will fail to guess all failure modes,
programmer could wrap each function into a try/catch. That, however:

1. is a masive PITA
2. will fail to propagate good information of what went wrong, and
that is almost just as bad as not reporting an error at all.

Yes but programmers have to take some responsibility for error
checking, that is their job after all.

The reason I initially mentioned reserve was to give it purpose.

Adam Skutt · Aug 31, 2011

I absolutely agree with Goran and disagree that terminate on OOM is
*always* the best approach. There may be programs where it is the
best approach but it is far from always the case.

A concrete example:

Network server using a standard pattern of one listener/producer and
multiple worker/consumer threads. The listener receives and job
request and hands done the processing of the job to one of the worker
thread.

It is a very much possible that processing one particular job might
actually require too much memory for the system. The correct thing to
do in that case is to stop processing this one oversized job, release
all the resources acquired to process this job, mark it as error and
continue processing further jobs.

If it's possible to stop processing a job safely and recover, sure.
Marking the job as an error might require allocating memory, so you
have to avoid that. This is more difficult that it sounds, and
requires code that is explicitly written to ensure that it is
possible. Screw that code up, and you'll almost certainly end up in a
deadlock situation or with a zombie worker thread, at which point you
will be restarting the process anyway!

Anyway, while you're in the process of attempting to cancel the
oversized job, many other jobs will fail (possibly all of them) since
they can no longer allocate memory either. In the meantime, all your I/
O connections will be frozen since there's no more memory for copying
data from your clients. Best case: a transient slow down. Worst
case: internal state becomes corrupted, forcing you to close some
(again, maybe all) of the connections anyway, or the clients
disconnect for you because you took to long to respond. All of your I/
O code must be able to properly handle an OOM condition as well. This
too, requires careful design and code you must write yourself.

So no, it's not a given that attempting to fix the problem is any
better. Heap fragmentation may mean that you're prolonging the
inevitable and that you'll just endlessly spin canceling jobs and
trying to free memory. In a multithreaded application, by the time
you've actually freed the memory, the damage may already have been
done.

Since this is a persistent server than needs to be on and alive 24/7,
it would be totally innapropriate to permanently terminate the server.

No, it wouldn't. If you need high reliability, then you need
redundancy. If you have redundancy, then lost of an individual unit
is normally not a big deal. Cleaning staff have the disturbing habit
of knocking out power cables to servers, after all.

Even if there was an additional monitoring process that restart the
server if it dies, this would not be a good thing because you would
loose current progress in the currently running worker threads and
would also kill current external client connections.

This can happen anyway, so your solution must simply be prepared to
deal with this eventuality. Making your code handle memory allocation
failure gracefully does not save you from the cleaning staff. If your
system can handle the cleaning staff, then it can handle memory
allocation failure terminating a process, too.

I don't know why people think it's interesting to talk about super
reliable software but neglect super reliable hardware too. It's
impossible to make hardware that never fails (pesky physics) so why
would I ever bother writing software that never fails? Software that
never crashes is useless if the cleaning staff kicks out the power
cable every night.

Adam

Adam Skutt · Aug 31, 2011

I disagree (obviously). Here's the way I see it: it all depends on the
number of functions program executes. A simple programs who only does
one thing (e.g. a "main" function with no "until something external
says stop" in the call chain) in C++ benefits slightly from "die on
OOM" approach (in C, or something else without exceptions, benefit is
greater because there, error checking is very labor-demanding). In
fact, it benefits from "die on any problem" approach.

Programs that do more than one function are at a net loss with "die on
OOM" approach, and the loss is bigger the more the functions there are
(and the more important they are. Imagine an image processing program.
So you apply a transformation, and that OOMs. You die, your user loses
his latest changes that worked. But if you go back the stack, clean
all those resources transformation needed and say "sorry, OOM", he
could have saved (heck, you could have done it for the user, given
that we hit OOM). And... Dig this: trying to do the same at the spot
you hit OOM is a __mighty__ bad idea. Why? Because memory, and other
resources, are likely already scarce, and an attempt to do anything
might fail do to that.

Trying to do it at any point is a mighty bad idea. Rolling back the
stack and deallocating memory _does not ensure_ future allocations
will succeed. Once you reach OOM, you're not assured the user will be
able to save; you're not assured you can do anything at all. Trying
may very well be pointless.

It's considerably smarter, and absurdly easier, to save off the file
before attempting the operation in the first place. This way, your
code can die in a clean fashion, and the user has a recovery file with
the changes up to the failed operation. Trying to create the recovery
file under or after an OOM condition is very difficult. However,
trying to create the recovery file before the OOM condition is
trivial.

In all of the counter-examples everyone's provided so far, the correct
way to ensure reliability is not to attempt to avoid crashing on OOM.
The correct way is to write your code so that crashing on OOM becomes
irrelevant. This has the added bonus that your code will still be
reliable even if your process is never told of the OOM condition. As
I mentioned before, in some languages you're not promised to be told
about allocation failure, and other have mentioned that on some
operating systems your process may simply die instead of being told
about the failure.

Or imagine an HTTP server. One request OOMs, you die. You terminate
and restart, and you cut off all other concurrent request processing
not nice, nor necessary. And so on.

That may happen anyway, as I explained to Yan. The situation is much
harder to handle when there is concurrent processing occurring, not
easier.

That is true, but only if peak memory memory use is actually used to
hold program state (heap fragmentation plays it's part, too). My
contention is that this the case much less often that you make it out
to be.

No, I don't believe most applications unnecessarily cache data and
would benefit from attempting to free those caches under OOM
conditions. Even if I did, the code has to be written to make that
possible, which is very difficult.

I disagree with that, too.

Then I shudder to think about what you actually find difficult. Have
you ever tried to do this? Your commentary strongly suggests not only
have you not, but you don't have any experience in building reliable
systems anyway.

First off, when you actually hit the top-
level exception handler, chances are, you will have freed some memory.

Which doesn't mean you can make use of it. When you get
std::bad_alloc, you must assume there's 0 bytes available. No new, or
malloc, etc., anywhere.

Second, OOM-handling facilities are already made not to allocate
anything. E.g. bad_alloc will not try to do it in any implementation
I've seen.

It's not the behavior of std::bad_alloc that's problematic.

I've also seen OOM exception objects pre-allocated
statically in non-C++ environments, too (what else?).

Figuring out what you need to handle an OOM condition is not a trivial
task. Quick, I want to write out a file during OOM (using iostreams)
and have it not fail due to lack of memory. Tell me everything I must
do to ensure this.

There is difficulty, I agree with that, but it's actually trivial: keep in mind
that, once you hit that OOM handler (most likely, some top-level
exception handler not necessarily tied to OOM), you have all you might
need prepared upfront.

Which is exceptionally hard. Let's say I'm going to attempt your file
writing idea. I have to have the stream itself already open and
ready. I have to make sure the internal streambuf has enough memory
to buffer my I/O (or is unbuffered). I have to make sure my data
structures are constructed in such a fashion so that no memory
allocation occurs when I access the data, and that the data is
accessible when the handler runs. This means no copying of anything,
which is not impossible to ensure but difficult.

You gloss over the difficultly, especially when you don't know what
code will allocate memory, nor when it will do it. Memory allocation
can and does occur in places you don't expect, so ensuring memory
allocation doesn't happen means being intimately familiar with the
implementation of every type involved in your OOM handler.

Personally, I have better things to do then familiarize my self with
the inner guts of my C++ runtime implementation.

Yeah, I agree that one cannot sensibly "handle" bad_alloc. It can
sensibly __report__ it though.

Reporting is handling and you're kidding yourself by trying to
distinguish them. Even reporting may not be possible. Reporting
requires I/O, I/O requires memory and resources. Most exceptions
occur because: 1) you're a bad programmer 2) you're out of some
resource (that you probably ran out of doing I/O anyway).

The thing is though, a vaaaast majority
of exceptions, code can't "handle". It can only report them, and in
rare cases, retry upon some sort o operator's reaction (like, check
the network and retry saving a file on a share). That makes OOM much
less special than any other exception, and less of a reason to
terminate.

No, it means the right thing to do for most exceptions is to
terminate. Logging or reporting is great when it's possible, but
hanging your hat on it being possible is simply absurd.

Adam

Paul · Aug 31, 2011

On Aug 31, 5:23 am, yatremblay@bel1lin202.(none) (Yannick Tremblay)
wrote:

I don't know why people think it's interesting to talk about super
reliable software but neglect super reliable hardware too. It's
impossible to make hardware that never fails (pesky physics) so why
would I ever bother writing software that never fails? Software that
never crashes is useless if the cleaning staff kicks out the power
cable every night.

So attempting to deliver robust software is a waste of time because
some cleaner may switch of the machine at the wall?
The amazing thing about this post is that you're serious.

Goran · Aug 31, 2011

Trying to do it at any point is a mighty bad idea. Rolling back the
stack and deallocating memory _does not ensure_ future allocations
will succeed. Once you reach OOM, you're not assured the user will be
able to save; you're not assured you can do anything at all. Trying
may very well be pointless.

It may be, but you are less helpless than what you are making it out
to be.

Say that you want to save. If your file stream is already there, and
given that saving is logically a "const" operation, there's little
reason for things to go wrong (it's possible, but not likely). Or say
that you want to log the error. Logging is something that must a no-
throw operation, therefore, logging facilities are already there and
ready.

I tried, a long time ago, to eat all my memory and then proceed to
disk I/O. This works on e.g. Unix and windows. Why wouldn't it?

It's considerably smarter, and absurdly easier, to save off the file
before attempting the operation in the first place. This way, your
code can die in a clean fashion, and the user has a recovery file with
the changes up to the failed operation. Trying to create the recovery
file under or after an OOM condition is very difficult. However,
trying to create the recovery file before the OOM condition is
trivial.

You can't be serious with this. I would really like to see a codebase
that saves state to disk prior to any allocation (any failure
condition, really).

Actually, what could be attempted is saving after any change. But that
won't work well for many-a-editor either. Best you can reasonably do
is to save recovery from time to time.

In all of the counter-examples everyone's provided so far, the correct
way to ensure reliability is not to attempt to avoid crashing on OOM.
The correct way is to write your code so that crashing on OOM becomes
irrelevant. This has the added bonus that your code will still be
reliable even if your process is never told of the OOM condition. As
I mentioned before, in some languages you're not promised to be told
about allocation failure, and other have mentioned that on some
operating systems your process may simply die instead of being told
about the failure.

That may happen anyway, as I explained to Yan. The situation is much
harder to handle when there is concurrent processing occurring, not
easier.

No, I don't believe most applications unnecessarily cache data and
would benefit from attempting to free those caches under OOM
conditions. Even if I did, the code has to be written to make that
possible, which is very difficult.

It's not unnecessary caching, it's transient peaks in memory usage
during some work. You often don't know how much memory a given system
has, nor you don't know what e.g. other processes are doing wrt memory
at a time you need more memory.

Then I shudder to think about what you actually find difficult. Have
you ever tried to do this? Your commentary strongly suggests not only
have you not, but you don't have any experience in building reliable
systems anyway.

But I have. I have been intentionally droving code up the wall with
memory usage and looked at what happens. If you have your resources
prepared up front, it's not hard doing something meaningful in that
handler (depends also what one considers reasonable).

Which doesn't mean you can make use of it. When you get
std::bad_alloc, you must assume there's 0 bytes available. No new, or
malloc, etc., anywhere.

It's not the behavior of std::bad_alloc that's problematic.

Figuring out what you need to handle an OOM condition is not a trivial
task. Quick, I want to write out a file during OOM (using iostreams)
and have it not fail due to lack of memory. Tell me everything I must
do to ensure this.

Quick? Why? Because the way to write a critical piece of code is off
the top of one's head? That's not serious.

Which is exceptionally hard. Let's say I'm going to attempt your file
writing idea. I have to have the stream itself already open and
ready. I have to make sure the internal streambuf has enough memory
to buffer my I/O (or is unbuffered). I have to make sure my data
structures are constructed in such a fashion so that no memory
allocation occurs when I access the data, and that the data is
accessible when the handler runs. This means no copying of anything,
which is not impossible to ensure but difficult.

Meh. You are trying to construct a case of trying to do a lot in case
of a resource shortage in order to prove that __nothing__ can be done
in case of resource shortage. I find this dishonest.

Realistically, here's what I'd do for save case:

try
{
throw zone, lotsa work
}
catch(const whatever& e)
{
inform_operator(e, ...); // nothrow zone
try { save(); } // throw zone
catch(const whatever& e)
{
inform_operator(e, ...); // nothrow zone
}
}

Then, I would specifically test inform_operator under load and try to
make it reasonably resilient to it. But whatever happens, I would not
allow exception to escape out of it. That's all there is to it
conceptually, and practically, there's always room for later
improvement, but __without__ changing conceptual model.

You gloss over the difficultly, especially when you don't know what
code will allocate memory, nor when it will do it. Memory allocation
can and does occur in places you don't expect, so ensuring memory
allocation doesn't happen means being intimately familiar with the
implementation of every type involved in your OOM handler.

Personally, I have better things to do then familiarize my self with
the inner guts of my C++ runtime implementation.

Reporting is handling and you're kidding yourself by trying to
distinguish them.

I disagree. For me, there's actually no such thing as "error
handling". There's error reporting and there's __program state
handling__ (in face of errors). This is IMO a very important
distinction.

Even reporting may not be possible. Reporting
requires I/O, I/O requires memory and resources.
Most exceptions
occur because: 1) you're a bad programmer 2) you're out of some
resource (that you probably ran out of doing I/O anyway).

Hmmm... We are most likely in disagreement what are exceptions used
for.

Goran.

Noah Roberts · Aug 31, 2011

My argument was that bad_alloc exceptions should be handled. I don't
see a reason to ignore them when all the mechanics are in place to
catch such exceptions and handle them in some appropriate way.

That was surely your thesis, yes, but you repeatedly claimed that
those who disagree were writing horrible code that would crash all the
time.

Adam Skutt · Aug 31, 2011

It may be, but you are less helpless than what you are making it out
to be.

Say that you want to save. If your file stream is already there, and
given that saving is logically a "const" operation,

It is not logically a "const" operation and never will be one. The
very notion is absurd. How can I/O be const?

reason for things to go wrong (it's possible, but not likely). Or say
that you want to log the error. Logging is something that must a no-
throw operation, therefore, logging facilities are already there and
ready.

Providing logging as a no-throw operation is a logical impossibility
unless it is swallowing the errors for you. I/O can always fail,
period. Even when you reserve the descriptor and the buffer.
Moreover, it's generally impossible to detect failure without actually
performing the operation!

I tried, a long time ago, to eat all my memory and then proceed to
disk I/O. This works on e.g. Unix and windows. Why wouldn't it?

Sure, if you're using read(2) and write(2) (or equivalents) and have
already allocated your buffers, then being out of memory won't require
any additional allocations on the part of your process. Of course,
performing I/O requires more effort than just the read and write
calls, and many (most?) people don't write code that uses such low-
level interfaces. Those interfaces frequently do not (e.g., C++
iostreams) make it easy or even possible to ensure that any given I/O
operation will not cause memory allocation to occur.

Nevermind that data is often stored in memory in a different format
from how it is stored on disk, converting between these formats often
requires allocating memory. If you truly believe the fact that
read(2) and write(2) do no allocations is somehow relevant in this
discussion, then you are truly clueless. There is more to doing I/O
than just the actual system calls that transfer data from your process
to the kernel or I/O device.

You can't be serious with this. I would really like to see a codebase
that saves state to disk prior to any allocation (any failure
condition, really).

You don't. You save it before performing the complicated image
processing operation that might fail, instead of trying to save the
file after it failed. Plenty of codebases expect you to do this, and
plenty of smart users do this automatically and out of habit, even if
the application does it for them.

Actually, what could be attempted is saving after any change. But that
won't work well for many-a-editor either. Best you can reasonably do
is to save recovery from time to time.

If that's the best I can do, then why the hell are you telling me to
handle OOM at all? You came up with the suggestion, and now you're
telling me what you originally suggested is not possible. So which is
it?

It's not unnecessary caching, it's transient peaks in memory usage
during some work.

What transient peaks? If the amount of memory allocated to my process
is less than what I actually need to perform my processing, it means
some sort of caching (e.g., pool or block allocator) must be
occurring. Writing those caches such that they support giving memory
back to the operating system may be difficult and not worth the effort
involved. In some cases, I may not even know they're occurring or be
able to influence them.

You often don't know how much memory a given system
has, nor you don't know what e.g. other processes are doing wrt memory
at a time you need more memory.

If the operating system's virtual memory allows for memory allocation
by other processes to cause allocation failure in my own, then
ultimately I may be forced to crash anyway. Many operating systems
kernel panic (i.e., stop completely) if they reach their commit limit
and have no way of raising the limit (e.g., adding swap automatically
or expanding an existing file). Talking about other processes when
all mainstream systems provide robust virtual memory systems is
tomfoolery.

But I have. I have been intentionally droving code up the wall with
memory usage and looked at what happens. If you have your resources
prepared up front, it's not hard doing something meaningful in that
handler (depends also what one considers reasonable).

Your definition of reasonable is asinine, since it requires
programmers to write code that relies on low-level operating system
behaviors and system calls. Moreover, it assumes that doing such
things is possible with any further exceptions occurring! Finally, it
assumes that the behavior of the system calls themselves is somehow
the only relevant thing!

Quick? Why? Because the way to write a critical piece of code is off
the top of one's head? That's not serious.

Meh. You are trying to construct a case of trying to do a lot in case
of a resource shortage in order to prove that __nothing__ can be done
in case of resource shortage. I find this dishonest.

You're the one who suggested that we write state out to a file when we
reach an out of memory condition, not I! I'm not suggesting that
anymore be done than what is necessary to have a reasonable chance of
the operation succeeding, and I didn't even suggest everything
strictly necessary since it is application dependent.

Realistically, here's what I'd do for save case:

try
{
throw zone, lotsa work}

catch(const whatever& e)
{
inform_operator(e, ...); // nothrow zone
try { save(); } // throw zone
catch(const whatever& e)
{
inform_operator(e, ...); // nothrow zone
}

}

It is not possible to write 'inform_operation' generically in such a
way that it's nothrow unless it actively swallows exceptions. All of
the stuff that you said was 'Meh' is required to notify the operator!
Of course, assuming there is an operator is just icing on the cake.

Then, I would specifically test inform_operator under load and try to
make it reasonably resilient to it. But whatever happens, I would not
allow exception to escape out of it.

Doing this doesn't buy you a thing. It doesn't ensure the operator
(who doesn't exist) will see the message, it doesn't ensure you can
safely save. Ensuring these things requires doing what I suggest, at
a minimum, if it's even possible to do ensure notifications and
saving, which it is not.

I disagree. For me, there's actually no such thing as "error
handling". There's error reporting and there's __program state
handling__ (in face of errors). This is IMO a very important
distinction.

Not when discussing out of memory conditions (and most exceptions),
there's not. It has no bearing on the relevant questions: will the
program terminate and how will it do it?

Hmmm... We are most likely in disagreement what are exceptions used
for.

Clearly, but your disagreement isn't really with me but with language
designers and implementers the world over.

Adam

Joshua Maurice · Aug 31, 2011

Trying to do it at any point is a mighty bad idea. Rolling back the
stack and deallocating memory _does not ensure_ future allocations
will succeed. Once you reach OOM, you're not assured the user will be
able to save; you're not assured you can do anything at all. Trying
may very well be pointless.

It's considerably smarter, and absurdly easier, to save off the file
before attempting the operation in the first place. This way, your
code can die in a clean fashion, and the user has a recovery file with
the changes up to the failed operation. Trying to create the recovery
file under or after an OOM condition is very difficult. However,
trying to create the recovery file before the OOM condition is
trivial.

This is disingenuous reasoning. I agree that once you hit OOM you're
not guaranteed that you can save. However, even before you hit OOM
you're not guaranteed that you can save. By this fallacious pedantic
reasoning, "you're never guaranteed of anything, so why try?".

I do agree that the way you achieve reliability in practice,
specifically the way you achieve fault tolerance, is through fault
isolation. For C++ on modern desktops, the first good layer of fault
isolation is the process. After that is separate hardware. Then
there's other points of failure like shared hardware ala networks,
power supply, and so on.

I would hazard a guess that a fair share of programs could reasonably
attempt to recover from an OOM because they will free memory when they
start freeing the stack. I also think that any reasonable program
written to recover from OOM would be able to report it to the user
without allocating memory. You'd have to use some non-portable OS
functions, possibly another process to send the error through a pipe,
but it's relatively easy to do, if annoying.

You are doing a common mistake IMHO. You are automatically assuming
that the unix / Windows process is the default, best, and perhaps only
level of fault tolerance in a system. You seem to think it's ok to try
to not report an error when a process dies when it hits OOM. I don't
follow this line of reasoning at all. Does that mean that whenever I
hit any error I should never log a report? That is absurd. So, you're
going to assert that I should only log an error when X happens but not
Y? That's incredibly silly. I like having as much longing as possible
to debug issues when they happen.

In all of the counter-examples everyone's provided so far, the correct
way to ensure reliability is not to attempt to avoid crashing on OOM.
The correct way is to write your code so that crashing on OOM becomes
irrelevant. This has the added bonus that your code will still be
reliable even if your process is never told of the OOM condition. As
I mentioned before, in some languages you're not promised to be told
about allocation failure, and other have mentioned that on some
operating systems your process may simply die instead of being told
about the failure.

This is a C++ newsgroup. Also, to be pedantic, those OSs have an
option to turn off overcommit. It's an unfortunate reality due to the
fork-exec bug in POSIX (fork as the only process creation is the bug),
but we deal with what we have.

[...]

Which doesn't mean you can make use of it. When you get
std::bad_alloc, you must assume there's 0 bytes available. No new, or
malloc, etc., anywhere.

Why? Again, see the earlier reference to fallacious reasoning. You
must always assume that there's 0 bytes available, both before and
after any std::bad_alloc errors.

Which is exceptionally hard. Let's say I'm going to attempt your file
writing idea. I have to have the stream itself already open and
ready. I have to make sure the internal streambuf has enough memory
to buffer my I/O (or is unbuffered). I have to make sure my data
structures are constructed in such a fashion so that no memory
allocation occurs when I access the data, and that the data is
accessible when the handler runs. This means no copying of anything,
which is not impossible to ensure but difficult.

You gloss over the difficultly, especially when you don't know what
code will allocate memory, nor when it will do it. Memory allocation
can and does occur in places you don't expect, so ensuring memory
allocation doesn't happen means being intimately familiar with the
implementation of every type involved in your OOM handler.

Personally, I have better things to do then familiarize my self with
the inner guts of my C++ runtime implementation.

Agreed. I don't mean to gloss over the difficulty. You have to abandon
basically all portable code at this point and use the system APIs,
like POSIX write. Annoying, but doable.

Adam Skutt · Aug 31, 2011

This is disingenuous reasoning. I agree that once you hit OOM you're
not guaranteed that you can save. However, even before you hit OOM
you're not guaranteed that you can save. By this fallacious pedantic
reasoning, "you're never guaranteed of anything, so why try?".

Except that isn't my reasoning. Try reading what I said again, taking
note of the fact I said, 'one is easy' and 'one is hard'.

I would hazard a guess that a fair share of programs could reasonably
attempt to recover from an OOM because they will free memory when they
start freeing the stack.

No, that doesn't guarantee you a thing, especially if your goal is to
retry the operation you failed in the first place. You'll simply
reallocate the same amount of memory, again, and fail in the same
place, again. The OOM handler must go out of its way to ensure
additional memory gets freed if a retry is desirable (generally, if
anything other than termination is desirable).

I also think that any reasonable program
written to recover from OOM would be able to report it to the user
without allocating memory.

The key part of your statement is "written to recover from OOM". The
entire argument I'm making is that is impressively difficult and
rarely worth the effort. It is much, much harder than anyone here
seems to believe it is. It also gains you very little. Users tend to
notice crashed programs, after all.

You'd have to use some non-portable OS
functions, possibly another process to send the error through a pipe,
but it's relatively easy to do, if annoying.

Non-portable OS functions are a non-starter for many codebases, and
it's not easy to do when compared to using an I/O library. Reliably
ensuring that process is around is not easy to do. That's far more
complicated then keeping a file and buffers open! If you're going for
easy compared to other suggestions, you're going in the wrong
direction.

You are doing a common mistake IMHO. You are automatically assuming
that the unix / Windows process is the default, best, and perhaps only
level of fault tolerance in a system.

I've not done anything of the sort. You are assuming a relatively
UNIX / Windows centric existence by suggesting the use of pipes as a
way to log an OOM condition, though.

You seem to think it's ok to try
to not report an error when a process dies when it hits OOM.

Yes, because the effort involved in doing so is quite substantial.

I don't
follow this line of reasoning at all. Does that mean that whenever I
hit any error I should never log a report?

Quite possibly. It depends on what you're doing, but even if I
attempted to log something whenever my program is about to exit, I
certainly wouldn't normally bother in ensuring that logging works
under an OOM condition. Nor would I worry too terribly much about
what happens if the logging fails due to an I/O problem. Neither is
easy to fix, especially in the situation in question.

That is absurd. So, you're
going to assert that I should only log an error when X happens but not
Y? That's incredibly silly.

Yes, in general, you should only log things that are useful to the
user / administrator / whomever. Whether "I ran out of memory" is
helpful or not depends on the situation. Even when it is helpful, the
effort involved in ensuring such a message is guaranteed to be noted
is rarely worth it.

This is a C++ newsgroup.

And? Good programming advice is often language independent. Many
languages have exception semantics very similar to those of C++, so
good exception handling behavior should be cross-language. Moreover,
the larger point is this: some very popular languages by design make
what you want to do impossible. As a result, it stands to reason
you're grossly overstating the importance of trying to handle OOM
conditions in any way but termination.

Also, to be pedantic, those OSs have an
option to turn off overcommit.

Turning off overcommit doesn't ensure your process stays alive. Even
when it does, you may have traded the survival of your process for the
loss of the whole system.

It's an unfortunate reality due to the
fork-exec bug in POSIX (fork as the only process creation is the bug),
but we deal with what we have.

fork()/exec() is not the primary reason for overcommit support in
Linux and other operating systems. Overcommit comes about because
many application do use allocation strategies that ask for more memory
from the operating system then they ever use. fork()/exec() is really
a corner case.

Why? Again, see the earlier reference to fallacious reasoning. You
must always assume that there's 0 bytes available, both before and
after any std::bad_alloc errors.

No, that does not follow at all. You need to consider the goal of the
exception handler: succeed no matter what state the application is in
vs. the goals of the rest of the program: succeed only if possible.
The latter means no memory allocation.

Agreed. I don't mean to gloss over the difficulty. You have to abandon
basically all portable code at this point and use the system APIs,
like POSIX write. Annoying, but doable.

And a non-starter, especially for a C++ newsgroup!

Adam

Waldek M. · Aug 31, 2011

]

Shut down eyes and ears and decrease walking speed?
Just crash and allow robot to fall over in a heap?

If you worked on an aircraft control system would you think its only
sane to allow a program to crash with an alloc error?

Click to expand...

[...]

See my other reply and then get a clue.

As someone wrote: fighting with trolls is like wrestling with
pigs; you get all muddy and they do like mud anyway.

Br.
Waldek

Goran · Sep 1, 2011

It is not logically a "const" operation and never will be one. The
very notion is absurd. How can I/O be const?

Please note the word "logically". If you already have a stream and
data to save, saving itself __does not__ change program state.
Successful save might e.g. change the state of the "dirty" bit, if
there's one.

Providing logging as a no-throw operation is a logical impossibility
unless it is swallowing the errors for you. I/O can always fail,
period. Even when you reserve the descriptor and the buffer.
Moreover, it's generally impossible to detect failure without actually
performing the operation!

I know. Functionality that informs the operator about failures, like
logging functions, in a well-designed system, __are__ no-throw
operations. That might mean try{}catch(...){swallow;}. What else!?
That's why I said that I'd test inform_operator under load and __try__
to make it as resilient as possible.

Now, if functionality fails, and logging fails, tough. You think you
can do better? I think you can't. I think that terminating/restarting
the process might be a way out, but __only__ for some software and
some situations, and that percentage of those is smaller than what you
make it out to be.

Sure, if you're using read(2) and write(2) (or equivalents) and have
already allocated your buffers, then being out of memory won't require
any additional allocations on the part of your process. Of course,
performing I/O requires more effort than just the read and write
calls, and many (most?) people don't write code that uses such low-
level interfaces. Those interfaces frequently do not (e.g., C++
iostreams) make it easy or even possible to ensure that any given I/O
operation will not cause memory allocation to occur.

Nevermind that data is often stored in memory in a different format
from how it is stored on disk, converting between these formats often
requires allocating memory.

It might, but given the nature of e.g. logging, it is of course my
task to ensure these conversions are a no-throw (or better yet, actual
no-fail) operations. I believe that
1. conversions are rare
2. if they exist, it s the task of the logging to make it a no-throw
(or better yet, actual no-fail) operation. See how exception::what is
"const"? That's not by accident.

You don't. You save it before performing the complicated image
processing operation that might fail, instead of trying to save the
file after it failed. Plenty of codebases expect you to do this, and
plenty of smart users do this automatically and out of habit, even if
the application does it for them.

That requires that code knows what is likely to fail. That's a pipe
dream at any given moment, and also not future-proof (random change in
the future, and another operation falls under "complicated image
processing that might fail"). Also, one of very tenets of e.g.
exceptions is to ease writing code when you don't know what will fail
(and, get this, you don't want to think about it).

If that's the best I can do, then why the hell are you telling me to
handle OOM at all? You came up with the suggestion, and now you're
telling me what you originally suggested is not possible. So which is
it?

I was merely pointing that your line of reasoning is bad, it wasn't my
intention to bring OOM handling in the picture. Therefore, I will
abstain from any further comment here.

What transient peaks? If the amount of memory allocated to my process
is less than what I actually need to perform my processing, it means
some sort of caching (e.g., pool or block allocator) must be
occurring. Writing those caches such that they support giving memory
back to the operating system may be difficult and not worth the effort
involved. In some cases, I may not even know they're occurring or be
able to influence them.

Erm...

web_response << "Welcome " some_person.diplay_name() << "!";

class person
{
string display_name() { if (culture == "en") return first_name +
last_name; ... };
};

I don't know how __you__ write your code, but I have transient peaks
all the time.

If the operating system's virtual memory allows for memory allocation
by other processes to cause allocation failure in my own, then
ultimately I may be forced to crash anyway. Many operating systems
kernel panic (i.e., stop completely) if they reach their commit limit
and have no way of raising the limit (e.g., adding swap automatically
or expanding an existing file). Talking about other processes when
all mainstream systems provide robust virtual memory systems is
tomfoolery.

Your definition of reasonable is asinine, since it requires
programmers to write code that relies on low-level operating system
behaviors and system calls. Moreover, it assumes that doing such
things is possible with any further exceptions occurring!

Erm... Given that system has plain C interface (no exceptions), and
given that writing is logically const (translates to "no exceptions"
in a sane codebase), this is a reasonable assumption.

Finally, it
assumes that the behavior of the system calls themselves is somehow
the only relevant thing!

Ultimately yes, it __is__ the only relevant thing, because this is
where the control of the code stops. What are you even trying to say
here? That code should react to e.g. a failure it knows nothing about?

It is not possible to write 'inform_operation' generically in such a
way that it's nothrow unless it actively swallows exceptions. All of
the stuff that you said was 'Meh' is required to notify the operator!
Of course, assuming there is an operator is just icing on the cake.

Doing this doesn't buy you a thing. It doesn't ensure the operator
(who doesn't exist) will see the message, it doesn't ensure you can
safely save. Ensuring these things requires doing what I suggest, at
a minimum, if it's even possible to do ensure notifications and
saving, which it is not.

If we're talking about an image processing program and saving, then
operator is kinda presumed. If we're talking about some daemon, then
the operator will see what has happened in the log when he gets wind
of problems.

Not when discussing out of memory conditions (and most exceptions),
there's not. It has no bearing on the relevant questions: will the
program terminate and how will it do it?

Clearly, but your disagreement isn't really with me but with language
designers and implementers the world over.

I don't think so. Your thinking is pretty much that any exception
should lead to a termination. If that was thinking of language
designers, then a catch would be a no-return block, and if it weren't,
implementers would push designers to make it so. None of that is true
nor is happening.

Goran.

Nick Keighley · Sep 1, 2011

I think that pre-STL it was pretty much standard practise to check
for memory allocation failures for example:
float* m1 = new float[16];
if(!m1){
//output an exit msg.
exit();
}

Click to expand...

Click to expand...

which is exactly what not handling a bad_alloc exception would do...

Again your lack of experience is showing.

Click to expand...

How does this address the question...What is the point of a throw if
[it's] not being caught?

to invoke destructors. Go and look up RAII.

Paul · Sep 1, 2011

Does a crashing Windows app crash Windows? No. Why would a phone be
any different? Try engaging your brain.

Why does any app have to crash?
Its only crashing because of an incompetent programmer somewhere.

There is no point in having robust code when there is code running
alongside it that will crash the system at the slightest glimmer of a
low memory situation.

Nick Keighley · Sep 1, 2011

I don't know why people think it's interesting to talk about super
reliable software but neglect super reliable hardware too.
http://en.wikipedia.org/wiki/NonStop

It's
impossible to make hardware that never fails (pesky physics) so why
would I ever bother writing software that never fails? Software that
never crashes is useless if the cleaning staff kicks out the power
cable every night.

so protect the power socket. Are server farms really this vulnerable?
Life support equipment? Fly-by-wire? Telephone excahnges?
The world is not a desk top.

Adam Skutt · Sep 1, 2011

http://en.wikipedia.org/wiki/NonStop

Does not neglect the hardware! Plus, the wikipedia page says you can
write programs for NonStop OS that terminate. Please read your
references before citing them. NonStop just provides a defined
mechanism, as part of the operating system, for providing the sorts of
redundancy I mentioned you need.

so protect the power socket. Are server farms really this vulnerable?

The recent issues with Amazon EC2 would suggest yes.

Life support equipment? Fly-by-wire? Telephone excahnges?
The world is not a desk top.

All support software that crashes. In life/safety-critical systems,
it's often preferable to crash immediately (fail-fast) whenever /any/
abnormal situation is encountered, because you can restart to a known
state much faster than you can fix the program, and you get a
deterministic response to all crashes.

Adams

Adam Skutt · Sep 1, 2011

Please note the word "logically". If you already have a stream and
data to save, saving itself __does not__ change program state.
Successful save might e.g. change the state of the "dirty" bit, if
there's one.

So you're resolved to contradict yourself in the same paragraph now?
You're simply factually wrong: doing I/O always involves a change in
the state of the process, period. If it doesn't, then the Haskell
folk have a lot of explaining to do.

I know. Functionality that informs the operator about failures, like
logging functions, in a well-designed system, __are__ no-throw
operations.

Funny, the people who've actually written C++ logging APIs sure as
hell don't seem to agree with you.

Now, if functionality fails, and logging fails, tough. You think you
can do better? I think you can't. I think that terminating/restarting
the process might be a way out, but __only__ for some software and
some situations, and that percentage of those is smaller than what you
make it out to be.

I think it's a way out for all applications, and as your reliability
needs go up, your odds of terminating at any sort of failure go up
substantially as well. The safest way to proceed after a normal
condition is to restart from a known state, and the safest way to do
that is to restart the process, possibly even the whole computer.
This is what many life-critical and safety-critical systems do.

It might, but given the nature of e.g. logging, it is of course my
task to ensure these conversions are a no-throw (or better yet, actual
no-fail) operations. I believe that
1. conversions are rare

Then you're hopelessly ignorant.

2. if they exist, it s the task of the logging to make it a no-throw
(or better yet, actual no-fail) operation. See how exception::what is
"const"? That's not by accident.

I'd love to see a generalized algorithm for removing freestore
allocations from programs, but I'm quite confident you will not be
providing it.

That requires that code knows what is likely to fail.

No it doesn't, it only requires it to know what is an expensive
operation that is undesirable to repeat. That's reasonably easy most
of the time.

Erm...

web_response << "Welcome " some_person.diplay_name() << "!";

class person
{
string display_name() { if (culture == "en") return first_name +
last_name; ... };

};

I don't know how __you__ write your code, but I have transient peaks
all the time.

What "transient peaks"? That's not even C++ and there's not enough
context to see what the hell you're talking about. I don't see any
memory there that could be freed by an OOM handler to enable the
operation to be retried. You have to show the presence of allocated
memory that is not necessary to complete whatever computation you're
attempting.

Erm... Given that system has plain C interface (no exceptions), and
given that writing is logically const (translates to "no exceptions"
in a sane codebase), this is a reasonable assumption.

As I said before, your second is assumption is wrong. Your first is
wrong too, actually (both in the C++ and the lay sense of the word
"exception").

If we're talking about an image processing program and saving, then
operator is kinda presumed.

No it isn't, but thanks for playing.

If we're talking about some daemon, then
the operator will see what has happened in the log when he gets wind
of problems.

If they bother to look at the log. You'd be surprised how many can't
manage to do that.

I don't think so. Your thinking is pretty much that any exception
should lead to a termination.

Given that's the default behavior, it's a damn good starting
assumption!

If that was thinking of language
designers, then a catch would be a no-return block, and if it weren't,
implementers would push designers to make it so. None of that is true
nor is happening.

If it weren't the thinking of language developers, they'd follow the
exception handling semantics of Eiffel and similar languages.

You know, I never much liked Don Quixote when I read it in school, so
I think I have no interest in conversing with you further, Mr.
Quixote.

Adam

C++: The Good and Bad	17	Jan 26, 2007
performance of freestore management	3	Oct 6, 2006
high traffic/availability application and gnu_cxx::hash_map problem -better to use tr1/unordered_map	1	Jan 5, 2011
What I feel about STL.	6	Oct 17, 2006
C++ equivalent to spaghetti code	33	Jul 15, 2008
Musatov's 'Mode/Code' Primary method call	4	Oct 31, 2009
Musatov claims "Mode/Code"	2	Oct 31, 2009
Bjarne Stroustrup has new C++ text coming out ...	1	Oct 19, 2008

bad alloc

Adam Skutt

Adam Skutt

Paul

Goran

Paul

Adam Skutt

Adam Skutt

Paul

Goran

Noah Roberts

Adam Skutt

Joshua Maurice

Adam Skutt

Waldek M.

Goran

Nick Keighley

Paul

Nick Keighley

Adam Skutt

Adam Skutt

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads