Experiment: functional concepts in C

Andrew Poelstra · Feb 17, 2010

A general purpose operating system doesn't need to reclaim the
individually allocated blocks, since they were allocated from pages
allocated by the operating system. It only needs to reclaim those
pages. What's more, it will do this even if you free the individually
allocated blocks in your program.

To yourself and Kaz, I was wrong on this point. Thanks for
explaining it. Unfortunately I haven't taken any operating
systems courses yet, so my experience with memory managers
has been limited to brief overviews in Knuth and assembler
manuals.

Having said that, I still don't think it's good to rely on
this operating system behavior. It seems to be unnecessary
unportability. Plus, it is so much easier to use valgrind.

ImpalerCore · Feb 17, 2010

I certainly do.

It depends a lot on circumstances. Generally, at the least, I'd expect to
know which pieces of storage I'm allocating that I expect to still be there
at the end of execution.

Then why did I get counted off for forgetting to free memory in my
data structures class in college? I wish I knew this argument back
then

That also depends quite a bit on circumstances. I usually close files, but
there have been exceptions.

As an example, I have a hunk of code I maintain right now, in which a few
files are opened, and a few resources allocated, which are not released on
exit -- because this hunk of code is intercepting the actions of other
programs, and does not necessarily know about their exits. More importantly,
it is fairly trivial to prove that even if I tried to intercept their exits,
I could not do so safely. Either I would deallocate my resources at a time
when the other program could then perform actions which still required them,
or I would have to leave them allocated until a point at which nothing was
going to transfer control back to me. Can't win.

I have. I once had a program which took over five seconds to exit on what
was, at the time, a very fast machine. In that case, I swapped a key data
structure from an array of pointers to an array of objects, and it got faster
by, well, about a factor of eighty, which was good enough.

I don't have a problem with the general idea of letting the OS reclaim
memory, but I don't have the experience to know how and when to make
the decision. If it takes 1, 5, or 30 seconds to end the program,
should that be a sign to just throw the hands up in the air to give it
to the OS to handle? Is there some other process really needing that
extra 5 seconds? Are users complaining about 5 seconds to close the
program? Is it that it costs more to develop the code to free
everything as efficiently, so it's cheaper to just let the OS handle
it?

Willem · Feb 17, 2010

Richard Heathfield wrote:
) In practice, I do see something wrong with relying on that behaviour,
) and my opinion is based on experience of a (fairly large - several
) hundred thousand lines) program that did rely on that behaviour. I
) didn't write it, but I did have to help fix it.
)
) In bare outline, it worked like this:
)
) main()
) {
) lots_of_functions_that_do_not_bother_to_clean_up();
) }
)
) And then, one day, it got maintained, as programs do:
)
) func()
) {
) lots_of_functions_that_do_not_bother_to_clean_up();
) }
)
) main()
) {
) for(big_old_loop)
) {
) func();
) }
) }
)
) Suddenly we had a huge maintenance problem on our hands because the
) original crew were too lazy to manage memory properly. This cost a
) significant amount of time (and therefore money) to fix, because the
) knowledge of the right points at which to free up the many and varied
) allocations had been discarded, and now had to be reacquired.

- What was the probability at the time of writing that such a change
was ever going to happen ?
(As a rough estimate, how many such programs have been written other than
this one which had to be changed)
(Let's say one in M)
- What was the relation between the time it would have cost the original
programmers to free all the resources correctly, and the time it did cost
to go back and then free all the resources correctly afterward ?
(Let's say one in N)

I find it likely that M is significantly larger than N, in which case
the original decision was correct, because it actually saved time,
even in the long run.

Or, to put it in some randomly picked numbers:

Suppose that program was one in a hundred programs that were written
without regard for free()ing every resource. Then, afterward you had
to go back and fix everything, which took ten times as long as it would
have for the original programmers. For that one program. In total,
it would have taken the original programmers ten times as long as it
took now for all those programs combined.

And, of course, there may have been more pragmatic solutions available,
such as the extra layer of memory management mentioned elsethread, or
some kind of loop that fork()s, runs the process in the child fork,
and wait()s in the parent fork for the child to finish before starting
the next iteration.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

santosh · Feb 17, 2010

Andrew said:
But you would need to have found a gigabyte of contiguous memory
dedicated to your program for this to be simple - otherwise you
would be in memory space also used by other applications, and
bookkeeping does matter.

Not necessarily. Most of the modern operating systems implement some
form of paging and virtual memory. Unused pages are simply paged out
to disk and mapped back to RAM only when accessed.

Under these types of systems, deallocating memory got from malloc
simply makes little or no difference to the memory management of the
OS. OTOH the standard library allocator can hand them out to future
requests, so it's usually a good idea to free() memory when you no
longer need it.

But from the OS's point of view, unused or stale pages are unmapped
from real addresses and written to disk, the space reclaimed can now
be mapped to other pages of other processes, or the current one. And
in systems like Linux, virtual memory allocations even greater than
available free physical memory are granted, with the system mapping
pages to physical RAM only on first use, or killing a random process
if it cannot.

I guess this is one reason why large programs like Firefox allocate
massive amounts of virtual memory when starting up, confident that a
majority of it will not actually reside in RAM. The concept of virtual
memory has made life easier for multitasking systems, but it does mean
that programs have gotten into the habit of allocating far more memory
than necessary, or in not deallocating when done.

Finally, when the process terminates, normally or abnormally, with or
without explicitly calling free() for each malloc(), all pages for the
process are simply reclaimed. Privileged processes can of course, play
with the system in ways normal processes can't.

But it's still a good idea to deallocate resources when you're
finished with them, unless doing so would not be worth it. And to know
it it'd be worth it, we need to know about other considerations. But
IMHO, manually deallocating memory before exit should be the rule
rather than the exception.

Nick Keighley · Feb 17, 2010

to the OS it isn't millions of objects but a single lump of contiguous
memory. Underneath malloc() there is likely a call to the OS to
allocate more memory for the process (effectivly the current running
program). Unix traditionally called this pbrk(). Often this "heap"
area was a contiguous lump of memory that could be simply returned to
the OSs free memory pool at the end. This is all slightly more
complicated with virtual memory, paging etc.

You might have a point with things that aren't just memory such as
Windows's GDI objects or Pens. Older versions of Windows were
notorious for leaking such resources.

To yourself and Kaz, I was wrong on this point. Thanks for
explaining it. Unfortunately I haven't taken any operating
systems courses yet, so my experience with memory managers
has been limited to brief overviews in Knuth and assembler
manuals.

Click to expand...

the last chapter of K&R discusses how C might be implemented on Unix.
That might be interesting.

Having said that, I still don't think it's good to rely on
this operating system behavior. It seems to be unnecessary
unportability. Plus, it is so much easier to use valgrind.

Click to expand...

freeing millions of objects really can be expensive. I think a modern
OS that didn't free memory properly when a process terminated would be
pretty broken!

on the other hand it gives a slightly queasy feeling not match every
malloc() with a free()!

Click to expand...

Nick Keighley · Feb 17, 2010

Andrew Poelstra a écrit :

I think Mr Poelstra and Mr Navia are in different programming
universes. Jacob is using desk top systems with decent modern
operating systems (or windows

) and lots and lots of memory.
Whilst Andrew is on semi-embedded systems with run time systems that
barely qualify as an OS. Its slightly worrying that he's doing this
bare back stuff without knowing much about OSs...

the code is not buggy.

if my set top box rebooted every 15 minutes due to a memory leak I'd
call that buggy.

OK. In your pet os (with lowercase) a program that has
a memory leak will bring the whole system down.

GREAT SYSTEM!

you have no idea how modern OSes work apparently

I'm slightly surprised that an embedded system with bare OS support
does any mallocing at all!

Nick Keighley · Feb 17, 2010

In times, where you measure processor cycles in GHz and available memory
in GiB and where the complexity of applications gets huge, it's very
dangerous to go with program exiting (!) performance. In general you
will prefer correctness and predictability over a slightly faster
process exit.

so that's why it takes so bloody long to shutdown my machine!

Ersek, Laszlo · Feb 17, 2010

Programmers who get paid for a living want their programs to be portable in a
way that is /economically/ relevant, balanced with other requirements.

Agree completely.

Maximal portability, the kind where we pretend we write a single body of
source code without conditionl compilation while pretending we programming
for an incapable, broken platform, is only an obsession of a few dull minds.

Please don't. This is also an economically feasible approach, only with
very different economic factors.

That should come as no surprise: it requires a lower level reasoning than
doing a crossword puzzle from the newspaper.

Ignoring for a moment that this is a factually incorrect deliberate
offense, isn't it the general understanding that reasoning and code and
development processes that require less mental strain are easier to keep
bug-free?

Cheers,
lacos

Nick Keighley · Feb 17, 2010

Then why did I get counted off for forgetting to free memory in my
data structures class in college? I wish I knew this argument back
then

you have to play be their rules. As you might have noted there are
strong opinions on both sides! Proper cleanup makes it easier to spot
unintentional memory leaks. Leaving the OS to cleanup is quicker.
Perhaps you could have a fast shutdown version and a clean everything
up version. Or mark items that really should be alive at program exit
time.

long shutdown times irritate me.

I dream of 5s shutdown times... 5s is practically instantaneous!

I don't have a problem with the general idea of letting the OS reclaim
memory, but I don't have the experience to know how and when to make
the decision. If it takes 1, 5, or 30 seconds to end the program,
should that be a sign to just throw the hands up in the air to give it
to the OS to handle? Is there some other process really needing that
extra 5 seconds? Are users complaining about 5 seconds to close the
program? Is it that it costs more to develop the code to free
everything as efficiently, so it's cheaper to just let the OS handle
it?

there is no simple answer. It depends on the environment, the
application, the user community. If the progranm is supposed to "run
for ever" the fact that it takes 30-40s to shutdown may not matter. If
it's a unix filter I'd be unimpressed with anything other than than
"instant" shutdown.

Nick Keighley · Feb 17, 2010

if you don't notice them they don't matter!

I actually have a memory debugger I wrote to find them. As you might expect,
it couldn't solve the actual problem.

I too have spent many a happy hour hunting memory leaks.

Because, technically, it wasn't a leak -- when I went through freeing
everything at the end of program execution, all the allocated space got
freed. However. During execution, it was possible for a particular object
to end up with a linked list of unbounded size of allocated things that it
maintained as internal state, which were neither used nor exposed to any
other interface, making it very hard to find them -- and since it did free
them correctly on exit, there was no memory leak.

mine actually were leaks. The memory concerned no longer had any
references to it and wouldn't be freed on shutdown.

Every time a timer was started a tiny little bit of memory went
missing... Over a period of weeks this begins to matter!

Nick Keighley · Feb 17, 2010

Kaz said:
Kaz said:

["Followup-To:" header set to comp.lang.c.]

Suddenly we had a huge maintenance problem on our hands because the
original crew were too lazy to manage memory properly. This cost a
significant amount of time (and therefore money) to fix, because the
knowledge of the right points at which to free up the many and varied
allocations had been discarded, and now had to be reacquired.

Click to expand...

Click to expand...

You wasted your time and money only because you didn't think of redirecting
the functions to use an alternate malloc-like API which keeps track of the
allocations, such that they can be freed with an additional call when func()
exits:

Click to expand...

It's true that we didn't think of such a scheme. Even if we had, though,
either we'd have had to visit every malloc and free to replace them with
the meta-memory-management function calls,

I've done this with a perl script.

or we'd have had to redefine
malloc(). The latter would have been unacceptable, I think. The former
may have saved us a little time, though, if we had thought of it.

But the time saving would have been optimum if the original programmer
had done his job correctly.
http://en.wikipedia.org/wiki/You_ain't_gonna_need_it

I am reminded of the student who, when asked to write a program to add
two numbers together and print the result, asked the lecturer "which two
numbers should I add?"

Michael Foukarakis · Feb 17, 2010

What basis do you have for this absurd claim...

Hardly absurd. Most OSs will reclaim all the user-space memory
previously assigned to a process, when it exits. One exception I can
think of is DOS with its TSR syscall. I also imagine embedded OSs that
don't implement virtual memory *might* also suffer from this, but that
doesn't make the claim absurd at all. It's actually very logical for
your garden variety current desktop OS (think Linux, Windows, etc).

...or this one? The OS, if it frees your memory, needs just
as much CPU time as your code would, had you chosen to make
it portable to machines who are not your mother.

The OS won't call free(). It doesn't do the same bookkeeping as a
process. Are you trying to be intentionally absurd?

Allocators allocate. To ascribe anything more to them is
presumptuous and unjustified.

What about bookkeeping? Error checking? What about memory
rearrangement to alleviate fragmentation? Are those unjustified? Or
does your generic term for "allocate" imply all of these?

jacob navia · Feb 17, 2010

Branimir Maksimovic a écrit :

Well, let me explain real world example.

You have php script that prepares dat for some other process about 20
and more hours. That's slow, similar optimized program in C woulnd't
take more than 20-30 minutes estimate.
But since C is not safe language and actually no one
knows it because it is obsolete, let use hadoop
to distribute script to cluster of servers and make it faster.
That is how are things are done in *real world* today!

Please do not speak about "the real world" here.

People here are living in systems that do not even cleanup
when a process stops, and insist that all programs in C should
be written ready to be ported to such "advanced" systems.

They want to perpetuate the notion that C is obsolete. It is only
THEY that are obsolete; but... never mind.

Ben Bacarisse · Feb 17, 2010

ImpalerCore said:
Then why did I get counted off for forgetting to free memory in my
data structures class in college? I wish I knew this argument back
then

OK, smiley noted, but that is not a comparable situation. Freeing up
the memory used by a data structure is a proper part of its
implementation and testing that such clean-up functions work is a
proper part of the exercise.

In practise, I'd write the memory freeing code (unless, for some
reason, it really was very complex to write) and simply exclude the
free-ing up code (#ifdef TIDY_UP) prior to exit if I found it was
taking too long.

PS, yes I've used an OS that does not free memory on program exit:
TRIPOS (and yes, there was a good reason for the OS to be designed
that way).

ImpalerCore · Feb 17, 2010

It's exactly what I've said - it was an application that
was taking forever to exit because on exit it was cleaning
up all of its memory. We changed it not to do that, and
then it exited quickly.

What I meant is what is the threshold that made you decide to rely on
the OS to clean up (how long was your "forever")? 1, 5, 10, 60 second
shutdown?

Ersek, Laszlo · Feb 17, 2010

You wasted your time and money only because you didn't think of redirecting
the functions to use an alternate malloc-like API which keeps track of the
allocations, such that they can be freed with an additional call when func()
exits:

main()
{
for(big_old_loop)
{
heap *h = heap_create();
func(h);
heap_dispose(h);
}
}

[...]

Click to expand...

calls like

obj *p = (obj *) malloc(sizeof *p);

are replaced with:

obj *p = (obj *) heap_alloc(my_heap, sizeof *p);

or through a wrapper which looks like malloc.

This is not a ``huge maintenance problem''.

I think it can be more complex than that. The malloc() family consists
(under SUS) of at least malloc(), calloc(), realloc(), free(),
posix_memalign(). Add the historical functions valloc() and memalign(), then
perhaps anonymous (and even file-backed) mmap()'s and whatever else.
Several other functions allocate memory or other resources, like
catopen(), iconv_open() and regcomp().

exit() does a lot of things in kernel space, "standard library space"
and "user program space".

http://www.opengroup.org/onlinepubs/000095399/functions/exit.html

When the body of main() is moved into a loop, free()'ing what was
malloc()'d is only one thing that must be "emulated" manually before the
next iteration commences, if the original programmer didn't deal with it
him- or herself.

Setup functionality must be extracted anyway, but if the programmer was
liberal in the "meat" of the code with releasing resources, it may be
very hard to register all the corresponding allocations in a bottom-up
way -- there are many types of allocations, also implicit allocations
like strdup(). (Perhaps strdup() should be considered a member of the
malloc() family.)

I'd guess in an "elegant" C++ program, the root(s) of the object graph
are auto objects in main() and static objects scattered all over the
place. Their destructors will be called in the end, and that should
result in an avalanche of destructors. Of course one can call _exit() by
hand, but I still reckon this final cascade is common in C++ programs,
and that it doesn't cause many problems. Nor should its C equivalent.

Cheers,
lacos

Richard Tobin · Feb 17, 2010

Kaz Kylheku said:
Only a complete fool argues against a statement about ``one action'',
without any agreed-upon definition of ``action'' anywhere in sight.

My aim was to enlighten the poster, not to win an argument by
pedantry. If that's being a complete fool, I can put up with it.

-- Richard

Pascal J. Bourguignon · Feb 17, 2010

Hyman Rosen said:
Experience. Logic.

No. A program which insists on freeing its allocated memory
does so by going through each allocated object individually
freeing it. This can involve large amounts of time when the
program has allocated millions of objects in its data
structures. Upon program exit, the operating system reclaims
all memory allocated by the program in one action, and that
takes essentially no time at all.

This is not even to mention that when a program frees allocated
memory through 'free', it is usually doing nothing but placing
that memory back onto an internal data structure so that it may
be allocated again. It is seldom returned to the operating system
because it was not allocated that way; allocators request large
blocks and break them up internally to the program.

What is worse, it makes the program touch all the memory blocs,
therefore fetching all of them from the swap file!

So instead of just exiting and have one swap block write (to free the
swap used by that process), you now have 4 GB of swap out to free memory
for the loading of 4 GB of swap in, just to "free" that memory before
exiting, and then of course, reading again the 4GB swapped out
previous...

Hence my use of kill -9 for "applications" that don't quit when I tell
them to quit...

Keith Thompson · Feb 17, 2010

Hyman Rosen said:
Actually, it's the other way around - you have to exercise a little
care not to get the cleanup, because these days memory allocations
are wrapped in objects with destructors.

Not in this newsgroup. (I'm posting from comp.lang.c.)

Keith Thompson · Feb 17, 2010

Hyman Rosen said:
Unless you're very careful,

So be very careful.

this is an anti-pattern,
and will cause difficult to diagnose errors for this
common singleton pattern:

char *getGlobalFoozleBuffer()
{
static char *buf = calloc(147, 13);
return buf;
}

Why is buf static?

I think what Kaz had in mind is that any memory that needs to be
deallocated on each iteration of the loop is allocated, not via
malloc and friends, but by some other routines that use the "heap"
object created by heap_create(). Since getGlobalFoozleBuffer
calls calloc() directly, the allocated memory won't be affected
by heap_dispose().

The tricky part is knowing which allocations need to survive
throughout the program execution, and which need to be cleaned up
on each iteration of the outer loop. And this can be extended to
multiple levels; the above main() might eventually become a function
in a larger program.

This is also another example of a sort of memory allocation
which is not, and need not be, freed by the program before
it exits.

Maybe. It depends on how global the FoozleBuffer really is.

Composability and Concurrency and Functional programming	1	Jun 13, 2014
Experiment: Church lists in Python	0	Jan 16, 2009
Java App for an Online Experiment	1	Aug 23, 2007
Functional schmunctional...	10	Feb 10, 2009
The Concepts and Confusions of Prefix, Infix, Postfix and Fully Functional Notations	30	May 23, 2007
C++ Now 2013 Call for Submissions	0	Oct 31, 2012
PEP thought experiment: Unix style exec for function/method calls	4	Jun 25, 2006
simpler over view on dao: a functional logic solver with builtinparsing power, and dinpy, the sugar	0	Nov 8, 2011

Experiment: functional concepts in C

Andrew Poelstra

ImpalerCore

Willem

santosh

Nick Keighley

Nick Keighley

Nick Keighley

Ersek, Laszlo

Nick Keighley

Nick Keighley

Nick Keighley

Michael Foukarakis

jacob navia

Ben Bacarisse

ImpalerCore

Ersek, Laszlo

Richard Tobin

Pascal J. Bourguignon

Keith Thompson

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads