Experiment: functional concepts in C

  • Thread starter Ertugrul Söylemez
  • Start date
E

Ertugrul Söylemez

Seebs said:
In general, I agree. In practice, though, I don't see anything
intrinsically wrong with a system guaranteeing that certain classes of
resources are automatically deallocated on exit, and relying on that
behavior.

It is good that the operating system does that, but it's wrong to rely
on that feature.

How many people have you seen close stdout, stderr, and stdin at the
end of a program? Why don't they do that? You'd normally close any
file that was previously open, right?

You're not supposed to close the std-handles, because you have not
opened them. You are responsible for closing/freeing resources you have
opened/allocated yourself. For many resources there is even a stack of
allocations, which you should follow. The pattern looks like this:

<file1>
<file2>
<someCode />
<file3>
<someCode />
</file3>
<someCode />
</file2>
</file1>

As your main function is entered, there are already three of those file
tags open, one for each of stdin, stdout and stderr and possibly some
others. Closing one of the handles is the same as putting a closing tag
too early. Because of system peculiarities you may be forced to do
that, but it's better to avoid for the sake of consistency. Closing a
resource prematurely is just as wrong as closing it too late (or not
closing it at all).

That's also the reason I suggest freeing all of the allocated memory
before exiting. If you don't, there is a closing tag missing in your
program pattern. When you restructure your program, you might
accidentally forget that missing closing tag and introduce memory leaks.
You know how difficult it can be to find memory leaks, if you ever
notice them at all.

The CPS-based resource management function I proposed solves this
problem to some extent, because every resource has a clear scope, and
without abusing global/static variables, it can't even escape that
scope. In other words, the closing tag is enforced in the pattern.


Greets
Ertugrul
 
E

Ersek, Laszlo

How many people have you seen close stdout, stderr, and stdin at the end
of a program? Why don't they do that?

Because they didn't open them in the first place. *Flushing* stdout is
another thing, of course. Stderr is generally not flushed because it is
never fully buffered per default, and sane programs write line-oriented
diagnostics anyway. (Progress indicators more advanced than
line-oriented diagnostics but still not quite on the curses level should
go through /dev/tty.)

I did close stdin / stdout in some of my utilities. I'm not sure if this
is allowed, even looking at fclose() / freopen() -- at least on OpenVMS
this triggered elaborate crashes, and generally, I guess even external
utilities accessing processes through /proc expect fd's 0, 1 and 2 to
have their original meanings. Daemons don't close their standard I/O
streams or the file descriptors underneath them, they redirect them from
(to) /dev/null.

You'd normally close any file that
was previously open, right?

If I was the one to open it, then yes.

Cheers,
lacos
 
E

Ertugrul Söylemez

Seebs said:
One in which we expect the operating system to manage resources.

Which is actually pretty much the point of calling something an
"operating system".

It is a bug, but some people tolerate it, some don't. What the
operating system does here is not called resource management, but fault
tolerance. It should be designed not to fail on programming errors.

It's wrong in a similar way to keep helper threads running, but the
operating system is not fault tolerant in this case, so it would in fact
lead to a system crash sooner or later.

If the OS doesn't know which programs have which resources, it's
already failed.

If a program launches several worker threads and then daemonizes, the
operating system generally loses track of the relations.

It turns out, though, that marking the few megabytes a process used to
have as "no longer in use" is much, much, cheaper than having the
process try to walk through a ton of complicated data structures.

In times, where you measure processor cycles in GHz and available memory
in GiB and where the complexity of applications gets huge, it's very
dangerous to go with program exiting (!) performance. In general you
will prefer correctness and predictability over a slightly faster
process exit.

Remember we're not talking about in-program performance here. We are
talking about program shutdown. Also even walking a data structure with
millions of branches takes less than a second on today's machines.


Greets
Ertugrul
 
E

Ertugrul Söylemez

Hyman Rosen said:
What a bizarre notion. What happens to the resources of a program when
the program exits is defined by the operating system. Relying on
OS-defined behavior in your program is not an error. Performing
time-wasting activity on abstract grounds of illusory correctness, on
the other hand, is an error.

You're showing that you are an irresponsible programmer. C and Cobol
are about the only languages in wide-spread use, which even allow that
kind of carelessness. Modern languages are even designed to prevent you
from being stupid. But obviously you want to be stupid.

Relying on operating system fault tolerance is wrong and most good
programmers will agree, at the very least because they want their
programs to be portable, well structured and modular. Even basic
constructs of modern programming languages like OOP disagree with you.

And as said, your irresponsible attitude will bite you as soon as you
need to rewrite parts of your program. Your coding style (relying on
environmental properties) is highly nonmodular. Modularity is the key
to maintainability, and if you disagree here, you're just ridiculous and
nobody will take you seriously anyway.


Greets
Ertugrul
 
I

ImpalerCore

What a bizarre notion. What happens to the resources of a
program when the program exits is defined by the operating
system. Relying on OS-defined behavior in your program is
not an error. Performing time-wasting activity on abstract
grounds of illusory correctness, on the other hand, is an
error.

Do you differentiate between detecting memory that is considered a
leak versus memory that you allow the OS to cleanup? I would think it
would be a benefit to know if the application *can* shutdown cleanly
so that memory leaks could be determined. Whether you consider that
time-wasting is up to you. Does that attitude also extend to library
development?

What's your opinion on other I/O sources? Do you bother to close
files, sockets, database connections, or do you also depend on the
underlying architecture to "do the right thing"?

I'm biased this way since I have not encountered a situation where the
shutdown time/complexity is an issue, but maybe you have.

Best regards,
John D.
 
I

ImpalerCore

That's what I said originally - the program should act
the way a garbage collector does. If it's leaking memory
by losing track of it, then the program is likely broken.
If it's simply building up very large data structures,
but they're all reachable, then there's no need to clean
up that memory before exiting.

 > Does that attitude also extend to library development?

Which attitude? If a library offers ways to build up
structures, it should offer a way to tear them down.
Whether those tear downs should be invoked by the program
before exiting will depend on what they do.


That depends. Files are usually encapsulated by data
structures that cache writes before committing to the
operating system, so a finalization step is needed to
make sure that the caches are flushed. I think, but I
am not sure, that it is harmless to let a database
connection die with the program, if the program is
accessing the database in small self-contained actions
(e.g., not holding a transaction open for the duration
of the session).

But other resources are seldom held in the numbers of
which allocated objects are held. It does not take long
to close a handful of files or connections. It can take
very long to free millions of pieces of memory. And doing
so accomplishes nothing.


I have.

Care to elaborate?
 
K

Kaz Kylheku

Again, in embedded systems many OS's choose not to support

I doubt you could name one, and if you could, that you could
get someone to pay you a dime to develop something for it.
bad programming. There are no users downloading garbage

A program implements some sort of specification. In commercial development,
the specification is usually a concrete document.

Your specification is that of a hobbyist; you have in mind the idea of a
a maximally portable program which tidies up after itself.

But a program which doesn't meet your academic ideal of what
constitutes a good program is not a bad program.

A ``bad'' program is one which, firstly, doesn't meet its specification,
whatever that is, and secondly, one which implements a faulty specification.

A specification which says that the program need not clean up after itself is
rationally justifiable. The functionality of process cleanup is already
implemented in the intended target platforms: it is found in one place which
has been well debugged. For you to contravene the specification and start
adding superfluous code to the program would be bad programming.

Code carries a risk. Unnecessary code carries unnecessary risk.
Unnecessary risk is never acceptable.

Many platforms which clean up after a program can do so even if the program
has misused dynamic memory (e.g. overran or underran a buffer). They can do
it even if the program has terminated abnormally and abruptly, possibly even
if that happened while the program was executing the memory allocator itself.

It is a bad strategy for a portable program to be written such that it treats
reliable target platforms as if they were unreliable. So the ``one size fits
all'' idea of a maximally portable program without any conditional
compilation is actually bad design. One design for every situation is
poor engineering.
But in "most cases" there is so little memory allocated
that the equivalent free() calls will take practically
zero time anyway. In the cases that a ton of memory is
used, then the OS has its work set out for it whether or
not the code is explicitly calling free().

You persist in being techincally wrong here, demostrating a gaping lack of
knowledge about mainstream platforms and their middleware.

On most platforms, calling free on every outstanding object
prior to termination will not subtract from any of the execution
time that is subsequently required to clean up that process.
 
K

Kaz Kylheku

What a bizarre notion. What happens to the resources of a
program when the program exits is defined by the operating
system. Relying on OS-defined behavior in your program is
not an error. Performing time-wasting activity on abstract
grounds of illusory correctness, on the other hand, is an
error.

Not only is it compute-time wasting, it is developer-time wasting
to develop it.

Furthermore, the code carries a risk. Since the code is unnecessary,
the risk is unnecessary.

It's difficult to justify unnecessary risk.
 
K

Kaz Kylheku

You're showing that you are an irresponsible programmer. C and Cobol
are about the only languages in wide-spread use, which even allow that
kind of carelessness. Modern languages are even designed to prevent you
from being stupid. But obviously you want to be stupid.

What modern languages would those be? I suspect you could not name one
such that you are not wrong.

In any language that can be called modern, allocate objects without
concern for how they will be freed.
Relying on operating system fault tolerance is wrong and most good

It's not fault tolerance but resource management. You are pitifully
wrong in your other posting.

Relying on this is no more wrong than relying on a TCP/IP stack,
graphics display, or DMA transfers.
It's no
programmers will agree, at the very least because they want their
programs to be portable

Programmers who get paid for a living want their programs to be portable in a
way that is /economically/ relevant, balanced with other requirements.

Maximal portability, the kind where we pretend we write a single body of
source code without conditionl compilation while pretending we programming
for an incapable, broken platform, is only an obsession of a few dull minds.

That should come as no surprise: it requires a lower level reasoning than
doing a crossword puzzle from the newspaper.
Modularity is the key
to maintainability, and if you disagree here, you're just ridiculous and
nobody will take you seriously anyway.

Without automatic storage managament, there is no real modularity. Modules
are coupled together by the distributed responsibility of memory management.
It creeps into all the interfaces: who allocates what and who will free it.

It's laughable to advocate explicit storage deallocation and modularity in
the same breath.
 
K

Kaz Kylheku

I am going to disagree with you on this point: on my system, if a program
cleans up its own memory, I see a delay between clicking the X (or Ctrl+C,
or whatever) and the program actually dying. During this time it does its
cleanup, plays with the hard drive and whatever. Meanwhile, I am free to
do other things.

However, if the program just dies, I see a quick and painless exit. And
then the kernel takes over to do cleanup, which has a higher priority than
the dead program. And while it's cleaning up, the rest of my system is
sluggish, even if only for a second.

ROFL.

I'd like a word with the prof who taught you Operating Systems 301.
 
J

jacob navia

Hyman Rosen a écrit :
That's what I said originally - the program should act
the way a garbage collector does. If it's leaking memory
by losing track of it, then the program is likely broken.
If it's simply building up very large data structures,
but they're all reachable, then there's no need to clean
up that memory before exiting.


Which attitude? If a library offers ways to build up
structures, it should offer a way to tear them down.
Whether those tear downs should be invoked by the program
before exiting will depend on what they do.


That depends. Files are usually encapsulated by data
structures that cache writes before committing to the
operating system, so a finalization step is needed to
make sure that the caches are flushed. I think, but I
am not sure, that it is harmless to let a database
connection die with the program, if the program is
accessing the database in small self-contained actions
(e.g., not holding a transaction open for the duration
of the session).

But other resources are seldom held in the numbers of
which allocated objects are held. It does not take long
to close a handful of files or connections. It can take
very long to free millions of pieces of memory. And doing
so accomplishes nothing.


I have.

Me too.

The lcc-win compiler never calls free(). Since windows will
free the memory used by the compiler, it is pointless to spend
time freeing everything. The same holds for lcc-win under linux
and Mac OS X.

In a compiler, the complicated structures needed by the building
of the compilation context are needed to the very end of the
program.

Symbol tables must be present so that the assembler knows which
globals must be emitted, after all code generation is done.

Each scope is allocated and then released, of course, to avoid
using up too much memory, but this is not done with free().
Each function allocates all its transient data structures
(entries for local variables, local declarations, etc) in
a single heap that is released with a single pointer assignment,
like a stack.

In fact, each statement allocates data structures that are released
in the same fashion (stack-like). But all the data allocated
with malloc that is global is retained, and never released,
assuming the OS will do that. This has been working reliably
since at least 2002-2003, when I eliminated all calls to
free() and eliminated all code that was doing the cleanup before
exiting.

Note that the order of freeing is very strict. You can never free an
object until all the objects that it contains are freed. Freeing
all the interconnected data structures can be dangerous since it
could provoke a trap. Much simpler is just to eliminate that
code.

The same is true for the linker. There, the situation is even more
drastic since you can't free anything until the very end. Why
should the linker do that? Just exiting to the OS will free
everything anyway.

Both the compiler and the linker are very fast. ANd that is a
feature users appreciate, more than some hypothetical
"correctness".
 
S

Seebs

It is good that the operating system does that, but it's wrong to rely
on that feature.

I don't agree that it is necessarily wrong to rely on OS features.
You're not supposed to close the std-handles, because you have not
opened them.

What if I have? ("freopen()")

Am I obliged to try to fclose() stderr if I have opened it with freopen()?
As your main function is entered, there are already three of those file
tags open, one for each of stdin, stdout and stderr and possibly some
others. Closing one of the handles is the same as putting a closing tag
too early. Because of system peculiarities you may be forced to do
that, but it's better to avoid for the sake of consistency. Closing a
resource prematurely is just as wrong as closing it too late (or not
closing it at all).

Okay, that's a good argument. In light of that, I'd guess that your answer
to the freopen() question would be, no, you don't have to close that, because
the environment is still expected to close stderr if it needs to, etc.
That's also the reason I suggest freeing all of the allocated memory
before exiting. If you don't, there is a closing tag missing in your
program pattern. When you restructure your program, you might
accidentally forget that missing closing tag and introduce memory leaks.
You know how difficult it can be to find memory leaks, if you ever
notice them at all.

I actually have a memory debugger I wrote to find them. As you might expect,
it couldn't solve the actual problem.

Because, technically, it wasn't a leak -- when I went through freeing
everything at the end of program execution, all the allocated space got
freed. However. During execution, it was possible for a particular object
to end up with a linked list of unbounded size of allocated things that it
maintained as internal state, which were neither used nor exposed to any
other interface, making it very hard to find them -- and since it did free
them correctly on exit, there was no memory leak.

-s
 
S

Seebs

It is a bug, but some people tolerate it, some don't. What the
operating system does here is not called resource management, but fault
tolerance. It should be designed not to fail on programming errors.

But it's not necessarily a programming error to let the operating system
deallocate resources. It'd be a programming error if that were contrary to
the documented behavior of the OS. Otherwise, it may well be intentional
and well-considered.
If a program launches several worker threads and then daemonizes, the
operating system generally loses track of the relations.

On unix-like stuff, as long as they are actually *threads*, it usually
doesn't.
In times, where you measure processor cycles in GHz and available memory
in GiB and where the complexity of applications gets huge, it's very
dangerous to go with program exiting (!) performance. In general you
will prefer correctness and predictability over a slightly faster
process exit.

I'm not convinced. There are a lot of programs which are, by design, run
EXTREMELY often.
Remember we're not talking about in-program performance here. We are
talking about program shutdown. Also even walking a data structure with
millions of branches takes less than a second on today's machines.

I do a lot of work on a build system. In the course of a single run of the
build system, there are programs that get run upwards of a hundred thousand
times. (Say, the shell.) In that case, I think it makes a great deal of
sense to think very carefully about performance of startup and exit.

Which is why it's useful to know what the operating system's designed
semantics are. One could easily imagine an operating system where a
particular device could only accept full-block writes of, say, 512 bytes.
However, we don't then say that it is a "programming error" to ever write
to any device except in 512-byte blocks...

-s
 
S

Seebs

You're showing that you are an irresponsible programmer. C and Cobol
are about the only languages in wide-spread use, which even allow that
kind of carelessness. Modern languages are even designed to prevent you
from being stupid. But obviously you want to be stupid. [...]
and if you disagree here, you're just ridiculous and
nobody will take you seriously anyway.

I'm curious, do you come from a culture in which there is any conceivable
way that insulting people like this leads to positive outcomes?

-s
 
S

Seebs

Do you differentiate between detecting memory that is considered a
leak versus memory that you allow the OS to cleanup?

I certainly do.
I would think it
would be a benefit to know if the application *can* shutdown cleanly
so that memory leaks could be determined. Whether you consider that
time-wasting is up to you. Does that attitude also extend to library
development?

It depends a lot on circumstances. Generally, at the least, I'd expect to
know which pieces of storage I'm allocating that I expect to still be there
at the end of execution.
What's your opinion on other I/O sources? Do you bother to close
files, sockets, database connections, or do you also depend on the
underlying architecture to "do the right thing"?

That also depends quite a bit on circumstances. I usually close files, but
there have been exceptions.

As an example, I have a hunk of code I maintain right now, in which a few
files are opened, and a few resources allocated, which are not released on
exit -- because this hunk of code is intercepting the actions of other
programs, and does not necessarily know about their exits. More importantly,
it is fairly trivial to prove that even if I tried to intercept their exits,
I could not do so safely. Either I would deallocate my resources at a time
when the other program could then perform actions which still required them,
or I would have to leave them allocated until a point at which nothing was
going to transfer control back to me. Can't win.
I'm biased this way since I have not encountered a situation where the
shutdown time/complexity is an issue, but maybe you have.

I have. I once had a program which took over five seconds to exit on what
was, at the time, a very fast machine. In that case, I swapped a key data
structure from an array of pointers to an array of objects, and it got faster
by, well, about a factor of eighty, which was good enough.

-s
 
K

Kaz Kylheku

["Followup-To:" header set to comp.lang.c.]
In practice, I do see something wrong with relying on that behaviour,
and my opinion is based on experience of a (fairly large - several
hundred thousand lines) program that did rely on that behaviour. I
didn't write it, but I did have to help fix it.

In bare outline, it worked like this:

main()
{
lots_of_functions_that_do_not_bother_to_clean_up();
}

And then, one day, it got maintained, as programs do:

func()
{
lots_of_functions_that_do_not_bother_to_clean_up();
}

main()
{
for(big_old_loop)
{
func();
}
}

Suddenly we had a huge maintenance problem on our hands because the
original crew were too lazy to manage memory properly. This cost a
significant amount of time (and therefore money) to fix, because the
knowledge of the right points at which to free up the many and varied
allocations had been discarded, and now had to be reacquired.

You wasted your time and money only because you didn't think of redirecting
the functions to use an alternate malloc-like API which keeps track of the
allocations, such that they can be freed with an additional call when func()
exits:

main()
{
for(big_old_loop)
{
heap *h = heap_create();
func(h);
heap_dispose(h);
}
}

func stores h in a global variable somewhere:

extern heap *my_heap;

void func(heap *h)
{
my_heap = h;
}

calls like

obj *p = (obj *) malloc(sizeof *p);

are replaced with:

obj *p = (obj *) heap_alloc(my_heap, sizeof *p);

or through a wrapper which looks like malloc.

This is not a ``huge maintenance problem''.

The original programmer did the right thing by not wasting time on this
problem. The new use case was not a functional requirement for that program.
 
S

Seebs

main()
{
lots_of_functions_that_do_not_bother_to_clean_up();
}

And then, one day, it got maintained, as programs do:

func()
{
lots_of_functions_that_do_not_bother_to_clean_up();
}

Ahh, I see.

I don't do it that way. The only time I abandon cleanup is when there
is no reason to allocate something again once it's allocated -- if I
thought a new pass might allocate the same resources, then I would indeed
free them up.

-s
 
R

Richard Tobin

Perhaps you should write memory managers, if you can really
reclaim millions of individually allocated blocks in a single
action.

A general purpose operating system doesn't need to reclaim the
individually allocated blocks, since they were allocated from pages
allocated by the operating system. It only needs to reclaim those
pages. What's more, it will do this even if you free the individually
allocated blocks in your program.

-- Richard
 
R

Richard Tobin

Richard Heathfield said:
In practice, I do see something wrong with relying on that behaviour,
and my opinion is based on experience of a (fairly large - several
hundred thousand lines) program that did rely on that behaviour. I
didn't write it, but I did have to help fix it. [...]
main()
{
for(big_old_loop)
{
func();
}
}

I have written programs which had this problem. On the other hand, I
have written many more which were never modified in that way.

Overall, would I have saved or wasted time by planning ahead for
something which usually didn't happen? I can't be certain. But I's
much rather do the work when it's needed than when it's not.

And of course, you often have a pretty good idea whether a particular
bit of code is likely to be reused in that way.

-- Richard
 
K

Kaz Kylheku

A general purpose operating system doesn't need to reclaim the
individually allocated blocks, since they were allocated from pages
allocated by the operating system. It only needs to reclaim those
pages. What's more, it will do this even if you free the individually
allocated blocks in your program.

Only a complete fool argues against a statement about ``one action'',
without any agreed-upon definition of ``action'' anywhere in sight.

What is an action? One machine instruction? One system call?

An atomic transaction across fifteen servers can be called ``action''
according to some view of a system.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top