Strange C developments

J

Jase Schick

Hi Can anyone explain why C has added support for pthreads, while NOT
adding support for garbage collection? Convenient memory management would
be a much greater enhancement than replicating a perfectly good existing
library it seems to me.

Jase
 
L

Les Cargill

Jase said:
Hi Can anyone explain why C has added support for pthreads, while NOT
adding support for garbage collection?

GC in 'C' is a non sequitur.
Convenient memory management would
be a much greater enhancement than replicating a perfectly good existing
library it seems to me.

Jase

I think you want Java, then.
 
S

Stefan Ram

christian.bau said:
And sorry to say, but C wasn't designed with garbage collection in
mind.

I believe that a GC is both important and nice, a great productivity
enhancer, but for /higher-level languages/. C, on the other hand,
is /intended to be/ a low-level language. A language that adds just
a thin layer over the machine language. It is intended to be
a language to possibly /implement a garbage collector in/, when
it should be needed for a higher level language.

This is about layering: One does not complain that features of a
higher layer are not present in a lower layer, because this would
break the layering. When one wants to use Perl, Java, or LISP
instead of C, one is always free to do so.

There also is the Boehm-Demers-Weiser conservative garbage collector.
 
R

Rui Maciel

Jase said:
Hi Can anyone explain why C has added support for pthreads, while NOT
adding support for garbage collection? Convenient memory management would
be a much greater enhancement than replicating a perfectly good existing
library it seems to me.

Why do you believe that garbage collection should be added to the C
standard?


Rui Maciel
 
B

BGB

Hi Can anyone explain why C has added support for pthreads, while NOT
adding support for garbage collection? Convenient memory management would
be a much greater enhancement than replicating a perfectly good existing
library it seems to me.

more people can agree on the behavior of a threading library than they
can on the behavior of a GC library?...

a few possible issues with a GC:
precise or conservative GC?
how does it represent references?
how does it interact with stack variables, global variables, or malloc?
does it simply behave like malloc, or does it also preserve type
information?
....

some of these issues would need to be addressed, and as-is, people get
by ok either using or implementing garbage-collection via libraries.
 
N

Nobody

Hi Can anyone explain why C has added support for pthreads, while NOT
adding support for garbage collection?

GC requires a "walled garden", which C isn't.

You can't just "add support" for GC, you have to design the language
around the ability to enumerate references.
 
J

jacob navia

Le 20/07/12 23:14, Jase Schick a écrit :
Hi Can anyone explain why C has added support for pthreads, while NOT
adding support for garbage collection? Convenient memory management would
be a much greater enhancement than replicating a perfectly good existing
library it seems to me.

Jase
The lcc-win compiler offers a garbage collector (Boehm's) in its
standard distribution. It is a very useful feature, used for instance in
the debugger of lcc-win, in the IDE and several other applications. Of
course it is used by many of the people that have downloaded lcc-win
(more than 1 million)
 
Q

Quentin Pope

Le 20/07/12 23:14, Jase Schick a écrit :
The lcc-win compiler offers a garbage collector (Boehm's) in its
standard distribution. It is a very useful feature, used for instance in
the debugger of lcc-win, in the IDE and several other applications. Of
course it is used by many of the people that have downloaded lcc-win
(more than 1 million)

Do you never get tired of spamming this group with advertising for your
compiler?

Adding garbage collection would break a large amount of existing code.

Often the bottom couple of bits of pointers to memory with known
alignment properties will be used to store information (the pointer than
being and'd with ~0x3ul or similar prior to dereferencing).

Many code protection methods rely on storing pointers xor'd with an
obfuscating mask. GCs are not sophisticated enough to track such pointers.

And what is the gain? With careful programming, there is no need
whatsoever for this stupid overhead. Leave it for the kiddies programming
JAVA.

//QP
 
J

jacob navia

Le 21/07/12 12:23, Quentin Pope a écrit :
Do you never get tired of spamming this group with advertising for your
compiler?

Adding garbage collection would break a large amount of existing code.

To port existing code to a GC environment you do not need to change a
single line. Just define malloc as gc_malloc and define free as a noop.

Often the bottom couple of bits of pointers to memory with known
alignment properties will be used to store information (the pointer than
being and'd with ~0x3ul or similar prior to dereferencing).

??? That is not the case with the GC used by lcc-win.
Many code protection methods rely on storing pointers xor'd with an
obfuscating mask. GCs are not sophisticated enough to track such pointers.

Yes, that kind of code shouldn't be used with a GC.
And what is the gain?

The gain is that instead of loosing endless hours tracking that dangling
pointer in the debugger you can concentrate on your application instead.

With careful programming, there is no need
whatsoever for this stupid overhead.

You fail to mention "With careful programming and not making any
mistake. NEVER. A single moment of inattention and you are screwed.


Leave it for the kiddies programming

JAVA, Lisp, C++, C, all the languages that can be used ith a collector.
 
M

Malcolm McLean

בת×ריך ×™×•× ×©×‘×ª,21 ביולי 2012 11:23:52 UTC+1, מ×ת Quentin Pope:
Often the bottom couple of bits of pointers to memory with known
alignment properties will be used to store information (the pointer than
being anded with ~0x3ul or similar prior to dereferencing).

Many code protection methods rely on storing pointers xored with an
obfuscating mask. GCs are not sophisticated enough to track such pointers..
Rarely do you need to do this sort of thing, particularly in a hosted environment.

The gain is that far too much C code is concerned with handling memory allocation failures and clean-up that can't happen. For instance user provides a list of filenames in a configuration file. I need to read them in and return as a list of strings. If a memory allocation failure occurs halfway through building the list, I've got to deallocate a half-built list, and return an error condition, probably a null pointer. The code to handle this willprobably be about half the function, even though if I've 4GB of memory installed, and the total allocation is 1000 bytes, the computer is more likelyto suffer an electrical failure than it is to run out of memory.

Thne you've got to write little function just to deallocate the list of strings, in normal use.

The reason I don't use garbage collection is a) it's non-standard and b) most garbage collectors are unacceptably inefficient for high performance routines. But it's a blessing from the coding angle.
 
B

BGB

GC requires a "walled garden", which C isn't.

You can't just "add support" for GC, you have to design the language
around the ability to enumerate references.

actually, it can be done, but mostly in the form of conservative GCs,
such as Boehm.

I also use a GC, which is functionally similar to the Boehm GC as well,
but differs mostly in the use of type-tagging (and some ability to
assign special treatment to type-tags), so it is a little different.

I ended up primarily using it with a hybrid strategy, where manual
memory-management is used for the most part, and GC is used mostly to
clean up for the case of memory leaks.
 
B

BGB

Jacob, what rules would have to be added for application writers
to use the GC in lcc-win besides the C standard and "don't invoke
undefined behavior"? Hint: the answer is not "none". And that's
not intended as a put-down of lcc's GC or GC in general.

Or, to put it another way which I'm sure you will still think of
as a personal attack, list the things an application programmer
could do to break GC that no sane programmer would do (someone will
come up with a sane-sounding reason for doing it anyway) but the C
standard still allows. If the C standard calls it undefined behavior
on the part of the application, GC is off the hook.

I'll define "break GC" as any one or more of the following:
- Collecting non-garbage
- Aborting the program with things like segfaults.
- A 1,000,000% slowdown
- A GC so conservative it never collects anything.

in my case, the policy is basically just to give code the choice.
if it wants to use malloc, it can use malloc;
if it wants to use the GC, it can use the GC.

the GC wont generally trace through memory allocated via malloc though.
in my case, there is a "gcmalloc()" call, which is behaviorally similar
to malloc (it wont automatically release the memory), except that the GC
will trace through it.

putting pointers to malloc-managed memory in GC-managed objects works,
just the GC will ignore them.

it is possible to register behavior hooks with the GC to allow
custom-managed memory regions.


usually, the GC wont collect anything until after a certain amount of
memory is used (can be set by the program), so if the app keeps its
memory use below this limit, the GC wont run (this currently leads to a
case of setting the limit at something like 1GB, which makes the GC
running, except in cases where the app "springs a leak" fairly unlikely).

I'm pretty sure that writing pointers to a (potentially terabyte)
temporary file, erasing the pointers from memory, and later reading
the pointers back from the file (all done by the same run of the
same program) will either confuse or horribly slow down GC. And I
think that's perfectly OK with the C standard.

actually, more likely, it will become timing dependent:
if the GC runs during the time the pointers no longer exist, the objects
may be freed;
if it does not, nothing happens (the pointers will just come back in
pointing at the same objects).

in the case where the pointers are read back in for objects which have
been freed, then essentially they are just dangling pointers (and the GC
will either ignore them, or treat them as if they point to whatever new
object was allocated at that address).


this case only really matters if the GC is being used to replace malloc
or similar.

otherwise, a person could declare that the case of temporarily hiding
and then restoring pointers is itself undefined-behavior.

I take that to mean: often an application program will abuse the bottom
couple of bits of pointers to memory with known alignment properties to
store information, and this will break the GC. I think doing that to
pointers invokes undefined behavior, at least if you do stuff like
void *ptr;

ptr |= 3;
or ptr &= ~3;

so the GC has a perfectly good excuse for breaking.

my GC will handle this case just fine (my GC is more conservative, and
so will by-default treat a pointer to anywhere in the object as if it
were a GC reference). likewise, most GC operations will accept these
pointers as well, and there is also a "gcGetBase()" operation mostly
specially for this case: it allows taking a pointer into an object and
getting the starting address of the object.


I don't know how Boehm handles it exactly, but IIRC there are script VMs
which use Boehm and use similar tagging schemes.


I don't personally use this sort of tagging in my VM, instead having the
GC keep track of the type, so whenever the VM needs to do something, it
will fetch the type-name for an object, or maybe fetch an associated
type-vtable (the VM has tables of function-pointers used to represent
common operations on objects of various types).

note that there is also a Class/Instance OO system, but this uses its
own independent vtables, and these objects exist as a single type from
the POV of the GC.

Code protection methods are designed to break the application, but
the C standard doesn't forbid that.

simple: label the case of XOR'ed pointers + GC'ed objects as undefined.

I'm not sure the C standard allows that specific method of disguising
pointers, but I think you are allowed to format a pointer with
sprintf() and %p, erase the original pointer, then later feed the
buffer sprintf() wrote into sscanf() and %p, and get back the same
pointer. While it's in text form, you could do all sorts of things
to it (encrypt, store it in a file, etc.) , as long as it eventually
gets back to the original text.

possible, but this case can also be labeled as undefined with GC
objects, or maybe, it can be restricted to certain cases, such as it is
only allowed if the objects are pinned/locked beforehand, or if they
were allocated either with malloc or a malloc-like call (and so will not
be implicitly freed).

People want to know how to recognize "that kind of code" and not
try to use GC with it, rather than trying it and figuring it out
the hard way. And I don't really see a "don't disguise pointers"
rule as being too onerous if you can clearly define what *ISN'T*
disguised. Probably 99.9% of programs would need no change.

in my case, it is more like:
pointers stored in global variables are visible (1);
pointers stored in stack variables are visible (2);
pointers stored in other GC managed objects.


1: on both Windows and Linux, the GC will walk the list of loaded
modules and scan the ".data" and ".bss" sections and similar. this also
includes any "static" variables within functions.

2: this can be a little harder. in my case, the GC supplies its own
thread-creation functions, but it is possible to do so without doing
this (for example, Boehm walks the OS thread list AFAIK). there is a
difficulty with knowing the exact stack address on WoW64 targets (an
issue for Boehm AFAIK), but my GC uses a different strategy: it just
scans the stack until either the known limit is encountered, or Windows
throws a SEH exception (when the scan function tries to scan into an
uninitialized guard page), which the function catches and handles.

Oh, yes, another rule for a working GC is that the GC has to
occasionally actually collect some garbage if there's any around
to collect. It doesn't have to catch everything immediately, but
code like:

#define malloc(x) gc_malloc(x)
int main(void)
{
while(malloc(6)) { /* loop */ }
}
shouldn't terminate the loop because malloc() eventually returns NULL
due to insufficient collection of garbage.

usually what happens in this case is that the caller thread will block
until the GC thread can run and finish (running out of memory triggers
an immediate GC).
 
B

Ben Bacarisse

BGB said:
actually, it can be done, but mostly in the form of conservative GCs,
such as Boehm.

People are, I think, talking at cross purposes. Those who say "it can
be done" mean it can be done in a way that works for some (possibly
large) set of programs. Those that say it can't be done mean that there
are correct C program which will break when linked, unchanged, against a
collecting implementation of malloc.

The discussion would involve much less pointless back-and-forth if
everyone started out by agreeing that it can work for a lot of programs
but it can't work for all programs. The debate could then be about the
kinds of program for which GC might or might not be useful.

<snip>
 
J

jacob navia

Le 21/07/12 15:57, Gordon Burditt a écrit :
Jacob, what rules would have to be added for application writers
to use the GC in lcc-win besides the C standard and "don't invoke
undefined behavior"? Hint: the answer is not "none". And that's
not intended as a put-down of lcc's GC or GC in general.

1) Do not hide pointers to the collector, i.e. the collector will NOT
search the window extra bytes (for instance) for pointers, or the hard
disk. If you need to XOR pointers to hide them do not use the collector
either.

2) In modern machines a collector slows down the program for at most
a milisecond in normal situations, that could be bigger but not much
bigger since the collector tries to spread out the GC time in each
allocation.

That's all.

jacob
 
J

jacob navia

Le 21/07/12 15:57, Gordon Burditt a écrit :
- Collecting non-garbage
- Aborting the program with things like segfaults.
- A 1,000,000% slowdown
- A GC so conservative it never collects anything.

You can collect non garbage when you hide the pointers
from the collector using XOR. Then the collector will
assume that all the pointers not referenced are free
and havoc will ensue.

Also, if you always keep pointers to everything never setting them to
NULL, and all pointers are global (or the roots are in the main()
function, the collector will never collect anything.

For instance a global list will never be collected if its root is a
global pointer. To collect that you have to set that pointer to NULL.
 
B

BGB

People are, I think, talking at cross purposes. Those who say "it can
be done" mean it can be done in a way that works for some (possibly
large) set of programs. Those that say it can't be done mean that there
are correct C program which will break when linked, unchanged, against a
collecting implementation of malloc.

and, I wasn't thinking about replacing "malloc()" in the first place,
rather my GC works via its own calls ("gcalloc()", "gctalloc()",
"gcmalloc()", "gcfree()", ...).

in this case, linking most C programs unchanged against the GC wouldn't
change or break anything, but they wouldn't be using the GC in this case
either...

The discussion would involve much less pointless back-and-forth if
everyone started out by agreeing that it can work for a lot of programs
but it can't work for all programs. The debate could then be about the
kinds of program for which GC might or might not be useful.

or, possibly, what exact form the GC would take in the first place...
 
M

Malcolm McLean

בת×ריך ×™×•× ×©×‘×ª,21 ביולי 2012 17:38:12 UTC+1, מ×ת Ben Bacarisse:
BGB &lt;[email protected]&gt; writes:

People are, I think, talking at cross purposes. Those who say 'it can
be done' mean it can be done in a way that works for some (possibly
large) set of programs. Those that say it can't be done mean that there
are correct C program which will break when linked, unchanged, against a
collecting implementation of malloc.
In C a pointer is valid if saved to a file in binary form then read back inagain. It's invalid if read back in to an second instance of the program. So the standard needs a tweak to make the first situation also invalid.
I don't see that as a problem, except for very specialised memory paging software that lives deep down in the bowels of the operating system. That's generally written in a non-standard version of C anyway.
 
B

Ben Bacarisse

Malcolm McLean said:
בת×ריך ×™×•× ×©×‘×ª, 21 ביולי 2012 17:38:12 UTC+1, מ×ת Ben Bacarisse:
In C a pointer is valid if saved to a file in binary form then read
back in again. It's invalid if read back in to an second instance of
the program. So the standard needs a tweak to make the first situation
also invalid.

Why? What would be the benefit?

<snip>
 
N

Nobody

Why? What would be the benefit?

What would be the benefit in writing a pointer to a file, or to making
that invalid?

Writing a pointer to a file may be useful for virtual memory systems such
as that used by the Win16 API.

Forbidding writing pointers to files would eliminate one possible
mechanism whereby "transparent" GC would fail.
 
T

Tim Rentsch

christian.bau said:
Interestingly, Apple just killed garbage collection in their Objective-
C compilers and never moved support for GC to the iPhone.
So someone there thinks that garbage collection isn't _that_ useful.

I believe that is consequent to a confluence of (a) wanting the
environments on the iPhone and MacOS to be the same, (b) needing a
certain level of real-time response on the iPhone, and (c) adopting a
different approach to resource management that allows reference
counting to be used rather than general GC.

It isn't hard to write a high-performance, general-purpose GC. It's
much harder to write a high-performance GC that observes severe
real-time constraints.
And sorry to say, but C wasn't designed with garbage collection in
mind.

But that doesn't precude GC being either feasible or practical in
a C environment.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top