Smart Pointers: Is there something similar to smart pointers in C?

Keith Thompson · Sep 13, 2006

Rod Pemberton said:
No. That idea is completely incorrect.

You're saying that the distinction between C and lcc-win32 is *not* an
important one? Fascinating.

You'll never find a post
from Doug Gwyn (comp.std.c, ANSI C X3J11 standard, developer of the
Army's BRL-UNIX in ANSI C) where his explanation doesn't take the
underlying assembly, physical hardware such as the cpu and memory
into account.
Nonsense.

[snip]

C is built upon assembly. The abstraction of C from assembly (which you're
promoting) is the problem. It's why almost no one here understands what is
and isn't actually pointer in assembly... You need to learn assembly first
to truly understand C. I honestly doubt that any C programmer could write a
C compiler, even when given the additional advantage of using an existing C
compiler, without understanding assembly.

More nonsense.

C, as defined by the ISO C standard, works on an "abstract machine".
It is not necessary to understand the underlying physical CPU to
understand how a portable C program works. Which is very fortunate,
because different hardware works differently; if C programmers
actually had to understand all possible hardware, there would be very
few C programmers.

William Hughes · Sep 13, 2006

Rod said:
No. That idea is completely incorrect.

You seem to be saying that C and C compilers are the same thing,
or at least you can't really understand C without understanding
a C compiler. Many people would disagree.

You'll never find a post from Doug
Gwyn (comp.std.c, ANSI C X3J11 standard, developer of the Army's BRL-UNIX
in ANSI C) where his explanation doesn't take the underlying assembly,
physical hardware such as the cpu and memory into account.

Untrue, and where it is true irrelevent. There are many aspects of
defining the C standard where it is crucial to consider the underlying
hardware.

This is what Larry Rosler (another contributor to ANSI C X3J11) says about
C:
"I taught the first course on C at Bell Labs, using a draft of K&R, which
helped vet the exercises. The students were hardware engineers who were
being induced to learn programming. They found C (which is 'portable
assembly language') much to their liking. Essentials such as pointers are
very clear if you have a machine model in mind."

Well, I don't know anyone who disagrees with this (knowing a machine
model helps in some ways to understand pointers), but Larry Rostler's
comment does not imply that you need to understand a machine model
to understand pointers. It does not deal with the question of whether
there are disadvantages as well as advantages to kowing a machine model
(the most obvious putative disadvantage is the tendency to believe
that pointers must behave
exatly like addresses on machine X)

C is built upon assembly. The abstraction of C from assembly (which you're
promoting) is the problem. It's why almost no one here understands what is
and isn't actually pointer in assembly...

There is no such thing as assembly, there are many. There is
no way to know " what is and isn't actually pointer in assembly".
On the other hand is is possible to know "
what is and isn't actually pointer in C.

You need to learn assembly first
to truly understand C. I honestly doubt that any C programmer could write a
C compiler, even when given the additional advantage of using an existing C
compiler, without understanding assembly.

[Trivialy untrue. Just copy the existing C compiler] So what. Most
people do not write compilers. The question
is, "Can a C programmer write a good C program without understanding
assembly?". I, and many others, would answer yes to that question.

-William Hughes

Mark McIntyre · Sep 13, 2006

The next time you run out of fuel and stall on a railroad crossing,
and want to walk the car out of the way of the train with the
starter motor, please describe how you do that when equipped with a
slush box.

I put mine into 4wd and use the starting handle...

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mark McIntyre · Sep 13, 2006

slush box?

fluid flywheel, the device that automatics rely on to let you change
gear without scrunching all the little cog thingys.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Mark McIntyre · Sep 13, 2006

Having driven for a great many years, I am pleased to report that I have
never, ever stalled on a level crossing. This is not mere luck. It is the
result of forethought.

Yeah, Dr Beeching's....

oof.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

websnarf · Sep 14, 2006

Ancient_Hacker said:
IMHO you can't do that in C. C gives you complete freedom to make
copies of pointers, do pointer arithmetic, pass the address of a
pointer, call arbitrary functions written in bizarre languages--- all
things that will screw up smart pointers and garbage collection to a
fare-thee-well, or at least a seg fault.

What I do is write a logging malloc() and free() so at the end of the
program it can print out "37122 unfreed blocks using 293455128 bytes".
And then a list of file names and lines where those blocks were
malloc'ed.

That's a good approach, which I've used myself. But you can do more --
detect double frees on the fly, as well as frees/reallocs to defective
pointers and take note of frees/reallocs of NULL.

In place of declarations and clearings you can create macros that
auto-initialize and set to NULL after free. For example, in Bstrlib I
give the following macros:

#define bstrDeclare(b) bstring (b) = NULL;
#define bstrFree(b) do {if ((b) != NULL && (b)->slen >= 0 && \
(b)->mlen >= (b)->slen) { bdestroy (b); (b) = NULL; }} while (0)

So in this way you can *envelope* a pointer so that under ordinary
circumstances its contents always reflect either legal or NULL
contents. Its a bit of a pain to do this for every data type, but
bstrings are a particularly good target since I used them as freely as
any other primitive data type, but they still carry with them all the
difficulties that come with pointers.

Its not quite "smart pointers" but in reality you end up close enough
that for practical purposes the differences don't much matter.

websnarf · Sep 14, 2006

Spiro said:
Ancient_Hacker said:

Please explain exactly how garbage collection can ever work with C in
these cases: [...]
Now please explain how the memory allocated for p1 gets preserved, or
collected, at the proper times, as the case may be. Hint: velly velly
difficult to impossible.

Click to expand...

I think this will work. lcc-win32 uses Boehm's garbage collector, cf.
http://www.hpl.hp.com/personal/Hans_Boehm/gc/
[...]

The most interesting part: This GC is not perfect (unlike Jacob wants us
make to believe), thus, chances are that you will end up with some
long-term memory leaks.

First of all, that's not what it says. It says there are short term
leaks, or finite spillage long term. Neither of which are serious
issues in real world environments where leaks only threaten long term
running applications with an accumulating leak problem.

Second of all Boehm's GC mechanism is *portable* (in the sense that it
has been ported, like berkeley db) and solves the problem without
compile-time assistance. It is *possible* through Jacob's approach
(since he is maintaining both a compiler and a library) to use a lot of
compile-time assistance to greatly enhance the performance and accuracy
of a GC for C.

For example, at compile time you can determine that some locally scoped
pointers of a function never emit to values returned or to static
memory -- if so, the compiler can simply always use alloca() (allocate
off the stack) in place of malloc() when storing to those pointers.
Also, if the compiler can determine that the programmer within a
function has allocated then properly freed a pointer with no
possibility of leaking, it can switch to using a non-GC malloc (i.e.,
use memory that the GC will not try to analyze.)

This website of Boehm is very interesting, and you can see the
advantages and the disadvantages of this GC; thus, you can decide if you
want to use it or not.

Anyway, as it clearly is not part of any "official" version of C (unlike
state by Jacob), I consider it off-topic. Thus, I will not answer on
this topic anymore here.

Where do you suppose discussion of *implementation* of the C language
should be discussed? Keeping in mind that comp.compilers could care
less about std library implementations.

Rod Pemberton · Sep 14, 2006

Keith Thompson said:
You're saying that the distinction between C and lcc-win32 is *not* an
important one? Fascinating.

Really? You disagree with me here and then turn around and (indirectly)
agree with what I just said:
KT> C, as defined by the ISO C standard, works on an "abstract machine".

But, of course, your IQ is high enough and your experience is deep enough
that you understood that you can't separate one from the other, didn't you?

Rod Pemberton

Rod Pemberton · Sep 14, 2006

Richard Heathfield said:
Rod Pemberton said:

Yes, I know who Doug Gwyn is, and he is very much a respected, albeit
occasional, contributor to this newsgroup. I am very familiar with his
writing style. And I can say without a shadow of a doubt that you are
absolutely and utterly mistaken. Doug Gwyn has written a great many
articles that don't even mention the underlying assembly, physical hardware
such as the cpu, or memory (in the stuff-you-can-kick sense), let alone
"take them into account", so your claim that he never does so is quite
wrong.

You've misinterpreted this statement: "his explanation doesn't take the
underlying...into account" to mean that he _explicitly_ "mention(s) the
underlying assembly, physical hardware". That isn't even close to what I
said. He doesn't always explicitly mention them. But, his answers are
always worded to work properly with them. I've seen numerous laughable
answers from Plauger where he doesn't make sure that his answers comply with
assembly or hardware implementations.

What this also tells me is:
either A) you have little, if no, assembly experience
or B) if you do, you've failed to fully comprehend what you read
(which I think I had numerous prior complaints with you, didn't I?)

Furthermore, I am quite sure Doug Gwyn would agree fully with me that the
distinction between "C" and "lcc-win32" is an important one. Why don't you
ask him?

Read my reply to Keith on this same issue... You do agree with Keith, don't
you?

Rod Pemberton

Richard Heathfield · Sep 14, 2006

Rod Pemberton said:

Really? You disagree with me here and then turn around and (indirectly)
agree with what I just said:
KT> C, as defined by the ISO C standard, works on an "abstract machine".

strcmp("abstract machine", "lcc-win32") is not 0. Keith is not agreeing with
you either directly or indirectly.

<snip>

Richard Heathfield · Sep 14, 2006

Rod Pemberton said:

You've misinterpreted this statement: "his explanation doesn't take the
underlying...into account" to mean that he _explicitly_ "mention(s) the
underlying assembly, physical hardware". That isn't even close to what I
said. He doesn't always explicitly mention them. But, his answers are
always worded to work properly with them.

That's because he answers questions about C with reference to the notional
abstract machine, rather than to specific implementations thereof. That is
precisely the point.

I've seen numerous laughable
answers from Plauger where he doesn't make sure that his answers comply
with assembly or hardware implementations.

If they comply with the abstract machine, then that is sufficient. It is the
C implementation's responsibility to ensure that correct programs are
translated in such a way as to work correctly on the target platform.

What this also tells me is:
either A) you have little, if no, assembly experience
or B) if you do, you've failed to fully comprehend what you read
(which I think I had numerous prior complaints with you, didn't I?)

Your inability to comprehend what I write does not imply my inability to
comprehend what you write.

Read my reply to Keith on this same issue... You do agree with Keith,
don't you?

Yes, and neither of us agrees with you on this occasion.

Ancient_Hacker · Sep 14, 2006

Spiro said:
I think this will work. lcc-win32 uses Boehm's garbage collector, cf.
http://www.hpl.hp.com/personal/Hans_Boehm/gc/

Wow! I looked over the explanation, and it's very clever! It will
"kinda" work. Here's what it does:

The garbage collector scans the heap and stack and registers for things
that look like pointers. Anything that looks like a valid address
doesnt get collected. Heap blocks that don't seem to be represented
in memory are candidates for collection. Will kinda work, perhaps a
usable amount of the time, except:

(1) If you're using more than 1/65536'th of the potential address
space, addresses will no longer be very unique-- i.e. things like
zero-terminated strings will start looking like valid addresses. Once
you start using more than 1/256'th of the address space, then even
string bodies and floats will start looking like addresses, which will
make the GC's job a lot harder (like, nearly impossible).

(2) If you pass a pointer to a system API, or pass a pointer in a
struct to a system API, the GC probably won't see the struct address,
or any addresses embedded in the struct. So things like call-back
addresses, semaphore addresses, indirect block references, async-I/O
control blocks, all those blocks are likely to be prematurely
collected,, leading to major blammos. I know, "non-standard", you
deserve anything that happens. This applies to the run-time library
also, so the GC has to have hooks, either binary or source-code, into a
good deal of the RTL.

(3) There's a potential "buffer-bashing" security hole-- there are
internet worms out there that know about the four msot common RTL heap
allocation schemes and they plop down plausible looking heap structures
in your web server input buffers. If the GC gets fooled by these,
almost anything can happen.

Richard Bos · Sep 14, 2006

Rod Pemberton said:
No. That idea is completely incorrect. You'll never find a post from Doug
Gwyn (comp.std.c, ANSI C X3J11 standard, developer of the Army's BRL-UNIX
in ANSI C) where his explanation doesn't take the underlying assembly,
physical hardware such as the cpu and memory into account.

You mean messages such as <[email protected]>;
<[email protected]>; <[email protected]>;

C is built upon assembly.

And assembly is built upon the movement of electrons in semi-conductors;
but that does not make Feynman diagrams on-topic in either an assembly
newsgroup, or in comp.lang.c.

Richard

William Hughes · Sep 14, 2006

Rod said:
Really? You disagree with me here and then turn around and (indirectly)
agree with what I just said:
KT> C, as defined by the ISO C standard, works on an "abstract machine".

But, of course, your IQ is high enough and your experience is deep enough
that you understood that you can't separate one from the other, didn't you?

Unfortunately your IQ is not high enough to separate abstract from
particular. (Note that there is no assembler for the abstract machine,
it
is definied in terms of results not methods.)

-William Hughes

Keith Thompson · Sep 14, 2006

Rod Pemberton said:
Really? You disagree with me here and then turn around and (indirectly)
agree with what I just said:
KT> C, as defined by the ISO C standard, works on an "abstract machine".

I have no idea how you interpret that to mean that I agree with you.
For the record, I do not agree with you.

See above for the specific statement of yours with which I disagreed.
Richard said that the distinction between "C" and "lcc-win32" is an
important one. You replied, "No. That idea is completely incorrect."
The only reasonable interpretation of that is that you don't believe
that the distinction between "C" and "lcc-win32" is an important one.

If that's not what you meant, please clarify.

But, of course, your IQ is high enough and your experience is deep
enough that you understood that you can't separate one from the
other, didn't you?

Nonsense. It's entirely possible to understand C without knowing
anything at all about lcc-win32. C existed long before lcc-win32 did;
I'm sure there were plenty of people who understand C at a time when
lcc-win32 didn't even exist.

Again, if you're making a claim other than one that there's no
important distinction between "C" and "lcc-win32", please say so.
(That's a surprising claim, since I thought your position was that a
knowledge of assembly language was critical to an understanding of C.)

In any case, you seem not to understand what an "abstraction" is. C
is defined on an abstract level, with little dependency on the
specifics of, for example, the underlying hardware. If I gathered a
group of people in a room with pencils and paper, and had them follow
the semantics of the C standard without using any computer hardware
(programs are submitted, and output is returned, as written text on
paper), the result, if done properly, would be a conforming C
implementation.

It's certainly true that the C abstract machine is *designed* to be
implementable on real-world hardware, and an understanding of one or
more assembly languages can certainly be helpful in understanding why
C is the way it is. But it's entirely possible to have a good
understanding of C based only on the abstract description in the
standard.

When I write C code, I don't think much about what happens within the
CPU when my program is executed. The CPU's job is to execute my code
and produce the required results. I don't need to know how it does
it.

Keith Thompson · Sep 14, 2006

Ancient_Hacker said:
Wow! I looked over the explanation, and it's very clever! It will
"kinda" work. Here's what it does:

The garbage collector scans the heap and stack and registers for things
that look like pointers. Anything that looks like a valid address
doesnt get collected. Heap blocks that don't seem to be represented
in memory are candidates for collection. Will kinda work, perhaps a
usable amount of the time, except:

(1) If you're using more than 1/65536'th of the potential address
space, addresses will no longer be very unique-- i.e. things like
zero-terminated strings will start looking like valid addresses. Once
you start using more than 1/256'th of the address space, then even
string bodies and floats will start looking like addresses, which will
make the GC's job a lot harder (like, nearly impossible).

Disclaimer: I haven't read the web page, but (I think) I have a
general idea of how the GC works.

I don't *think* what you describe is going to be much of a problem in
practice, though it's certainly a theoretical problem. In pratice, I
suspect that most valid addresses tend to have different bit patterns
than most other valid data. There will be a *few* arbitrary bit
patterns that happen to look like pointers, but not many. This just
means that garbage collection won't be 100% efficient, but it
shouldn't be far from it. There may be applications where anything
short of 100% efficiency is unacceptable; such applications either
can't use GC, or need to use some more intrusive form of it.

(2) If you pass a pointer to a system API, or pass a pointer in a
struct to a system API, the GC probably won't see the struct address,
or any addresses embedded in the struct. So things like call-back
addresses, semaphore addresses, indirect block references, async-I/O
control blocks, all those blocks are likely to be prematurely
collected,, leading to major blammos. I know, "non-standard", you
deserve anything that happens. This applies to the run-time library
also, so the GC has to have hooks, either binary or source-code, into a
good deal of the RTL.

That's an interesting point. Any time a pointer value is stashed away
where the GC can't see it, you have the potential of blocks being
collected prematurely. I've been thinking in terms of storing the
value in an external file, or breaking it down into bits or bytes
(e.g., by encrypting or compressing some chunk of data containing
pointers), but <OT>copying it into the kernel's memory space where GC
code, which runs in user mode, can't see it, is also likely to be an
issue</OT>. But I don't think it's likely that a program would pass a
pointer value to a system API *and forget it*. The application itself
would probably keep a copy of the pointer value in its own memory
space. And if the application is intended to work with GC, it must do
so.

[snip]

Tak-Shing Chan · Sep 15, 2006

If I gathered a
group of people in a room with pencils and paper, and had them follow
the semantics of the C standard without using any computer hardware
(programs are submitted, and output is returned, as written text on
paper), the result, if done properly, would be a conforming C
implementation.

To do this properly would require a team of infallible human
beings. But there is only one Dan Pop.

Tak-Shing

Ancient_Hacker · Sep 15, 2006

Keith said:
Disclaimer: I haven't read the web page, but (I think) I have a
general idea of how the GC works.

I don't *think* what you describe is going to be much of a problem in
practice, though it's certainly a theoretical problem. In pratice, I
suspect that most valid addresses tend to have different bit patterns
than most other valid data.

Yep, many systems start handing out addresses that have a few of the
high address bytes as zeroes, so addresses tend to be discernible if
you don't push the address range very far. And some addresses tend to
be 2/4/8/16 byte aligned when first handed out, so that thins the vald
address range somewhat, at least until the program starts indexing into
arrays.

But as soon as you ask for more than 16-bits of memory, the high byte
on a 32-bit architecture will be zero, followed or preceded by three
non-zero bytes (depending on address ordering), and that will be
mimicked by zero-terminated strings.

Worse yet, as soon as you ask for more than 24-bits of memory, the high
byte on a 32-bit architecture will be non-zero, making many string
bodies mimic addresses. Very bad.

I guess the moral is, when addresses start looking like data, switch to
64-bit compilers!

But I don't think it's likely that a program would pass a
pointer value to a system API *and forget it*.

Well, yes, usually true, two or three thorny canonical problems:

(1) The app may keep the pointer, so the GC won't toss out the block,
but there are many OS's where you can pass in arbitrary arrays or
structs or even linked lists. For example a database app might pass an
array to the OS meaning "gather up these 3,200 pairs of random disk
blocks and put them in this other block of pointers to addresses in my
memory space". The GC would have (usually) no way to intercept that OS
call, and no intrinsic knowledge that the block passed has addresses in
it. Worse yet, the app need NOT keep a pointer to this request
structure, as in most cases the OS will asyncronously call back to the
user program, passing back the request block address, where the OS is
returning result codes. This is very common in Windows NT/XP. So
yipes, the app for a while may not have any trace of these addresses.

(2) There are OS calls to request the OS to allocate memory and return
the virtual address where it's given the app memory. The GC may not
have any way to hook this call and learn about those addresses.

So yes, this clever GC may be able to root around and figure out where
most blocks are, as long as addresses don't get too large, and apps
don't make any fancy OS calls. Whether this is a tenable situation
probably varies a lot from case to case.

I'd love to have a reliable GC for C. Last week I had what I thought
was a pretty clean C program, but when I used my malloc_watcher, at the
end it said "84,132 blocks using 68,321,144 bytes left dangling at
exit(0) time". I had forgotten to free() some large linked lists.
Sigh.

Frederick Gotham · Sep 15, 2006

Keith Thompson posted:

It's certainly true that the C abstract machine is *designed* to be
implementable on real-world hardware, and an understanding of one or
more assembly languages can certainly be helpful in understanding why
C is the way it is. But it's entirely possible to have a good
understanding of C based only on the abstract description in the
standard.

I myself haven't really got much of a clue about assembly language, call
stack and so forth... but I can write some decent code.

Keith Thompson · Sep 15, 2006

Ancient_Hacker said:
Yep, many systems start handing out addresses that have a few of the
high address bytes as zeroes, so addresses tend to be discernible if
you don't push the address range very far. And some addresses tend to
be 2/4/8/16 byte aligned when first handed out, so that thins the vald
address range somewhat, at least until the program starts indexing into
arrays.

But as soon as you ask for more than 16-bits of memory, the high byte
on a 32-bit architecture will be zero, followed or preceded by three
non-zero bytes (depending on address ordering), and that will be
mimicked by zero-terminated strings.

Worse yet, as soon as you ask for more than 24-bits of memory, the high
byte on a 32-bit architecture will be non-zero, making many string
bodies mimic addresses. Very bad.

You're making some assumptions about how addresses are allocated. In
my experience, they're not typically allocated starting at 0; the
address space for a given program tends to be sparse.

On one system, the following program:

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
static int Static;
int Auto;
int *Allocated = malloc(sizeof *Allocated);
printf("&Static = %p\n", (void*)&Static);
printf("&Auto = %p\n", (void*)&Auto);
printf("Allocated = %p\n", (void*)Allocated);
return 0;
}

produces the following output:

&Static = 0x804962c
&Auto = 0xbffc2804
Allocated = 0x860c008

And my vague intuition tells me that there are still enough
differences between what addresses tend to "smell like" vs. other
kinds of data that accidental matches are unlikely. For example, all
of the above addresses contain bytes with the high-order bit set; in a
program dealing mainly with ASCII characters, these byte values are
unlikely to appear in strings.

The only way to be sure of this, one way or the other, is to measure
it. I'm guessing someone has already done so.

New to coding - Looking to make a smart journal	1	Jul 25, 2021
[C++] Pointers declared inside a function, how do I manage them?	5	May 3, 2023
Pointers	24	Mar 13, 2013
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Sizes of pointers	233	Jul 30, 2013
Pointers, question and fact confirmation please	12	Apr 27, 2014
Should I be using something other than raw pointers as a helper indexinto a collection?	17	Jun 1, 2013
There is something weird going on my instagram and i need to know if it was done on purpose by someone else	2	Jan 31, 2023

Smart Pointers: Is there something similar to smart pointers in C?

Keith Thompson

William Hughes

Mark McIntyre

Mark McIntyre

Mark McIntyre

websnarf

websnarf

Rod Pemberton

Rod Pemberton

Richard Heathfield

Richard Heathfield

Ancient_Hacker

Richard Bos

William Hughes

Keith Thompson

Keith Thompson

Tak-Shing Chan

Ancient_Hacker

Frederick Gotham

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads