differentiating between pointers - "primary"?

M

mathog

Ideally in any program the compiler should do as much of the heavy
lifting as possible. When it comes to pointers to memory though, the C
compiler is not much help. After:

int *ibuf;
ibuf = (int *)malloc(sizeof(int)*128);

a program could do lots things with ibuf that could lead to
invalid program operation and the compiler wouldn't give any help.
For instance, later in this same module:

somefunction(ibuf); /* may do a free(ibuf) */

*ibuf=1; /* if ibuf was released, this would be bad */

To avoid these issues I wonder how far one could get with one extra
keyword (here "primary") and a small number of additional compiler rules
and standard functions. The associated memory functions would be like:

void primary *pmalloc(); /* also pcalloc, prealloc */
void pfree(void primary **ptr); /* frees and sets pointer to NULL */

The first set would allocate memory and return this special type of
pointer to it, while pfree would release this memory. The idea being that:

int primary *ibuf = (int primary *)pmalloc(sizeof(int)*128);
int *ibuf2=ibuf; /* compiler OK, can make non primary copies */
...
free(ibuf); /* compiler error (cannot free(primary) ) */
pfree(&ibuf2); /* compiler error (cannot pfree(non primary)) */
...
*ibuf2=1; /* no problem */
...
pfree(&ibuf); /* compiler OK with this*/
...
*ibuf=1; /* compiler error (assignment to deallocated memory) */
*ibuf2=1; /* compiler error (assignment to deallocated memory) */

or change from pfree() on down to:

/* pass primary out. The function may deallocate it,
so primary memory is no longer valid here */
somefunction(ibuf);
...
*ibuf=1; /* compiler warning (assignment to iffy memory) */
*ibuf2=1; /* compiler error (assignment to iffy memory) */

or change from pfree() on down to:

return{ibuf); /* compiler is OK with returning a primary */
}

but

return(1); /* compiler warning (missing pfree(primary) ) */
}

Naturally this becomes complicated once structures are involved because
the two pointer types can become mixed. Not clear to me if the compiler
could keep up with the complications or not. If
"astruct" contains:

int primary *ibuf;

and in a function

astruct->ibuf = (int primary *)pmalloc(sizeof(int)*128);
int *ibuf2=ibuf;
...
somefunction(astruct); /* This could pfree astruct->ibuf */
*(astruct->ibuf)=1; // compiler warning (assignment to iffy memory)
*ibuf2=1; //compiler warning (assignment to iffy memory)

This is where things might fall apart. If calling a function with a
struct containing primary memory references marks them all as "iffy" it
is going to be pretty much impossible to leave the "iffy" warning
enabled. However, since the only way to free a primary is through
pfree(), and that always resets the pointer to NULL, one could have:

somefunction(astruct);
if(astruct->ibuf){ // post test compiler knows primary is OK
*(astruct->ibuf)=1; // compiler OK
*ibuf2=1; //compiler OK
}

Would the requirement that pfree() set the primary pointer to NULL be
sufficient for the compiler to be able to catch the "iffy" errors,
and disable those warnings subsequent to a test? It seems to me that
at least on extra rule would be needed:

int primary *ibuf = (int primary *)pmalloc(sizeof(int)*128);
int *ibuf2=ibuf; //no problem
int primary *ibufp=ibuf; //compiler error (duplicates primary)

to get around that the compiler would need some special function or
other method to handle this case:

/* returns the pointer to primary memory and sets src to NULL */
void primary *pxfer(void primary **src);

so that after

astruct->ibufp= (int primary *)pxfer(&ibuf);

the compiler knows that ibufp contains the original ptr from ibuf, that
ibuf has been set to NULL, and that secondary pointers for ibuf are now
secondary pointers for ibufp.

Regards,

David Mathog
 
K

Kaz Kylheku

Ideally in any program the compiler should do as much of the heavy
lifting as possible. When it comes to pointers to memory though, the C
compiler is not much help. After:

int *ibuf;
ibuf = (int *)malloc(sizeof(int)*128);

a program could do lots things with ibuf that could lead to
invalid program operation and the compiler wouldn't give any help.
For instance, later in this same module:

somefunction(ibuf); /* may do a free(ibuf) */

*ibuf=1; /* if ibuf was released, this would be bad */

To avoid these issues I wonder how far one could get with one extra
keyword (here "primary") and a small number of additional compiler rules
and standard functions.

This idea is of little value because programs where one module "owns" certain
objects are trivially easy to get right without this kind of help (right,
that is, as far as freeing those objects only in that module).

You're not attacking any difficult memory management problem here.
int primary *ibuf = (int primary *)pmalloc(sizeof(int)*128);

This qualifier should really be on the pointer, and not what it points to:

int *primary buf = ...

Just like C99 restrict.
int *ibuf2=ibuf; /* compiler OK, can make non primary copies */

Now here is a problem:

some_function(ibuf2); /* ibuf retains the pointer */

free(ibuf); /* allowed */

some_other_function(); /* uses the previously retained ibuf pointer, oops! */


The tough problem is the correct determination of object lifetimes.

The primary qualifier does nothing to address that problem, and in fact its use
is fundamentally incompatible with programs that require complicated memory
management.

It is suitable only for use for object that can be managed with simple
ownership rules, which is an easy situation that can be managed trivially
without this sort of thing.

As you can see from the above, even in these kinds of programs, you can still
make mistakes which are not caught by the use of this pointer qualifier.

Another obvious thing you are overlooking is that when you have a module that
manages objects of a certain type (and also provides alloc/initialize
and deinitialize/free routines), it does not necessarily "own" the objects.

Those routines are just wrappers that other modules use.

So, suppose we have:

void widget_destroy(widget_t *w)
{
/* what are you going to do here? */
}

Of course, inside widget_destroy, you're going to have to call pfree, right?
So you're going to have to cast the pointer to (widget_t *primary).

If you don't want to do that, then you have to move the qualifier into the
parameter declaration:

void widget_destroy(widget_t * primary w);

but now, nobody can call widget_destroy unless they have a primary pointer.

So this means that either everyone does the cast (which destroys the primary
pointer scheme), or else the widget_create has to hand out primary pointers
(which also destroys the primary pointer scheme, since the idea is that
not everyone has the primary pointer!)

The problem is that although the widget module encapsulates the creation,
manipulation and finalization of these objects, it does not own them.

The primary scheme will basically only apply in situations where memory
is dynamically allocated, used and released in a single lexical scope.
pfree(&ibuf); /* compiler OK with this*/
...
*ibuf=1; /* compiler error (assignment to deallocated memory) */
*ibuf2=1; /* compiler error (assignment to deallocated memory) */

This is a tricky thing which requires the compiler to solve instances
of the halting problem in the general case.

You can only implement this diagnostic in some obvious situations, like here:
there is only one way to reach the *ibuf=1 code, and that control passes
through a block where pfree is applied to ibuf.

It's possible to code up situations in which *ibuf=1 occurs in code that
cannot be proved either way: whether it is reached through a path where
ibuf is freed.

This creates a language specification problem. Do you require a diagnostic
for situations in which this cannot be decided?

But what does that mean? What if it *can* be decided, but it takes three hours
of computation over some instance of a NP-complete problem?

You could implement something naive, such as if ibuf is used anywhere in the
function body where such use is in the lexical scope of a pfree(&ibuf),
then a diagnostic is required. So for instance this would still
require a diagnostic:

pfree(&ibuf)
goto skip_bad;
*ibuf = 1;
skip_bad:
;

The false positives from such a diagnostic will turn it into a nuisance.
 
B

Ben Pfaff

mathog said:
To avoid these issues I wonder how far one could get with one extra
keyword (here "primary") and a small number of additional compiler
rules and standard functions. The associated memory functions would
be like:

void primary *pmalloc(); /* also pcalloc, prealloc */
void pfree(void primary **ptr); /* frees and sets pointer to NULL */

The first set would allocate memory and return this special type of
pointer to it, while pfree would release this memory. The idea being
that:

int primary *ibuf = (int primary *)pmalloc(sizeof(int)*128);
int *ibuf2=ibuf; /* compiler OK, can make non primary copies */
...
free(ibuf); /* compiler error (cannot free(primary) ) */
pfree(&ibuf2); /* compiler error (cannot pfree(non primary)) */

You could do something like this with a few source code
annotations (probably via preprocessor macros) plus a C processor
like "sparse" which is very useful for verifying non-standard
type properties.
 
E

Eric Sosman

You could do something like this with a few source code
annotations (probably via preprocessor macros) plus a C processor
like "sparse" which is very useful for verifying non-standard
type properties.

Or just wrap the "primary" pointer in a struct:

struct primary {
void *ptr;
};
int pmalloc(struct primary *, size_t);
void pfree(struct primary *);

Fans of obfuscation could go even further, with something like:

struct primary {
unsigned char opaque_data[PRIMARY_DATA_SIZE];
};
// pmalloc, pfree as before
void *primary2plain(const struct primary *);

.... and swizzle the pointer bytes by XOR'ing them with the phase
of the Moon to discourage people from peeking into opacity.

But I'm not persuaded the outcome would be all that helpful.
Nothing in any scheme (so far) prevents deriving a plain pointer
from a primary and passing it to free() or realloc(). Nothing
prevents `ibuf2[128] = 42;'. The O.P. expressed a hope that the
compiler could somehow keep track of all the pointers derived from
a given primary, and know when they became (or might have become)
invalid, but I doubt there's much reason to hope, especially when
pointers are passed to and returned from external functions the
compiler cannot see.

Even the notion of "derived" is a little shaky. Here's a
function that removes a node's successor from a linked list and
returns a pointer to the removed node:

struct node *removeNext(struct node *predecessor) {
struct node *victim = predecessor->next;
if (victim != NULL) {
predecessor->next = victim->next;
victim->next = NULL; // prophylaxis
}
return victim;
}

Now, here's the question: If all the nodes are allocated via
primary pointers, and if the caller does pfree(&victim), is the
new value of predecessor->next invalidated? We only found it by
chasing a primary pointer, and that pointer has become invalid,
so ...? Compare and contrast with applying strchr() to a primary
char* and using the returned value: You'd clearly want to invalidate
that returned value after freeing the primary string, but you
should not invalidate predecessor->next, which was obtained in a
rather similar way. What rule would allow the compiler to tell
such cases apart?
 
B

Ben Pfaff

Eric Sosman said:
But I'm not persuaded the outcome would be all that helpful.
Nothing in any scheme (so far) prevents deriving a plain pointer
from a primary and passing it to free() or realloc(). Nothing
prevents `ibuf2[128] = 42;'. The O.P. expressed a hope that the
compiler could somehow keep track of all the pointers derived from
a given primary, and know when they became (or might have become)
invalid, but I doubt there's much reason to hope, especially when
pointers are passed to and returned from external functions the
compiler cannot see.

I agree that there is little or no hope in the general case. I
personally assumed the goal was to get better compiler warnings
in simple cases, since that seems like a goal that can be
accomplished.
 
M

mathog

Eric said:
struct node *removeNext(struct node *predecessor) {
struct node *victim = predecessor->next;
if (victim != NULL) {
predecessor->next = victim->next;
victim->next = NULL; // prophylaxis
}
return victim;
}

Now, here's the question: If all the nodes are allocated via
primary pointers, and if the caller does pfree(&victim), is the
new value of predecessor->next invalidated?

I don't see why it would be. Before the call there are
three primaries: A,B,C

A is predecessor
B is victim (also predecessor->next)
C is victim->next

at the end of the function call predecessor->next points to C, and C is
still valid even after pfree(&victim).

If we had not set predecessor->next to point to C, and just called
pfree(&victim), then in that case predecessor->next would be invalid, as
it should be, since it would then point to deallocated memory. That is
the whole point of the exercise, to have the compiler catch things like
this when it can.
Compare and contrast with applying strchr() to a primary
char* and using the returned value: You'd clearly want to invalidate
that returned value after freeing the primary string,
right

but you
should not invalidate predecessor->next, which was obtained in a
rather similar way.

Not so much. The pointer returned by strchr is still attached to the
original primary, whereas predecessor->next has been changed to
reference a different primary, so that deallocating the original will
not affect it.

Regards,

David Mathog
 
M

mathog

Kaz said:
This qualifier should really be on the pointer, and not what it points to:

int *primary buf = ...

Just like C99 restrict.
Right.


Now here is a problem:

some_function(ibuf2); /* ibuf retains the pointer */

free(ibuf); /* allowed */

Actually free on a primary would not be allowed. Only

pfree(&ibuf)
some_other_function(); /* uses the previously retained ibuf pointer, oops! */

You mean this situation I think:

int *hold_ibuf=NULL; /* global */

void somefunction(int *buf){
hold_ibuf=buf;
}

void some_other_function(void){
(void) fprintf(stdout,"hold pointer is %p\n",hold_ibuf);
}

/* later in main */
int *primary ibuf = (int *primary) malloc(sizeof(int)*128);
int *ibuf2=ibuf; //POINT1

/* access to hold_ibuf before here is not OK */

somefunction(ibuf2); //POINT2

/* access to hold_ibuf here is OK */

pfree(&ibuf); //POINT3

/* access to hold_ibuf after here is not OK */

/* and later still */

some_other_function(); //POINT4

If this is all in one compilation unit then the compiler has a shot of
warning about it. Because:

At POINT1: compiler marks that ibuf2 references primary ibuf
At POINT2: compiler marks that (on some code paths)
hold_ibuf references primary ibuf
At POINT3: compiler notes that primary ibuf is removed
At POINT4: compiler encounters a reference to hold_ibuf without any
intervening code to clear its reference to the now deleted
primary. So it throws an error.

versus

/* and later still */

hold_ibuf=NULL;
some_other_function(); //POINT4

Where now there is no problem when hold_ibuf is accessed since the
reference to the primary has been cleared.

Clearly if the various functions were compiled separately then the
compiler could not perform this sort of checking. Well, at least not
without some method of keeping track of the memory references.
The tough problem is the correct determination of object lifetimes.

It isn't tough at run time, just ugly, and can be done there without
modifying the language, along the lines of:

typedef struct {
void *data;
}MEMHANDLE;

typedef struct {
MEMHANDLE *primary:
int *ibuf; /* pointer to data within primary */
}INTPTRDESCRIPTOR;

If all memory allocations/free act on "data" in a MEMHANDLE, and we only
pass around objects with types like INTPTRDESCRIPTOR, then code
can always do:

void function(INTPTRDESCRIPTOR *object){
if(!object->primary)fatal_err("Memory oops");
/* normal acess to object->ibuf */
}

It is pretty wasteful to run this way all the time though, and on any
moderately complicated code it would have a huge number of tests on
primaries.
So, suppose we have:

void widget_destroy(widget_t *w)
{
/* what are you going to do here? */
}

Of course, inside widget_destroy, you're going to have to call pfree, right?

Simplest case would be yes. (Let's ignore situations where you can
stack up widgets inside an array created with one memory allocation.)
So you're going to have to cast the pointer to (widget_t *primary).
Right


If you don't want to do that, then you have to move the qualifier into the
parameter declaration:

void widget_destroy(widget_t * primary w);

but now, nobody can call widget_destroy unless they have a primary pointer.

Right. That's sort of the point.
So this means that either everyone does the cast (which destroys the primary
pointer scheme), or else the widget_create has to hand out primary pointers
(which also destroys the primary pointer scheme, since the idea is that
not everyone has the primary pointer!)

You lost me. Simplest case is that widget_create would return a primary
pointer, and widget_destroy would accept one (and deallocate its
memory.)
The problem is that although the widget module encapsulates the creation,
manipulation and finalization of these objects, it does not own them.

The primary scheme will basically only apply in situations where memory
is dynamically allocated, used and released in a single lexical scope.


This is a tricky thing which requires the compiler to solve instances
of the halting problem in the general case.

You can only implement this diagnostic in some obvious situations, like here:
there is only one way to reach the *ibuf=1 code, and that control passes
through a block where pfree is applied to ibuf.

It's possible to code up situations in which *ibuf=1 occurs in code that
cannot be proved either way: whether it is reached through a path where
ibuf is freed.

If there is any path that gets to *ibuf=1 after ibuf is deallocated it
should warn. It doesn't seem like a terribly complicated detection. Let
pfree(&A) set "primary A gone" to 1. Every place the paths come together
OR that value. Anything referencing that primary when that logical
value is true will cause a warning.
This creates a language specification problem. Do you require a diagnostic
for situations in which this cannot be decided?

Do such paths exist within a single compilation?
But what does that mean? What if it *can* be decided, but it takes three hours
of computation over some instance of a NP-complete problem?

I don't see how the logic described above with ORs is going to have that
kind of math. Either there is a pfree() on a path or not.
You could implement something naive, such as if ibuf is used anywhere in the
function body where such use is in the lexical scope of a pfree(&ibuf),
then a diagnostic is required. So for instance this would still
require a diagnostic:

pfree(&ibuf)
goto skip_bad;
*ibuf = 1;
skip_bad:
;

That code would already warn - there is no path to "*ibuf = 1". For the
purposes of this discussion it would not warn about the primary - the
compiler would never see a situation where *ibuf is used after it is free'd.

Regards,

David Mathog
 
E

Eric Sosman

I don't see why it would be. Before the call there are
three primaries: A,B,C

A is predecessor
B is victim (also predecessor->next)
C is victim->next

at the end of the function call predecessor->next points to C, and C is
still valid even after pfree(&victim).

As I envision it, before the call there is *one* primary, A.
The function follows this to discover B and then C, sets A's `next'
to C, and returns B. (Observe that C was discovered only by way
of B.) Now B gets invalidated; why does C remain valid, since it
depends on the now-invalid B?
If we had not set predecessor->next to point to C, and just called
pfree(&victim), then in that case predecessor->next would be invalid, as
it should be, since it would then point to deallocated memory. That is
the whole point of the exercise, to have the compiler catch things like
this when it can.

On the assumption that the compiler cannot see what goes on
inside the separately-compiled function, I have doubts about the
practicality of the idea.
Not so much. The pointer returned by strchr is still attached to the
original primary, whereas predecessor->next has been changed to
reference a different primary, so that deallocating the original will
not affect it.

So we now have "provenance" in all pointers? IIRC there was/is
a memory-checking system built on gcc that did/does this: It rewrote
pointers into keys that led to [start,limit) pairs in an internal
data structure, and inserted validation code on all pointer arithmetic
(so you could find out whether `p[42]' did or did not point within the
range of memory allocated to `p'). This must have been, oh, fifteen
or twenty years ago, but I bet you could still find it.

Note that it was purely a run-time thing: The compiler did not
get involved in trying to distinguish "primary" and "derived" and
"attached" and "related by marriage" and so on.

I'm also not sure how well it coped with things like

struct {
int ident;
char firstname[30];
char lastname[30];
} *person = malloc(sizeof *person);

It could, clearly, see that `person' referred to a 64-byte (probably)
region starting at 0xMUMBLE, but would it detect a problem with
`person->firstname[42]', inside the `person' region but outside the
valid range of `person->firstname'? I don't know; I think it would
have needed quite a lot of compiler help to sort such things out.

And I still don't see how the value of strchr() is "attached"
to its primary while the value C is not "attached" to B. How is the
compiler supposed to unravel these degrees of attachment?
 
S

Stefan Ram

mathog said:
int *ibuf2=ibuf; /* compiler OK, can make non primary copies */

One can hide the pointer using the PIMPL idiom and then only
publish functions to read from it or write to it, not allowing
to copy or free it.

(Some implementations might also emit warnings in some cases
of possible aliases when the keyword »restrict« is used.)
 
M

mathog

Eric said:
As I envision it, before the call there is *one* primary, A.
The function follows this to discover B and then C, sets A's `next'
to C, and returns B. (Observe that C was discovered only by way
of B.) Now B gets invalidated; why does C remain valid, since it
depends on the now-invalid B?

No, three primaries. Each of the nodes would have been allocated as a
separate primary.
On the assumption that the compiler cannot see what goes on
inside the separately-compiled function, I have doubts about the
practicality of the idea.

Yes, that is definitely the hard part. To have any shot at it there
would need to be some way to pass whatever information is known about
pointers and their corresponding primaries. For instance, if a module
was just:

somefunc(int *ptr){
if(whatever_condition)free(ptr);
}
someother(int *primary ptr){
if(whatever_condition)pfree(&ptr);
}

The compiler would have to be able to produce some sort of
characterization file that says something like:

somefunc: may call free on ptr
someother: may call pgree on ptr

Then when another module was compiled which did:

int *primary ptr=(int *primary) pmalloc(128);
int *ibuf=ptr;
somefunc(ibuf);
*ibuf=1;
someother(ptr);
*ibuf=2;

it would be able to determine that warnings were required at each of the
assignments. If a function "somelist" was passed a pointer to a
structure containing a linked list of primary memory references, and one
of these may be deleted, it gets really ugly. The compiler cannot know
in that case what will be on the list, so it cannot keep track of a
specific primary. What it could do though is detect that any primary on
the linked list may be deleted, and mark that as:

somelist: may call pfree on indirect primary

which might or might not provide enough information for another
compilation that calls somelist to detect possible error conditions.

Also it occurs to me that "primary" memory pointers should in effect by
const. Once created they may be destroyed or transfered, but not
modified. That would allow warnings on any function that tried to
modify a primary pointer, which by definition, would always be an error.
(The point of this is to force a clean break between pointers used to
access data for processing, and the primary pointer used to keep track
of memory allocations. We have all seen code where using the same
pointer for both purposes has gone wrong.)
And I still don't see how the value of strchr() is "attached"
to its primary while the value C is not "attached" to B. How is the
compiler supposed to unravel these degrees of attachment?

strchr returns a regular pointer, regular pointers are "attached" to
their corresponding primary. A primary pointer can have multiple
regular pointers attached to it. When allocated A, B, and C are all
primary pointers. If the structure is going to carry around the primary
pointer then B moves to A->next and C moves to B->next with pxfer.

A pointer is attached to a primary whenever it is set directly or
indirectly by reference to that primary:

regular1 = primary1; // associates regular1 with primary1
regular2 = regular1; // associates regular2 with primary1 (indirectly)
regular1 = NULL; // disassociates regular1 with primary1

Operationally, while the compiler works every regular pointer would have
a slot associated with it for the primary (or NULL). So every time it
sees a pointer used the compiler can look at that slot to see if there
might be a problem (for instance: slot is NULL and *ptr needed) or if
the value of the slot needs to be changed (ptr1 = ptr2;).

Regards,

David Mathog
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top