Portable pointer comparison

D

Danilo Kempf

Folks,

maybe one of you could be of help with this question: I've got a relatively
portable application which I'm extending with a plugin interface. While
portability (from a C perspective) is going to hell just by using
dlopen()/LoadLibrary() respectively, I'm still trying to get it as clean as
possible.

I have a number of different quantums of data and a number of plugins. Since
any plugin can (and possibly will) touch any quantum of data, I have a
many-to-many relationship here.

Now, every plugin can store some "private" data associated to every quantum
it has done some work on.

I thought, I'd realize this with some kind of dictionary. Every plugin gets
a unique key and can (using this key) store an arbitrary value. The
dictionary is implemented as a sorted array - so using the key, the stored
value can be retrieved by a simple binary search.

I thought I could use a "private" pointer for the key. Say, every plugin
uses a pointer to a global variable from the plugin library or a pointer to
some space it has just allocated as the key for the dictionary.

Now my question is, whether this is really portable C. Can pointer values
(coming from different allocation units, pointing to global storage or even
function pointers that were cast' to void *) be compared with each other,
so that I get my sorted array right?

Im sure they can be compared for (in)equality, but for sorting the
dictionary I need a (deterministic) is-less-than comparison at least. Would
casting those pointers to intptr_t help? Does intptr_t guarantee that two
pointers pointing to different things get different intptr_t values?

Sure, I can implement some entity that hands out unique integers or so as
keys, but if I don't have to do this, I'd rather prefer not doing so.

Im furiously searching through the (draft) standard, but I haven't found a
definitive answer yet.

Any thoughts on this are very much appreciated.

Best regards,

Danilo
 
I

Ian Pilcher

Danilo said:
Now my question is, whether this is really portable C. Can pointer values
(coming from different allocation units, pointing to global storage or even
function pointers that were cast' to void *) be compared with each other,
so that I get my sorted array right?

As you suspected, this isn't portable. Pointers to different objects
can't be compared for anything except equality/inequality. Trying to
fudge things with integer types (even uintptr_t) doesn't help either.
There's no guarantee that multiple different bit patterns don't point
to the same object, nor is there any guarantee that doing something with
a pointer won't change its bit pattern.

If you read the parts of the standard that discuss this, keep in mind
that the "value" of a pointer is the object it points to, not its bit
pattern.

HTH
 
D

Dave Vandervies

Folks,

maybe one of you could be of help with this question: I've got a relatively
portable application which I'm extending with a plugin interface. While
portability (from a C perspective) is going to hell just by using
dlopen()/LoadLibrary() respectively, I'm still trying to get it as clean as
possible.

[much snippage]
Now my question is, whether this is really portable C. Can pointer values
(coming from different allocation units, pointing to global storage or even
function pointers that were cast' to void *) be compared with each other,
so that I get my sorted array right?

It isn't "really portable C", but it's probably not unreasonable to
expect it to be portable across any systems on which you can dynamically
load plugins; by the time you're introducing dlopen() or some reasonable
facsimile thereof, you've probably already (indirectly) tied yourself
down to systems that have a flat N-bit address space (where N=32 will be
most common now and N=64 will be increasingly common as time goes on),
and if you can assume that, it will follow that pointers can be compared
for ordering with sensible and consistent results.

Just be sure to document any assumptions you're making and localize them
as much as possible, so that if the code needs to be ported to a system
where they don't hold whoever has to do the work (possibly you!) won't
be cursing you too much.

The best place to sanity-check that your assumptions will in fact hold
is probably the same place you'd go to ask questions about the dynamic
code loading (comp.unix.programmer and comp.os.ms-windows.programmer,
if I'm not mistaken)


dave

--
Dave Vandervies (e-mail address removed)
Oh I see. Thanks... (please excuse me while I feel stupid for a bit)
We all get used to it around here.
--Christopher Benson-Manica and Keith Thompson in comp.lang.c
 
F

Friedhelm Usenet Waitzmann

Danilo Kempf said:
Now, every plugin can store some "private" data associated to
every quantum it has done some work on.
I thought, I'd realize this with some kind of dictionary. Every
plugin gets a unique key and can (using this key) store an
arbitrary value. The dictionary is implemented as a sorted array
- so using the key, the stored value can be retrieved by a
simple binary search.
I thought I could use a "private" pointer for the key. Say,
every plugin uses a pointer to a global variable from the plugin
library or a pointer to some space it has just allocated as the
key for the dictionary.
function pointers that were cast' to void *

Don't cast a function pointer to a pointer to an object. It is
undefined.
Can pointer values be compared with each other, so that I get my
sorted array right?
Im sure they can be compared for (in)equality, but for sorting
the dictionary I need a (deterministic) is-less-than comparison
at least.

I think, it is possible:

First, if given two variables

T a;
T b;

and two pointer variables

T *ap = &a;
T *bp = &b;

then the pointers stored in the variables ap and bp must have
different bit patterns, otherwise the program could not use them
at places in the code, where it does not know that they point to
different lvalues.

Second, the problem is, when given another variable

T *ap2 = &a;

the standard does not garantee that the pointers stored in ap and
ap2 share the same bit pattern representation though they are
garanteed to compare equal. The question is: How can we work
around the possibility, that there exist two unequal bit pattern
representations that point to the same lvalue?

The solution is: If you want to use the pointer values &a and &b
as key values, store them only once into pointer variables ap and
bp as above and do not copy them any further, but treat their
type as opaque. Instead of passing on these key values &a and &b,
pass on pointers to the variables ap and bp containing these key
values. (Therefore, the lifetimes of the variables ap and bp (and
of course a and b) must extend to all places in the program where
these key values are to be used.)

So, given the declarations (you may choose any other type for
char but char wastes less space) and definitions

typedef char *keyType;

static char plugin_a_my_address_is_the_key;

extern const keyType plugin_a_key
= &plugin_a_my_address_is_the_key;

static char plugin_b_my_address_is_the_key;

extern const keyType plugin_b_key
= &plugin_b_my_address_is_the_key;

the function key_compare(), used to compare (opaque) keys
accessed through pointers

#include <string.h> /* memcmp() */

int key_compare(const keyType *ap, const keyType *bp)
{
/* compare keys pointed to by ap resp. bp */
return memcmp(ap, bp, sizeof *ap);
}

the types and function declarations for the dictionary

struct dict_entry
{
const keyType *keyptr;
data_type data;
}

struct dictionary
{
size_t nentries;
struct dict_entry *entries;
};

extern data_type *dict_locate(
struct dictionary *dictptr, const keyType *keyptr);

/* returns a pointer to the data associated with *keyptr
*/

extern data_type *dict_insert(
struct dictionary *dictptr, const keyType *keyptr);

/* dict_insert() first tries to locate the entry identified by
* *keyptr, if it exists. If not, a new entry is created.
* Returns the pointer to the data of the entry located or
* created or NULL, if the entry could not be created.
*/

and the dictionary itself

struct dictionary myDictionary = {0, NULL};

you can retrieve a pointer to the data for plugin a:

data_type *dataptr = dict_locate(&myDictionary,&plugin_a_key);

dict_locate() uses the function compare_key_to_dict_entry()

#include <stdlib.h> /* bsearch(), NULL, qsort() */

static int compare_key_to_dict_entry(const void *keyptr,
const void *dictentryptr)
{
const keyType *kp = keyptr;
const struct dict_entry *dep = dictentryptr;

return key_compare(kp,dep->keyptr);
}

data_type *dict_locate(struct dictionary *dictptr,
const keyType *keyptr)
{
struct dict_entry *deptr =
bsearch(keyptr,dictptr->entries,dictptr->nentries,
sizeof dictptr->entries[0],
compare_key_to_dict_entry
);
return deptr != NULL? &deptr->data : NULL;
}

static int compare_dict_entries(const void *dictentryptrA,
const void *dictentryptrB)
{
const struct dict_entry
*depA = dictentryptrA,
*depB = dictentryptrB;

return key_compare(depA->keyptr,depB->keyptr);
}

data_type *dict_insert(struct dictionary *dictptr,
const keyType *keyptr)
{
data_type *dataptr;

if ((dataptr = dict_locate(dictptr,keyptr)) == NULL)
{
/* entry not found, extend dictionary */
size_t newnentries = dictptr->nentries + 1;
struct dict_entry *newarray =
realloc(dictptr->entries,
newnentries * sizeof newarray[0]);

if (newarray == NULL)
{
/* sorry, could not extend dictionary */
dataptr = NULL;
}
else
{
/* store new address of entry array */
dictptr->entries = newarray;
/* update number of entries */
dictptr->nentries = newnentries;
/* store the pointer to the key into the new entry
*/
newarray[newnentries - 1].keyptr = keyptr;

/* sort the array */
qsort(newarray,newnentries,sizeof newarray[0],
compare_dict_entries);

/* locate the new entry */
dataptr = dict_locate(dictptr,keyptr);
}
}
return dataptr;
}
Would casting those pointers to intptr_t help?
I think, it would not.
Does intptr_t guarantee that two pointers pointing to different
things get different intptr_t values?

Maybe. But another problem is that it is possible that two
pointers pointing to the same thing get different intptr_t
values.
Any thoughts on this are very much appreciated.

HTH
 
W

Walter Roberson

: T a;
: T b;

:and two pointer variables

: T *ap = &a;
: T *bp = &b;

:then the pointers stored in the variables ap and bp must have
:different bit patterns, otherwise the program could not use them
:at places in the code, where it does not know that they point to
:different lvalues.

To expand slightly on that theme:

If the compiler -did- know at some point that ap and bp point
to different lvalues, then it is hypothetically possible to use
the same bit patterns for both.

Imagine, for example, that the compiler did "variable lifetime
analysis" and established that the lifetime of the objects did not
overlap. In such a case, the compiler could use segment + offset
or register + offset representations, storing just the offset,
with the distinction between the two lvalues contextually implied
by the different segment descriptor or different base register
value.


Also, in cases where the variable lifetimes did overlap but in a
situation that was not too difficult for the compiler to track,
then the compiler could use the same bitpatterns for the two
pointers as stored, supplying an appropriate descriptor or base
address or base register context as required.

For example, if you had a very simple situation such as

{
char a[15];
char b[15] = "hello there!";
char *ap = &a;
char *bp = &b;
offset_t i;
for (i = 0; i < 15; i++ ) {
*ap++ = *bp++
}
printf( "%s\n", a );
}

then in theory ap and bp could be stored as (for example) just
the current index into the base object, such as 3 if (ap - a == 3)
with the compiler dropping in address a or b as the base address...

move #0, D0
move #0, D1
move #0, D2
P: cmp #15, D2
bge Q
move A0@D0, A1@D1
add #1, D0
add #1, D1
bra P
Q:

to invent an assembler notation.

In this hypothetical example, D0 -is- the stored pointer value ap,
and D1 -is- the stored pointer value bp: the bit patterns
for the two can be equal as long as the compiler has enough
context to know how to disambiguate the two.


There is no general requirement in C that pointers to different
objects may not have the same bitpattern, as long as you can
meet the constraints about testing for the null pointer,
that the == operator compares as different for pointers at different
offset within the same object, and that the == operator compares as
the same for pointers to the same offset within the same object,
and that == compares as different for pointers the different objects.
[I am not sure whether the last is well defined, but a lot of C programs
would break if it wasn't true, I suspect.]


Of course in complex situations where the compiler cannot keep track
of whether pointers are to different objects, it is going to need to
use different bitpatterns for the pointers. That simply doesn't
preclude it from using the same bitpattern for different pointers
in situations where it -can- keep track.
 
F

Friedhelm Usenet Waitzmann

Walter Roberson said:
Also, in cases where the variable lifetimes did overlap but in a
situation that was not too difficult for the compiler to track,
then the compiler could use the same bitpatterns for the two
pointers as stored, supplying an appropriate descriptor or base
address or base register context as required.
For example, if you had a very simple situation such as
{
char a[15];
char b[15] = "hello there!";
char *ap = &a;
char *bp = &b;
offset_t i;
for (i = 0; i < 15; i++ ) {
*ap++ = *bp++
}
printf( "%s\n", a );
}
then in theory ap and bp could be stored as (for example) just
the current index into the base object, such as 3 if (ap - a == 3)
with the compiler dropping in address a or b as the base address...
move #0, D0
move #0, D1
move #0, D2
P: cmp #15, D2
bge Q
move A0@D0, A1@D1
add #1, D0
add #1, D1
bra P
Q:
to invent an assembler notation.
In this hypothetical example, D0 -is- the stored pointer value ap,
and D1 -is- the stored pointer value bp: the bit patterns
for the two can be equal as long as the compiler has enough
context to know how to disambiguate the two.

So, may

{
char a[15];
char b[15] = "hello there!";
char *ap = &a, *bp = &b, *cp;

cp = ap; /* From now on, use ap's base address with cp, too. */
cp[0] = 1; /* use cp */
memcpy(&cp,&bp,sizeof cp);
/* Does the compiler know, that from now on it should use bp's
* base address with cp?
*/
cp[0] = 0; /* use cp */
}

cause undefined behaviour, because the compiler does not realize
that after memcpy() the base address used with cp should be the
same that is used with bp? In other words: Pointer variables may
not be read or written to by engaging functions that read from or
write to untyped buffers via generic pointers (i.e. void *)?

Or, the other way round: Whenever the address of a pointer
variable is taken, must it be made sure, that the pointer
variable contains the `full' pointer, i.e. the base address
included?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top