Casting pointer-to-void

T

T Koster

Hi lads,

In implementing a hash table, I offer the option of storing
arbitrarily-typed objects using a pointer-to-void.

My hash function is as follows:

unsigned int hashfn(void * key, unsigned int tablesize)
{
return *(unsigned int *)(key) % tablesize;
}

The question is not whether this is a good hash function (in terms of
hash collisions and whatnot), but rather if taking the first
width-of-an-unsigned-int bits from the object pointed to by 'key', and
returning their value as if they were representing an unsigned int,
through the use of a cast, is well-defined.

In general, I can see that if whatever 'key' were pointing to occupied
less storage space than an int, the behaviour would likely be
undefined, but in my case I can safely assume the object pointed to by
'key' is equal or greater in size than an int.

Thanks,
Thomas
 
I

infobahn

T said:
Hi lads,

In implementing a hash table, I offer the option of storing
arbitrarily-typed objects using a pointer-to-void.

My hash function is as follows:

unsigned int hashfn(void * key, unsigned int tablesize)
{
return *(unsigned int *)(key) % tablesize;
}

The question is not whether this is a good hash function (in terms of
hash collisions and whatnot), but rather if taking the first
width-of-an-unsigned-int bits from the object pointed to by 'key', and
returning their value as if they were representing an unsigned int,
through the use of a cast, is well-defined.

No, it isn't. Unless key is aligned in a way compatible with
unsigned int, the behaviour is undefined.
 
M

Michael Mair

T said:
Hi lads,

In implementing a hash table, I offer the option of storing
arbitrarily-typed objects using a pointer-to-void.

My hash function is as follows:

unsigned int hashfn(void * key, unsigned int tablesize)
{
return *(unsigned int *)(key) % tablesize;
}

The question is not whether this is a good hash function (in terms of
hash collisions and whatnot), but rather if taking the first
width-of-an-unsigned-int bits from the object pointed to by 'key', and
returning their value as if they were representing an unsigned int,
through the use of a cast, is well-defined.

In general, I can see that if whatever 'key' were pointing to occupied
less storage space than an int, the behaviour would likely be undefined,
but in my case I can safely assume the object pointed to by 'key' is
equal or greater in size than an int.

The problem is that there may be padding bits in the representation
of an unsigned int which -- if set in the "right" way -- may lead to
a trap representation. Then everything probably will go downhill.
The only safe way to access the representation of an object is treating
the bytes as unsigned char (which does not have padding bits or trap
representations).
From the "value" of the bytes accessed by (unsigned char *), you
can then construct a value which lies in the range from 0 to UINT_MAX.


Cheers
Michael
 
K

Keith Thompson

infobahn said:
No, it isn't. Unless key is aligned in a way compatible with
unsigned int, the behaviour is undefined.

Sure, but if all the objects are allocated by malloc(), the alignment
is guaranteed.
 
R

Richard Bos

Keith Thompson said:
Sure, but if all the objects are allocated by malloc(), the alignment
is guaranteed.

It still leaves the problem of trap representations, though. One can get
around that either by assuming that there are no trap representations in
unsigned integer types (not completely portable, but a good assumption
on modern desktop machines), or using a combination of unsigned chars
instead.

Richard
 
K

Keith Thompson

It still leaves the problem of trap representations, though. One can get
around that either by assuming that there are no trap representations in
unsigned integer types (not completely portable, but a good assumption
on modern desktop machines), or using a combination of unsigned chars
instead.

You can probably write a test that checks whether unsigned int has any
padding bits (by checking whether UINT_MAX == 2**(sizeof unsigned int)-1).
If the test fails, the program can immediately abort with an error
message; if it passes, you can safely assume that there are no padding
bits.
 
M

Michael Mair

Keith said:
You can probably write a test that checks whether unsigned int has any
padding bits (by checking whether UINT_MAX == 2**(sizeof unsigned int)-1).

You meant 2**(sizeof(unsigned int)*CHAR_BIT)-1 :)
 
I

infobahn

Keith said:
Sure, but if all the objects are allocated by malloc(), the alignment
is guaranteed.

Sure, but I could see nothing in the OP's article that showed he
was passing in pointers returned by malloc (and co).
 
B

Ben Pfaff

Michael Mair said:
The problem is that there may be padding bits in the representation
of an unsigned int which -- if set in the "right" way -- may lead to
a trap representation. Then everything probably will go downhill.

Even if padding bits are not used for a trap representation, they
may still interfere with the goal of obtaining a hash function,
because changing their values will change the hash value without
changing the integer value.
 
K

Keith Thompson

Michael Mair said:
Keith Thompson wrote: [...]
You can probably write a test that checks whether unsigned int has any
padding bits (by checking whether UINT_MAX == 2**(sizeof unsigned int)-1).

You meant 2**(sizeof(unsigned int)*CHAR_BIT)-1 :)

Yes, thank you. I had it right in my head; somehow it failed to
propagate to my fingers.
 
M

Michael Mair

Ben said:
Even if padding bits are not used for a trap representation, they
may still interfere with the goal of obtaining a hash function,
because changing their values will change the hash value without
changing the integer value.

Thanks for pointing this out.
It was so obvious that I need unsigned char that I forgot this
reason... :)


Cheers
Michael
 
J

Jack Klein

Sure, but if all the objects are allocated by malloc(), the alignment
is guaranteed.

Alignment is not the issue, or at least not the only issue. You are
missing the issue of effective type.

========
6.5 p6

The effective type of an object for an access to its stored value is
the declared type of the object, if any. If a value is stored into an
object having no declared type through an lvalue having a type that is
not a character type, then the type of the lvalue becomes the
effective type of the object for that access and for subsequent
accesses that do not modify the stored value.
========

A footnote explains that allocated objects have no declared type.

Now we move along to:

========
6.5 p7

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of
the object,

— a type that is the signed or unsigned type corresponding to the
effective type of the object,

— a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or

— a character type.
========

So if the declared type of the first sizeof(int) bytes of object, or
the effective type if it was dynamically allocated, is not compatible
with unsigned int, the behavior is undefined regardless of alignment
or trap representation.
 
M

Michael Wojcik

It still leaves the problem of trap representations, though. One can get
around that either by assuming that there are no trap representations in
unsigned integer types (not completely portable, but a good assumption
on modern desktop machines), or using a combination of unsigned chars
instead.

Or with memcpy, which is the moral equivalent of accessing via
unsigned char, though it may not be implemented that way:

unsigned int hashfn(void * key, unsigned int tablesize)
{
unsigned int keyval;
memcpy(&keyval, key, sizeof keyval);
return keyval % tablesize;
}

(Koster guaranteed that key points to an object of at least
sizeof(unsigned int) bytes, so accessing past the end of an object
with the memcpy isn't a problem.)

Obviously this has the same potential (though unlikely) problem with
trap representations, but avoids the "effective type" UB issue (as
does the combination-of-unsigned-chars solution).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top