accessing char as int through union

Hallvard B Furuseth · Sep 27, 2006

Is the code below valid? Generally a value must be accessed
through the same type it was stored as, but there is an exception
for data stored through a character type. I'm not sure if that
applies in this case though:

#include <limits.h>

unsigned foo(void) {
static const union {
unsigned char str[4];
unsigned i;
} val = { { 1, 2, 3, 4 } };
if (sizeof(int) == 4 && CHAR_BIT == 8 && UINT_MAX == 0xFFFFFFFFUL)
return val.i
return 0;
}

As far as I can tell, the relevant standardese is in C99 6.5 p6-7:

6.5 Expressions

6. The effective type of an object for an access to its stored value is
the declared type of the object, if any.(72) If a value is stored into
an object having no declared type through an lvalue having a type
that is not a character type, then the type of the lvalue becomes the
effective type of the object for that access and for subsequent
accesses that do not modify the stored value. If a value is copied
into an object having no declared type using memcpy or memmove, or is
copied as an array of character type, then the effective type of the
modified object for that access and for subsequent accesses that do
not modify the value is the effective type of the object from which
the value is copied, if it has one. For all other accesses to an
object having no declared type, the effective type of the object is
simply the type of the lvalue used for the access.

7. An object shall have its stored value accessed only by an lvalue
expression that has one of the following types

73)
- (...)
- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
- a character type.

Footnotes:
72) Allocated objects have no declared type.
73) The intent of this list is to specify those circumstances in which
an object may or may not be aliased.

Cong Wang · Sep 27, 2006

Hallvard said:
Is the code below valid? Generally a value must be accessed
through the same type it was stored as, but there is an exception
for data stored through a character type. I'm not sure if that
applies in this case though:

#include <limits.h>

unsigned foo(void) {
static const union {
unsigned char str[4];
unsigned i;
} val = { { 1, 2, 3, 4 } };
if (sizeof(int) == 4 && CHAR_BIT == 8 && UINT_MAX == 0xFFFFFFFFUL)
return val.i
return 0;
}

As far as I can tell, the relevant standardese is in C99 6.5 p6-7:

6.5 Expressions

6. The effective type of an object for an access to its stored value is
the declared type of the object, if any.(72) If a value is stored into
an object having no declared type through an lvalue having a type
that is not a character type, then the type of the lvalue becomes the
effective type of the object for that access and for subsequent
accesses that do not modify the stored value. If a value is copied
into an object having no declared type using memcpy or memmove, or is
copied as an array of character type, then the effective type of the
modified object for that access and for subsequent accesses that do
not modify the value is the effective type of the object from which
the value is copied, if it has one. For all other accesses to an
object having no declared type, the effective type of the object is
simply the type of the lvalue used for the access.

7. An object shall have its stored value accessed only by an lvalue
expression that has one of the following types73)
- (...)
- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
- a character type.

Footnotes:
72) Allocated objects have no declared type.
73) The intent of this list is to specify those circumstances in which
an object may or may not be aliased.

Yeah, that code is valid. In fact, your code equals to the following on
IA-32 machines:

Cong Wang · Sep 27, 2006

Cong said:
Hallvard said:

Is the code below valid? Generally a value must be accessed
through the same type it was stored as, but there is an exception
for data stored through a character type. I'm not sure if that
applies in this case though:

#include <limits.h>

unsigned foo(void) {
static const union {
unsigned char str[4];
unsigned i;
} val = { { 1, 2, 3, 4 } };
if (sizeof(int) == 4 && CHAR_BIT == 8 && UINT_MAX == 0xFFFFFFFFUL)
return val.i
return 0;
}

As far as I can tell, the relevant standardese is in C99 6.5 p6-7:

6.5 Expressions

6. The effective type of an object for an access to its stored value is
the declared type of the object, if any.(72) If a value is stored into
an object having no declared type through an lvalue having a type
that is not a character type, then the type of the lvalue becomes the
effective type of the object for that access and for subsequent
accesses that do not modify the stored value. If a value is copied
into an object having no declared type using memcpy or memmove, or is
copied as an array of character type, then the effective type of the
modified object for that access and for subsequent accesses that do
not modify the value is the effective type of the object from which
the value is copied, if it has one. For all other accesses to an
object having no declared type, the effective type of the object is
simply the type of the lvalue used for the access.

7. An object shall have its stored value accessed only by an lvalue
expression that has one of the following types73)
- (...)
- an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
- a character type.

Footnotes:
72) Allocated objects have no declared type.
73) The intent of this list is to specify those circumstances in which
an object may or may not be aliased.

Click to expand...

Yeah, that code is valid. In fact, your code equals to the following on
IA-32 machines:

Sorry, a semicolon, not a comma. I mistyped that. ;-(

Hallvard B Furuseth · Oct 4, 2006

Cong said:
Hallvard said:

Is the code below valid? (...)
unsigned foo(void) {
static const union {
unsigned char str[4];
unsigned i;
} val = { { 1, 2, 3, 4 } };
if (sizeof(int) == 4 && CHAR_BIT == 8 && UINT_MAX == 0xFFFFFFFFUL)
return val.i
return 0;
}

As far as I can tell, the relevant standardese is in C99 6.5 p6-7:
(...)

Click to expand...

Yeah, that code is valid.
Good.

In fact, your code equals to the following on IA-32 machines:
unsigned j='\x04\x03\x02\x01';
But that is arch-dependent.

Yeah, but then I might as well write 0x01020304.

I was looking for a conformant, compile-time way to set the endianness
of an integer. The union hack above doesn't necessarily make a
compile-time _constant_ which the compiler will optimize away,
but at least it doesn't require run-time initialization.

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?= · Oct 4, 2006

Hallvard said:
Cong said:

Hallvard said:

Is the code below valid? (...)
unsigned foo(void) {
static const union {
unsigned char str[4];
unsigned i;
} val = { { 1, 2, 3, 4 } };
if (sizeof(int) == 4 && CHAR_BIT == 8 && UINT_MAX == 0xFFFFFFFFUL)
return val.i
return 0;
}

As far as I can tell, the relevant standardese is in C99 6.5 p6-7:
(...)

Click to expand...

Yeah, that code is valid.
Good.

In fact, your code equals to the following on IA-32 machines:
unsigned j='\x04\x03\x02\x01';
But that is arch-dependent.

Click to expand...

Yeah, but then I might as well write 0x01020304.

I was looking for a conformant, compile-time way to set the endianness
of an integer. The union hack above doesn't necessarily make a
compile-time _constant_ which the compiler will optimize away,
but at least it doesn't require run-time initialization.

Represent it in the host endian, operate on it as usual,
and assemble it to the required endian for the external media
when you need to (Such as when writing it to a file,sending it
over a network,etc.)

Hallvard B Furuseth · Oct 4, 2006

Nils said:
Represent it in the host endian, operate on it as usual, and assemble
it to the required endian for the external media when you need to
(Such as when writing it to a file, sending it over a network,etc.)

What external media? It's for a hash function (and the companion
compare function) for character data.

Hm. I might as well ask about that here too, since I'm writing
anyway...

A text processing program spends a lot of its time in a hash
function, even with the simple hash:
for (h = 0; *str; str++) h = h*33 ^ *(unsigned char *)str;
Improving that function doesn't seem to reduce the time in the
hash lookup function enough to matter.

However if the strings are int-aligned and zero-padded to
next int-boundary, this hash function seems much faster:

static const unsigned Mask = <representation {0,..., 0, 0xFF}>;
unsigned hash(const char *str) {
const unsigned *ptr = (const unsigned *)str;
unsigned h, chunk = *ptr;
for (h = Mumble1(chunk); chunk & Mask; h = Mumble2(h, chunk))
chunk = *++ptr;
return h;
}

....provided Mumble1() and Mumble2() are fast enough. Do anyone know any
fast hash functions which assume character data stored integers? The
hash functions I've seen that munge integers instead of bytes are very
thorough; I don't want to trade that much speed for quality. I guess I
might have to have different branches for endianness and integer size,
but that gets ugly pretty fast.

The keys are words in a dictionary or text file, average size just 5-6
characters + the padding zeroes. (And most searches find nothing.)

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?= · Oct 4, 2006

Hallvard said:
What external media? It's for a hash function (and the companion
compare function) for character data.

Then why do you need to care about endianess of something ?
Do you read data that comes in a certain endian ?

Hm. I might as well ask about that here too, since I'm writing
anyway...

A text processing program spends a lot of its time in a hash
function, even with the simple hash:
for (h = 0; *str; str++) h = h*33 ^ *(unsigned char *)str;
Improving that function doesn't seem to reduce the time in the
hash lookup function enough to matter.

However if the strings are int-aligned and zero-padded to
next int-boundary, this hash function seems much faster:

Atleast profile this so you know how much faster it is,
and wether any gain matters.

Ancient_Hacker · Oct 4, 2006

Using 33 for the hash multiplier might be faster if you did it instead
with (InChar << 5) + InChar

Eric Sosman · Oct 4, 2006

Hallvard said:
A text processing program spends a lot of its time in a hash
function, even with the simple hash:
[... go faster by munching multiple characters at at time ...]
The keys are words in a dictionary or text file, average size just 5-6
characters + the padding zeroes. (And most searches find nothing.)

If the strings are so short, making the hash faster isn't
likely to save a lot of time. Suppose you manage to inhale four
characters per iteration instead of one: Great, you make two
iterations instead of five or six, for something like a threefold
speedup. But those five or six iterations probably didn't take
very much time in absolute terms, so the savings are likely to
be small. Note, too, that whatever cache-miss penalties occur
are likely to afflict both schemes equally, on the first iteration.

How confident are you that the program truly spends "a lot of
its time" computing the hash function? Alternatively, how sure
are you of the "5-6 characters" estimate? Taken together they
seem implausible or at least surprising, and I'd suggest verifying
the measurements before expending a lot of effort.

Hallvard B Furuseth · Oct 4, 2006

Eric said:
Hallvard said:

A text processing program spends a lot of its time in a hash
function, even with the simple hash:
[... go faster by munching multiple characters at at time ...]
The keys are words in a dictionary or text file, average size just 5-6
characters + the padding zeroes. (And most searches find nothing.)

Click to expand...

If the strings are so short, making the hash faster isn't
likely to save a lot of time. Suppose you manage to inhale four
characters per iteration instead of one: Great, you make two
iterations instead of five or six, for something like a threefold
speedup.

Yup, and it's one of doing a _lot_ of hash table lookups.

I think I can halve the number of lookups with an average file,
but that may require fairly extensive code duplication to take
care of the 'easy' files. So at the moment I'm hoping to collect
the uglyness in one place

(...) Note, too, that whatever cache-miss penalties occur are
likely to afflict both schemes equally, on the first iteration.

I know. Will need to look at that.

How confident are you that the program truly spends "a lot of
its time" computing the hash function?

The profiler says so. External timing as well as profile data
change when I play with the hash function. Around 20% of the
time spent in the hash function and the lookup function, IIRC.
(Haven't got the program where I'm sitting.)

Alternatively, how sure
are you of the "5-6 characters" estimate?

Did some throwaway script to count. But it varies depending on
input files and language, of course.

Taken together they seem implausible or at least surprising, and
I'd suggest verifying the measurements before expending a lot of
effort.

OK. But what's the surprise?

Hallvard B Furuseth · Oct 4, 2006

Ancient_Hacker said:
Using 33 for the hash multiplier might be faster if you did
it instead with (InChar << 5) + InChar

Well, I do, but gcc figures out *33 anyway.

Eric Sosman · Oct 4, 2006

Hallvard said:
The profiler says so. External timing as well as profile data
change when I play with the hash function. Around 20% of the
time spent in the hash function and the lookup function, IIRC.
(Haven't got the program where I'm sitting.)

Are you able to separate the "around 20%" into time spent
in the hash computation and time spent in the search? They might
be too tightly integrated (especially since this seems to be an
area of the code you've been paying special attention to), but
maybe you could extract the hash computation into a test harness
of its own and get some absolute timings of it in isolation, then
combine those with absolute timings of hash-plus-search to get an
idea of how much each component uses. (It won't be perfectly
accurate, but it's better than nothing.)

You mentioned that you'd rather have a fast hash computation
than a high-quality one, and it seems to me you need some idea of
the time taken in the hash and the search, separately, to be able
to assess the trade-off. If it takes three milliquivers longer
to compute a better hash that then saves you 0.7 probes per search,
it might be time well spent.

Is it feasible to post actual code?

Ancient_Hacker · Oct 4, 2006

Hallvard B Furuseth wrote:

The profiler says so. External timing as well as profile data
change when I play with the hash function. Around 20% of the
time spent in the hash function and the lookup function, IIRC.

Okay, so if we were really smart and came up with an instantaneous hash
function, the program is only going to speed up 20%. Doesnt sound too
worthwhile. And if this is the biggest chunk, count yourself
fortunate-- the program sounds like it's pretty "flat", with no huge
time wasters.

The only way to improve a program that has gotten to this state is to
see if you can avoid doing the whole magilla at all, maybe by caching
previous results?

CBFalconer · Oct 5, 2006

Eric said:
Hallvard B Furuseth wrote:
.... snip ...

Are you able to separate the "around 20%" into time spent
in the hash computation and time spent in the search? They might
be too tightly integrated (especially since this seems to be an
area of the code you've been paying special attention to), but
maybe you could extract the hash computation into a test harness
of its own and get some absolute timings of it in isolation, then
combine those with absolute timings of hash-plus-search to get an
idea of how much each component uses. (It won't be perfectly
accurate, but it's better than nothing.)

To measure this, I recommend hashlib. Hashlib is instrumented to
display hits and misses, from which you can deduce the efficiency
of the hash, and especially the count of actual hash calls. You
have full source in standard compile anywhere C, so you can compile
with profiling enabled. Hashlib was originally born as a means to
test hash functions. See:

<http://cbfalconer.home.att.net/download/>

--
Some informative links:
< <http://www.geocities.com/nnqweb/>
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/>

Christopher Layne · Oct 5, 2006

Hallvard said:
A text processing program spends a lot of its time in a hash
function, even with the simple hash:
for (h = 0; *str; str++) h = h*33 ^ *(unsigned char *)str;
Improving that function doesn't seem to reduce the time in the
hash lookup function enough to matter.

However if the strings are int-aligned and zero-padded to
next int-boundary, this hash function seems much faster:

Is this on Solaris?

Christopher Layne · Oct 5, 2006

Christopher said:
Is this on Solaris?

BTW, the reason I mention this is that modern Solaris (and other OS' most
likely) use a "is there a NUL in 4 bytes" trick in various string functions
(on x86 atleast). But it depends on appropriate alignment. It may be that if
you're using a string function somewhere, such as strlen() (most likely not)
or strcmp() (possibly in your post-hash lookup code) you're seeing a speed
benefit there, as a result of the NUL-in-int trick, making the non-aligned
situations look slower than usual.

websnarf · Oct 6, 2006

Hallvard said:
What external media? It's for a hash function (and the companion
compare function) for character data.

Hm. I might as well ask about that here too, since I'm writing
anyway...

A text processing program spends a lot of its time in a hash
function, even with the simple hash:
for (h = 0; *str; str++) h = h*33 ^ *(unsigned char *)str;

This is both slow and low quality.

Improving that function doesn't seem to reduce the time in the
hash lookup function enough to matter.

Uhh ... this will depend on the strings you use in your test.

However if the strings are int-aligned and zero-padded to
next int-boundary, this hash function seems much faster:

static const unsigned Mask = <representation {0,..., 0, 0xFF}>;
unsigned hash(const char *str) {
const unsigned *ptr = (const unsigned *)str;
unsigned h, chunk = *ptr;
for (h = Mumble1(chunk); chunk & Mask; h = Mumble2(h, chunk))
chunk = *++ptr;
return h;
}

...provided Mumble1() and Mumble2() are fast enough.

This is correct. Your insight goes beyond most of the other posters in
this thread. It, exactly as you should expect, can be enormously
faster to do things this way.

But(!) the alignment requirement is not as crucial as you might think.
I.e., if you simply perform something like memcpy (&chunk, ptr +
i*sizeof (int), sizeof(int)); and otherwise perform the rest of the
computation as you suggest, it *STILL* potentially faster than most of
the high quality char-by-char based hash functions. Its the fact that
you are *amortizing* your inputs into larger chunks, that provides the
potential for speed up.

[...] Do anyone know any
fast hash functions which assume character data stored integers?

Well as I say, that assumption is not necessary. Instead, when I
looked at this I said "in some cases I can assume my architecture
doesn't *require* alignment, at all" and just conditionally compile for
that case. In any event here is my hash function:

http://www.azillionmonkeys.com/qed/hash.html

It is used in Konqueror/Apple's WebCore, Adobe's Flash multimedia
player, and Google's open source sparse hash library. Google's new
code searching tool also reveals that it is being used in a *ton* of
lesser known projects.

My design is, rather than trying to read a whole int's worth of
sequential char's and mix it in, I just do 16 bits at a time, and
unroll once. This releaves the pressure on the mixing a bit, eases
portability, while still taking advantage of the general idea (i.e.,
less mixing per char read, via proper amortization.) Intuitively, this
is the right trade off, because the left shift then add operations
continue to use *all* the input bits since the full 16 bits still fit
into the 32 bit accumulators. Furthermore, 32 bit integers is kind of
the sweet spot of performance in contemporary machines.

Bob Jenkin's has recently come up with an improvement to his own hash
function, but I have not analyzed it. Its probably worth taking a look
at nevertheless:

http://burtleburtle.net/bob/c/lookup3.c

[...] The
hash functions I've seen that munge integers instead of bytes are very
thorough; I don't want to trade that much speed for quality.

Yes, this is a misconception. What you want is the least amount of
mixing necessary to guarantee an "avalanching" property in the bits
(what Bob Jenkin's calls "no funnelling"). It turns out that most high
quality hash functions that operate on char's at a time actually
perform a tremendous overkill amount of mixing. The problem is that
they are trying to solve long term and short term avalanching solely in
the inner loop.

The hash function that I created splits the avalanching problem into
two distinct categories. Inner loop mixing, and final mixing (or
tempering.)

In the inner loop, the goal of the mixing is to make the input bits all
represented in the accumulator, and as independent as possible
(complete independence is, of course, impossible, but the goal is to do
as well as you can).

The final mixing is used to make sure that the last accumulated bits
(in the my case, I needed to examine 384 bits worth) have equal output
perturbation property without losing independence. The reason is that
(unless your hash design is really bad) early bits that are being
accumulated are going to get evenly spread and end up with a high noise
property regardless of what you do -- so it is only the final bits
which are at risk of not being spread thoroughly. But since this is a
finite problem, it requires only a finite amount of tempering to
provide this property.

So you can see the folly of the typical approach easily. On the final
inputs to the mixer, they have to do enough mixing in their inner loop
of those final input bits to have all the good bit avalanche properties
for their whole hash function. That means that at each iteration this
amount of full avalanching is applied over and over again to the
accumulator which holds prior input bit entropy that has already been
fully avalanched from previous iterations. Its obvious overkill. You
actually only need inter-bit independence and representation in the
inner loop, while final avalanching should be performed in a finite
operation at the end.

[...] I guess I
might have to have different branches for endianness and integer size,
but that gets ugly pretty fast.

Its not likely worth it.

The keys are words in a dictionary or text file, average size just 5-6
characters + the padding zeroes. (And most searches find nothing.)

Oh well, these are extremely short strings. What you're going to what
to do is special case them by *length*, not alignment. Then just
completely unroll the hash computation for all the short cases (say up
to 16 characters or so) and bail out with a more general hasher (like
mine, or Bob Jenkins').

Hallvard B Furuseth · Oct 6, 2006

Thanks to several people for answers and explanations, in particular
websnarf. A few details:

This is both slow and low quality.

I know. It's a very unoptimized program, so I just plugged in a
simple hash and kept moving for the time being. Then the time
came to look a bit closer at the hash function, and.... it didn't
help, or even slowed down. Much head-scratching ensued.
It helped to move away the strlen() which the above hash doesn't
need but others do...

Oh well, these are extremely short strings.

Actually that was quite wrong, I seem to have counted a test sample
instead or something at a late night. 5-6 letters is more like it for
some inputs, ~10 for others.

What you're going to what to do is special case them by *length*,
not alignment. Then just completely unroll the hash computation for
all the short cases (say up to 16 characters or so) and bail out
with a more general hasher (like mine, or Bob Jenkins').

Aah, of course.

Hallvard B Furuseth · Oct 6, 2006

I said:
I know. (...)

Sorry, I meant I knew the low quality slowed down the hash lookup,
I hadn't realized it was actually slow by itself, for a hash function.
(Due to the extra strlen needed elsewhere.)

Ancient_Hacker · Oct 6, 2006

Hallvard said:
(Due to the extra strlen needed elsewhere.)

In case somebody hasnt mentioned the obvious, you need not do a strlen
if:

(1) You always hash up a fixed number of elements (which could be dumb
if there's a large variation in string lengths).

(2) You store the length somewhere, or the length rounded up in words
or ints or longs.

Reading from different union members	5	Dec 30, 2009
three problems of C... a little bit long but not complex.......	2	May 21, 2013
Union and strict aliasing	4	Jul 28, 2012
Union member access	3	Jul 16, 2004
accessing members of union within a struct	6	Jul 7, 2012
Survey details won't go through using php, ajax, Mysql	0	Oct 26, 2023
Accessing array index addresses with custom datatype in a function	0	Jun 2, 2022
Effective types of union members	24	Nov 9, 2010

accessing char as int through union

Hallvard B Furuseth

Cong Wang

Cong Wang

Hallvard B Furuseth

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=

Hallvard B Furuseth

=?ISO-8859-1?Q?=22Nils_O=2E_Sel=E5sdal=22?=

Ancient_Hacker

Eric Sosman

Hallvard B Furuseth

Hallvard B Furuseth

Eric Sosman

Ancient_Hacker

CBFalconer

Christopher Layne

Christopher Layne

websnarf

Hallvard B Furuseth

Hallvard B Furuseth

Ancient_Hacker

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads