Typecast clarification

Dik T. Winter · Feb 14, 2009

>
> Right, but when 32-bit operations were implemented in software, they
> used the ordering 2143 (two adjacent 16-bit integers).

Not right. The units that did the FIS (floating-point instruction set) that
could be added to a few models were hardware, not software. And there were
other units (with different instruction sets) for other models.

Keith Thompson · Feb 14, 2009

Dik T. Winter said:
Not right. The units that did the FIS (floating-point instruction
set) that could be added to a few models were hardware, not
software. And there were other units (with different instruction
sets) for other models.

Note that I was referring to 32-bit integer operations, but you're
probably still right about it being in hardware.

A quick look at the Wikipedia page for the PDP-11 says that it stored
16-bit words in little-endian order; double-words (i.e., 32-bit
integers) were supported by the "Extended Instruction Set", using a
middle-endian ordering, so that the value 0x0A0B0C0D would be stored
in 4 bytes as { 0x0B, 0x0A, 0x0D, 0x0C }.

Ben Pfaff · Feb 14, 2009

Keith Thompson said:
A quick look at the Wikipedia page for the PDP-11 says that it stored
16-bit words in little-endian order; double-words (i.e., 32-bit
integers) were supported by the "Extended Instruction Set", using a
middle-endian ordering, so that the value 0x0A0B0C0D would be stored
in 4 bytes as { 0x0B, 0x0A, 0x0D, 0x0C }.

Anyone know why? On the surface, this seems like a design that
only a crazy person would choose, but presumably there was some
adequate reason for it.

Kenny McCormack · Feb 14, 2009

Keith Thompson said:
A quick look at the Wikipedia page for the PDP-11 says that it stored
16-bit words in little-endian order; double-words (i.e., 32-bit
integers) were supported by the "Extended Instruction Set", using a
middle-endian ordering, so that the value 0x0A0B0C0D would be stored
in 4 bytes as { 0x0B, 0x0A, 0x0D, 0x0C }.

Yes, but you, of all people know that, in CLC-think, Wikipedia is
garbage. In fact, the conventional wisdom here is that whatever
Wikipedia says, the exact opposite is generally held as gospel here.

Tim Rentsch · Feb 14, 2009

James Kuyper said:
Note: this code assumes that there are only two possible
representations. That's a good approximation to reality, but it's not
the exact truth. If 'int' is a four-byte type (which it is on many
compilers), there's 24 different byte orders theoretically possible, 6
of which would be identified as Little Endian by this code, 5 of them
incorrectly. 18 of them would be identified as Big Endian, 17 of them
incorrectly.

This would all be pure pedantry, if it weren't for one thing: of those
24 possible byte orders, something like 8 to 11 of them (I can't
remember the exact number) are in actual use on real world machines.
Even that would be relatively unimportant if bigendian and littlendian
were overwhelmingly the most popular choices, but that's not even the
case: the byte orders 2134 and 3412 have both been used in some fairly
common machines.

The really pedantic issue is that the standard doesn't even guarantee
that 'char' and 'int' number the bits in the same order. A conforming
implementation of C could use the same bit that is used by an 'int'
object to store a value of '1' as the sign bit when the byte containing
that bit is interpreted as a char.

No, because you cannot dereference a pointer to void.

The key differences between char* and void* are that
a) you cannot dereference or perform pointer arithmetic on void*
b) there are implicit conversions between void* and any other pointer to
to object type.

The general rule is that you should use void* whenever the implicit
conversions are sufficiently important. The standard library's mem*()
functions are a good example where void* is appropriate, because they
are frequently used on pointers to types other than char. You should use
char* whenever your actually accessing the object as an array of
characters, which requires pointer arithmetic and dereferencing. You
should use unsigned char* when accessing the object as an array of
uninterpreted bytes.

There's no such thing as a typecast in C. There is a type conversion,
which can occur either implicitly, or explicitly. Explicit conversions
occur as a result of cast expressions.

The (char*) cast does not convert an integer into a char. It converts a
pointer to an int into a pointer to a char. The char object it points at
is the first byte of 'num'. The * operator interprets that byte as a char.

The result of the cast expression is a pointer to char; it can be
converted into a char and stored into a char variable, but the result of
that conversion is probably meaningless unless sizeof(intptr_t) == 1,
which is pretty unlikely. It would NOT, in general, have anything to do
with the value stored in the first byte of "num".

You could write:

char c = *(char*)&num;

The only type conversions that are reasonably safe in portable code are
the ones which occur implicitly, without the use of a cast, and even
those have dangers. Any use of a cast should be treated as a danger
sign. The pattern *(T*), where T is an arbitrary type, is called type
punning. In general, this is one of the most dangerous uses of a cast.
In the case where T is "char", it happens to be relatively safe.

The best answer to your question is to read section 6.3 of the standard.
However, it may be hard for someone unfamiliar with standardese to
translate what section 6.3 says into "safe" or "unsafe", "portable" or
"unportable". Here's my quick attempt at a translation:

* Any value may be converted to void; there's nothing that you can do
with the result. The only use for such a cast would be to shut up the
diagnostics that some compilers generate when you fail to do anything
with the value returned by a function. However, it is perfectly safe.

* Converting any numeric value to a type that is capable of storing that
value is safe. If the value is currently of a type which has a range
which is guaranteed to be a subset of the the range of the target type,
safety is automatic - for instance, when converting "signed char" to
"int". Otherwise, it's up to your program to make sure that the value is
within the valid range.

* Converting a value to a signed or floating point type that is outside
of the valid range for that type is not safe.

* Converting a numeric value to an unsigned type that is outside the
valid range is safe, in the sense that your program will continue
running; but the resulting value will be different from the original by
a multiple of the number that is one more than the maximum value which
can be stored in that type. If that change in value is desired and
expected (D&E), that's a good thing, otherwise it's bad.

* Converting a floating point value to an integer type will loose the
fractional part of that value. If this is D&E, good, otherwise, bad.

* Converting a floating point value to a type with lower precision will
generally lose precision. If this is acceptable and expected, good -
otherwise, bad.

* Converting a _Complex value to a real type will cause the imaginary
part of the value to be discarded. Converting it to an _Imaginary type
will cause the real part of the value to be discarded. Converting
between real and _Imaginary types will always result in a value of 0. In
each of these cases, if the change in value is D&E, good - otherwise, bad.

* Converting a null pointer constant to a pointer type results in a null
pointer of that type. Converting a null pointer to a different pointer
type results in a null pointer of that target type. Both conversions are
safe.

* Converting a pointer to an integer type is safe, but unless the target
type is either an intptr_t or a uintptr_t, the result is
implementation-defined, rendering it pretty much useless, at least in
portable code. If the target type is intptr_t or uintptr_t, the result
may be safely converted back to the original pointer type, and the
result of that conversion will compare equal to the original pointer.
You can safely treat that integer value just like any other integer
value, but conversion back to the original pointer type is the only
meaningful thing that can be done with it.

Converting a pointer to _Bool is always safe and well-defined.

* Except as described above, converting an integer value into a pointer
type is always dangerous. Note: an integer constant expression with a
value of 0 qualifies as a null pointer constant. Therefore, it qualifies
as one of the cases "described above".

* Any pointer to a function type may be safely converted into a pointer
to a different pointer type. The result may be converted back to the
original pointer type, in which case it will compare equal to the
original pointer. However, you can only safely dereference a function
pointer if it points at a function whose actual type is compatible with
the type that the function pointer points at.

* Conversions which add a qualifier to a pointer type (such as int* =>
const int*) are safe.

* Conversions which remove a qualifier from a pointer type (such as
volatile double * => double *) are safe in themselves, but are
invariably needed only to perform operations that can be dangerous
unless you know precisely what the relevant rules are.

* A pointer to any object can be safely converted into a pointer to a
character type. The result points at the first byte of that object.

* Conversion of a pointer to an object or incomplete type into a pointer
to a different object or incomplete type is safe, but only if it is
correctly aligned for that type. There are only a few cases where you
can be portably certain that the alignment is correct, which limits the
usefulness of this case.

Except as indicated above, the standard says absolutely nothing about
WHERE the resulting pointer points at, which in principle even more
seriously restricts the usefulness of the result of such a conversion.
[snip]

Three cases (perhaps meant to be included under the "there are only
a few cases" clause above) were left out:

1. A pointer to a structure may be safely and reliably converted to
a pointer to the first member of the structure, and vice versa.

2. A pointer to a member of a union may be safely and reliably
converted to a pointer to any other member of the union,
or to the union itself, and vice versa.

3. A pointer converted to (void*) and then converted to (char*)
produces a pointer to the same byte as a direct conversion
to (char*). [I posted an analysis on this sometime within the
last several weeks.]

Tim Rentsch · Feb 14, 2009

James Kuyper said:
No. The binary '&' operator works on the bits of the value, not the bits
of the representation. The expression 'i&1' returns a value of 1 if the
bit with a value of 1 is set in the representation of 'i', regardless of
which bit that is. The value of that expression will therefore be 1, a
value which will be preserved when converted to unsigned char, and will
still be preserved when it is promoted to either 'int' or 'unsigned
int', depending upon whether or not UCHAR_MAX < INT_MAX.

To test my assertion, you must look at the representation of 'i', not
just at it's value:

for(char *p = (char*)&i; p < (char*)(&i + 1); p++)
printf("%d ", *p);
printf("\n");

What I am saying is that the standard does not guarantee that any of the
values printed out by the above code will be '1'. If 'int' doesn't have
any padding bits, then exactly one of those values will be non-zero, and
the one that is non-zero will be either a power of two, or (if char is
signed) whatever value the sign bit represents, which depends upon
whether it has 2's complement, 1's complement, or sign-magnitude
representation.

There is also one other possibility, namely, that the
bit representation interpreted as (char) is a trap
representation rather than a valid (char) value.

[goes on to say this sort of example code should use
(unsigned char) rather than (char)...]

Yes, besides eliminating the problem of possible trap
representations it simplifies the discussion, and analysis.

Richard Tobin · Feb 14, 2009

A quick look at the Wikipedia page for the PDP-11 says that it stored
16-bit words in little-endian order; double-words (i.e., 32-bit
integers) were supported by the "Extended Instruction Set", using a
middle-endian ordering, so that the value 0x0A0B0C0D would be stored
in 4 bytes as { 0x0B, 0x0A, 0x0D, 0x0C }.

[/QUOTE]

Anyone know why? On the surface, this seems like a design that
only a crazy person would choose, but presumably there was some
adequate reason for it.

As I recall, the instructions (16x16->32 multiply, 32/16->16 divide)
worked on register pairs, not memory. The high order part was in
Rn, and the low-order part in Rn+1, and n had to be even.

This would only really correspond to middle-endian addressing if the
registers were copied to or from memory, and I think you had to use
two 16-bit MOV instructions for that.

I don't remember any instructions that operated directly on 32-bit
memory operands, but I could be wrong.

-- Richard

Dik T. Winter · Feb 15, 2009

O, well, this text is wrong. The EIS did *not* use double words in memory.
There were four EIS instructions (I just checked in the manual), MUL, DIV,
ASH and ASHC. All four have one operand in a register pair and the other
operand as a single word in memory. For MUL the operand is a single
register. For all the result is a register pair (and under some conditions
for MUL only a single register).
That is why I mentioned FIS which operated on 32-bit memory operands.

> Anyone know why? On the surface, this seems like a design that
> only a crazy person would choose, but presumably there was some
> adequate reason for it.

Taking a look at the EIS, the register pair consisted of an even numbered
register together with the odd numbered register next higher. (In the
hardware the second register was found by xoring the number with 1, what
that would do with the DIV, ASH and ASHC instructions if the code gave an
odd register is not told).

The high order part was in the even numbered register, the low order part
in the odd numbered register. I think the middle-endian order came because
the way you were apt to store or load two registers when using a bottom-up
stack. And I think the FIS (that did use double word memory) was build on
that model.

Richard Bos · Feb 15, 2009

Keith Thompson said:
CBFalconer said:

And people should also be aware that converting unsigned to signed
can also invoke UB.

Click to expand...

[snip definition of undefined behavior]

As Harald pointed out, the result of such a conversion is
implementation-defined, or it can raise an implementation-defined
signal.

True, but raising an implementation-defined signal does itself have
undefined behaviour, by omission. In particular, there is no portable
way of telling which signal is raised; therefore, there is no way (short
of invoking UB yourself) of finding out what the (default or installed)
handler for that signal is; and if it isn't SIG_IGN, it starts out as
SIG_DFL, and the Standard doesn't define how that handler behaves,
either. Hence, undefined behaviour by omission.
Of course, I don't expect this to be much of a problem in practice, and
would assume that nearly all implementations take one out of at most a
handful of variants on the the "I-D value" approach; but in principle,
the behaviour of anything which is allowed to raise a signal, is
undefined.

Richard

Richard Bos · Feb 15, 2009

Yes, but you, of all people know that, in CLC-think, Wikipedia is
garbage.

Not just in CLC-think; I've found that a reasonable approach to
Pikiwedia in all even remotely controversial topics. I'd not even trust
it to get the birth and death of a Renaissance artist right.

In fact, the conventional wisdom here is that whatever
Wikipedia says, the exact opposite is generally held as gospel here.

Oh, if only it were. That would mean that it would be useful. But no,
the truth value of an unchecked Pikiwedia article isn't 0 - which would
make it a usable resource, by negation - but as near to 1/2 as makes no
practical difference.
IOW, on any subject you choose, Pikiwedia might be completely correct,
or it might be entirely wrong, or (perhaps most likely) it might be
something which looks like correct but is deceptively wrong in places -
and what's most devious, _you have no way to tell which_.

Richard

Stephen Sprunk · Feb 15, 2009

Richard said:
Not just in CLC-think; I've found that a reasonable approach to
Pikiwedia in all even remotely controversial topics. I'd not even trust
it to get the birth and death of a Renaissance artist right.

An independent study found that, except for current events, Wikipedia
has a comparable error rate to Britannica. For current events, it's
absolutely atrocious -- but Britannica wouldn't have entries on those
topics at all yet.

Oh, if only it were. That would mean that it would be useful. But no,
the truth value of an unchecked Pikiwedia article isn't 0 - which would
make it a usable resource, by negation - but as near to 1/2 as makes no
practical difference.
IOW, on any subject you choose, Pikiwedia might be completely correct,
or it might be entirely wrong, or (perhaps most likely) it might be
something which looks like correct but is deceptively wrong in places -
and what's most devious, _you have no way to tell which_.

.... which is true for anything you read on the Internet or even the
"professional" media, which these days mostly just reprints press
releases without checking facts. At least WP has a policy of requiring
citations to other sources, and you can easily check those sources
yourself, if given, or see that the article does not follow policy, if not.

A side benefit of the Internet is that, due to all the crap out there,
kids these days are learning better critical reading skills than their
parents who grew up getting similarly bad information from the boob tube
without knowing it.

S

Kenny McCormack · Feb 15, 2009

Not just in CLC-think; I've found that a reasonable approach to
Pikiwedia in all even remotely controversial topics. I'd not even trust
it to get the birth and death of a Renaissance artist right.

Just in case there is any lack of clarity on the matter, I wasn't
defending Wikipedia. I have no real opinion on the matter, at least not
one that can be fully explained and documented with every t crossed and
every i and j dotted, to the satisfaction of the CLC-crowd. In short,
not something I'm going to go into here.

I was merely pointing out that Keith, as a member of the "Wiki is
garbage" He-Man CLC regs crowd, shouldn't be using it as a reference in
his posts. It just smells too much like "Wiki is crap (but only when it
disagrees with me)". The regs love to disparage Wiki when it fits their
agenda to do so.

Typecast clarification	29	May 26, 2010
C language. work with text	3	Dec 10, 2021
How to determine the byte order of machine.	30	Feb 20, 2012
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Portably determine endianness?	8	Jul 1, 2007
question about void typecast	8	Dec 26, 2007
Union test for endianess	47	Jun 17, 2011
Macro for setting MSB - Intended to work on both Little and Bigendian machines	8	Mar 26, 2013

Typecast clarification

Dik T. Winter

Keith Thompson

Ben Pfaff

Kenny McCormack

Tim Rentsch

Tim Rentsch

Richard Tobin

Dik T. Winter

Richard Bos

Richard Bos

Stephen Sprunk

Kenny McCormack

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads