Typecast clarification

S

syuga2012

Hi Folks,

To determine if a machine is little endian / big endian the foll. code
snippet is used...

int num = 1;

if( * (char *)&num == 1)
printf ("\n Little Endian");

else
printf("\n Big endian");

I needed a few clarifications regarding this.

1. Can we use void * instead of char * ?
2. When do we use void * and when char * ?
3. Does the above typecast convert an integer to a char (1 byte) in
memory?
For e.g if I used a variable ch, to store the result of the above
typecast

4. In general, when can we safely do typecasts ? Are such code
portable ?

Thanks a lot for your help. Appreciate it.

syuga
 
W

WANG Cong

Hi Folks,

To determine if a machine is little endian / big endian the foll. code
snippet is used...

int num = 1;

if( * (char *)&num == 1)
printf ("\n Little Endian");

else
printf("\n Big endian");

I needed a few clarifications regarding this.

1. Can we use void * instead of char * ?

Here? No, you can not dereference a void* pointer.
2. When do we use void * and when char * ?

void* is used generally, for example, malloc(), it can help
you to pass pointers without castings.

Here, in your case, using char * is because it only wants to
fetch one byte from a 4-byte int.
3. Does the above typecast convert an integer to a char (1 byte) in
memory?
For e.g if I used a variable ch, to store the result of the above
typecast

No, it casts an int pointer to char pointer.
4. In general, when can we safely do typecasts ? Are such code
portable ?

When you understand what you are doing. :)
 
G

Guest

Subject: "Typecast clarification"

technically, what you are doing is "casting" not "typecasting"
[prepare for flamewar]


Hi Folks,

To determine if a machine is little endian / big endian the foll. code
snippet is used...

int num = 1;

if( * (char *)&num == 1)
  printf ("\n Little Endian");
else
  printf("\n Big endian");

that takes the address of num, casts it to a pointer to char
(unsigned char might be slightly safer), then dereferences
it to give a char. If it's little endian the number will
be stored (assuming 8bit chars and 32 bit ints)

lo hi
01 00 00 00

lo hi
00 00 00 01

so the code will do what you expect. Note there
are more than 2 ways to order 4 objects...
(and some of them *have* been used)
I needed a few clarifications regarding this.

1. Can we use void * instead of char * ?

no. You cannot dereference a (void*)
(some compilers allow this but they are not compliant with the
standard)
2. When do we use void * and when char * ?

(void*) for anonymous or unknown type. (char*) for
pointer to characters (eg strings). (unsigned char*)
for getting at the representation. It is safe to cast
any data to unsigned char.
3. Does the above typecast convert an integer to a char (1 byte) in
memory?

it doesn't actually modifythe value in memory, but
only how the program looks at it.
    For e.g if I used a variable ch, to store the result of the above
typecast

sorry, lost me. Could you post code?

4. In general, when can we safely do typecasts ?

when necessary :) There's no short answer to this one.
Are such code portable ?

sometimes. More often then not, no though

Thanks a lot for your help. Appreciate it.

happy coding


--
Nick Keighley

"Half-assed programming was a time-filler that, like knitting,
must date to the beginning of human experience."
"A Fire Upon The Deep" by Verne Vinge
 
J

James Kuyper

Hi Folks,

To determine if a machine is little endian / big endian the foll. code
snippet is used...

int num = 1;

if( * (char *)&num == 1)
printf ("\n Little Endian");

else
printf("\n Big endian");

Note: this code assumes that there are only two possible
representations. That's a good approximation to reality, but it's not
the exact truth. If 'int' is a four-byte type (which it is on many
compilers), there's 24 different byte orders theoretically possible, 6
of which would be identified as Little Endian by this code, 5 of them
incorrectly. 18 of them would be identified as Big Endian, 17 of them
incorrectly.

This would all be pure pedantry, if it weren't for one thing: of those
24 possible byte orders, something like 8 to 11 of them (I can't
remember the exact number) are in actual use on real world machines.
Even that would be relatively unimportant if bigendian and littlendian
were overwhelmingly the most popular choices, but that's not even the
case: the byte orders 2134 and 3412 have both been used in some fairly
common machines.

The really pedantic issue is that the standard doesn't even guarantee
that 'char' and 'int' number the bits in the same order. A conforming
implementation of C could use the same bit that is used by an 'int'
object to store a value of '1' as the sign bit when the byte containing
that bit is interpreted as a char.
I needed a few clarifications regarding this.

1. Can we use void * instead of char * ?

No, because you cannot dereference a pointer to void.
2. When do we use void * and when char * ?

The key differences between char* and void* are that
a) you cannot dereference or perform pointer arithmetic on void*
b) there are implicit conversions between void* and any other pointer to
to object type.

The general rule is that you should use void* whenever the implicit
conversions are sufficiently important. The standard library's mem*()
functions are a good example where void* is appropriate, because they
are frequently used on pointers to types other than char. You should use
char* whenever your actually accessing the object as an array of
characters, which requires pointer arithmetic and dereferencing. You
should use unsigned char* when accessing the object as an array of
uninterpreted bytes.
3. Does the above typecast convert an integer to a char (1 byte) in
memory?

There's no such thing as a typecast in C. There is a type conversion,
which can occur either implicitly, or explicitly. Explicit conversions
occur as a result of cast expressions.

The (char*) cast does not convert an integer into a char. It converts a
pointer to an int into a pointer to a char. The char object it points at
is the first byte of 'num'. The * operator interprets that byte as a char.
For e.g if I used a variable ch, to store the result of the above
typecast

The result of the cast expression is a pointer to char; it can be
converted into a char and stored into a char variable, but the result of
that conversion is probably meaningless unless sizeof(intptr_t) == 1,
which is pretty unlikely. It would NOT, in general, have anything to do
with the value stored in the first byte of "num".

You could write:

char c = *(char*)#
4. In general, when can we safely do typecasts ? Are such code
portable ?

The only type conversions that are reasonably safe in portable code are
the ones which occur implicitly, without the use of a cast, and even
those have dangers. Any use of a cast should be treated as a danger
sign. The pattern *(T*), where T is an arbitrary type, is called type
punning. In general, this is one of the most dangerous uses of a cast.
In the case where T is "char", it happens to be relatively safe.

The best answer to your question is to read section 6.3 of the standard.
However, it may be hard for someone unfamiliar with standardese to
translate what section 6.3 says into "safe" or "unsafe", "portable" or
"unportable". Here's my quick attempt at a translation:

* Any value may be converted to void; there's nothing that you can do
with the result. The only use for such a cast would be to shut up the
diagnostics that some compilers generate when you fail to do anything
with the value returned by a function. However, it is perfectly safe.

* Converting any numeric value to a type that is capable of storing that
value is safe. If the value is currently of a type which has a range
which is guaranteed to be a subset of the the range of the target type,
safety is automatic - for instance, when converting "signed char" to
"int". Otherwise, it's up to your program to make sure that the value is
within the valid range.

* Converting a value to a signed or floating point type that is outside
of the valid range for that type is not safe.

* Converting a numeric value to an unsigned type that is outside the
valid range is safe, in the sense that your program will continue
running; but the resulting value will be different from the original by
a multiple of the number that is one more than the maximum value which
can be stored in that type. If that change in value is desired and
expected (D&E), that's a good thing, otherwise it's bad.

* Converting a floating point value to an integer type will loose the
fractional part of that value. If this is D&E, good, otherwise, bad.

* Converting a floating point value to a type with lower precision will
generally lose precision. If this is acceptable and expected, good -
otherwise, bad.

* Converting a _Complex value to a real type will cause the imaginary
part of the value to be discarded. Converting it to an _Imaginary type
will cause the real part of the value to be discarded. Converting
between real and _Imaginary types will always result in a value of 0. In
each of these cases, if the change in value is D&E, good - otherwise, bad.

* Converting a null pointer constant to a pointer type results in a null
pointer of that type. Converting a null pointer to a different pointer
type results in a null pointer of that target type. Both conversions are
safe.

* Converting a pointer to an integer type is safe, but unless the target
type is either an intptr_t or a uintptr_t, the result is
implementation-defined, rendering it pretty much useless, at least in
portable code. If the target type is intptr_t or uintptr_t, the result
may be safely converted back to the original pointer type, and the
result of that conversion will compare equal to the original pointer.
You can safely treat that integer value just like any other integer
value, but conversion back to the original pointer type is the only
meaningful thing that can be done with it.

* Except as described above, converting an integer value into a pointer
type is always dangerous. Note: an integer constant expression with a
value of 0 qualifies as a null pointer constant. Therefore, it qualifies
as one of the cases "described above".

* Any pointer to a function type may be safely converted into a pointer
to a different pointer type. The result may be converted back to the
original pointer type, in which case it will compare equal to the
original pointer. However, you can only safely dereference a function
pointer if it points at a function whose actual type is compatible with
the type that the function pointer points at.

* Conversions which add a qualifier to a pointer type (such as int* =>
const int*) are safe.

* Conversions which remove a qualifier from a pointer type (such as
volatile double * => double *) are safe in themselves, but are
invariably needed only to perform operations that can be dangerous
unless you know precisely what the relevant rules are.

* A pointer to any object can be safely converted into a pointer to a
character type. The result points at the first byte of that object.

* Conversion of a pointer to an object or incomplete type into a pointer
to a different object or incomplete type is safe, but only if it is
correctly aligned for that type. There are only a few cases where you
can be portably certain that the alignment is correct, which limits the
usefulness of this case.

Except as indicated above, the standard says absolutely nothing about
WHERE the resulting pointer points at, which in principle even more
seriously restricts the usefulness of the result of such a conversion.
However, in practice, on most real systems the resulting pointer will
point at the same location in memory as the original pointer.

However, it is only safe to dereference such a pointer if you do so in a
way that conforms to the anti-aliasing rules (6.5p7). And that is what
makes type punning so dangerous.
 
B

Boon

syuga said:
To determine if a machine is little endian or big endian, the
following code snippet is used...

int num = 1;

if( * (char *)&num == 1)
printf ("\n Little Endian");

else
printf("\n Big endian");

You don't need casts if you use memcmp.

$ cat endian.c
#include <stdint.h>
#include <string.h>
#include <stdio.h>

int main(void)
{
uint32_t i = 0x12345678;
uint8_t msb_first[4] = { 0x12, 0x34, 0x56, 0x78 };
uint8_t lsb_first[4] = { 0x78, 0x56, 0x34, 0x12 };
if (memcmp(&i, msb_first, 4) == 0) puts("BIG ENDIAN");
else if (memcmp(&i, lsb_first, 4) == 0) puts("LITTLE ENDIAN");
else puts("SOMETHING ELSE");
return 0;
}
 
B

Bruce Cook

James said:
Note: this code assumes that there are only two possible
representations. That's a good approximation to reality, but it's not
the exact truth. If 'int' is a four-byte type (which it is on many
compilers), there's 24 different byte orders theoretically possible, 6
of which would be identified as Little Endian by this code, 5 of them
incorrectly. 18 of them would be identified as Big Endian, 17 of them
incorrectly.

This would all be pure pedantry, if it weren't for one thing: of those
24 possible byte orders, something like 8 to 11 of them (I can't
remember the exact number) are in actual use on real world machines.
Even that would be relatively unimportant if bigendian and littlendian
were overwhelmingly the most popular choices, but that's not even the
case: the byte orders 2134 and 3412 have both been used in some fairly
common machines.

And there's arguments as to weather 2143, 3412 or 4321 is the "real" big-
endian once it jumped from 16 bits to 32 bits, endian-ness became a bit
complicated. It's original intent was to enable short-word and word fetches
to fetch the same value, assuming the word contained a small value. This
came about because processors often had octet as well as word instructions.

Once 32 bits came about and instructions had 8, 16 and 32 bit word operand
sizes, the question was do you optimize for 8-bit or 16 bit fetches.
Different processor designers came up with different solutions to this,
which lead to all the differing endians.

Then when you get to 64-bit native such as the Alpha, there's even more
combinations (8 octets per word instead of just 4).

The Alpha is interesting because it's endianness is controllable, although
practically you'd have it fixed for a particular operating system so testing
for it would still be valid.

[...]

Bruce
 
K

Keith Thompson

James Kuyper said:
Note: this code assumes that there are only two possible
representations. That's a good approximation to reality, but it's not
the exact truth. If 'int' is a four-byte type (which it is on many
compilers), there's 24 different byte orders theoretically possible, 6
of which would be identified as Little Endian by this code, 5 of them
incorrectly. 18 of them would be identified as Big Endian, 17 of them
incorrectly.

This would all be pure pedantry, if it weren't for one thing: of those
24 possible byte orders, something like 8 to 11 of them (I can't
remember the exact number) are in actual use on real world
machines. Even that would be relatively unimportant if bigendian and
littlendian were overwhelmingly the most popular choices, but that's
not even the case: the byte orders 2134 and 3412 have both been used
in some fairly common machines.

Really? I've only heard of 1234, 4321, 2143, and 3412 being used in
real life. In fact, I've only heard of one of the last two (whichever
one the PDP-11 used). What other orders have been used, and *why*?

[...]
* Converting a numeric value to an unsigned type that is outside the
valid range is safe, in the sense that your program will continue
running; but the resulting value will be different from the original
by a multiple of the number that is one more than the maximum value
which can be stored in that type. If that change in value is desired
and expected (D&E), that's a good thing, otherwise it's bad.

Almost. Converting a *signed or unsigned* value to an unsigned type
is safe, as you describe. Converting a floating-point value to
unsigned, if the value is outside the range of the unsigned type,
invokes undefined behavior.
 
J

jameskuyper

Keith said:
Really? I've only heard of 1234, 4321, 2143, and 3412 being used in

My reference to 2134 was a typo - I meant 2143.
real life. In fact, I've only heard of one of the last two (whichever
one the PDP-11 used). What other orders have been used, and *why*?

I remember seeing a web site that listed a large number number of
orders in current use, and cited specific machines for each byte
order. Unfortunately, I did not save the URL, so I can't cite it.
Sorry!
However it is sufficient for my purposes that 2143 and 3412 are in
use, and all you have to do to verify that is to do a web search for
"middle endian".
Almost. Converting a *signed or unsigned* value to an unsigned type
is safe, as you describe. Converting a floating-point value to
unsigned, if the value is outside the range of the unsigned type,
invokes undefined behavior.

You're right. It's not an issue I've had to worry about very often,
and I remembered it incorrectly. I did the first 7 items on my list
straight from memory, and I should have double-checked them against
the standard before posting.
 
L

LL

I'm a novice on C too but here makes no sense.
Refer to C Precedence Table
(http://isthe.com/chongo/tech/comp/c/c-precedence.html). Here unary * has
the highest precedence, then comes == then &. So what's this supposed to
mean? Dereferencing what?
Here? No, you can not dereference a void* pointer.


void* is used generally, for example, malloc(), it can help
you to pass pointers without castings.

Here, in your case, using char * is because it only wants to
fetch one byte from a 4-byte int.


No, it casts an int pointer to char pointer.


When you understand what you are doing. :)
Could someone tell me how does this test for endianness?
 
J

jameskuyper

LL said:
I'm a novice on C too but here makes no sense.
Refer to C Precedence Table
(http://isthe.com/chongo/tech/comp/c/c-precedence.html). Here unary * has
the highest precedence, then comes == then &. So what's this supposed to
mean? Dereferencing what?

It's a mistake to pay too much attention to precedence tables. The C
standard defines things in terms of grammar, not in terms of
precedence, and the relevant grammar rule is 6.5.3p1:

"unary-expression:
...
unary-operator cast-expression

unary-operator: one of
& * + - ~ !
"

Thus, & and * have the same "precedence", the key issue is whether or
not the thing to the right of the operator can be parsed as a cast-
expression. You can't parse anything to the right of the '*' operator
as a cast express that is shorter than "(char*)&num". Therefore the
&num has to be evaluated first, giving a pointer to 'num'. Then
(char*) is applied to that pointer, converting it to a pointer to
char. Finally, the unary '*' is evaluated, returning the value of the
byte at that location, interpreted as a char. That value is then
compared to 1 for equality.
Could someone tell me how does this test for endianness?

If 'int' is a little-endian type, the bit that will be set is in the
first byte; if it's a big-endian type, the bit that will be set is in
the last byte. If those were the only two possibilities, this would be
a good way to find out which one it is.
 
C

CBFalconer

LL said:
.... snip ...

I'm a novice on C too but here makes no sense.
Refer to C Precedence Table
(http://isthe.com/chongo/tech/comp/c/c-precedence.html). Here
unary * has the highest precedence, then comes == then &. So
what's this supposed to mean? Dereferencing what?

C doesn't define precedences. It defines the parsing grammar. See
the things marked C99 below, which refer to the C standard.
n869_txt.bz2 is a bzipped text file. I advise you to mistrust such
things as your reference (I haven't looked at it).

Some useful references about C:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://c-faq.com/> (C-faq)
<http://benpfaff.org/writings/clc/off-topic.html>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf> (C99)
<http://cbfalconer.home.att.net/download/n869_txt.bz2> (pre-C99)
<http://www.dinkumware.com/c99.aspx> (C-library}
<http://gcc.gnu.org/onlinedocs/> (GNU docs)
<http://clc-wiki.net/wiki/C_community:comp.lang.c:Introduction>
<http://clc-wiki.net/wiki/Introduction_to_comp.lang.c>
 
C

CBFalconer

Keith said:
.... snip ...

Almost. Converting a *signed or unsigned* value to an unsigned
type is safe, as you describe. Converting a floating-point value
to unsigned, if the value is outside the range of the unsigned
type, invokes undefined behavior.

And people should also be aware that converting unsigned to signed
can also invoke UB.

3.18
[#1] undefined behavior
behavior, upon use of a nonportable or erroneous program
construct, of erroneous data, or of indeterminately valued
objects, for which this International Standard imposes no
requirements

[#2] NOTE Possible undefined behavior ranges from ignoring
the situation completely with unpredictable results, to
behaving during translation or program execution in a
documented manner characteristic of the environment (with or
without the issuance of a diagnostic message), to
terminating a translation or execution (with the issuance of
a diagnostic message).

[#3] EXAMPLE An example of undefined behavior is the
behavior on integer overflow.
 
H

Harald van Dijk

And people should also be aware that converting unsigned to signed can
also invoke UB.

This is misleading. The result of an unsigned-to-signed conversion, when
the value is not representable in the signed type, is implementation-
defined (or in C99, an implementation-defined signal may be raised). It's
not considered an overflow, and it can be useful in correct programs.
 
K

Keith Thompson

CBFalconer said:
And people should also be aware that converting unsigned to signed
can also invoke UB.
[snip definition of undefined behavior]

As Harald pointed out, the result of such a conversion is
implementation-defined, or it can raise an implementation-defined
signal.

But of course James Kuyper already covered that case (though he said
it's "not safe", which was appropriate wording for the level at which
he was aiming). I only commented on the floating-point-to-unsigned
issue because James got the rest of it right.
 
G

Guest

The really pedantic issue is that the standard doesn't even guarantee
that 'char' and 'int' number the bits in the same order.

what!?

So this gives implementation defined behaviour:
int i = 1;
printf ("%d\n", (unsigned char)(i & 1));

[ignoring the fact that printf() itself might do something strange
if a sanscrit font were selected]

A conforming
implementation of C could use the same bit that is used by an 'int'
object to store a value of '1' as the sign bit when the byte containing
that bit is interpreted as a char.

I read that three times and didn't understand it.
Assuming (for concretness) that we have 16 bit bytes
and 8 bit chars and the ints are lsb at lower addresss.

int i = 0x0080;

lo hi
80 00

the one bit is not in a sign position, but
char c = *(char*)i;

c will equal 0x80 so the one bit *is* in a sign
position. is that what you were talking about.


The idea of the bits being renumbered is... unsettling
[Darth Vader's first C programming class]
 
J

James Kuyper

Richard said:
CBFalconer said:


I /have/ looked at it, and it doesn't contain any mistakes that p53
of K&R2 doesn't contain. It's actually slightly more useful than
the K&R2 version, since it labels ambiguous operators as being
either the unary or binary version (and does so correctly).

(But then I would expect nothing less from a page written by Curt
Landon Noll.)

Like Chuck, I didn't look at that page either, before posting my earlier
response. When I did look just now, I find that RH is right; insofar as
the grammar of C can be approximately described in terms of precedence,
and associativity, that page does so reasonably well.

LL was misinterpreting it when he assumed that it was telling him that
unary & has lower precedence than ==. I think he missed the distinction
between unary '&' and binary '&', and didn't notice the unary '&' right
next to the unary '*'.
 
J

James Kuyper

what!?

So this gives implementation defined behaviour:
int i = 1;
printf ("%d\n", (unsigned char)(i & 1));

No. The binary '&' operator works on the bits of the value, not the bits
of the representation. The expression 'i&1' returns a value of 1 if the
bit with a value of 1 is set in the representation of 'i', regardless of
which bit that is. The value of that expression will therefore be 1, a
value which will be preserved when converted to unsigned char, and will
still be preserved when it is promoted to either 'int' or 'unsigned
int', depending upon whether or not UCHAR_MAX < INT_MAX.

To test my assertion, you must look at the representation of 'i', not
just at it's value:

for(char *p = (char*)&i; p < (char*)(&i + 1); p++)
printf("%d ", *p);
printf("\n");

What I am saying is that the standard does not guarantee that any of the
values printed out by the above code will be '1'. If 'int' doesn't have
any padding bits, then exactly one of those values will be non-zero, and
the one that is non-zero will be either a power of two, or (if char is
signed) whatever value the sign bit represents, which depends upon
whether it has 2's complement, 1's complement, or sign-magnitude
representation.

In practice, I'd be very surprised if the non-zero value was anything
other than 1, and I think it might be a good thing for the standard to
be more specific about such things. I don't think it would cause any
problems if the standard specified that all of the value bits stored
within a given byte of an integer type must have values that represent
consecutive powers of 2, in the same order for all integer types, with
the sign bit being adjacent to the value bit with the highest value.
Does anyone know of an implementation for which that is not true?
I read that three times and didn't understand it.
Assuming (for concretness) that we have 16 bit bytes
and 8 bit chars and the ints are lsb at lower addresss.

int i = 0x0080;

lo hi
80 00

the one bit is not in a sign position, but
char c = *(char*)i;

c will equal 0x80 so the one bit *is* in a sign
position. is that what you were talking about.

No. It will be clearer if we use a different value for i, and talk about
unsigned char, rather than char, because my point really wasn't specific
to the sign bit, it applies equally well to any bit, and is easier to
explain without the distraction of the possible existence of a sign bit
in char.

Assume:
sizeof(int) = 2
UCHAR_MAX = 255
i == 0x0040
*(1 + (unsigned char*)&i) == 0.

I am saying that the standard allows the existence of an implementation
with those things could be true, while the value of *(unsigned char*)&i
could be any of the following: 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40,
or 0x80.
The idea of the bits being renumbered is... unsettling
[Darth Vader's first C programming class]

The C standard fails to say a great many things that most C programmers
take for granted. In some cases, there's a good reason for it. I'm not
sure this is one of those cases.
 
B

Bruce Cook

Keith said:
Really? I've only heard of 1234, 4321, 2143, and 3412 being used in
real life. In fact, I've only heard of one of the last two (whichever
one the PDP-11 used). What other orders have been used, and *why*?

PDP-11 was original big-endian , 21 - only 16 bits.

There were no 32-bit instructions, so the endian-ness was restricted to the
2 options big/little

Bruce
 
K

Keith Thompson

Bruce Cook said:
PDP-11 was original big-endian , 21 - only 16 bits.

There were no 32-bit instructions, so the endian-ness was restricted to the
2 options big/little

Right, but when 32-bit operations were implemented in software, they
used the ordering 2143 (two adjacent 16-bit integers).
 
K

Kaz Kylheku

C doesn't define precedences. It defines the parsing grammar.

The C grammar does in fact define precedences, implicitly. The grammar can be
shown to exhibit precedence, by reduction to an equivalent grammar which uses
precedence to generate the same language. The concepts of associativity
and precedence are mathematically precise.

Most C programmers in fact work with these concepts, rather than the factored
grammar. The K&R2 gives an associativity and precedence table on page 53;
even Dennis Ritchie thinks of C expression grammar in terms of associativity
and precedence.

So it's a perfectly accurate remark to say that in the C expression a * b + c,
the * operator has a higher precedence than the + operator.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top