when is typecasting (unsigned char*) to (char*) dangerous?

K

Kaz Kylheku

thanks in advance for your help, tim

It's dangerous when it's done by someone who got other people
to answer his technical interview or homework questions.
 
N

Nick Keighley

include the subject of your post in the body of your post

"when is typecasting (unsigned char*) to (char*) dangerous? Options"

thanks in advance for your help, tim

what is "typecasting"? It isn't defined by the C Standard so I'm
guessing you're talking about the problems older actors have? Must be
some extended pun to do with stars and character actors not being
asked for autographs...
 
M

Malcolm McLean

thanks in advance for your help, tim
In narrow technical sense, it's almost never dangerous. chars can have
trap representations whilst unsigned chars can't, but it's most
unlikely your code will ever need to run on such a machine.

However casting unsigned char to char, though not the other way round,
usually means that someone doesn't know what they are doing. char
should be used for characters, i.e human-readable text unsigned char
for bytes, usually arbitrary bits, occasionally for small integers. If
you need a tiny signed integer, use signed char. It doesn't normally
make sense to convert an arbitrary bit pattern to a human-readable
character.
 
M

Markus Wichmann

In narrow technical sense, it's almost never dangerous. chars can have
trap representations whilst unsigned chars can't, but it's most
unlikely your code will ever need to run on such a machine.

However casting unsigned char to char, though not the other way round,
usually means that someone doesn't know what they are doing. char
should be used for characters, i.e human-readable text unsigned char
for bytes, usually arbitrary bits, occasionally for small integers. If
you need a tiny signed integer, use signed char. It doesn't normally
make sense to convert an arbitrary bit pattern to a human-readable
character.

Heh! Tell that to the guys who wrote libbzip2! They have a routine
called BZ2_bzBuffToBuffDecompress() and they think that the compressed
buffer pointer is of type "char*". Nice one. So I had to cast that
pointer to make the warning go away.

CYA,
Markus
 
N

nroberts

In narrow technical sense, it's almost never dangerous. chars can have
trap representations whilst unsigned chars can't, but it's most
unlikely your code will ever need to run on such a machine.

However casting unsigned char to char, though not the other way round,
usually means that someone doesn't know what they are doing. char
should be used for characters, i.e human-readable text unsigned char
for bytes, usually arbitrary bits, occasionally for small integers.

Not since the 70's...maybe earlier. The character sets on today's
systems extend the full length of the 8 bit, unsigned character if not
further.
If
you need a tiny signed integer, use signed char.

Not unless you've got a really good reason. It doesn't buy you
anything.
 
K

Kaz Kylheku

THIS IS NOT A HOMEWORK

In that case, ...

If the unsigned char * value is already well defined and everything, it is
safe to convert it to char *, and even to access the memory. In C, this
is not considered to be invalid aliasing. Any object can be accessed
as an array of characters, plain, signed or unsigned.

Sometimes this conversion will be necessary. If you know that some region of
memory contains a null terminate C string that you would like to compare with
strcmp, you will end up doing that cast.

One thing that may be dangerous is that char may be a signed value. This
means that through a char * pointer, some of the byte values will appear
negative. You can trip up like this:

int translated_char = table[*char_ptr]; /* oops, negative index */

Accessing memory using an unsigned char * pointer ensures that bytes are
treated as positive binary numbers.
 
B

Ben Pfaff

Vincenzo Mercuri said:
Ben Pfaff ha scritto:

I thought about this as well, but the typecast "per se" is in fact
a conversion between pointer types so I think it would be safe.

I agree that the cast itself is not the problem.
 
K

Keith Thompson

(no need to shout)
In that case, ...

If the unsigned char * value is already well defined and everything, it is
safe to convert it to char *, and even to access the memory. In C, this
is not considered to be invalid aliasing. Any object can be accessed
as an array of characters, plain, signed or unsigned.

Does the standard guarantee that? I was unable to find anything
that permits treating arbitrary objects as arrays of anything other
than unsigned char.

C99 6.1.6.1p4:

Values stored in non-bit-field objects of any other object type
consist of n * CHAR_BIT bits, where n is the size of an object
of that type, in bytes. The value may be copied into an object
of type unsigned char [n] (e.g., by memcpy); the resulting set
of bytes is called the *object representation& of the value.

C99 6.2.6.2p1:

For unsigned integer types other than unsigned char, the bits of the
object representation shall be divided into two groups: value bits
and padding bits (there need not be any of the latter).

p2:

Which of these [sign and magnitude, two's complement, ones'
complement] applies is implementation-defined, as is whether
the value with sign bit 1 and all value bits zero (for the
first two), or with sign bit and all value bits 1 (for ones’
complement), is a trap representation or a normal value.

As far as I can tell, given that CHAR_BIT==8, it would be legal for an
implementation to have plain char (if it's signed) and signed char have
a range of -127 .. +127, with the extra representation being a trap
representation. It would even be legal for signed char to have padding
bits, possibly leading to even more trap representation; given the
requirements for SCHAR_MIN and SCHAR_MAX, that's possible only if
CHAR_BIT > 8.

I seriously doubt that any real-world implementation takes advantage
of this.

I have a vague memory of a statement that plain and signed char cannot
have trap representations, but I can't confirm that from the standard.
 
K

Kaz Kylheku

(no need to shout)


Does the standard guarantee that?

Even if there is a trap representation there, it's not an aliasing issue.

If you could not alias an object using chars, then no access at all would
be well-defined.
 
K

Keith Thompson

Kaz Kylheku said:
Even if there is a trap representation there, it's not an aliasing issue.

If you could not alias an object using chars, then no access at all would
be well-defined.

I don't follow your reasoning.

Where does the standard say that you can alias any object with an
array of plain or signed char?

If you can't do so, how does that affect the ability to access an
object as its declared type, or as an array of unsigned char?
 
K

Kaz Kylheku

I don't follow your reasoning.

Where does the standard say that you can alias any object with an
array of plain or signed char?

6.5 paragraph 7. An object can be accessed with an lvalue which
is of character type.
 
J

James Kuyper

On 11/16/2011 02:52 PM, Keith Thompson wrote:
....
Where does the standard say that you can alias any object with an
array of plain or signed char?

6.5p7, last item: "a character type". The term "alias" is used only in
footnote 76, but that's sufficient for this purpose.
 
J

James Kuyper

On 11/16/2011 02:27 PM, Keith Thompson wrote:
....
I have a vague memory of a statement that plain and signed char cannot
have trap representations, but I can't confirm that from the standard.

I know of no reason why signed char (and therefore, char) cannot have
trap representations. However, every statement in 6.2.6.1p5 which says
that the behavior is undefined when a trap representation is involved,
explicitly excludes all character types, not just unsigned char. I'm not
quite sure what to make of that fact, but I'm sure that explicitly
excluding all character types was intentional; I'm not so sure whether
it was intentional to allow signed char to have trap representations.
 
H

Harald van Dijk

On 11/16/2011 02:27 PM, Keith Thompson wrote:
...


I know of no reason why signed char (and therefore, char) cannot have
trap representations. However, every statement in 6.2.6.1p5 which says
that the behavior is undefined when a trap representation is involved,
explicitly excludes all character types, not just unsigned char. I'm not
quite sure what to make of that fact, but I'm sure that explicitly
excluding all character types was intentional; I'm not so sure whether
it was intentional to allow signed char to have trap representations.

6.2.6.1p5 refers to the trap representations for the type of the
object. In other words, if an object p of type void * holds a trap
representation, 6.2.6.1p5 makes it explicit that reading that object
as void * is not valid. It doesn't say that signed char can be used to
access the bytes in p, it merely doesn't say that it can't. If signed
char has no trap representations, the required behaviour can be
inferred from other parts of the standard. If signed char does have
trap representations, then even though 6.2.6.1p5 doesn't explicitly
state that the behaviour is undefined, since the standard never
defines the behaviour, the end result is the same.
 
J

James Kuyper

6.2.6.1p5 refers to the trap representations for the type of the
object. In other words, if an object p of type void * holds a trap
representation, 6.2.6.1p5 makes it explicit that reading that object
as void * is not valid.

So, in your opinion, what is the significance of the exclusion of
character types from those statements? What do those statements mean,
with those exclusions, that differs from what they would mean if those
exclusions were dropped? Please accompany your explanation with specific
examples of code that would have defined behavior under the existing
rules, but not with that modification, or vice-versa.
 
H

Harald van Dijk

So, in your opinion, what is the significance of the exclusion of
character types from those statements? What do those statements mean,
with those exclusions, that differs from what they would mean if those
exclusions were dropped? Please accompany your explanation with specific
examples of code that would have defined behavior under the existing
rules, but not with that modification, or vice-versa.

If those exclusions were dropped, then using memcpy (or rather, a
custom function written in standard C that behaves exactly like
memcpy) to copy an object holding a trap representation would be
invalid.

/* the standard function memcpy, but implemented in 100% standard C */
extern void *mymemcpy(void *dest, void *src, size_t n);

struct S
{
int ptrIsValid;
void *ptr;
};

{
struct S s1, s2;
s2.ptrIsValid = 0; /* ptr is left uninitialised */
mymemcpy(&s1, &s2, sizeof(s1));
}

Without the exclusion in 6.2.6.1p5, if pointer types can have trap
representations, mymemcpy would potentially use a character type to
read a trap representation. This should be allowed, and by excluding
character types in that paragraph, this is allowed.
 
K

Keith Thompson

And I still don't. If, hypothetically, the standard permitted objects
to be aliased using unsigned chars but not signed or plain chars, how
would that imply that "no access at all would be well-defined"?
6.5 paragraph 7. An object can be accessed with an lvalue which
is of character type.

Ah, thank you, that's one of the clues I was missing. The other is
6.2.6.1p5 (thanks to James Kuyper for catching that one); that says
explicitly that you can access an object via an lvalue of character
type.

So let's assume that you have an object of type unsigned char with
the value SCHAR_MAX + 1, and you access it as a signed char --
but that representation is a trap representation
for signed char:

unsigned char u = SCHAR_MAX + 1;
signed char s = *(signed char*)&u;

My reading is that the behavior is undefined by omission. 6.2.6.1p5
says that storing a non-character trap representation has undefined
behavior; it explicitly excludes character types. 6.5p7 says that
an object shall have its stored value accessed *only* by an lvalue of
certain types, including character types, but that doesn't imply that
the behavior of such an access is defined. For example, accessing
an int object by an lvalue of of type int is permitted by 6.5p7,
but has undefined behavior if the object holds a trap representation.

If the behavior is defined, what is it?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top