Sort of mystified from an earlier thread

C

Chad

This was taken from the following:

http://groups.google.com/group/comp...802b3/663e9afae83d061c?hl=en#663e9afae83d061c

And I quote:

"Well, that's also ok for char**, since string literals are of type
char * in c. The general idea still stands, though.

The thing that irritates me is that despite all this, it's _trivial_
to violate const in C without resorting to all this.

const char foo[] = "mystring";
char *constviol = strchr(foo,*foo); "

What I don't get is that that 'const char f[]="mystring" ' is defined
as a char, but the prototype is defined as the following:

char *strchr(const char *s, int c);

When foo gets de-referenced (ie *foo), how come the compiler doesn't
complain about the difference between 'int' and 'char'?

Thanks in advance.

Chad
 
A

Arctic Fidelity

"Well, that's also ok for char**, since string literals are of type
char * in c. The general idea still stands, though.

The thing that irritates me is that despite all this, it's _trivial_
to violate const in C without resorting to all this.

const char foo[] = "mystring";
char *constviol = strchr(foo,*foo); "

What I don't get is that that 'const char f[]="mystring" ' is defined
as a char, but the prototype is defined as the following:

char *strchr(const char *s, int c);

When foo gets de-referenced (ie *foo), how come the compiler doesn't
complain about the difference between 'int' and 'char'?

I have actually been wondering about this as well. I know that char is an
integer type, but I still would have thought that int and char would have
brought up some kind of warning or what not. I'm not sure how I understand
how that all does it's thing.

- Arctic
 
C

Chad

Arctic said:
"Well, that's also ok for char**, since string literals are of type
char * in c. The general idea still stands, though.

The thing that irritates me is that despite all this, it's _trivial_
to violate const in C without resorting to all this.

const char foo[] = "mystring";
char *constviol = strchr(foo,*foo); "

What I don't get is that that 'const char f[]="mystring" ' is defined
as a char, but the prototype is defined as the following:

char *strchr(const char *s, int c);

When foo gets de-referenced (ie *foo), how come the compiler doesn't
complain about the difference between 'int' and 'char'?

I have actually been wondering about this as well. I know that char is an
integer type, but I still would have thought that int and char would have
brought up some kind of warning or what not. I'm not sure how I understand
how that all does it's thing.

- Arctic

Outside of the sloppy wording, here is my best guess on what is going
on.

When we go *foo, we are getting each character from the string.
Internally at each pass, we would have a varibale storing 'm', tnen
'y', etc. This would be the same has we done something like

char internal_string = 'm';

Then char would be automatically converted to integer (on the strchr
int c parameter). This might explain wny the gnu compiler didn't
complain even when I hard warning flags enabled.
 
P

pete

Chad said:
This was taken from the following:

http://groups.google.com/group/comp...802b3/663e9afae83d061c?hl=en#663e9afae83d061c

And I quote:

"Well, that's also ok for char**, since string literals are of type
char * in c. The general idea still stands, though.

The thing that irritates me is that despite all this, it's _trivial_
to violate const in C without resorting to all this.

const char foo[] = "mystring";
char *constviol = strchr(foo,*foo); "

What I don't get is that that 'const char f[]="mystring" ' is defined
as a char, but the prototype is defined as the following:

char *strchr(const char *s, int c);

When foo gets de-referenced (ie *foo), how come the compiler doesn't
complain about the difference between 'int' and 'char'?

Because there's no problem converting a char to an int,
unless (CHAR_MAX > INT_MAX)
which doesn't seem to be the case in any hosted sysytems.
 
P

pete

pete said:
This was taken from the following:

http://groups.google.com/group/comp...802b3/663e9afae83d061c?hl=en#663e9afae83d061c

And I quote:

"Well, that's also ok for char**, since string literals are of type
char * in c. The general idea still stands, though.

The thing that irritates me is that despite all this, it's _trivial_
to violate const in C without resorting to all this.

const char foo[] = "mystring";
char *constviol = strchr(foo,*foo); "

What I don't get is that that 'const char f[]="mystring" ' is defined
as a char, but the prototype is defined as the following:

char *strchr(const char *s, int c);

When foo gets de-referenced (ie *foo), how come the compiler doesn't
complain about the difference between 'int' and 'char'?

Because there's no problem converting a char to an int,
unless (CHAR_MAX > INT_MAX)
which doesn't seem to be the case in any hosted sysytems.

There's also

N869
6.3.1 Arithmetic operands
6.3.1.1 Boolean, characters, and integers

[#2] The following may be used in an expression wherever an
int or unsigned int may be used:
-- An object or expression with an integer type whose
integer conversion rank is less than the rank of int
and unsigned int.
 
G

Greg Comeau

...const char foo[] = "mystring";
char *constviol = strchr(foo,*foo); "

What I don't get is that that 'const char f[]="mystring" ' is defined
as a char,

No, it's defined as a conat char[9]. I'm assuming you mean
*f but that's a const char.
but the prototype is defined as the following:

char *strchr(const char *s, int c);
Correct.

When foo gets de-referenced (ie *foo), how come the compiler doesn't
complain about the difference between 'int' and 'char'?

Because it is one of the implicit conversions. You can do this with
no problem:

const char c = 'x';
int i;

i = c;

A similar thing happens during argument passing, since the prototype
specifically says the argument should be an int.

Why it is an int is a seperate story, but no doubt has to do
with the fact that routines such as getchar return int's (as
they, for better or worse, accomodate for the returned character
_and_ signals such as EOF).
 
O

Old Wolf

Arctic said:
Chad said:
const char foo[] = "mystring";
char *constviol = strchr(foo,*foo); "

char *strchr(const char *s, int c);

When foo gets de-referenced (ie *foo), how come the compiler doesn't
complain about the difference between 'int' and 'char'?

I have actually been wondering about this as well.

In C there is an implicit conversion from char to int.
This means that if there is a context expecting an int, but you
supply a char, then C will silently convert the char to an int.

Some people use this to call C a "weakly typed" language, and
say C has "holes in its type system". However those people are
usually Lisp trolls.

This means that the following code works:

char c = 5;
int i = c;
/* now 'i' has the value of 5 */

If C did not have this implicit conversion then you would have
to write something ugly like:

int i = (int)c;

To me, this is less type-safe than the real situation, as it
encourages the use of casts.

C also has an implicit conversion from int to char:

int i = 5;
char c = i;
/* now 'c' has a value of 5. */

But if 'i' had a value that couldn't be held by a char, then
we would have undefined behaviour (to cut a long story short).
Some compilers will issue a warning when you do a so-called
"narrowing conversion" like this.

C in fact has implicit conversions between all of the integral
and floating point types, with silent UB if the value can't
be represented.

By contrast, Java has implicit widening conversions, but no
implicit narrowing conversions. Java trolls often bring this up.
 
K

Keith Thompson

Old Wolf said:
C also has an implicit conversion from int to char:

int i = 5;
char c = i;
/* now 'c' has a value of 5. */

But if 'i' had a value that couldn't be held by a char, then
we would have undefined behaviour (to cut a long story short).

Cutting a long story short never works around here. :cool:}
Some compilers will issue a warning when you do a so-called
"narrowing conversion" like this.

C in fact has implicit conversions between all of the integral
and floating point types, with silent UB if the value can't
be represented.

Actually, overflow on a conversion has different rules than overflow
on an arithmetic operator. For arithmetic operators, overflow on a
signed integer type invokes undefined behavior. For conversion, it
either yields an implementation-defined result or raises an
implementation-defined signal (the latter is new in C99).

So, given
int i = <whatever>;
char c = i;
the implicit conversion of i to type char doesn't cause undefined
behavior -- and if plain char is unsigned, it yields a well-defined
result.
 
O

Old Wolf

Keith said:
Cutting a long story short never works around here. :cool:}

It either yields an implementation-defined result or raises an
implementation-defined signal (the latter is new in C99).

An implementation-defined signal might as well be UB, in practice.
I think the only thing you can do safely in a signal handler is set
a flag, or call exit(). What is the status of the char after the
signal handler returns? If it's indeterminate, then the subsequent
use of it will cause UB.
 
K

Keith Thompson

Old Wolf said:
An implementation-defined signal might as well be UB, in practice.
I think the only thing you can do safely in a signal handler is set
a flag, or call exit(). What is the status of the char after the
signal handler returns? If it's indeterminate, then the subsequent
use of it will cause UB.

So you set a flag in the signal handler; if the flag is set, you don't
look at the variable. It's not going to have anything useful in it
anyway.

I don't know of any implementation that takes advantage of the new
permission to raise a signal on overflow, and since the signal is
implementation-defined, you can't use it portably.

BTW, I think type char is guaranteed not to have any trap
representations, so an indeterminate value will just be one of the
values in the range CHAR_MIN..CHAR_MAX.
 
J

Jordan Abel

An implementation-defined signal might as well be UB, in practice.
I think the only thing you can do safely in a signal handler is set
a flag, or call exit().

Such a signal would most likely be SIGFPE. [I can't imagine what
else it would be], but the point is that it's something you can look
at the implementation's documents and find out what it does, and
that it'll do the same thing every time.
 
J

Jordan Abel

So you set a flag in the signal handler; if the flag is set, you don't
look at the variable. It's not going to have anything useful in it
anyway.

I don't know of any implementation that takes advantage of the new
permission to raise a signal on overflow, and since the signal is
implementation-defined, you can't use it portably.

BTW, I think type char is guaranteed not to have any trap
representations, so an indeterminate value will just be one of the
values in the range CHAR_MIN..CHAR_MAX.

what about 100000000 on a signed-magnitude system?

unsigned types are guaranteed not to have any trap representations.
signed types are not. and char's signed-ness is implementation-specified.
 
K

Keith Thompson

Jordan Abel said:
what about 100000000 on a signed-magnitude system?

unsigned types are guaranteed not to have any trap representations.
signed types are not. and char's signed-ness is implementation-specified.

unsigned char is specifically guaranteed not to have trap
representations (it's represented using a pure binary notation).
Other unsigned types have no such guarantee; they can have padding
bits. (I think the presence of padding bits allows, but does not
require the existence of trap representations.)

But C99 6.2.6.1p5 says:

Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined. If such a
representation is produced by a side effect that modifies all or
any part of the object by an lvalue expression that does not have
character type, the behavior is undefined. Such a representation
is called a _trap representation_.

This is the definition of "trap representation" (the term is in
italics). I think the "does not have character type" wording implies
that plain char, even if it's signed, cannot have any trap
representations, but I'd be happier if I could find a clearer
statement to that effect.

Assume a 2's-complement signed representation for plain char, with
CHAR_BIT==8. Without the above statement, the binary value 11111111
could be a trap representation; with it, it must represent the value
-128.
 
T

Tim Rentsch

Keith Thompson said:
unsigned char is specifically guaranteed not to have trap
representations (it's represented using a pure binary notation).
Other unsigned types have no such guarantee; they can have padding
bits. (I think the presence of padding bits allows, but does not
require the existence of trap representations.)

But C99 6.2.6.1p5 says:

Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined. If such a
representation is produced by a side effect that modifies all or
any part of the object by an lvalue expression that does not have
character type, the behavior is undefined. Such a representation
is called a _trap representation_.

This is the definition of "trap representation" (the term is in
italics). I think the "does not have character type" wording implies
that plain char, even if it's signed, cannot have any trap
representations, but I'd be happier if I could find a clearer
statement to that effect.

Assume a 2's-complement signed representation for plain char, with
CHAR_BIT==8. Without the above statement, the binary value 11111111
could be a trap representation; with it, it must represent the value
-128.

Of course you meant 10000000 as a possible trap representation;
11111111 in 2's complement is a representation for -1.

I think the conclusion here is wrong. A signed char type (and a
char type that matches signed char) is explicitly permitted to
have a trap representation, by 6.2.6.2 p3. Any consequences of
6.2.6.1 p5 cannot implicitly take away what 6.2.6.2 p3 expressly
allows.

Also, the wording in 6.2.6.1 p5 says that access through other
than a character type results in UB, not that access through a
character type doesn't result in UB. But, even if access though
a charater type can't result in UB, that doesn't mean the access
doesn't yield a trap representation. A trap representation is a
representation that doesn't (or "need not") represent a value of
the object type. If a trap representation is accessed though an
other-than-char type, it results in UB; if a trap representation
is accessed through a char type, it still would yield a trap
representation, just not one that causes UB. The result of such
an access could subsequently cause UB if it were converted to
int, for example; but if the value were simply read and then
written, using character types in both cases, then a trap
representation could be transmitted without causing UB.

(Most of the previous paragraph under the assumption that
accessing a char-type trap representation doesn't cause UB.)
 
K

Keith Thompson

Tim Rentsch said:
Of course you meant 10000000 as a possible trap representation;
11111111 in 2's complement is a representation for -1.

Of course; thanks for catching my error.
I think the conclusion here is wrong. A signed char type (and a
char type that matches signed char) is explicitly permitted to
have a trap representation, by 6.2.6.2 p3. Any consequences of
6.2.6.1 p5 cannot implicitly take away what 6.2.6.2 p3 expressly
allows.

Are you sure that's the right paragraph? 6.2.6.2p3 talks about
negative zeros, not trap representations.
Also, the wording in 6.2.6.1 p5 says that access through other
than a character type results in UB, not that access through a
character type doesn't result in UB. But, even if access though
a charater type can't result in UB, that doesn't mean the access
doesn't yield a trap representation. A trap representation is a
representation that doesn't (or "need not") represent a value of
the object type. If a trap representation is accessed though an
other-than-char type, it results in UB; if a trap representation
is accessed through a char type, it still would yield a trap
representation, just not one that causes UB. The result of such
an access could subsequently cause UB if it were converted to
int, for example; but if the value were simply read and then
written, using character types in both cases, then a trap
representation could be transmitted without causing UB.

(Most of the previous paragraph under the assumption that
accessing a char-type trap representation doesn't cause UB.)

Interesting. I had assumed that reading anything with a trap
representation must cause undefined behavior (the name "trap
representation" seems to imply that), but of course we can't really
infer anything about "trap representations" from the words "trap" and
"representation".

But if signed char can have trap representations, I don't see anything
in the standard that defines the behavior of reading such a
representation -- which makes it undefined behavior by default.

C99 6.2.6.1p5 says:

Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined.

I'm still confused about why it refers to "character type" rather than
just unsigned char.

I think this is a question for comp.std.c.
 
T

Tim Rentsch

Keith Thompson said:
Tim Rentsch said:
unsigned char is specifically guaranteed not to have trap
representations (it's represented using a pure binary notation).
Other unsigned types have no such guarantee; they can have padding
bits. (I think the presence of padding bits allows, but does not
require the existence of trap representations.)

But C99 6.2.6.1p5 says:

Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined. If such a
representation is produced by a side effect that modifies all or
any part of the object by an lvalue expression that does not have
character type, the behavior is undefined. Such a representation
is called a _trap representation_.

This is the definition of "trap representation" (the term is in
italics). I think the "does not have character type" wording implies
that plain char, even if it's signed, cannot have any trap
representations, but I'd be happier if I could find a clearer
statement to that effect.
[snip]
I think the conclusion here is wrong. A signed char type (and a
char type that matches signed char) is explicitly permitted to
have a trap representation, by 6.2.6.2 p3. Any consequences of
6.2.6.1 p5 cannot implicitly take away what 6.2.6.2 p3 expressly
allows.

Are you sure that's the right paragraph? 6.2.6.2p3 talks about
negative zeros, not trap representations.

Sorry, you're right, that's the wrong citation. It should
be 6.2.6.2 p2.

Interesting. I had assumed that reading anything with a trap
representation must cause undefined behavior (the name "trap
representation" seems to imply that), but of course we can't really
infer anything about "trap representations" from the words "trap" and
"representation".

But if signed char can have trap representations, I don't see anything
in the standard that defines the behavior of reading such a
representation -- which makes it undefined behavior by default.

C99 6.2.6.1p5 says:

Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined.

I'm still confused about why it refers to "character type" rather than
just unsigned char.

The best interpretation I can give to it is: it's legal to
read and write "values" of any character type, even "values"
that are trap representations; it's only when those trap
representations are converted to other types (as for
assignment or for operand promotion) that UB results. That
interpretation does fit pretty well both with what the
Standard says, and with what (IMO) "makes sense".

Granted, that interpretation does require (under existing
language in the Standard) some amount of "reading between
the lines"; I'm not sure I'd want to defend it. But that
interpretation does seem like it fits what the Standard "is
trying to say" better than any others I'm aware of.

I think this is a question for comp.std.c.

Ditto.
 
K

Keith Thompson

Tim Rentsch said:
The best interpretation I can give to it is: it's legal to
read and write "values" of any character type, even "values"
that are trap representations; it's only when those trap
representations are converted to other types (as for
assignment or for operand promotion) that UB results. That
interpretation does fit pretty well both with what the
Standard says, and with what (IMO) "makes sense".

I posted to comp.std.c. The one response so far is from Jack Klein;
he says the use of "character type" rather than "unsigned char" is a
flaw in the standard, and one that appears elsewhere. That makes
sense to me; with that one assumption everything else falls neatly
into place.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top