sizeof C integral types

Dan Pop · Jan 15, 2004

In said:
Dan Pop said:

sizeof(long) > sizeof(char) ?

Click to expand...

Implicitly guaranteed for hosted implementations, because the
library specification relies on INT_MAX >= UCHAR_MAX [...]

Click to expand...

By which I presume you mean an 'int' must be able to hold all possible
values of an 'unsigned char', required in (for example) getchar()?

Yes. Both <stdio.h> and <ctype.h> rely on this property of int.

Dan

pete · Jan 15, 2004

nrk wrote:

This follows from the specification that states that
value bits in signed types have the same meaning as corresponding
value bits in the unsigned types

I know that's not supposed to mandate sign and magnitude
representation of negative integers, but it seems like it does.

and the stipulation
that an unsigned integer type with n bits must be
able to represent values in the range [0, 2^(n-1)].

Dan Pop · Jan 15, 2004

Nor do I. And even though I at first thought it was technically
wrong because of padding bits, I now think that while it still may be
wrong, it's less wrong than I thought.

a) Plain char is unsigned. INT_MAX must be at least UCHAR_MAX so that
getchar() can return any plain char value, and INT_MIN must be less than
or equal to -32767. So the total number of values of 'int' must be at
least UCHAR_MAX+32768, which requires more bits than CHAR_BIT. Q.E.D.

b) Plain char is signed. The range of char, i.e., of signed char, must
be a subrange of the range of int. But is it possible we might have

The properties of plain char don't matter, it is unsigned char that
matters.

#define CHAR_BIT 16
#define UCHAR_MAX 65535
#define SCHAR_MIN -32767 /* !!! */
#define SCHAR_MAX 32767
#define INT_MIN -32768
#define INT_MAX 32767
#define EOF -32768

Is anything wrong, from the C standpoint, with these definitions?

Yes, for a hosted implementation: int cannot represent the whole range
of unsigned char.

Furthermore, INT_MIN and EOF, as defined above, do not have type int.

Dan

Dan Pop · Jan 15, 2004

In said:
Yes, something is wrong. If CHAR_BIT is 16, SCHAR_MIN *has* to be -32768.

Has it? How would you represent -32768 using one's complement or sign
magnitude? Furthermore, even for implementations using two's complement,
the representation with the sign bit set and all the value bits zero is
allowed to be a trap representation.

Dan

nrk · Jan 15, 2004

pete said:
I know that's not supposed to mandate sign and magnitude
representation of negative integers, but it seems like it does.

Sorry, that's my mistake for not reading further on. Just a little further
down the standard stipulates how a set sign bit will modify the value
represented in the value bits and gives three choices:

sign and magnitude
sign bit has value -(2^n) 2's complement
sign bit has value -(2^n - 1) 1's complement.

However, my earlier conclusion that SCHAR_MIN must be -32768 still stands,
as Arthur was trying to mix both 1's complement (SCHAR_MIN) and 2's
complement (INT_MIN) representations, which is not allowed. While the
interpretation of the sign bit is implementation defined, the
interpretation needs to be consistent across all the signed integer types.

-nrk.

and the stipulation
that an unsigned integer type with n bits must be
able to represent values in the range [0, 2^(n-1)].

Click to expand...

Dan Pop · Jan 15, 2004

In said:
Sorry, that's my mistake for not reading further on. Just a little further
down the standard stipulates how a set sign bit will modify the value
represented in the value bits and gives three choices:

sign and magnitude
sign bit has value -(2^n) 2's complement
sign bit has value -(2^n - 1) 1's complement.

And just immediately afterwards that, it says:

Which of these applies is implementation-defined, as is
^^^^^
whether the value with sign bit 1 and all value bits zero
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(for the first two), or with sign bit and all value bits 1 (for
^^^^^^^^^^^^^^^^^^^
one's complement), is a trap representation or a normal value.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

However, my earlier conclusion that SCHAR_MIN must be -32768 still stands,

Does it?

as Arthur was trying to mix both 1's complement (SCHAR_MIN) and 2's

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

complement (INT_MIN) representations, which is not allowed.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Where did you get this idea from?

While the
interpretation of the sign bit is implementation defined, the
interpretation needs to be consistent across all the signed integer types.

Chapter and verse, please.

Dan

pete · Jan 15, 2004

nrk said:
Sorry, that's my mistake for not reading further on.
Just a little further
down the standard stipulates how a set sign bit will modify the value
represented in the value bits and gives three choices:

sign and magnitude
sign bit has value -(2^n) 2's complement
sign bit has value -(2^n - 1) 1's complement.

However, my earlier conclusion that SCHAR_MIN must
be -32768 still stands,
as Arthur was trying to mix both 1's complement (SCHAR_MIN) and 2's
complement (INT_MIN) representations, which is not allowed.

I believe that he may have conceived the whole thing in 2's
complement and that -32767 is a valid limit for 2's complement.

nrk · Jan 15, 2004

Dan said:
In <[email protected]> nrk

And just immediately afterwards that, it says:

Which of these applies is implementation-defined, as is
^^^^^
whether the value with sign bit 1 and all value bits zero
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(for the first two), or with sign bit and all value bits 1 (for
^^^^^^^^^^^^^^^^^^^
one's complement), is a trap representation or a normal value.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Yes.

Does it?

No.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Where did you get this idea from?

From the fact that his SCHAR_MIN is -32767 and INT_MIN is -32768 (however,
as you've shown my conclusions were wrong).

Chapter and verse, please.

6.2.6.2, I interpreted this from the part that talks about signed integer
types. Specifically:

Which of these applies is implementation-defined, as is
^^^^^

which I assumed to mean that the definition is implementation-defined and
that definition applies to all signed integer types (since the section
talks about all signed integer types).

What you're saying is that the implementation can pick and choose how to
interpret the sign bit and whether the particular combination mentioned is
a trap representation in each signed integer type (Out of curiosity, are
there any real-world examples of this?). Since I am no expert at
interpreting the standard, I am convinced that your interpretation is
correct.

As you've show elsethread, the problem with Arthur's implementation is that
int does not have the same range as unsigned char.

-nrk.

Arthur J. O'Dwyer · Jan 15, 2004

Yes, for a hosted implementation: int cannot represent the whole range
of unsigned char.

Okay, then I just don't see why you say that. I thought you were
talking about the EOF-and-getchar issue, but you're not. What *are*
you referring to? Chapter and verse would probably help.

Furthermore, INT_MIN and EOF, as defined above, do not have type int.

Sorry. But you know what I meant.

-Arthur

glen herrmannsfeldt · Jan 15, 2004

Hallvard B Furuseth wrote:
(snip)

No. BTW, sizeof(char) == 1 by definition.

However, sizeof(long) == 1, which I think implies sizeof(int) == 1,
would break several very common idioms, e.g.

int ch;
while ((ch = getchar()) != EOF) { ... }

because EOF is supposed to be a value which is different from all
'unsigned char' values. That is only possible when 'int' is wider
than 'unsigned char'.

While technically true, if long, int, and char were all 32 bits
or more I wouldn't worry about it.

Moore's law doesn't tend to apply to alphabets or character
sets, so we should be safe for many years to come.

-- glen

Dan Pop · Jan 16, 2004

In said:
6.2.6.2, I interpreted this from the part that talks about signed integer
types. Specifically:

which I assumed to mean that the definition is implementation-defined and
that definition applies to all signed integer types (since the section
talks about all signed integer types).

By your logic, all signed types would have the same number of value bits
and so on...

What you're saying is that the implementation can pick and choose how to
interpret the sign bit and whether the particular combination mentioned is
a trap representation in each signed integer type

If there is no wording prohibiting this, an implementor is free to do it.
And I can find no such wording.

(Out of curiosity, are there any real-world examples of this?).

I sincerely hope there aren't. But this doesn't affect the discussion
about hypothetical conforming implementations...

Dan

Dan Pop · Jan 16, 2004

Okay, then I just don't see why you say that. I thought you were
talking about the EOF-and-getchar issue, but you're not. What *are*
you referring to? Chapter and verse would probably help.

Yes, I was talking about the EOF-and-getchar issue. Why would you think
otherwise?

Dan

Dan Pop · Jan 16, 2004

In said:
Hallvard B Furuseth wrote:
(snip)

While technically true, if long, int, and char were all 32 bits
or more I wouldn't worry about it.

Moore's law doesn't tend to apply to alphabets or character
sets, so we should be safe for many years to come.

You seem to be blissfully ignoring the binary files, which have nothing
to do with alphabets or character sets.

Dan

Hallvard B Furuseth · Jan 16, 2004

Dan said:
You seem to be blissfully ignoring the binary files, which have nothing
to do with alphabets or character sets.

Also, there are people out there who deliberately send data to a program
which will cause it to misbehave. You may have heard of computer
viruses, for example...

pete · Jan 16, 2004

Dan said:
Yes, I was talking about the EOF-and-getchar issue.
Why would you think otherwise?

Arthur J. O'Dwyer may be under the impression that
"int cannot represent the whole range of unsigned char."
is an incorrect aphorism about C, being used to criticize the code,
rather than a direct critisism of the code.

nrk · Jan 16, 2004

Dan said:
In <[email protected]> nrk

By your logic, all signed types would have the same number of value bits
and so on...

No, because the section discussing limits.h will tell me otherwise. But I
see your point (which I've already conceded).

If there is no wording prohibiting this, an implementor is free to do it.
And I can find no such wording.

Precisely the point that I missed. This is a good lesson for me to learn as
far as interpreting the standard goes.

I sincerely hope there aren't. But this doesn't affect the discussion
about hypothetical conforming implementations...

Yes, of course. I didn't ask that question as a challenge to your
interpretation, only to see if any weird systems out there exploit this
leeway in the standard (so I can refuse to work on such a system

.

-nrk.

Hallvard B Furuseth · Jan 16, 2004

Dan said:
Implicitly guaranteed for hosted implementations, because the library
specification relies on INT_MAX >= UCHAR_MAX and this would be impossible
if sizeof(int) == 1.

I seem to remember this is an issue where some committee members on
comp.std.c sort of admit that the standard is buggy, but that they
couldn't agree on a fix. Unless I'm thinking of problems when 'char'
not two's complement and/or int->char conversion overflow doesn't simply
silently strip the top bits.

Arthur J. O'Dwyer · Jan 16, 2004

Dan said:
Dan said:

Arthur J. O'Dwyer said:

Dan Pop wrote:

because the library
specification relies on INT_MAX >= UCHAR_MAX and this would be
impossible if sizeof(int) == 1.
But is it possible we might have [letting plain char be signed]
#define CHAR_BIT 16
#define UCHAR_MAX 65535
#define SCHAR_MIN -32767 /* !!! */
#define SCHAR_MAX 32767
#define INT_MIN -32768
#define INT_MAX 32767
#define EOF -32768

Is anything wrong, from the C standpoint, with these definitions?

Yes, for a hosted implementation:
int cannot represent the whole range of unsigned char.

Okay, then I just don't see why you say that. I thought you were
talking about the EOF-and-getchar issue, but you're not. What *are*
you referring to? Chapter and verse would probably help.

Click to expand...

Yes, I was talking about the EOF-and-getchar issue.
Why would you think otherwise?

Click to expand...

Arthur J. O'Dwyer may be under the impression that

I hereby give you permission to call me by my first name only. ;-)

"int cannot represent the whole range of unsigned char."
is an incorrect aphorism about C, being used to criticize the code,
rather than a direct criticism of the code.

And I have no idea what you mean by that, so I'll leave it for the
moment. However, re: Dan's reply: getchar() returns either a 'char'
value, cast to 'int', or it returns EOF, which is a negative 'int'
value unequal to any 'char' value. Right?
My #definitions above provide exactly enough numbers to do this
job: the range of 'char', which is signed, goes from -32767 to 32767,
and EOF is the 'int' value -32768. So if you were talking only about
the "EOF-and-getchar issue," you were wrong, AFAICT.

However, since I posted that message I noticed a post elsethread
talking about the <ctype.h> functions, which expect to be passed an
'unsigned char' value, cast to 'int'. That complicates things, or
^^^^^^^^
so I thought... but now I'm not so sure about that, either. I think
I'm really going to need the C&V here, or you're going to have to
show me a piece of code that pokes large holes in my #definitions.

-Arthur

ark · Jan 16, 2004

Implicitly guaranteed for hosted implementations, because the library
specification relies on INT_MAX >= UCHAR_MAX and this would be impossible
if sizeof(int) == 1. Since LONG_MAX cannot be lower than INT_MAX,
sizeof(long) cannot be 1, either, on a hosted implementation.

<snip>

Would anything be wrong with intentional under-using the potential range
#define CHAR_BIT 1024
#define UCHAR_MAX 255
#define SCHAR_MIN -127
.... etc ...
#define <under-used limits for int, long ...>

Thanks again,
Ark

Arthur J. O'Dwyer · Jan 16, 2004

Would anything be wrong with intentional under-using the potential range
#define CHAR_BIT 1024
#define UCHAR_MAX 255

Yes. 'unsigned char' must use a pure binary representation, and
may not contain any padding bits.

-Arthur

C99 integer types	24	Jul 29, 2012
Why sizeof(main) = 1?	8	Dec 17, 2012
condition true or false? -> (-1 < sizeof("test"))	83	May 17, 2012
[MUDFLAP] Is sizeof(ARRAY[0]) equivalent to sizeof(*ARRAY) ?	46	Jan 9, 2013
Setting array size with a variable - What does the C compiler do?	3	Feb 25, 2022
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
integral promotion., sign extension	25	Jun 1, 2010
[TinyCC] sizeof of element of struct returned by function	18	Feb 20, 2013

sizeof C integral types

Dan Pop

pete

Dan Pop

Dan Pop

nrk

Dan Pop

pete

nrk

Arthur J. O'Dwyer

glen herrmannsfeldt

Dan Pop

Dan Pop

Dan Pop

Hallvard B Furuseth

pete

nrk

Hallvard B Furuseth

Arthur J. O'Dwyer

ark

Arthur J. O'Dwyer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads