May fgetc() and friends return 163? Or UCHAR_MAX?

  • Thread starter Christopher Benson-Manica
  • Start date
C

Christopher Benson-Manica

In a thread from substantially earlier this week,

Harald van D?k said:
getchar does not work with plain chars, it works with unsigned chars. 163
fits just fine in an unsigned char, so getchar is allowed to return 163.

Being rather pendantic, I decided to try to verify whether this was
true. I would appreciate knowing whether my reading of the Standard
is correct.

7.19.7.1 (as we all know) states that fgetc() (and thus its friends)
"obtains [a] character as an unsigned char converted to an int".
There is nothing in the Standard (that I was able to find) which
states that sizeof(int) may not be 1, so it occurred to me to ask, "Is
163 always representable as a signed int if sizeof(int) is 1?"
5.2.4.2.1 states that INT_MAX may not be less than 32767, so the
answer to that question appears to be "yes".

On the other hand, I do not see anything in 5.2.4.2.1 which requires
that UCHAR_MAX not be greater than INT_MAX - which indeed it must be,
if sizeof(int) == 1, correct? In such a case, fgetc() may return
UCHAR_MAX (right?), and so either fgetc() must work behind-the-scenes
magic to return a signed integer representing UCHAR_MAX, or invoke UB
by overflowing the signed type int. Both of these alternatives seem
ridiculous to me, so what am I missing?
 
H

Harald van =?UTF-8?B?RMSzaw==?=

Christopher said:
In a thread from substantially earlier this week,

Harald van D?k said:
getchar does not work with plain chars, it works with unsigned chars. 163
fits just fine in an unsigned char, so getchar is allowed to return 163.

Being rather pendantic, I decided to try to verify whether this was
true. I would appreciate knowing whether my reading of the Standard
is correct.

7.19.7.1 (as we all know) states that fgetc() (and thus its friends)
"obtains [a] character as an unsigned char converted to an int".
There is nothing in the Standard (that I was able to find) which
states that sizeof(int) may not be 1, so it occurred to me to ask, "Is
163 always representable as a signed int if sizeof(int) is 1?"
5.2.4.2.1 states that INT_MAX may not be less than 32767, so the
answer to that question appears to be "yes".
Right.

On the other hand, I do not see anything in 5.2.4.2.1 which requires
that UCHAR_MAX not be greater than INT_MAX - which indeed it must be,
if sizeof(int) == 1, correct?

Correct. signed int has at least INT_MAX - INT_MIN + 1 distinct
representations, and if sizeof(int) == 1, that means unsigned char must be
capable of storing at least that many values. However, it is allowed to be
capable of storing even more.
In such a case, fgetc() may return
UCHAR_MAX (right?), and so either fgetc() must work behind-the-scenes
magic to return a signed integer representing UCHAR_MAX, or invoke UB
by overflowing the signed type int. Both of these alternatives seem
ridiculous to me, so what am I missing?

The behaviour is not undefined for integer conversions of out-of-range
values, not even for the signed types. Either the result is
implementation-defined, or an implementation-defined signal is raised, see
6.3.1.3p3. The result is the same: fgetc need not or cannot be meaningful.

However, 7.19.2p3 states that

"A binary stream is an ordered sequence of characters that can transparently
record internal data. Data read in from a binary stream shall compare equal
to the data that were earlier written out to that stream, under the same
implementation. Such a stream may, however, have an implementation-defined
number of null characters appended to the end of the stream."

This requirement cannot be met by an implementation where the conversion of
out-of-range values results in a signal, or where the conversion of
out-of-range values cannot be reverted. So by my reading, only freestanding
implementations that do not provide the standard I/O functions at all are
allowed to define unsigned char and int in such ways.
 
C

Christopher Benson-Manica

Harald van D?k said:
The behaviour is not undefined for integer conversions of out-of-range
values, not even for the signed types. Either the result is
implementation-defined, or an implementation-defined signal is raised, see
6.3.1.3p3. The result is the same: fgetc need not or cannot be meaningful.

The language in n869 does not mention signals, but I assume that is a
difference between the draft and the actual standards.
However, 7.19.2p3 states that
"A binary stream is an ordered sequence of characters that can transparently
record internal data. Data read in from a binary stream shall compare equal
to the data that were earlier written out to that stream, under the same
implementation. Such a stream may, however, have an implementation-defined
number of null characters appended to the end of the stream."
This requirement cannot be met by an implementation where the conversion of
out-of-range values results in a signal, or where the conversion of
out-of-range values cannot be reverted.

Yes, that makes sense, although on further reading, it seems that an
implementation could work internal magic to establish a one-to-one
relationship between all unsigned char values from 0 to UCHAR_MAX and all
signed int values from INT_MIN to INT_MAX. That would mean that an
implementation would have to ensure that there were at least as many
valid signed int values as unsigned char values, with an extra signed
int value representing EOF. It does sound like a tall order for an
implementation where sizeof(int) == 1, but possible on a DS9K level.
 
H

Harald van =?UTF-8?B?RMSzaw==?=

Christopher said:
The language in n869 does not mention signals, but I assume that is a
difference between the draft and the actual standards.

I don't remember when it was added. I believe it's part of C99, but I may be
misremembering. I'm reading from n1124.
Yes, that makes sense, although on further reading, it seems that an
implementation could work internal magic to establish a one-to-one
relationship between all unsigned char values from 0 to UCHAR_MAX and all
signed int values from INT_MIN to INT_MAX. That would mean that an
implementation would have to ensure that there were at least as many
valid signed int values as unsigned char values, with an extra signed
int value representing EOF. It does sound like a tall order for an
implementation where sizeof(int) == 1, but possible on a DS9K level.

EOF need not be distinct from any valid character converted to int. Though
most code doesn't, strictly speaking, after reading EOF, you should call
feof() and ferror() to check whether more characters can be read. Note that
this is necessary even for non-DS9K systems when using fgetwc() e.a.
 
M

Malcolm McLean

Christopher Benson-Manica said:
In a thread from substantially earlier this week,

Harald van D?k said:
getchar does not work with plain chars, it works with unsigned chars. 163
fits just fine in an unsigned char, so getchar is allowed to return 163.

Being rather pendantic, I decided to try to verify whether this was
true. I would appreciate knowing whether my reading of the Standard
is correct.

7.19.7.1 (as we all know) states that fgetc() (and thus its friends)
"obtains [a] character as an unsigned char converted to an int".
There is nothing in the Standard (that I was able to find) which
states that sizeof(int) may not be 1, so it occurred to me to ask, "Is
163 always representable as a signed int if sizeof(int) is 1?"
5.2.4.2.1 states that INT_MAX may not be less than 32767, so the
answer to that question appears to be "yes".

On the other hand, I do not see anything in 5.2.4.2.1 which requires
that UCHAR_MAX not be greater than INT_MAX - which indeed it must be,
if sizeof(int) == 1, correct? In such a case, fgetc() may return
UCHAR_MAX (right?), and so either fgetc() must work behind-the-scenes
magic to return a signed integer representing UCHAR_MAX, or invoke UB
by overflowing the signed type int. Both of these alternatives seem
ridiculous to me, so what am I missing?
Yes. That's a known glitch. A system that makes sizeof(int) == 1 has no way
of returning EOF and distinguishing it from a legal value.
In practise files are probably read as octets. Which is OK but breaks
fputc(), but only in binary mode.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top