A question regarding Q20.1 from c-faq.com

jameskuyper · Dec 19, 2007

Mark said:
Smiley noted. But since some functions are obligated to use EOF to
indicate an error, its completely correct to check for that return.

I would disagree; for the small set of routines where EOF is returned
through the same channel as valid characters, relying solely upon an
EOF value to diagnose an error/eof condition is the wrong approach,
unless the program tests to make sure that it does so only if
UCHAR_MAX < INT_MAX.

If in some implementations a programmer can't distinguish the error
return from a non-error return, ...

Since it's always possible to distinguish an error return from a non-
error return by use of ferr() and feof(), that's never an issue.

... code which is completely conforming will
malfunction. ...

There's no category named "completely conforming" defined by the
standard. The standard provides only two code-conformance categories;
one of which ("conforming code") is far too broad to be of any use
whatsoever, other than being a political compromise that allowed the
standard to be approved. The other category ("strictly conforming
code") is far too strict to describe any significant fraction of real-
world programs; it's useful for discussing the meaning of the
standard's requirements, but not for much else. However, what we're
doing right now is discussing the meaning of the standard's
requirements, so that's the term I'll assume you're referring to.

A program that is "strictly conforming", is prohibited from producing
output which depends upon implementation-defined behavior. Code such
as you describe has behavior which depends upon whether or not
UCHAR_MAX > INT_MAX, which is implementation-defined. If that behavior
has any visible effect on the output of the program, then the code is
not strictly conforming.

... My POV is that if an implementation can't translate and
_execute_ a conforming programme correctly, it can't be a properly
conforming implementation.

I believe that a conforming implementation of C with UCHAR_MAX >
INT_MAX could translate and execute a program such as you describe
correctly; it wouldn't behave as intended by the developer, but it
would behave in a manner which is completely consistent with the
standard's requirements for the behavior of such code.

jameskuyper · Dec 19, 2007

Mark said:
Keith said:

Mark McIntyre said:

You said

If they're different sizes, then they can't be compatible.

Click to expand...

[I think that was Peter Nilsson.]

This is clearly incorrect.

Click to expand...

No, it's clearly correct.

Click to expand...

I still disagree...

The standard has a fairly strict definition

of what it means for two types to be "compatible".

Click to expand...

Sure, but in those terms the entire discussion becomes meaningless,
since "Two types have compatible type if their types are the same."

The standard has a far more complicated definition of "compatible
types" than that one. However, despite the complications, it is still
impossible for compatible types to have different sizes.

The context of the discussion seemed different to merely a statement of
the words of the standard.

The first reference to compatibility along this thread was when it was
incorrectly given as a reason why a double value could not be assigned
to an object of pointer-to-double type. While it is true that those
types are not compatible, and it's also true that if they were
compatible then the assignment wouldn't be a problem, the fundamental
problem isn't about compatibility. It's the fact that such code
violates the constraints listed in section 6.5.16.1p1. Those
constraints require compatibility only when both the left and right
operands have pointer type, or both have structure types; neither case
applies to this code.

Peter Nilsson · Dec 19, 2007

James Kuyper said:
On what grounds?

Apart from the obvious PITA associated with EOF becoming
an inband signal, there's...

"The header <ctype.h> declares several functions useful
for classifying and mapping characters. In all cases the
argument is an int, the value of which shall be
representable as an unsigned char... [or be EOF]

The whole header is effectively unusable if you allow
UCHAR_MAX > INT_MAX.

I acknowledge that the standard _can_ be interpreted
alternatively as allowed it (at a great cost to code
clarity for maximally portable code.) However I think
the standard is defective in that regard.

I won't restart debates that have occured many times
before in comp.std.c. I'll happily retract the earlier
statement if you'll let me continue thinking that these
implementations are nothing more than figments of over-
active imaginations. ;-)

Peter Nilsson · Dec 19, 2007

santosh said:
But for C89, a strictly...

Quite right. Thank you.

James Kuyper · Dec 19, 2007

Peter Nilsson wrote:
....

Apart from the obvious PITA associated with EOF becoming
an inband signal, there's...

"The header <ctype.h> declares several functions useful
for classifying and mapping characters. In all cases the
argument is an int, the value of which shall be
representable as an unsigned char... [or be EOF]

The whole header is effectively unusable if you allow
UCHAR_MAX > INT_MAX.

The problem isn't the ctype() functions. The standard doesn't define
what any of those functions is supposed to return when called with a
value of EOF, it only says that they must accept it. Therefore, strictly
conforming code can't make any useful assumptions about the behavior of
those functions based upon the idea that EOF cannot be a valid character
value.

The problem is the getc() loop which terminates unconditionally when
getc()==EOF; and it's just as much of a problem no matter what the rest
of the loop does, whether or not it calls any of the <ctype.h>
functions. There's a lot of code written this way, and it would need to
be rewritten to use feof() and ferror() if it's to be portable to
systems where UCHAR_MAX>INT_MAX.

The programs I'm responsible for almost never use the
character-at-a-time functions for I/O. I realize that many programs do,
but in order for you to declare the entire header "unusable", it would
have to be an equally great problem for the other programs who don't use
those functions. It isn't. All issues related to having getc() return a
value of EOF are handled (presumably correctly) inside of fread() or
fscanf(), where we don't have to worry about them. If I fill a buffer by
calling one of those functions, and it contains a character which
compares equal to EOF, I know with a certainty that it represents a
valid character with the same value, and indicate neither an end of
file, nor an I/O error.

I won't restart debates that have occured many times
before in comp.std.c. I'll happily retract the earlier
statement if you'll let me continue thinking that these
implementations are nothing more than figments of over-
active imaginations. ;-)

Implementations where all integer types up to and including 'int' are
the same size are, I believe, a reality, though scarce, but I can't give
you any details because I don't know them.

Mark McIntyre · Dec 19, 2007

I would disagree; for the small set of routines where EOF is returned
through the same channel as valid characters, relying solely upon an
EOF value to diagnose an error/eof condition is the wrong approach,

Why? The standard requires the functions to return EOF on an error
condition. It seems to me entirely reasonable for a programmer expect to
be able to rely on a guaranteed behaviour.

Since it's always possible to distinguish an error return from a non-
error return by use of ferr() and feof(), that's never an issue.

Sorry but feof() or ferr() are irrelevant - sure, there are other
methods of detecting the error but thats not the point.

There's no category named "completely conforming" defined by the
standard.

Agreed. And since again this renders the *entire* thread utterly
pointless its useless information.

We're back to the "since ISO allows extensions, any code which contains
c-like syntax is C and discussable here" argument. I don't accept that
argument, and I don't accept we can abandon commmon sense when applying
the standard to define "conforming".

I believe that a conforming implementation of C with UCHAR_MAX >
INT_MAX could translate and execute a program such as you describe
correctly; it wouldn't behave as intended by the developer,

.... or as a reasonable person could expect it to, given the mandatory
requirements placed upon certain functions.

James Kuyper · Dec 19, 2007

Mark said:
Why? The standard requires the functions to return EOF on an error
condition. It seems to me entirely reasonable for a programmer expect to
be able to rely on a guaranteed behaviour.

Yes. However, the standard does not prohibit getc() from returning EOF
when there is no end of file and no error condition. A programmer who
assumes that a value of EOF indicates only those two possibilities is
relying upon a guarantee not actually provided by the standard (though
it is something that is actually true on most real implementations,
explaining the popularity of that assumption).

Sorry but feof() or ferr() are irrelevant - sure, there are other
methods of detecting the error but thats not the point.

It is precisely the point. An EOF returned by getc() can indicate either
an end of file, an I/O error, or (if UINT_MAX > INT_MAX) a successful
read of a byte with an unsigned char value that converts to the same
'int' value as EOF. Only feof() and ferr() can be used to disambiguate
these three possibilities.

... or as a reasonable person could expect it to, given the mandatory
requirements placed upon certain functions.

What mandatory requirement does the standard impose which prohibits
getc() from returning EOF when it successfully reads a byte from the file?

Robert Latest · Dec 19, 2007

Geoff said:
size_t is unsigned int?

You don't know, and don't need to know. Just cast it to something you know,
and then use the proper conversion char for that.

robert

Robert Latest · Dec 19, 2007

Geoff said:
Ah! Well casting makes everything alright then. Why go to all that
trouble? Why not just use %i or even better, %u?

Because you can't know what's behind size_t. You need the cast.

And why has anyone not noticed that sizeof 'A' returned 4, not 1?

Well, you seem to have tested your code on a machine that has 32-bit
integers. That makes four bytes. What's there to notice?

robert

Richard Heathfield · Dec 19, 2007

Mark McIntyre said:

Why?

He's already explained that.

The standard requires the functions to return EOF on an error
condition. It seems to me entirely reasonable for a programmer expect to
be able to rely on a guaranteed behaviour.

Yes, you're right that it's guaranteed that the functions return EOF on an
error condition. But what James is telling you is that the Standard does
*not* guarantee that EOF *cannot* be returned under other circumstances
too - no matter how much you might wish it.

<snip>

santosh · Dec 19, 2007

James said:
Yes. However, the standard does not prohibit getc() from returning EOF
when there is no end of file and no error condition. A programmer who
assumes that a value of EOF indicates only those two possibilities is
relying upon a guarantee not actually provided by the standard (though
it is something that is actually true on most real implementations,
explaining the popularity of that assumption).

The intent of the Standard was for an EOF return to indicate end-of-file
or error.
7.19.1 (3)

EOF

which expands to an integer constant expression, with type int and a
negative value, that is returned by several functions to indicate
end-of-file, that is, no more input from a stream;
<<<<<

The only sensible way to deal with character based I/O on systems where
UCHAR_MAX > INT_MAX is to use functions customised for the platform.

What mandatory requirement does the standard impose which prohibits
getc() from returning EOF when it successfully reads a byte from the
file?

As far as I can tell, the Standard doesn't explicitly say so, but it's
strongly implied that EOF is meant to indicate end-of-file or error.

If this is not so, then why have these functions return an int value at
all? They could return an ordinary unsigned char or char value and the
programmer could be told to verify feof and ferror after each and every
read.

EOF would become essentially meaningless on such systems.

Philip Potter · Dec 19, 2007

Mark said:
Why? The standard requires the functions to return EOF on an error
condition. It seems to me entirely reasonable for a programmer expect to
be able to rely on a guaranteed behaviour.

Your logic is broken. An error forces the function to return EOF, but we
cannot deduce from this that a function which returned EOF encountered
an error. One must first prove that there is no other way the function
could have returned EOF.

Philip

santosh · Dec 19, 2007

Robert said:
Because you can't know what's behind size_t. You need the cast.

Well, you seem to have tested your code on a machine that has 32-bit
integers. That makes four bytes. What's there to notice?

No. This assumes that a byte is exactly eight bits, which is invalid in
C.

Richard Heathfield · Dec 19, 2007

santosh said:

No. This assumes that a byte is exactly eight bits, which is invalid in
C.

Clarifying: the *assumption* is invalid. Implementations are, however,
entirely free to restrict themselves to a mere eight bits per byte if they
so choose, and indeed many do.

James Kuyper · Dec 19, 2007

santosh said:
The intent of the Standard was for an EOF return to indicate end-of-file
or error.

There's considerable reason to believe that C89 was written without
considering the possibility of UCHAR_MAX > INT_MAX. However, I know for
a fact that this issue was discussed on comp.std.c before C99 was
approved, and that several actual committee members participated in that
discussion. Therefore, I believe that this possibility was considered
when C99 was written. Yet, neither C99 nor any of the TCs contains any
changes that clarify this issue. Therefore, I believe that the standard,
as currently written, does indeed reflect the intent of the committee.

As currently written, there's nothing in the standard about EOF
exclusively indicating end-of-file or an error, only wording indicating
that and end-of-file or error will cause a return value of EOF.

7.19.1 (3)

EOF

which expands to an integer constant expression, with type int and a
negative value, that is returned by several functions to indicate
end-of-file, that is, no more input from a stream;

Sure, but does that mean that value of EOF must only mean end-of-file?
The very same functions we're talking about (and many others as well)
also use EOF to indicate an I/O error, not and end-of-file. The wctob()
function is defined as returning EOF for reasons that have nothing to do
with I/O.

Taken to it's extreme, your argument would imply that no function is
allowed to return a value equal to the value of EOF unless it indicated
end-of-file. That would be a rather obscure constraint to apply to, for
instance, the atoi() function.

....

....
If this is not so, then why have these functions return an int value at
all? They could return an ordinary unsigned char or char value and the
programmer could be told to verify feof and ferror after each and every
read.

EOF would become essentially meaningless on such systems.

C was originally developed on a system where all the char types were 8
bits and int was larger (probably 16 bits, but I don't know for sure).
The interface for getc() was chosen when similar systems were the only
ones being considered. I doubt very much that C89 was written with the
possibility of UCHAR_MAX>INT_MAX in mind. However, it didn't prohibit
that possibility, and that possibility can be worked-around, and C99 was
approved without any changes addressing that issue, which had been
brought up by that time.

Keith Thompson · Dec 19, 2007

santosh said:
The intent of the Standard was for an EOF return to indicate end-of-file
or error.

7.19.1 (3)

EOF

which expands to an integer constant expression, with type int and a
negative value, that is returned by several functions to indicate
end-of-file, that is, no more input from a stream;
<<<<<

The only sensible way to deal with character based I/O on systems where
UCHAR_MAX > INT_MAX is to use functions customised for the platform.

[...]

But such customized functions aren't necessary. The usual idiom of
``while ((c = getchar()) != EOF)'' won't work properly on such
systems, but the workaround (checking feof() and ferror() after
getting an EOF result) solves the problem for exotic systems with
UCHAR_MAX > INT_MAX *and* works properly on ordinary systems.

It certainly would have been nice if the common idiom, which works on
almost all systems, were actually guaranteed by the standard to work
on all systems, but we're stuck with the current situation.

And I suspect that there are no *hosted* implementations with
UCHAR_MAX > INT_MAX. Non-hosted (freestanding) implementations aren't
even required to support <stdio.h>, and if they do support it it
needn't conform to the standard's requirements (though as a QoI issue
it should). Adding a requirement for sizeof(int) > 1 for hosted
implementations would largely solve the problem and *probably*
wouldn't affect any real-world implementations.

Richard Tobin · Dec 19, 2007

Keith Thompson said:
But such customized functions aren't necessary. The usual idiom of
``while ((c = getchar()) != EOF)'' won't work properly on such
systems, but the workaround (checking feof() and ferror() after
getting an EOF result) solves the problem for exotic systems with
UCHAR_MAX > INT_MAX *and* works properly on ordinary systems.

Can someone familiar with such an exotic system confirm that getchar()
really does read in a value that can take on all the values up to
UCHAR_MAX? Rather than, say, reading an 8-but external value even
though char is 32 bits. Or are such systems just hypothetical?

-- Richard

Peter Nilsson · Dec 20, 2007

James Kuyper said:
Peter said:

Apart from the obvious PITA associated with EOF becoming
an inband signal, there's...

Click to expand...

"The header <ctype.h> declares several functions useful
for classifying and mapping characters. In all cases the
argument is an int, the value of which shall be
representable as an unsigned char... [or be EOF]

Click to expand...

The whole header is effectively unusable if you allow
UCHAR_MAX > INT_MAX.

Click to expand...

The problem isn't the ctype() functions. The standard
doesn't define what any of those functions is supposed to
return when called with a value of EOF, it only says that
they must accept it. ...

You're missing the point. None of the extended characters
in the range (INT_MAX..UCHAR_MAX] can passed to these
functions. In other words, you can't pass arbitrary input
data to these functions.

Chris Torek · Dec 20, 2007

Can someone familiar with such an exotic system confirm that getchar()
really does read in a value that can take on all the values up to
UCHAR_MAX? Rather than, say, reading an 8-but external value even
though char is 32 bits. Or are such systems just hypothetical?

I believe they are real. Jack Klein in particular has mentioned
a TI C system with 32-bit "char". Others have mentioned systems
with 16-bit "char", though of course, if they have 16-bit "char"
and 32-bit "int", the problem does not arise.

I believe they are (all) also "standalone" rather than "hosted"
implementations, although I have not used any of them myself. As
such, they need not provide a <stdio.h> at all. I suspect that
at least some of them do, however -- and I suspect further that
on any system with sizeof(int) == 1, that getc() returns a value
limited in range to [0..255] (or at most [0..1023], when talking
to a BBN "C machine" [%]), so that checking for getc() == EOF
suffices in all cases.

[% These machines, which implemented much of what eventually became
the Internet, were called "C machines" because they were *designed*
to run C code. They had 10-bit bytes. As far as I know, this was
the first hardware ever designed with C specifically in mind. One
might thus say that the C language "prefers" 10-bit bytes, and all
the 8-bit byte machines that exist today are inferior.

]

Richard Tobin · Dec 20, 2007

Can someone familiar with such an exotic system confirm that getchar()
really does read in a value that can take on all the values up to
UCHAR_MAX? Rather than, say, reading an 8-but external value even
though char is 32 bits. Or are such systems just hypothetical?

[/QUOTE]

I believe they are (all) also "standalone" rather than "hosted"
implementations, although I have not used any of them myself. As
such, they need not provide a <stdio.h> at all. I suspect that
at least some of them do, however -- and I suspect further that
on any system with sizeof(int) == 1, that getc() returns a value
limited in range to [0..255] (or at most [0..1023], when talking
to a BBN "C machine" [%]), so that checking for getc() == EOF
suffices in all cases.

That's what I was suggesting.

Presumably such a system in non-conformant (except in that it doesn't
have to provide <stdio.h> at all), because if fread() etc work as if
through getc(), then

int i = large_number;
fwrite(&i, sizeof(int), 1, ...);
...
fread(&i, sizeof(int), 1, ...);

will truncate the int value.

-- Richard

getting -0.0 output ?	4	Jun 11, 2008
Is this code correct ?	11	Feb 28, 2008
[C language] Issue in the Lotka-Volterra model.	0	Jun 28, 2023
Custom matrix multiplication produces different results to glm	0	Sep 16, 2023
Python trig precision problem	4	May 18, 2006
silly Math question from Fortran programmer	6	Jan 3, 2008
Function is not worked in C	2	Jun 27, 2023
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023

A question regarding Q20.1 from c-faq.com

jameskuyper

jameskuyper

Peter Nilsson

Peter Nilsson

James Kuyper

Mark McIntyre

James Kuyper

Robert Latest

Robert Latest

Richard Heathfield

santosh

Philip Potter

santosh

Richard Heathfield

James Kuyper

Keith Thompson

Richard Tobin

Peter Nilsson

Chris Torek

Richard Tobin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads