3-byte ints

Keith Thompson · Sep 27, 2003

Jack Klein said:
These are pretty much all free-standing environments, it is not really
possible to provide all the features of a hosted environment on a
platform where char and int have the same representation. It is
impossible to provide a getchar() function which complies with the
standard, namely that it returns all possible values of char and also
EOF, which is an int different from any possible char value.

I don't see where the standard requires that EOF has to be different
from any possible char value.

If EOF is a valid char value, you could just check the feof()
function. For example, the following program should copy stdin to
stdout on such an implementation:

#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {
putchar(c);
}
return 0;
}

The comparison to EOF could be omitted, but it might save the overhead
of some function calls.

Barry Schwarz · Sep 27, 2003

(It's actually an unsigned char converted to int, not plain char).

However, are you sure it has to be able to return all possible unsigned
chars? Isn't it possible for unsigned char to have 65536 possible
values, but there be only, say, 140 distinct _characters_ which the
string, input and output functions deal with? Does every possible
unsigned char value have to represent a character?

Obviously not since in the ASCII character set, values between 0x00
and 0x1f don't.

<<Remove the del for email>>

Kevin Easton · Sep 27, 2003

Keith Thompson said:
I don't see where the standard requires that EOF has to be different
from any possible char value.

If EOF is a valid char value, you could just check the feof()
function. For example, the following program should copy stdin to
stdout on such an implementation:

#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {

ITYM

c != EOF || (!feof(stdin) && !ferror(stdin))

The real problem seems to be that getchar() is supposed to return an int
with a value in the range of unsigned char, or EOF. Returning any
negative non-EOF value is clearly out (not in the range of unsigned
char), so it'd have to map any characters with values between INT_MAX + 1
and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
all already taken by other character values.

So it's implementable, but only in a way that loses information about
which character was actually read. Not really what you'd call
a practical way to write an input function.

- kevin.

Keith Thompson · Sep 27, 2003

Kevin Easton said:
ITYM

c != EOF || (!feof(stdin) && !ferror(stdin))
Right.

The real problem seems to be that getchar() is supposed to return an int
with a value in the range of unsigned char, or EOF. Returning any
negative non-EOF value is clearly out (not in the range of unsigned
char), so it'd have to map any characters with values between INT_MAX + 1
and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
all already taken by other character values.

So it's implementable, but only in a way that loses information about
which character was actually read. Not really what you'd call
a practical way to write an input function.

What's wrong with getchar() returning a negative non-EOF value?

getchar() is equivalent to getc() with the argument stdin; getc() is
equivalent to fgetc(), except that if it's a macro it can evaluate its
argument more than once.

The description of fgetc() says:

If the end-of-file indicator for the input stream pointed to by
stream is not set and a next character is present, the fgetc
function obtains that character as an unsigned char converted to
an int and advances the associated file position indicator for the
stream (if defined).

Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

Having sizeof(int)==1 breaks the common "while ((c=getchar()) != EOF)"
idiom, but I don't see that it breaks anything else -- which argues
that the common idiom is non-portable.

Peter Nilsson · Sep 27, 2003

Keith Thompson said:
What's wrong with getchar() returning a negative non-EOF value?

getchar() is equivalent to getc() with the argument stdin; getc() is
equivalent to fgetc(), except that if it's a macro it can evaluate its
argument more than once.

The description of fgetc() says:

If the end-of-file indicator for the input stream pointed to by
stream is not set and a next character is present, the fgetc
function obtains that character as an unsigned char converted to
an int and advances the associated file position indicator for the
stream (if defined).

Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

Having sizeof(int)==1 breaks the common "while ((c=getchar()) != EOF)"
idiom, but I don't see that it breaks anything else -- which argues
that the common idiom is non-portable.

And it always has been. [Under C99 it is even worse since the conversion of
an unsigned char to signed int can theoretically raise an implementation
defined signal! Thus reducing getc to the level of gets.]

The unwritten assumption about hosted implementations is naturally that
UCHAR_MAX <= INT_MAX. Why the standards never made this normative seems a
mystery to lesser minds like my own.

Barry Schwarz · Sep 27, 2003

I don't see where the standard requires that EOF has to be different
from any possible char value.

EOF must have type int and be negative. On those systems where char
is unsigned, it obviously cannot be a char value.

It could be a valid char on a system where char is signed. But, as
explained below, none of the normal character I/O functions can return
any negative value other than for end of file or I/O error

If EOF is a valid char value, you could just check the feof()
function. For example, the following program should copy stdin to
stdout on such an implementation:

#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {

Coding problem here:

If c == EOF, then the remaining to expressions following the
first && will never be evaluated due to && short circuit. The while
will evaluate to false and the loop terminated immediately, regardless
of the status of feof and ferror. Consequently, you don't know if you
have hit the real EOF or the merely a character that looks like it.

If c != EOF, you are pretty much guaranteed that !feof() and
!ferror will both be true also.

Therefore, the expression c != EOF defeats the purpose of what
you want the expression after the comma to do.

Logic problem also:

getchar "returns the next character of [stdin] as an unsigned
char (converted to an int), or an EOF if end of file or error occurs"
(from K&R2, B1.4). Since an unsigned int cannot be negative and EOF
has to be, getchar cannot return EOF for a normal character.

putchar(c);
}
return 0;
}

The comparison to EOF could be omitted, but it might save the overhead
of some function calls.

<<Remove the del for email>>

Kevin Easton · Sep 27, 2003

Keith Thompson said:
What's wrong with getchar() returning a negative non-EOF value?

getchar() is equivalent to getc() with the argument stdin; getc() is
equivalent to fgetc(), except that if it's a macro it can evaluate its
argument more than once.

The description of fgetc() says:

If the end-of-file indicator for the input stream pointed to by
stream is not set and a next character is present, the fgetc
function obtains that character as an unsigned char converted to
an int and advances the associated file position indicator for the
stream (if defined).

OK, you're right - it just has to be converted to an int.

Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

....but the conversion to int that takes place is in no way defined (it
just says "as an unsigned char converted to int") - so you don't know
how it'll be converted. It doesn't say it has to be a reversible
conversion, or even a stable one.

Perhaps you could read the requirement that anything written to a binary
stream will compare equal to the original value when it's read back as
meaning that the unsigned char / int conversions mentioned in the
character reading and writing functions have to be stable, reversible
and the inverse of each other.

You still break ungetc() if a valid character maps to EOF, since you
couldn't ungetc that character:

4 If the value of c equals that of the macro EOF, the operation fails
and the input stream is unchanged.

- Kevin.

Keith Thompson · Sep 27, 2003

Kevin Easton said:
...but the conversion to int that takes place is in no way defined (it
just says "as an unsigned char converted to int") - so you don't know
how it'll be converted. It doesn't say it has to be a reversible
conversion, or even a stable one.

Thank you, that's the point I was missing. I had assumed (because I
didn't bother to check) that the conversion from unsigned char to int
was well-defined.

Dave Thompson · Sep 29, 2003

In message <[email protected]>

(Within a struct, shown later.)

Pretty much, assuming it's in a structure. The only things I'd say are:

It will do exactly 24-bit arithmetic, which is 3 bytes IF a byte is 8
bits, as is very common but not required. It, or rather the
"allocation unit" containing it, is very likely to occupy 32 bits or 4
usual-bytes/octets. This difference matters only if you write out
the/a containing struct to a file or over a network etc., since you
can't form (or use) a pointer to a bitfield member; or if you (need
to) care about the actual memory/bus accesses performed by the
compiled (object) form of your code when executed.

1) C90 doesn't allow anything other than "int" and "unsigned int" for
bitfield types. C99 does allow implementations to offer other types
like "unsigned long"; presumably your implementation does - it's
a common extension.

(explicitly) signed int, unsigned int, or "plain" int which unlike
non-char integer types elsewhere is not automatically signed, it is
implementation-defined as signed or unsigned. And C99 also standardly
allows _Bool (or bool with stdbool.h).

Plus _tmp already had type unsigned long.

Because it's buggy? Your code looks fine to me.

Unless perhaps the OP (or someone) did <GACK!> #define long int </>
since you are using gcc, check the preprocessor output with -E .

- David.Thompson1 at worldnet.att.net

Adding adressing of IPv6 to program	1	Feb 16, 2023
Exit the infinity while loop by pressing the button and continue with the switch element.	2	Apr 21, 2024
8 buttons ,3 states and PJON Arduino	0	Jan 15, 2022
Arduino Chess Clock	0	Apr 20, 2024
C99 Seg fault on while(), why ?	0	Sep 13, 2022
strtoul() behavior	39	Jan 5, 2011
Qsort() messing with my entire Code	0	Apr 25, 2022
portable method of getting alignment of an arbitrary type...	10	Mar 2, 2009

3-byte ints

Keith Thompson

Barry Schwarz

Kevin Easton

Keith Thompson

Peter Nilsson

Barry Schwarz

Kevin Easton

Keith Thompson

Dave Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads