3-byte ints

K

Keith Thompson

Jack Klein said:
These are pretty much all free-standing environments, it is not really
possible to provide all the features of a hosted environment on a
platform where char and int have the same representation. It is
impossible to provide a getchar() function which complies with the
standard, namely that it returns all possible values of char and also
EOF, which is an int different from any possible char value.

I don't see where the standard requires that EOF has to be different
from any possible char value.

If EOF is a valid char value, you could just check the feof()
function. For example, the following program should copy stdin to
stdout on such an implementation:

#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {
putchar(c);
}
return 0;
}

The comparison to EOF could be omitted, but it might save the overhead
of some function calls.
 
B

Barry Schwarz

(It's actually an unsigned char converted to int, not plain char).

However, are you sure it has to be able to return all possible unsigned
chars? Isn't it possible for unsigned char to have 65536 possible
values, but there be only, say, 140 distinct _characters_ which the
string, input and output functions deal with? Does every possible
unsigned char value have to represent a character?
Obviously not since in the ASCII character set, values between 0x00
and 0x1f don't.


<<Remove the del for email>>
 
K

Kevin Easton

Keith Thompson said:
I don't see where the standard requires that EOF has to be different
from any possible char value.

If EOF is a valid char value, you could just check the feof()
function. For example, the following program should copy stdin to
stdout on such an implementation:

#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {

ITYM

c != EOF || (!feof(stdin) && !ferror(stdin))

The real problem seems to be that getchar() is supposed to return an int
with a value in the range of unsigned char, or EOF. Returning any
negative non-EOF value is clearly out (not in the range of unsigned
char), so it'd have to map any characters with values between INT_MAX + 1
and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
all already taken by other character values.

So it's implementable, but only in a way that loses information about
which character was actually read. Not really what you'd call
a practical way to write an input function.

- kevin.
 
K

Keith Thompson

Kevin Easton said:
ITYM

c != EOF || (!feof(stdin) && !ferror(stdin))
Right.

The real problem seems to be that getchar() is supposed to return an int
with a value in the range of unsigned char, or EOF. Returning any
negative non-EOF value is clearly out (not in the range of unsigned
char), so it'd have to map any characters with values between INT_MAX + 1
and UCHAR_MAX to some value between 0 and INT_MAX inclusive, which are
all already taken by other character values.

So it's implementable, but only in a way that loses information about
which character was actually read. Not really what you'd call
a practical way to write an input function.

What's wrong with getchar() returning a negative non-EOF value?

getchar() is equivalent to getc() with the argument stdin; getc() is
equivalent to fgetc(), except that if it's a macro it can evaluate its
argument more than once.

The description of fgetc() says:

If the end-of-file indicator for the input stream pointed to by
stream is not set and a next character is present, the fgetc
function obtains that character as an unsigned char converted to
an int and advances the associated file position indicator for the
stream (if defined).

Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

Having sizeof(int)==1 breaks the common "while ((c=getchar()) != EOF)"
idiom, but I don't see that it breaks anything else -- which argues
that the common idiom is non-portable.
 
P

Peter Nilsson

Keith Thompson said:
What's wrong with getchar() returning a negative non-EOF value?

getchar() is equivalent to getc() with the argument stdin; getc() is
equivalent to fgetc(), except that if it's a macro it can evaluate its
argument more than once.

The description of fgetc() says:

If the end-of-file indicator for the input stream pointed to by
stream is not set and a next character is present, the fgetc
function obtains that character as an unsigned char converted to
an int and advances the associated file position indicator for the
stream (if defined).

Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

Having sizeof(int)==1 breaks the common "while ((c=getchar()) != EOF)"
idiom, but I don't see that it breaks anything else -- which argues
that the common idiom is non-portable.

And it always has been. [Under C99 it is even worse since the conversion of
an unsigned char to signed int can theoretically raise an implementation
defined signal! Thus reducing getc to the level of gets.]

The unwritten assumption about hosted implementations is naturally that
UCHAR_MAX <= INT_MAX. Why the standards never made this normative seems a
mystery to lesser minds like my own.
 
B

Barry Schwarz

I don't see where the standard requires that EOF has to be different
from any possible char value.

EOF must have type int and be negative. On those systems where char
is unsigned, it obviously cannot be a char value.

It could be a valid char on a system where char is signed. But, as
explained below, none of the normal character I/O functions can return
any negative value other than for end of file or I/O error
If EOF is a valid char value, you could just check the feof()
function. For example, the following program should copy stdin to
stdout on such an implementation:

#include <stdio.h>
int main(void)
{
int c;
while (c = getchar(), c != EOF && !feof(stdin) && !ferror(stdin)) {

Coding problem here:

If c == EOF, then the remaining to expressions following the
first && will never be evaluated due to && short circuit. The while
will evaluate to false and the loop terminated immediately, regardless
of the status of feof and ferror. Consequently, you don't know if you
have hit the real EOF or the merely a character that looks like it.

If c != EOF, you are pretty much guaranteed that !feof() and
!ferror will both be true also.

Therefore, the expression c != EOF defeats the purpose of what
you want the expression after the comma to do.

Logic problem also:

getchar "returns the next character of [stdin] as an unsigned
char (converted to an int), or an EOF if end of file or error occurs"
(from K&R2, B1.4). Since an unsigned int cannot be negative and EOF
has to be, getchar cannot return EOF for a normal character.
putchar(c);
}
return 0;
}

The comparison to EOF could be omitted, but it might save the overhead
of some function calls.


<<Remove the del for email>>
 
K

Kevin Easton

Keith Thompson said:
What's wrong with getchar() returning a negative non-EOF value?

getchar() is equivalent to getc() with the argument stdin; getc() is
equivalent to fgetc(), except that if it's a macro it can evaluate its
argument more than once.

The description of fgetc() says:

If the end-of-file indicator for the input stream pointed to by
stream is not set and a next character is present, the fgetc
function obtains that character as an unsigned char converted to
an int and advances the associated file position indicator for the
stream (if defined).

OK, you're right - it just has to be converted to an int.
Assume CHAR_BIT==16 and sizeof(int)==1. If the next input character
has the value, say, 60000, it's converted to the int value -5536 and
returned.

....but the conversion to int that takes place is in no way defined (it
just says "as an unsigned char converted to int") - so you don't know
how it'll be converted. It doesn't say it has to be a reversible
conversion, or even a stable one.

Perhaps you could read the requirement that anything written to a binary
stream will compare equal to the original value when it's read back as
meaning that the unsigned char / int conversions mentioned in the
character reading and writing functions have to be stable, reversible
and the inverse of each other.

You still break ungetc() if a valid character maps to EOF, since you
couldn't ungetc that character:

4 If the value of c equals that of the macro EOF, the operation fails
and the input stream is unchanged.

- Kevin.
 
K

Keith Thompson

Kevin Easton said:
...but the conversion to int that takes place is in no way defined (it
just says "as an unsigned char converted to int") - so you don't know
how it'll be converted. It doesn't say it has to be a reversible
conversion, or even a stable one.

Thank you, that's the point I was missing. I had assumed (because I
didn't bother to check) that the conversion from unsigned char to int
was well-defined.
 
D

Dave Thompson

(Within a struct, shown later.)
Pretty much, assuming it's in a structure. The only things I'd say are:
It will do exactly 24-bit arithmetic, which is 3 bytes IF a byte is 8
bits, as is very common but not required. It, or rather the
"allocation unit" containing it, is very likely to occupy 32 bits or 4
usual-bytes/octets. This difference matters only if you write out
the/a containing struct to a file or over a network etc., since you
can't form (or use) a pointer to a bitfield member; or if you (need
to) care about the actual memory/bus accesses performed by the
compiled (object) form of your code when executed.
1) C90 doesn't allow anything other than "int" and "unsigned int" for
bitfield types. C99 does allow implementations to offer other types
like "unsigned long"; presumably your implementation does - it's
a common extension.
(explicitly) signed int, unsigned int, or "plain" int which unlike
non-char integer types elsewhere is not automatically signed, it is
implementation-defined as signed or unsigned. And C99 also standardly
allows _Bool (or bool with stdbool.h).

Plus _tmp already had type unsigned long.
Because it's buggy? Your code looks fine to me.
Unless perhaps the OP (or someone) did <GACK!> #define long int </>
since you are using gcc, check the preprocessor output with -E .


- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,185
Latest member
GluceaReviews

Latest Threads

Top