getchar() and EOF confusion

A

arnuld

Mostly when I want to take input from stdin I use getchar() but I get this
from man page itself:

"If the integer value returned by getchar() is stored into a variable of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of type
char on widening to integer is implementation-defined"


while( EOF != (ch = getchar()) ) ....


I use it like that. Can I run into problems with that ?
 
B

Barry Schwarz

Mostly when I want to take input from stdin I use getchar() but I get this
from man page itself:

"If the integer value returned by getchar() is stored into a variable of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of type
char on widening to integer is implementation-defined"


while( EOF != (ch = getchar()) ) ....


I use it like that. Can I run into problems with that ?

getchar treats the data it obtains from the stream as unsigned. EOF
is guaranteed to be negative. Can you see where this leads?
 
D

danmath06

arnuld said:
Mostly when I want to take input from stdin I use getchar() but I get this
from man page itself:

"If the integer value returned by getchar() is stored into a variable of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of type
char on widening to integer is implementation-defined"

while( EOF != (ch = getchar()) ) ....

I use it like that. Can I run into problems with that ?

Yes, if ch is not an int. The prototype for getchar() is: "int
getchar(void);". So you should use an int to hold the return from
getchar();
 
P

Peter Nilsson

arnuld said:
Mostly when I want to take input from stdin I use
getchar() but I get this from man page itself:  

  "If the  integer value returned by getchar() is
stored into a variable of type char and then
compared against the integer constant EOF, the
   comparison may never succeed, because sign-
extension of a variable of type char on
widening to integer is implementation-defined"

The manual is poorly written. Integral promotion
is well defined and will always be value preserving
in the case of char values.

What is implementation defined is whether plain char
is signed or unsigned, but that too is mostly
incidental.
     while( EOF != (ch = getchar()) ) ....

I use it like that. Can I run into problems with that ?

Did you read the FAQ?

http://c-faq.com/stdio/getcharc.html
 
A

arnuld

..SNIP...


yes, I did but can't figure out what FAQ means:


:: Two failure modes are possible if, as in the fragment
:: above, getchar's return value is assigned to a char.


:: 1. If type char is signed, and if EOF is defined (as is usual) as -1,
:: the character with the decimal value 255 ('\377' or '\xff' in C) will
:: be sign-extended and will compare equal to EOF, prematurely terminating
:: the input. [footnote]

does it mean that if char is signed, the input if 255 will be equal to
-1, hence 255 == EOF


:: 2. If type char is unsigned, an actual EOF value will be truncated
:: (by having its higher-order bits discarded, probably resulting in
:: 255 or 0xff) and will not be recognized as EOF, resulting in
:: effectively infinite input.


it means, if char is unsigned, an input value equal to EOF, whihc is -1
will be converted to 255 ?



okay, whatever it is, why bother, just use "int ch" for getchar(), getc(),
and fgetc()
 
P

Pranav

Then does reading a character sized data into a integer type of data
variable do cause an issue in the porting of the code ??
 
N

Nate Eldredge

Pranav said:
Then does reading a character sized data into a integer type of data
variable do cause an issue in the porting of the code ??

I'm not sure I understand what you mean. Can you give an example of the
kind of code you have in mind?
 
A

arnuld

Then does reading a character sized data into a integer type of data
variable do cause an issue in the porting of the code ??


No, as every character is converted into an integer at compilation. Right
clc folks ? ( or you think I am confusing ASCII table with compiler ?)
 
K

Keith Thompson

arnuld said:
No, as every character is converted into an integer at compilation. Right
clc folks ? ( or you think I am confusing ASCII table with compiler ?)

Pranav was talking about run-time input, not compilation.

Note that type char is an integer type. It's important to distinguish
between an integer type (of which there are several, including char,
int, unsigned long, etc.) and the specific integer type called "int".
The name "int" was obviously formed as an abbreviation of the word
"integer", but they mean different things.

getchar() attempts to read the next character from stdin. If it
succeeds, it treats the character as a value of type unsigned char,
and then converts the resulting unsigned char value to int. Since all
unsigned char values are non-negative, the result of the conversion is
non-negative. If it fails (either because there's no more input or
because of some error), it returns the int value EOF, which, since
it's negative, is distinct from any valid character value. (Plain
char may be either signed or unsigned -- but getchar() doesn't use
plain char.)

The answer to Pranav's questions is no, this doesn't cause any
problems with porting code.

Well, mostly. Some exotic machines might have sizeof(int)==1 (which
can happen only if char is at least 16 bits). On such a system, it
can be difficult to distinguish between EOF (typically an int value of
-1) and a valid character with the unsigned char value 0xffff, which
when converted to int is likely to yield -1.

You're unlikely to run into this in practice. Machines with this
characteristic are typically DSPs (digital signal processors) which
typically have freestanding C implementations, so stdio.h might not
even be available. But if you want your code to be 100% portable, you
can first check whether the result returned by getchar() is equal to
EOF, and then check whether either feof() or ferror() returns a true
value. In practice, we don't generally bother.
 
A

arnuld

Note that type char is an integer type. It's important to distinguish
between an integer type (of which there are several, including char,
int, unsigned long, etc.) and the specific integer type called "int".
The name "int" was obviously formed as an abbreviation of the word
"integer", but they mean different things.


Now I am much curious. Whats the different between an "integer" and a
variable of type "int". Do "integer types" are different from "int types"



You're unlikely to run into this in practice. Machines with this
characteristic are typically DSPs (digital signal processors) which
typically have freestanding C implementations, so stdio.h might not
even be available. But if you want your code to be 100% portable, you
can first check whether the result returned by getchar() is equal to
EOF, and then check whether either feof() or ferror() returns a true
value. In practice, we don't generally bother.


Now I know why some clc lurker told me to distinguish between real end
of file (no more input) and the not so real end of file (error in input)
and suggested me to use feof() and ferror() for that.
 
J

James Kuyper

arnuld said:
Now I am much curious. Whats the different between an "integer" and a
variable of type "int". Do "integer types" are different from "int types"

The standard doesn't define any meaning for the phrase "int types". It
does define "integer types". "int" is the name one of one particular
integer type.

Integer types (6.2.5p17):
char

signed integer types (6.2.5p4):
standard signed integer types:
signed char, short int, int, long int, long long int

extended signed integer types (implementation-defined)

unsigned integer types (6.2.5p6):
standard unsigned integer types:
_Bool, and unsigned types corresponding to standard signed
integer types

extended unsigned integer types (implementation-defined)

enumerated types

It's not possible to be specific about the extended integer types. They
are implementation-defined types, such as _int36 for a 36-bit integer
type. In C90, such types were allowed only as an extension to C. This
meant that, in particular, things like size_t that were required to be
integer types could only be typedefs for standard types. In C99, the
concept of "extended integer types" was defined, and size_t is allowed
to refer any unsigned integer type, whether standard or extended.

....
Now I know why some clc lurker told me to distinguish between real end
of file (no more input) and the not so real end of file (error in input)
and suggested me to use feof() and ferror() for that.

EOF is just a macro name; it's clearly named in reference to "End Of
File", but it's also used by the character-oriented I/O functions as a
general-purpose error flag, not exclusively to refer to the end of the file.
 
J

James Kuyper

arnuld said:
Now I am much curious. Whats the different between an "integer" and a
variable of type "int". Do "integer types" are different from "int types"

The standard doesn't define any meaning for the phrase "int types". It
does define "integer types". "int" is the name one of one particular
integer type.

Integer types (6.2.5p17):
char

signed integer types (6.2.5p4):
standard signed integer types:
signed char, short int, int, long int, long long int

extended signed integer types (implementation-defined)

unsigned integer types (6.2.5p6):
standard unsigned integer types:
_Bool, and unsigned types corresponding to standard signed
integer types

extended unsigned integer types (implementation-defined)

enumerated types

It's not possible to be specific about the extended integer types. They
are implementation-defined types, such as _int36 for a 36-bit integer
type. In C90, such types were allowed only as an extension to C. This
meant that, in particular, things like size_t that were required to be
integer types could only be typedefs for standard types. In C99, the
concept of "extended integer types" was defined, and size_t is allowed
to refer any unsigned integer type, whether standard or extended.

....
Now I know why some clc lurker told me to distinguish between real end
of file (no more input) and the not so real end of file (error in input)
and suggested me to use feof() and ferror() for that.

EOF is just a macro name; it's clearly named in reference to "End Of
File", but it's also used by the character-oriented I/O functions as a
general-purpose error flag, not exclusively to refer to the end of the file.
 
M

Michael

arnuld said:
Mostly when I want to take input from stdin I use getchar() but I get this
from man page itself:

"If the integer value returned by getchar() is stored into a variable of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of type
char on widening to integer is implementation-defined"


while( EOF != (ch = getchar()) ) ....


I use it like that. Can I run into problems with that ?

the function

int getchar();

reads a byte from the standard input and return it.
If End-of-file is read, it returns EOF (on my machine, it is 0xffffffff)
If ch is an int, there is no problem at all.
A common mistake is assigning getchar() into a char variable.
For example, if ch is a char:

EOF!=(ch=getchar())

When the byte of 0xff is read:

getchar()=0x000000ff
ch=0xff

Because EOF is an int, the value of ch is automatically casted to int.

If ch is unsigned, R.H.S of != is 0x000000ff
If ch is signed, R.H.S of != is 0xffffffff which is equal to EOF and
while loop will exit

Therefore, if ch is a char, there will be a problem if the read
character is expanded to EOF (which is implementation-specific) and the
signedness of char (again which is implementation-specific)
 
J

James Kuyper

Michael said:
arnuld wrote: .... ....
If ch is an int, there is no problem at all.

Unless INT_MAX<UCHAR_MAX, which is possible on systems where CHAR_BIT >=
16. On such systems, it's possible for a valid byte, when converted to
'int', to have the same value as EOF. The only work-around for that
possibility is to check feof() and ferror().
 
K

Keith Thompson

Michael said:
the function

int getchar();

reads a byte from the standard input and return it.
If End-of-file is read, it returns EOF (on my machine, it is 0xffffffff)
[...]

No, EOF cannot be defined as 0xffffffff. It must expand to "an
integer constant expression, with type int and a negative value". A
typical definition is

#define EOF (-1)

If you convert the value of EOF to unsigned int on a 32-bit system,
the result is likely to be 0xffffffff; that's not the value of EOF,
it's the result of the conversion.
 
M

Michael

Keith said:
Michael said:
the function

int getchar();

reads a byte from the standard input and return it.
If End-of-file is read, it returns EOF (on my machine, it is 0xffffffff)
[...]

No, EOF cannot be defined as 0xffffffff. It must expand to "an
integer constant expression, with type int and a negative value". A
typical definition is

#define EOF (-1)

If you convert the value of EOF to unsigned int on a 32-bit system,
the result is likely to be 0xffffffff; that's not the value of EOF,
it's the result of the conversion.

0xffffffff is hexadecimal *is* -1 in decimal on 32-bit int.
 
C

Chris Dollin

Michael said:
Keith Thompson wrote:

0xffffffff is hexadecimal *is* -1 in decimal on 32-bit int.

Not if it's an /unsigned/ int (see Keith's first sentence above).
 
J

jameskuyper

Michael said:
Keith Thompson wrote: ....

0xffffffff is hexadecimal *is* -1 in decimal on 32-bit int.

Not in C. In C, 0xFFFFFFFF is just a different way of writing the same
value as 2147483647 - the only difference is that 0xFFFFFFFF might
have an unsigned type, while 2147483647 must have a signed type.
0xFFFFFFFF never has the meaning "-1". It can be converted to an int,
and if 'int' is a 32-bit 2's complement type the result of that
conversion will probably be -1, but that doesn't mean that 0xFFFFFFFF
is -1.
 
K

Keith Thompson

Michael said:
Keith said:
Michael said:
the function

int getchar();

reads a byte from the standard input and return it.
If End-of-file is read, it returns EOF (on my machine, it is 0xffffffff)
[...]
No, EOF cannot be defined as 0xffffffff. It must expand to "an
integer constant expression, with type int and a negative value". A
typical definition is
#define EOF (-1)
If you convert the value of EOF to unsigned int on a 32-bit system,
the result is likely to be 0xffffffff; that's not the value of EOF,
it's the result of the conversion.

0xffffffff is hexadecimal *is* -1 in decimal on 32-bit int.

No, 0xffffffff is an integer constant with the value 4294967295
(2**32-1, where "**" denotes exponentiation).

Assuming int is 32 bits, 2's-complement, no padding bits, no trap
representations, then that value cannot be represented by type int.
If you assign 0xffffffff to an int object, then, strictly speaking,
the result is an implementation-defined value (or, optionally and in
C99 only, an implementation-defined signal). In practice, it's very
likely that the value -1 will be assigned -- this is the
(implementation-defined but very common) result of the conversion.
Because of the conversion *the value changes*.

Assigning -1 to an object of type unsigned int will result in the
object having the value UINT_MAX, which, if unsigned int is 32 bits
with no padding bits, is 4294967295 or 0xffffffff. Again, the
implicit conversion from int (the type of the expression -1) to
unsigned int (the type of the object) changes the value. (Conversion
to unsigned types is defined differently by the standard than
conversion to signed types.)

I suspect that you're thinking of hexadecimal notation as a way of
specifying the representation of an object, as opposed to decimal
notation, which specifies a mathematical numeric value. If so, you
are mistaken. In C, decimal and hexadecimal are just two different
notations for representing integer values; there's nothing magical
about either one. 0xff, 0x00ff, and 255 mean *exactly* the same
thing.

On the other hand, in English text it's not unreasonable to use
hexadecimal notation to talk about object representations, so that
0xff refers to 8 bits all set to 1, and 0x00ff refers to 16 bits (and
thus is distinct from 0xff). But since C has a well-defined meaning
for hexadecimal notation, if you're going to use it that way you need
to say so explicitly.

For example, the representation of the 32-bit int value -1 is
0xffffffff.

(Octal is the third notation; it's probably not used as much these
days, though it was very useful on the PDP-11. Except that, strictly
speaking, 0 is an octal constant, so most C programmers use octal
every day without realizing it.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top