getchar() and EOF confusion

Discussion in 'C Programming' started by arnuld, Oct 15, 2008.

  1. arnuld

    arnuld Guest

    Mostly when I want to take input from stdin I use getchar() but I get this
    from man page itself:

    "If the integer value returned by getchar() is stored into a variable of
    type char and then compared against the integer constant EOF, the
    comparison may never succeed, because sign-extension of a variable of type
    char on widening to integer is implementation-defined"


    while( EOF != (ch = getchar()) ) ....


    I use it like that. Can I run into problems with that ?
     
    arnuld, Oct 15, 2008
    #1
    1. Advertisements

  2. arnuld

    arnuld Guest

    Oh.. thats why K&R2 uses "int ch" , thanks :)
     
    arnuld, Oct 15, 2008
    #2
    1. Advertisements

  3. getchar treats the data it obtains from the stream as unsigned. EOF
    is guaranteed to be negative. Can you see where this leads?
     
    Barry Schwarz, Oct 15, 2008
    #3
  4. arnuld

    danmath06 Guest

    Yes, if ch is not an int. The prototype for getchar() is: "int
    getchar(void);". So you should use an int to hold the return from
    getchar();
     
    danmath06, Oct 15, 2008
    #4
  5. The manual is poorly written. Integral promotion
    is well defined and will always be value preserving
    in the case of char values.

    What is implementation defined is whether plain char
    is signed or unsigned, but that too is mostly
    incidental.
    Did you read the FAQ?

    http://c-faq.com/stdio/getcharc.html
     
    Peter Nilsson, Oct 15, 2008
    #5
  6. arnuld

    arnuld Guest


    yes, I did but can't figure out what FAQ means:


    :: Two failure modes are possible if, as in the fragment
    :: above, getchar's return value is assigned to a char.


    :: 1. If type char is signed, and if EOF is defined (as is usual) as -1,
    :: the character with the decimal value 255 ('\377' or '\xff' in C) will
    :: be sign-extended and will compare equal to EOF, prematurely terminating
    :: the input. [footnote]

    does it mean that if char is signed, the input if 255 will be equal to
    -1, hence 255 == EOF


    :: 2. If type char is unsigned, an actual EOF value will be truncated
    :: (by having its higher-order bits discarded, probably resulting in
    :: 255 or 0xff) and will not be recognized as EOF, resulting in
    :: effectively infinite input.


    it means, if char is unsigned, an input value equal to EOF, whihc is -1
    will be converted to 255 ?



    okay, whatever it is, why bother, just use "int ch" for getchar(), getc(),
    and fgetc()
     
    arnuld, Oct 15, 2008
    #6
  7. arnuld

    Pranav Guest

    Then does reading a character sized data into a integer type of data
    variable do cause an issue in the porting of the code ??
     
    Pranav, Oct 15, 2008
    #7
  8. I'm not sure I understand what you mean. Can you give an example of the
    kind of code you have in mind?
     
    Nate Eldredge, Oct 15, 2008
    #8
  9. arnuld

    arnuld Guest


    No, as every character is converted into an integer at compilation. Right
    clc folks ? ( or you think I am confusing ASCII table with compiler ?)
     
    arnuld, Oct 15, 2008
    #9
  10. Pranav was talking about run-time input, not compilation.

    Note that type char is an integer type. It's important to distinguish
    between an integer type (of which there are several, including char,
    int, unsigned long, etc.) and the specific integer type called "int".
    The name "int" was obviously formed as an abbreviation of the word
    "integer", but they mean different things.

    getchar() attempts to read the next character from stdin. If it
    succeeds, it treats the character as a value of type unsigned char,
    and then converts the resulting unsigned char value to int. Since all
    unsigned char values are non-negative, the result of the conversion is
    non-negative. If it fails (either because there's no more input or
    because of some error), it returns the int value EOF, which, since
    it's negative, is distinct from any valid character value. (Plain
    char may be either signed or unsigned -- but getchar() doesn't use
    plain char.)

    The answer to Pranav's questions is no, this doesn't cause any
    problems with porting code.

    Well, mostly. Some exotic machines might have sizeof(int)==1 (which
    can happen only if char is at least 16 bits). On such a system, it
    can be difficult to distinguish between EOF (typically an int value of
    -1) and a valid character with the unsigned char value 0xffff, which
    when converted to int is likely to yield -1.

    You're unlikely to run into this in practice. Machines with this
    characteristic are typically DSPs (digital signal processors) which
    typically have freestanding C implementations, so stdio.h might not
    even be available. But if you want your code to be 100% portable, you
    can first check whether the result returned by getchar() is equal to
    EOF, and then check whether either feof() or ferror() returns a true
    value. In practice, we don't generally bother.
     
    Keith Thompson, Oct 15, 2008
    #10
  11. arnuld

    arnuld Guest


    Now I am much curious. Whats the different between an "integer" and a
    variable of type "int". Do "integer types" are different from "int types"



    Now I know why some clc lurker told me to distinguish between real end
    of file (no more input) and the not so real end of file (error in input)
    and suggested me to use feof() and ferror() for that.
     
    arnuld, Oct 15, 2008
    #11
  12. arnuld

    James Kuyper Guest

    The standard doesn't define any meaning for the phrase "int types". It
    does define "integer types". "int" is the name one of one particular
    integer type.

    Integer types (6.2.5p17):
    char

    signed integer types (6.2.5p4):
    standard signed integer types:
    signed char, short int, int, long int, long long int

    extended signed integer types (implementation-defined)

    unsigned integer types (6.2.5p6):
    standard unsigned integer types:
    _Bool, and unsigned types corresponding to standard signed
    integer types

    extended unsigned integer types (implementation-defined)

    enumerated types

    It's not possible to be specific about the extended integer types. They
    are implementation-defined types, such as _int36 for a 36-bit integer
    type. In C90, such types were allowed only as an extension to C. This
    meant that, in particular, things like size_t that were required to be
    integer types could only be typedefs for standard types. In C99, the
    concept of "extended integer types" was defined, and size_t is allowed
    to refer any unsigned integer type, whether standard or extended.

    ....
    EOF is just a macro name; it's clearly named in reference to "End Of
    File", but it's also used by the character-oriented I/O functions as a
    general-purpose error flag, not exclusively to refer to the end of the file.
     
    James Kuyper, Oct 15, 2008
    #12
  13. arnuld

    James Kuyper Guest

    The standard doesn't define any meaning for the phrase "int types". It
    does define "integer types". "int" is the name one of one particular
    integer type.

    Integer types (6.2.5p17):
    char

    signed integer types (6.2.5p4):
    standard signed integer types:
    signed char, short int, int, long int, long long int

    extended signed integer types (implementation-defined)

    unsigned integer types (6.2.5p6):
    standard unsigned integer types:
    _Bool, and unsigned types corresponding to standard signed
    integer types

    extended unsigned integer types (implementation-defined)

    enumerated types

    It's not possible to be specific about the extended integer types. They
    are implementation-defined types, such as _int36 for a 36-bit integer
    type. In C90, such types were allowed only as an extension to C. This
    meant that, in particular, things like size_t that were required to be
    integer types could only be typedefs for standard types. In C99, the
    concept of "extended integer types" was defined, and size_t is allowed
    to refer any unsigned integer type, whether standard or extended.

    ....
    EOF is just a macro name; it's clearly named in reference to "End Of
    File", but it's also used by the character-oriented I/O functions as a
    general-purpose error flag, not exclusively to refer to the end of the file.
     
    James Kuyper, Oct 15, 2008
    #13
  14. arnuld

    Michael Guest

    the function

    int getchar();

    reads a byte from the standard input and return it.
    If End-of-file is read, it returns EOF (on my machine, it is 0xffffffff)
    If ch is an int, there is no problem at all.
    A common mistake is assigning getchar() into a char variable.
    For example, if ch is a char:

    EOF!=(ch=getchar())

    When the byte of 0xff is read:

    getchar()=0x000000ff
    ch=0xff

    Because EOF is an int, the value of ch is automatically casted to int.

    If ch is unsigned, R.H.S of != is 0x000000ff
    If ch is signed, R.H.S of != is 0xffffffff which is equal to EOF and
    while loop will exit

    Therefore, if ch is a char, there will be a problem if the read
    character is expanded to EOF (which is implementation-specific) and the
    signedness of char (again which is implementation-specific)
     
    Michael, Oct 15, 2008
    #14
  15. arnuld

    James Kuyper Guest

    Unless INT_MAX<UCHAR_MAX, which is possible on systems where CHAR_BIT >=
    16. On such systems, it's possible for a valid byte, when converted to
    'int', to have the same value as EOF. The only work-around for that
    possibility is to check feof() and ferror().
     
    James Kuyper, Oct 15, 2008
    #15
  16. [...]

    No, EOF cannot be defined as 0xffffffff. It must expand to "an
    integer constant expression, with type int and a negative value". A
    typical definition is

    #define EOF (-1)

    If you convert the value of EOF to unsigned int on a 32-bit system,
    the result is likely to be 0xffffffff; that's not the value of EOF,
    it's the result of the conversion.
     
    Keith Thompson, Oct 15, 2008
    #16
  17. arnuld

    Michael Guest

    0xffffffff is hexadecimal *is* -1 in decimal on 32-bit int.
     
    Michael, Oct 16, 2008
    #17
  18. arnuld

    Chris Dollin Guest

    Not if it's an /unsigned/ int (see Keith's first sentence above).
     
    Chris Dollin, Oct 16, 2008
    #18
  19. arnuld

    jameskuyper Guest

    Not in C. In C, 0xFFFFFFFF is just a different way of writing the same
    value as 2147483647 - the only difference is that 0xFFFFFFFF might
    have an unsigned type, while 2147483647 must have a signed type.
    0xFFFFFFFF never has the meaning "-1". It can be converted to an int,
    and if 'int' is a 32-bit 2's complement type the result of that
    conversion will probably be -1, but that doesn't mean that 0xFFFFFFFF
    is -1.
     
    jameskuyper, Oct 16, 2008
    #19
  20. No, 0xffffffff is an integer constant with the value 4294967295
    (2**32-1, where "**" denotes exponentiation).

    Assuming int is 32 bits, 2's-complement, no padding bits, no trap
    representations, then that value cannot be represented by type int.
    If you assign 0xffffffff to an int object, then, strictly speaking,
    the result is an implementation-defined value (or, optionally and in
    C99 only, an implementation-defined signal). In practice, it's very
    likely that the value -1 will be assigned -- this is the
    (implementation-defined but very common) result of the conversion.
    Because of the conversion *the value changes*.

    Assigning -1 to an object of type unsigned int will result in the
    object having the value UINT_MAX, which, if unsigned int is 32 bits
    with no padding bits, is 4294967295 or 0xffffffff. Again, the
    implicit conversion from int (the type of the expression -1) to
    unsigned int (the type of the object) changes the value. (Conversion
    to unsigned types is defined differently by the standard than
    conversion to signed types.)

    I suspect that you're thinking of hexadecimal notation as a way of
    specifying the representation of an object, as opposed to decimal
    notation, which specifies a mathematical numeric value. If so, you
    are mistaken. In C, decimal and hexadecimal are just two different
    notations for representing integer values; there's nothing magical
    about either one. 0xff, 0x00ff, and 255 mean *exactly* the same
    thing.

    On the other hand, in English text it's not unreasonable to use
    hexadecimal notation to talk about object representations, so that
    0xff refers to 8 bits all set to 1, and 0x00ff refers to 16 bits (and
    thus is distinct from 0xff). But since C has a well-defined meaning
    for hexadecimal notation, if you're going to use it that way you need
    to say so explicitly.

    For example, the representation of the 32-bit int value -1 is
    0xffffffff.

    (Octal is the third notation; it's probably not used as much these
    days, though it was very useful on the PDP-11. Except that, strictly
    speaking, 0 is an octal constant, so most C programmers use octal
    every day without realizing it.)
     
    Keith Thompson, Oct 16, 2008
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.