Reading from files and range of char and friends

Discussion in 'C Programming' started by Spiros Bousbouras, Mar 10, 2011.

  1. If you are reading from a file by successively calling fgetc() is there
    any point in storing what you read in anything other than unsigned
    char ? If you try to store it in char or signed char then it's possible
    that what you read may fall outside the range of the type in which case
    you get implementation defined behavior according to 6.3.1.3 p. 3. So
    then why doesn't fgets() get unsigned char* as first argument ? It
    would make the life of the user simpler and possibly also the life of
    the implementor.

    --
    Pain makes believers.
    Wally Jay
    Spiros Bousbouras, Mar 10, 2011
    #1
    1. Advertising

  2. Spiros Bousbouras

    Angel Guest

    On 2011-03-10, Spiros Bousbouras <> wrote:
    > If you are reading from a file by successively calling fgetc() is there
    > any point in storing what you read in anything other than unsigned
    > char ?


    Yes, when you read EOF which is not an unsigned char.

    "fgetc() reads the next character from stream and returns
    it as an unsigned char cast to an int, or EOF on end of file or
    error."
    (From the Linux man pages.)


    --
    The natural state of a spammer's website is a smoking crater.
    Angel, Mar 10, 2011
    #2
    1. Advertising

  3. On 10 Mar 2011 16:49:57 GMT
    Angel <> wrote:
    > On 2011-03-10, Spiros Bousbouras <> wrote:
    > > If you are reading from a file by successively calling fgetc() is there
    > > any point in storing what you read in anything other than unsigned
    > > char ?

    >
    > Yes, when you read EOF which is not an unsigned char.


    In my mind I was making a distinction between storing and temporarily
    assigning but I guess it wasn't clear. What I had in mind was something
    like:

    unsigned char arr[some_size] ;
    int a ;

    while ( (a = fgetc(f)) != EOF) arr[position++] = a ;

    Would there be any reason for arr to be something other than
    unsigned char ?
    Spiros Bousbouras, Mar 10, 2011
    #3
  4. Spiros Bousbouras

    Angel Guest

    On 2011-03-10, Spiros Bousbouras <> wrote:
    > On 10 Mar 2011 16:49:57 GMT
    > Angel <> wrote:
    >> On 2011-03-10, Spiros Bousbouras <> wrote:
    >> > If you are reading from a file by successively calling fgetc() is there
    >> > any point in storing what you read in anything other than unsigned
    >> > char ?

    >>
    >> Yes, when you read EOF which is not an unsigned char.

    >
    > In my mind I was making a distinction between storing and temporarily
    > assigning but I guess it wasn't clear. What I had in mind was something
    > like:
    >
    > unsigned char arr[some_size] ;
    > int a ;
    >
    > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
    >
    > Would there be any reason for arr to be something other than
    > unsigned char ?


    No, but you should use a cast there or your compiler might balk because
    unsigned char is likely to have less bits than int.

    fgetc() returns an int because EOF has to have a value that cannot
    normally be read from a file. Once you've determined that the read value
    is not EOF, it's safe to store it as an unsigned char.

    And in C there is no difference between "storing" and "temporarily
    assigning". Every assignment lasts until overwritten.


    --
    The natural state of a spammer's website is a smoking crater.
    Angel, Mar 10, 2011
    #4
  5. Spiros Bousbouras

    Paul N Guest

    On Mar 10, 5:05 pm, Spiros Bousbouras <> wrote:
    > On 10 Mar 2011 16:49:57 GMT
    >
    > Angel <> wrote:
    > > On 2011-03-10, Spiros Bousbouras <> wrote:
    > > > If you are reading from a file by successively calling fgetc() is there
    > > > any point in storing what you read in anything other than unsigned
    > > > char ?

    >
    > > Yes, when you read EOF which is not an unsigned char.

    >
    > In my mind I was making a distinction between storing and temporarily
    > assigning but I guess it wasn't clear. What I had in mind was something
    > like:
    >
    > unsigned char arr[some_size] ;
    > int a ;
    >
    > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
    >
    > Would there be any reason for arr to be something other than
    > unsigned char ?


    char is normally used for storing characters, and I think that is what
    it was designed for. So it seems a bit odd not to use it. If you're
    going to use the str* functions to manipulate what you've read in,
    then storing it as char seems sensible, and not doing so is likely to
    require some nasty casts.

    In my view anyway...
    Paul N, Mar 10, 2011
    #5
  6. On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
    Paul N <> wrote:
    > On Mar 10, 5:05 pm, Spiros Bousbouras <> wrote:
    > > On 10 Mar 2011 16:49:57 GMT
    > >
    > > Angel <> wrote:
    > > > On 2011-03-10, Spiros Bousbouras <> wrote:
    > > > > If you are reading from a file by successively calling fgetc() is there
    > > > > any point in storing what you read in anything other than unsigned
    > > > > char ?

    > >
    > > > Yes, when you read EOF which is not an unsigned char.

    > >
    > > In my mind I was making a distinction between storing and temporarily
    > > assigning but I guess it wasn't clear. What I had in mind was something
    > > like:
    > >
    > > unsigned char arr[some_size] ;
    > > int a ;
    > >
    > > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
    > >
    > > Would there be any reason for arr to be something other than
    > > unsigned char ?

    >
    > char is normally used for storing characters, and I think that is what
    > it was designed for. So it seems a bit odd not to use it.


    But if arr[] is char how do you avoid the implementation defined
    behavior when doing arr[position++] = a ?
    Spiros Bousbouras, Mar 10, 2011
    #6
  7. Spiros Bousbouras

    Angel Guest

    On 2011-03-10, Spiros Bousbouras <> wrote:
    > On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
    > Paul N <> wrote:
    >> >
    >> > In my mind I was making a distinction between storing and temporarily
    >> > assigning but I guess it wasn't clear. What I had in mind was something
    >> > like:
    >> >
    >> > unsigned char arr[some_size] ;
    >> > int a ;
    >> >
    >> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
    >> >
    >> > Would there be any reason for arr to be something other than
    >> > unsigned char ?

    >>
    >> char is normally used for storing characters, and I think that is what
    >> it was designed for. So it seems a bit odd not to use it.

    >
    > But if arr[] is char how do you avoid the implementation defined
    > behavior when doing arr[position++] = a ?


    Depends on what exactly you are reading. If it's a normal text file
    encoded in ASCII, converting the values read by fgetc() should be safe
    because ASCII values are only 7 bits and will fit into a char.

    If it's a binary file though, you'll have to use unsigned char, and
    you should consider using fread instead.


    --
    The natural state of a spammer's website is a smoking crater.
    Angel, Mar 10, 2011
    #7
  8. On 10 Mar 2011 22:49:52 GMT
    Angel <> wrote:
    > On 2011-03-10, Spiros Bousbouras <> wrote:
    > > On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
    > > Paul N <> wrote:
    > >> >
    > >> > In my mind I was making a distinction between storing and temporarily
    > >> > assigning but I guess it wasn't clear. What I had in mind was something
    > >> > like:
    > >> >
    > >> > unsigned char arr[some_size] ;
    > >> > int a ;
    > >> >
    > >> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
    > >> >
    > >> > Would there be any reason for arr to be something other than
    > >> > unsigned char ?
    > >>
    > >> char is normally used for storing characters, and I think that is what
    > >> it was designed for. So it seems a bit odd not to use it.

    > >
    > > But if arr[] is char how do you avoid the implementation defined
    > > behavior when doing arr[position++] = a ?

    >
    > Depends on what exactly you are reading. If it's a normal text file
    > encoded in ASCII, converting the values read by fgetc() should be safe
    > because ASCII values are only 7 bits and will fit into a char.
    >
    > If it's a binary file though, you'll have to use unsigned char, and
    > you should consider using fread instead.


    And what if it's a non ASCII text file ? It could be ISO-8859-1 or
    UTF-8. An extra complication is that you may have to read some of the
    file in order to determine what kind of information it contains.
    Spiros Bousbouras, Mar 10, 2011
    #8
  9. Spiros Bousbouras

    Angel Guest

    On 2011-03-10, Spiros Bousbouras <> wrote:
    > On 10 Mar 2011 22:49:52 GMT
    > Angel <> wrote:
    >> On 2011-03-10, Spiros Bousbouras <> wrote:
    >> > On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
    >> > Paul N <> wrote:
    >> >> >
    >> >> > In my mind I was making a distinction between storing and temporarily
    >> >> > assigning but I guess it wasn't clear. What I had in mind was something
    >> >> > like:
    >> >> >
    >> >> > unsigned char arr[some_size] ;
    >> >> > int a ;
    >> >> >
    >> >> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
    >> >> >
    >> >> > Would there be any reason for arr to be something other than
    >> >> > unsigned char ?
    >> >>
    >> >> char is normally used for storing characters, and I think that is what
    >> >> it was designed for. So it seems a bit odd not to use it.
    >> >
    >> > But if arr[] is char how do you avoid the implementation defined
    >> > behavior when doing arr[position++] = a ?

    >>
    >> Depends on what exactly you are reading. If it's a normal text file
    >> encoded in ASCII, converting the values read by fgetc() should be safe
    >> because ASCII values are only 7 bits and will fit into a char.
    >>
    >> If it's a binary file though, you'll have to use unsigned char, and
    >> you should consider using fread instead.

    >
    > And what if it's a non ASCII text file ? It could be ISO-8859-1 or
    > UTF-8. An extra complication is that you may have to read some of the
    > file in order to determine what kind of information it contains.


    fgetc() is guaranteed to return either an unsigned char or EOF, so that
    always works. Interpreting the read data is up to your program and will
    depend on what exactly you are trying to accomplish.

    UTF-8, as the name implies, is 8 bits wide and will fit in an unsigned
    char (it will fit in a signed char too, but values >127 will be
    converted to negative values), and so does ISO-8859-1. For character
    encodings with more bits, there is fgetwc().


    --
    The natural state of a spammer's website is a smoking crater.
    Angel, Mar 10, 2011
    #9
  10. Spiros Bousbouras <> writes:
    > On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
    > Paul N <> wrote:
    >> On Mar 10, 5:05 pm, Spiros Bousbouras <> wrote:
    >> > On 10 Mar 2011 16:49:57 GMT
    >> >
    >> > Angel <> wrote:
    >> > > On 2011-03-10, Spiros Bousbouras <> wrote:
    >> > > > If you are reading from a file by successively calling fgetc() is there
    >> > > > any point in storing what you read in anything other than unsigned
    >> > > > char ?
    >> >
    >> > > Yes, when you read EOF which is not an unsigned char.
    >> >
    >> > In my mind I was making a distinction between storing and temporarily
    >> > assigning but I guess it wasn't clear. What I had in mind was something
    >> > like:
    >> >
    >> > unsigned char arr[some_size] ;
    >> > int a ;
    >> >
    >> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
    >> >
    >> > Would there be any reason for arr to be something other than
    >> > unsigned char ?

    >>
    >> char is normally used for storing characters, and I think that is what
    >> it was designed for. So it seems a bit odd not to use it.

    >
    > But if arr[] is char how do you avoid the implementation defined
    > behavior when doing arr[position++] = a ?


    Typically by ignoring the issue. (Well, this doesn't avoid
    the implementation defined behavior; it just assumes it's
    ok.) On any system where this is a sensible thing to do, the
    implementation-defined behavior is almost certain to be what you
    want. Assigning a value exceeding CHAR_MAX to a char (assuming
    plain char is signed) *could* give you a strange result, or even
    raise an implementation-defined signal, but any implementation that
    chose to do such a thing would break a lot of existing code.

    C uses plain char (which may be signed) for strings, but it reads
    characters from files as unsigned char values. IMHO this is a flaw
    in the language. A byte read from a file with a representation
    of 10101001 (0xa9) is far more likely to mean 169 than -87 (it's
    a copyright symbol in Latin-1, 'z' in EBCDIC).

    One solution might be to require plain char to be unsigned, but that
    causes inefficient code for some operations -- which was more of
    issue in the PDP-11 days than it is now, but it's probably still
    significant.

    Another might be to have fgetc() return an int representing either
    a *plain* char value or EOF, but it's too late to change that.

    I'm usually a strong advocate for writing code as portably as possible,
    but in this case I suspect that workaround around the unsigned char vs.
    plain char mismatch would be more effort than it's worth.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 10, 2011
    #10
  11. Spiros Bousbouras

    Eric Sosman Guest

    On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
    > If you are reading from a file by successively calling fgetc() is there
    > any point in storing what you read in anything other than unsigned
    > char ?


    Sure. To see one reason in action, try

    unsigned char uchar_password[SIZE];
    ...
    if (strcmp(uchar_password, "SuperSecret") == 0) ...

    > If you try to store it in char or signed char then it's possible
    > that what you read may fall outside the range of the type in which case
    > you get implementation defined behavior according to 6.3.1.3 p. 3.


    Yes. This is, IMHO, a weakness in the library design, a weakness
    inherited from the pre-Standard days that also gave us gets(). The
    practical consequence is that the implementation must define the
    behavior "usefully" in order to make the library work as desired.
    (The situation is particularly bad for systems with signed-magnitude
    or ones' complement notations, where the sign of zero is obliterated
    on conversion to unsigned char and thus cannot be recovered again
    after getc().)

    > then why doesn't fgets() get unsigned char* as first argument ?


    Hysterical raisins, I'd guess.

    In-band signaling works well in some situations -- NULL for a
    failed malloc() or strchr() or getenv(), for example -- but C has
    used it in situations where the benefits are not so clear. getc()
    is one of those, strtoxxx() is another, and no doubt there are other
    situations where the "error return" can be confused with a perfectly
    valid value. Even a failed bsearch() could usefully return something
    more helpful than NULL, were there an independent channel to indicate
    "I didn't find it."

    --
    Eric Sosman
    d
    Eric Sosman, Mar 11, 2011
    #11
  12. On 10/03/2011 18:05, Spiros Bousbouras wrote:
    > On 10 Mar 2011 16:49:57 GMT
    > Angel <> wrote:
    >> On 2011-03-10, Spiros Bousbouras <> wrote:
    >>> If you are reading from a file by successively calling fgetc() is there
    >>> any point in storing what you read in anything other than unsigned
    >>> char ?

    >>
    >> Yes, when you read EOF which is not an unsigned char.

    >
    > In my mind I was making a distinction between storing and temporarily
    > assigning but I guess it wasn't clear. What I had in mind was something
    > like:
    >
    > unsigned char arr[some_size] ;
    > int a ;
    >
    > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;


    Assuming position is initially 0 and a==EOF not needed, try
    position = fread(arr,1,some_size,f);
    This will not cause UB if the input is too big, and it has
    a fair chance to be slightly faster.

    > Would there be any reason for arr to be something other than
    > unsigned char ?


    Usually no (possible exception: dead slow type conversion).
    Whenever fgetc(f) does not return EOF (being passed a valid f),
    it returns an unsigned char casted to an int, and casting that
    int back to unsigned char cause no data loss.

    Francois Grieu
    Francois Grieu, Mar 11, 2011
    #12
  13. On Thu, 10 Mar 2011 20:37:09 -0500
    Eric Sosman <> wrote:
    > On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
    > > If you are reading from a file by successively calling fgetc() is there
    > > any point in storing what you read in anything other than unsigned
    > > char ?

    >
    > Sure. To see one reason in action, try
    >
    > unsigned char uchar_password[SIZE];
    > ...
    > if (strcmp(uchar_password, "SuperSecret") == 0) ...


    Just to be clear , the only thing that can go wrong with this example
    is that strcmp() may try to convert the elements of uchar_password to
    char thereby causing the implementation defined behavior. The same
    issue could arise with any other str* function. Or is there something
    specific about your example that I'm missing ?

    > > If you try to store it in char or signed char then it's possible
    > > that what you read may fall outside the range of the type in which case
    > > you get implementation defined behavior according to 6.3.1.3 p. 3.

    >
    > Yes. This is, IMHO, a weakness in the library design, a weakness
    > inherited from the pre-Standard days that also gave us gets(). The
    > practical consequence is that the implementation must define the
    > behavior "usefully" in order to make the library work as desired.
    > (The situation is particularly bad for systems with signed-magnitude
    > or ones' complement notations, where the sign of zero is obliterated
    > on conversion to unsigned char and thus cannot be recovered again
    > after getc().)


    If getc() read int's from files instead of unsigned char's would it be
    realistically possible that reading from a file would return a negative
    zero ? That would be one strange file.

    > > then why doesn't fgets() get unsigned char* as first argument ?

    >
    > Hysterical raisins, I'd guess.


    For those who didn't get it , that's historical reasons.

    > In-band signaling works well in some situations -- NULL for a
    > failed malloc() or strchr() or getenv(), for example -- but C has
    > used it in situations where the benefits are not so clear. getc()
    > is one of those, strtoxxx() is another, and no doubt there are other
    > situations where the "error return" can be confused with a perfectly
    > valid value.


    I don't see how this can happen with getc(). The only improvement I
    can think of is that you could have two different return values to
    denote exceptional situations instead of just EOF , one value would
    denote end of file and the other error. But the current interface of
    getc() could accommodate this just fine , you would only need to make
    the 2 exceptional values negative.

    > Even a failed bsearch() could usefully return something
    > more helpful than NULL, were there an independent channel to indicate
    > "I didn't find it."


    --
    If strings doesn't work, then there's the "Read Microsoft" tool, rm,
    which gives you the useful content of Word files that strings can't
    extract and helpfully moves the hideous fonts, ugly typography, macro
    viruses, and general bloat that make up the rest of this class of Word
    files into the bit bucket for you.
    Dave Vandervies
    Spiros Bousbouras, Mar 11, 2011
    #13
  14. Spiros Bousbouras <> writes:
    > On Thu, 10 Mar 2011 20:37:09 -0500
    > Eric Sosman <> wrote:
    >> On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
    >> > If you are reading from a file by successively calling fgetc() is there
    >> > any point in storing what you read in anything other than unsigned
    >> > char ?

    >>
    >> Sure. To see one reason in action, try
    >>
    >> unsigned char uchar_password[SIZE];
    >> ...
    >> if (strcmp(uchar_password, "SuperSecret") == 0) ...

    >
    > Just to be clear , the only thing that can go wrong with this example
    > is that strcmp() may try to convert the elements of uchar_password to
    > char thereby causing the implementation defined behavior. The same
    > issue could arise with any other str* function. Or is there something
    > specific about your example that I'm missing ?


    The call to strcmp() violates a constraint. strcmp() expects const
    char* (a non-const char* is also ok), but uchar_password, after
    the implicit conversion is of type unsigned char*. Types char*
    and unsigned char* are not compatible, and there is no implicit
    conversion from one to the other.

    If you use an explicit cast, it will *probably* work as expected,
    but without the case the compiler is permitted to reject i.t

    [...]

    > If getc() read int's from files instead of unsigned char's would it be
    > realistically possible that reading from a file would return a negative
    > zero ? That would be one strange file.


    What would be so strange about it? If a file contains a sequence of
    ints, stored as binary, and the implementation has a distinct
    representation for negative zero, then the file could certainly contain
    negative zeros.

    [...]

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Mar 11, 2011
    #14
  15. On Thu, 10 Mar 2011 15:37:38 -0800
    Keith Thompson <> wrote:
    > Spiros Bousbouras <> writes:
    > > On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
    > > Paul N <> wrote:
    > >> On Mar 10, 5:05 pm, Spiros Bousbouras <> wrote:
    > >> >
    > >> > unsigned char arr[some_size] ;
    > >> > int a ;
    > >> >
    > >> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
    > >> >
    > >> > Would there be any reason for arr to be something other than
    > >> > unsigned char ?
    > >>
    > >> char is normally used for storing characters, and I think that is what
    > >> it was designed for. So it seems a bit odd not to use it.

    > >
    > > But if arr[] is char how do you avoid the implementation defined
    > > behavior when doing arr[position++] = a ?

    >
    > Typically by ignoring the issue. (Well, this doesn't avoid
    > the implementation defined behavior; it just assumes it's
    > ok.) On any system where this is a sensible thing to do, the
    > implementation-defined behavior is almost certain to be what you
    > want.


    Is there a system which has stdio.h but reading from a file and storing
    what you read in an array is not a sensible thing to do ?

    [...]

    > C uses plain char (which may be signed) for strings, but it reads
    > characters from files as unsigned char values. IMHO this is a flaw
    > in the language. A byte read from a file with a representation
    > of 10101001 (0xa9) is far more likely to mean 169 than -87 (it's
    > a copyright symbol in Latin-1, 'z' in EBCDIC).


    Which makes me wonder if there are any character encodings in use where
    some characters get encoded by negative numbers.

    > One solution might be to require plain char to be unsigned, but that
    > causes inefficient code for some operations -- which was more of
    > issue in the PDP-11 days than it is now, but it's probably still
    > significant.
    >
    > Another might be to have fgetc() return an int representing either
    > a *plain* char value or EOF, but it's too late to change that.


    The standard could say that if an implementation offers stdio.h then
    the following function

    int foo(unsigned char a) {
    char b = a ;
    unsigned char c = b ;
    return a == c ;
    }

    always returns 1. This I think would be sufficient to be able to assign
    the return value of fgetc() to char (after checking for EOF) without
    worries. But does it leave any existing implementations out ? And while
    I'm at it , how do existing implementations handle conversion to a
    signed integer type if the value doesn't fit ? Anyone has any unusual
    examples ?

    Another approach would be to have a macro __WBUC2CA (well behaved
    unsigned char to char assignment) which will have the value 1 or 0 and
    if it has the value 1 then foo() above will be guaranteed to return 1.

    > I'm usually a strong advocate for writing code as portably as possible,
    > but in this case I suspect that workaround around the unsigned char vs.
    > plain char mismatch would be more effort than it's worth.


    --
    If Larry Wall had instead written a paper describing Perl, it probably
    would have been dismissed as a joke.
    Kaz Kylheku
    Spiros Bousbouras, Mar 11, 2011
    #15
  16. Spiros Bousbouras

    Tim Rentsch Guest

    Eric Sosman <> writes:

    > On 3/10/2011 11:40 AM, Spiros Bousbouras wrote:
    >> If you are reading from a file by successively calling fgetc() is there
    >> any point in storing what you read in anything other than unsigned
    >> char ?

    >
    > Sure. To see one reason in action, try
    >
    > unsigned char uchar_password[SIZE];
    > ...
    > if (strcmp(uchar_password, "SuperSecret") == 0) ...
    >
    >> If you try to store it in char or signed char then it's possible
    >> that what you read may fall outside the range of the type in which case
    >> you get implementation defined behavior according to 6.3.1.3 p. 3.

    >
    > Yes. This is, IMHO, a weakness in the library design, a weakness
    > inherited from the pre-Standard days that also gave us gets(). The
    > practical consequence is that the implementation must define the
    > behavior "usefully" in order to make the library work as desired.
    > (The situation is particularly bad for systems with signed-magnitude
    > or ones' complement notations, where the sign of zero is obliterated
    > on conversion to unsigned char and thus cannot be recovered again
    > after getc().) [snip subsequent paragaphs]


    Do you mean to say that if a file has a byte with a bit
    pattern corresponding to a 'char' negative-zero, and
    that byte is read (in binary mode) with getc(), the
    result of getc() will be zero? If that's what you're
    saying I believe that is wrong.
    Tim Rentsch, Mar 11, 2011
    #16
  17. Spiros Bousbouras

    Tim Rentsch Guest

    Spiros Bousbouras <> writes:

    > If getc() read int's from files instead of unsigned char's would it be
    > realistically possible that reading from a file would return a negative
    > zero ?


    A call to getc() cannot return negative zero. The reason is,
    getc() is defined in terms of fgetc(), which returns an
    'unsigned char' converted to an 'int', and such conversions
    cannot produce negative zeros.
    Tim Rentsch, Mar 11, 2011
    #17
  18. Spiros Bousbouras

    Tim Rentsch Guest

    Keith Thompson <> writes:

    >> If getc() read int's from files instead of unsigned char's would it be
    >> realistically possible that reading from a file would return a negative
    >> zero ? That would be one strange file.

    >
    > What would be so strange about it? If a file contains a sequence of
    > ints, stored as binary, and the implementation has a distinct
    > representation for negative zero, then the file could certainly contain
    > negative zeros.


    I think the question he was asking is something different, which
    is, "can the int values produced by getc() ever be (int) negative
    zeros?", to which the answer is they cannot.
    Tim Rentsch, Mar 11, 2011
    #18
  19. Spiros Bousbouras

    Tim Rentsch Guest

    Spiros Bousbouras <> writes:

    > On Thu, 10 Mar 2011 14:18:05 -0800 (PST)
    > Paul N <> wrote:
    >> On Mar 10, 5:05 pm, Spiros Bousbouras <> wrote:
    >> > On 10 Mar 2011 16:49:57 GMT
    >> >
    >> > Angel <> wrote:
    >> > > On 2011-03-10, Spiros Bousbouras <> wrote:
    >> > > > If you are reading from a file by successively calling fgetc() is there
    >> > > > any point in storing what you read in anything other than unsigned
    >> > > > char ?
    >> >
    >> > > Yes, when you read EOF which is not an unsigned char.
    >> >
    >> > In my mind I was making a distinction between storing and temporarily
    >> > assigning but I guess it wasn't clear. What I had in mind was something
    >> > like:
    >> >
    >> > unsigned char arr[some_size] ;
    >> > int a ;
    >> >
    >> > while ( (a = fgetc(f)) != EOF) arr[position++] = a ;
    >> >
    >> > Would there be any reason for arr to be something other than
    >> > unsigned char ?

    >>
    >> char is normally used for storing characters, and I think that is what
    >> it was designed for. So it seems a bit odd not to use it.

    >
    > But if arr[] is char how do you avoid the implementation defined
    > behavior when doing arr[position++] = a ?


    Assuming: the bits are in the same places for the implementation that
    wrote the file and the implementation reading the file; and CHAR_BIT
    is also the same; and UCHAR_MAX < INT_MAX; then you could do this:

    arr[position++] = a <= CHAR_MAX ? a : a - (UCHAR_MAX+1);

    which works for all values that the target machine supports.
    Tim Rentsch, Mar 11, 2011
    #19
  20. Spiros Bousbouras

    Tim Rentsch Guest

    Angel <> writes:

    > [snip]
    >
    > UTF-8, as the name implies, is 8 bits wide and will fit in an unsigned
    > char (it will fit in a signed char too,


    It will on most implementations but the Standard does not
    require that.

    > but values >127 will be converted to negative values),


    Again true on most implementations but not Standard-guaranteed.
    Tim Rentsch, Mar 11, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page