Reading from files and range of char and friends

Peter Nilsson · Mar 17, 2011

DSPs are special-purpose, but they're not exactly rare.

And various mobile phones.

I'm not really looking for them. I'm curious why the designer of
an ALU would choose other than twos-complement.

I've read they tend to be simpler to manufacture. Note there are many
ones' complement checksum routines. Also, signal processing tends to
be simpler if you don't have to work with a biased range, i.e. -127
to 127 is simpler to work with than -128 to 127.

Eric Sosman · Mar 17, 2011

DSPs are special-purpose, but they're not exactly rare.

Nor did I or anyone else say they were. Not exactly.

And various mobile phones.

Your phone does toast? *My* phone got toasted ...

James Kuyper · Mar 17, 2011

Sorry for the late response - I meant to reply to this message promptly,
but lost track of it.

On Fri, 11 Mar 2011 11:53:57 -0800

I see. I assumed that the implicit conversion would be ok because
paragraph 27 of 6.2.5 says "A pointer to void shall have the same
representation and alignment requirements as a pointer to a character
type.39)" and footnote 39 says "The same representation and alignment
requirements are meant to imply interchangeability as arguments to
functions, return values from functions, and members of unions." I
assumed that the relation "same representation and alignment
requirements" is transitive.

It is, but having the "same representation and alignment requirements"
is not sufficient for two types to be compatible with each other, nor to
ensure that there's an implicit conversion from one to the other.

Now, "The same representation and alignment requirements are meant to
imply interchangeability as arguments to functions, return values from
functions, and members of unions", but even interchangeable types are
not the same thing as compatible types.

Also, that footnote is not normative, and while it is "meant to imply
....", it does not actually imply it. Two types, with exactly the same
representation and alignment, could fail to be interchangeable for those
purposes if an implementation choses to treat them differently, for
instance by passing them in function calls by different mechanisms. The
fact that they are not compatible types is sufficient to allow such a
decision.

lawrence.jones · Mar 17, 2011

James Kuyper said:
Also, that footnote is not normative, and while it is "meant to imply
...", it does not actually imply it. Two types, with exactly the same
representation and alignment, could fail to be interchangeable for those
purposes if an implementation choses to treat them differently, for
instance by passing them in function calls by different mechanisms. The
fact that they are not compatible types is sufficient to allow such a
decision.

But the fact that they are there is sufficient to indicate that such a
decision is exceedingly unwise.

James Kuyper · Mar 17, 2011

But the fact that they are there is sufficient to indicate that such a
decision is exceedingly unwise.

I did not mean to suggest that it would be a good idea to implement
function calls that way. I just object to the wording of the standard,
and I'm bringing up the possibility of such an implementation as a way
of demonstrating what's wrong with that wording.

Interchangeability should either be required or at least recommended; it
should not be incorrectly identified as a implication that can be
derived solely from the requirement of "same representation and
alignment". Saying "is meant to imply" rather than "implies" is simply
weasel-wording; it should have no place in an official standard: if the
committee really feels that the implication is valid, it should say so
explicitly.

Spiros Bousbouras · Mar 17, 2011

It will be executed if it is the only code there is. Just to state the
problem I'll be solving..
int c;
while ((c = fgetc(f)) != EOF)
..which doesn't work because char and int are the same width so the value
of EOF cannot be distinguisled from valid data. Why not..
int c;
while ((c = fgetc(f)), ! feof(f))
..might work.

The problem with this is that if there is an error reading from the
file then fgetc(f) may keep returning EOF in which case you'll have
an infinite loop. The way I plan to do it from now on is

while (1) {
c = fgetc(f) ;
if (ferror(f)) // Handle the error
if (feof(f)) // Exit the loop possibly after clean-up
}

Even visually I like it better like this than burdening the condition
of the while .

Spiros Bousbouras · Mar 17, 2011

I've read they tend to be simpler to manufacture. Note there are many
ones' complement checksum routines. Also, signal processing tends to
be simpler if you don't have to work with a biased range, i.e. -127
to 127 is simpler to work with than -128 to 127.

Two's complement is consistent with symmetrical range according to the
standard.

Spiros Bousbouras · Mar 17, 2011

But that wrongness can only occur on a vanishingly small number of
platforms -- quite possibly nonexistent in real life.

Possibly but not certainly either now or in the future. But even for
reasons of style I have come now to consider using feof() preferable.

[...]

Suppose your code is running on a system with 16-bit char and 16-bit
int, and it reads a byte with the value 0xffff, which yields -1 when
converted to int (note that even this is implementation-defined), which
happens to be the value of EOF. Since you've added extra checks, you
notice that feof() and ferror() both returned 0, meaning the -1 is a
value read from the file. How sure are you that you can handle that
value correctly?

That depends on what the code is doing. For a lot code you wouldn't
need to handle specially EOF. Say you write code which reads lines of
input and only prints those which match some string. Then you just read
lines one by one and pass them to strcmp() .It makes no difference
whether EOF can be one of the characters in the line. But using feof()
and ferror() guarantees that you won't think the input finished before
it actually did.

Off the top of my head I can't think of examples where you would need
to do something special for EOF.

I'm not necessarily saying you can't, but I can
imagine that there might be some subtle problems.

For example ?

More to the point, how are you going to be able to test your code?
Unless you have access to an exotic system, or unless you replace
fgetc() with your own version, the code that handles this case will
never be executed.

You can trigger the condition by appropriately modifying the source.
The real problem is that if you only have access to systems where EOF
can never be a valid char value then the executed code can't simulate
the exotic systems where EOF may be a valid value for char.

I cannot offer a general methodology for checking the code. But I don't
think it's a problem. If I had to deal with a specific piece of code
rather than talking abstractly I could probably come up with a way to
simulate the situation even on non-exotic hardware. But more
importantly , I think it's very unlikely that EOF will ever have to be
handled specially.

Tim Rentsch · Mar 17, 2011

Keith Thompson said:
Tim Rentsch said:

Phil Carmody said:

On 3/11/2011 4:55 PM, Spiros Bousbouras wrote:
[...]
Ok , I guess it could happen. But then I have a different objection. Eric said

(The situation is particularly bad for systems with
signed-magnitude or ones' complement notations, where the
sign of zero is obliterated on conversion to unsigned char
and thus cannot be recovered again after getc().)

It seems to me that an implementation can easily ensure that the sign
of zero does not get obliterated. If by using fgetc() an unsigned char
gets the bit pattern which corresponds to negative zero then the
implementation can assign the negative zero when converting to int .
The standard allows this.

Could you indicate where? I'm looking at 6.2.6.2p3, which lists
the operations that can generate a minus zero, and does not list
"conversion" among them.

That prevents ``signed char s = -0;'' from making s a negative zero?

Click to expand...

Yes. Surprising but true.

Was that really intended?

Click to expand...

Apparently it was.

Click to expand...

And I think it makes a certain amount of sense. It means that this:

int i = /* something */
int j = -i;

won't store a negative zero in j, even if the value of i is 0.

The reason I say it's surprising is that the most natural
hardware implementation will yield negative zero in both
cases (ie, whether a constant zero or an expression with
the value (positive) zero is used).

Of course, one could argue about whether that's desirable. In any case,
having *some* rules that let you avoid spurious negative zeros in
general calculations seems like a good idea.

As long as the integer constant 0 never produces a
negative zero, it's easy to avoid them, eg,

#define AVOID_NEGATIVE_ZERO(x) ((x) ? (x) : 0)

(Using two's-complement seems like an even better idea.)

Maybe so but it's not really a practical option if you're writing
a C compiler for a machine that doesn't use two's complement in
its hardware.

Tim Rentsch · Mar 17, 2011

Spiros Bousbouras said:
40 AM, Spiros Bousbouras wrote:
If you are reading from a file by successively calling fgetc() is there
any point in storing what you read in anything other than unsigned
char ?

Sure. To see one reason in action, try

unsigned char uchar_password[SIZE];
...
if (strcmp(uchar_password, "SuperSecret") == 0) ...

Just to be clear , the only thing that can go wrong with this example
is that strcmp() may try to convert the elements of uchar_password to
char thereby causing the implementation defined behavior. The same
issue could arise with any other str* function. Or is there something
specific about your example that I'm missing ?

The call to strcmp() violates a constraint. strcmp() expects const
char* (a non-const char* is also ok), but uchar_password, after
the implicit conversion is of type unsigned char*. Types char*
and unsigned char* are not compatible, and there is no implicit
conversion from one to the other.

Click to expand...

I see. I assumed that the implicit conversion would be ok because
paragraph 27 of 6.2.5 says "A pointer to void shall have the same
representation and alignment requirements as a pointer to a character
type.39)" and footnote 39 says "The same representation and alignment
requirements are meant to imply interchangeability as arguments to
functions, return values from functions, and members of unions." I
assumed that the relation "same representation and alignment
requirements" is transitive.

On the other hand footnote 35 of paragraph 15 says that char is not
compatible with signed or unsigned char and in 6.7.5.1 we read that
pointers to types are compatible only if the types are compatible. We
must conclude then that the relation "same representation and alignment
requirements" is not transitive. That's a damn poor choice of
terminology then.

Click to expand...

Actually if the relation "same representation and alignment
requirements" were transitive *and* symmetric [snip rest]

The relation "same representation and alignment requirements"
is reflexive, symmetric, and transitive. Probably you are
confusing it with the 'compatible' relation.

Tim Rentsch · Mar 17, 2011

Spiros Bousbouras said:
As I just said in a different post , on this occasion invoking an
alternative C was pointless. But the way it works in general is that
you take the usual C , change some bits and pieces in the standard and
you have your new C.

I guess you missed the point of the comment. What I was trying
to say (perhaps too subtly) is that asking a question about what
would happen in the hypothetical "Alternative C" is kind of a
dumb question, because since you made it up only you can answer
the question, or for that matter care about what the answer is.

Tim Rentsch · Mar 17, 2011

Spiros Bousbouras said:
The standard lists the operations that can *generate* a negative zero.
One could argue that operations like cast and assignment simply preserve
an existing negative zero rather than generating a new one.

That's my reading too but the problem is what happens when you read
from a file ? I think the standard would be more clear if it said that
reading from a binary stream can also generate a negative zero.[/QUOTE]

It can't unless INT_MAX < UCHAR_MAX, in which case it's
obvious that it can because of how unsigned-to-signed
conversions work.

Tim Rentsch · Mar 17, 2011

Keith Thompson said:
Well, I wouldn't say it's wrong; rather, I'd say it's only 99+% portable
rather than 100% portable. It works just fine *unless* sizeof(int) == 1,
which implies CHAR_BIT >= 16.

As far as I know, all existing hosted C implementations have
CHAR_BIT == 8 and sizeof(int) >= 2 (and non-hosted implementations
aren't even required to support stdio).

If I were worried about the possibility, rather than adding calls
to feof() and ferror(), I'd probably add
#if CHAR_BIT != 8
#error "CHAR_BIT != 8"
#endif
And if I ever see that error message, it almost certainly means
that I forgot to add the "#include <limits.h>"

(Actually, checking that sizeof(int) > 1 would be better, since
the usual EOF check works just fine on a system with 16-bit char
and 32-bit int, but that's a little harder to check at compile time.)

It's easy if a better test is used. The test against EOF is
guaranteed to work if

UCHAR_MAX > 0 && UCHAR_MAX <= INT_MAX

Keith Thompson · Mar 17, 2011

Tim Rentsch said:
As long as the integer constant 0 never produces a
negative zero, it's easy to avoid them, eg,

#define AVOID_NEGATIVE_ZERO(x) ((x) ? (x) : 0)

Sure, it's easy to do that by adding an invocation of
AVOID_NEGATIVE_ZERO() to every expression where you want to avoid
a negative zero result -- something that is of no benefit unless
your code is actually running on a system that has negative zeros.

Maybe so but it's not really a practical option if you're writing
a C compiler for a machine that doesn't use two's complement in
its hardware.

True. Think of it as advice for hardware designers. (Not that
there's any particular reason they should listen to me.)

Keith Thompson · Mar 17, 2011

Tim Rentsch said:
It's easy if a better test is used. The test against EOF is
guaranteed to work if

UCHAR_MAX > 0 && UCHAR_MAX <= INT_MAX

That's good!

Is the "UCHAR_MAX > 0" test intended to catch forgetting the
#include <limits.h>?

Spiros Bousbouras · Mar 18, 2011

I guess you missed the point of the comment. What I was trying
to say (perhaps too subtly) is that asking a question about what
would happen in the hypothetical "Alternative C" is kind of a
dumb question, because since you made it up only you can answer
the question, or for that matter care about what the answer is.

No , I didn't miss the point. My post you are quoting addresses this
very point by explaining how someone other than myself can answer
questions about this alternative C. Here is a more detailed
explanation: the way they could do that is by taking the current
standard , making the modification I suggested and then they would have
a description of how this alternative C is supposed to work. By
consulting this description they can answer questions about the
language. In fact , at least 1 person in the thread did address my
question which means they cared somewhat about the answer.

Beyond that , contemplating functionality for C different that what the
standard specifies is a fairly common practice both here and on
comp.std.c .For example the thread "All inclusive header files?"
contains discussion of how C would/should behave if it were redesigned
from scratch. So I don't understand why you find my own contemplation
so problematic.

Spiros Bousbouras · Mar 18, 2011

The relation "same representation and alignment requirements"
is reflexive, symmetric, and transitive. Probably you are
confusing it with the 'compatible' relation.

Not at all as
<[email protected]>
http://groups.google.com/group/comp.lang.c/msg/12f4d3ff0e739fdf?dmode=source
clearly demonstrates. But if it is indeed symmetric and transitive
then it follows that using a function argument of unsigned char where
the function prototype says char should be ok. But the compatibility
rules say it's not ok and that's why I believe that the standard is
misleading on this point.

Spiros Bousbouras · Mar 18, 2011

It can't unless INT_MAX < UCHAR_MAX, in which case it's
obvious that it can because of how unsigned-to-signed
conversions work.

But 6.2.6.2 p3 uses the word "only" and does not list reading from a
file as a way to generate a negative zero.

Tim Rentsch · Mar 18, 2011

Spiros Bousbouras said:
Not at all as
<[email protected]>
http://groups.google.com/group/comp.lang.c/msg/12f4d3ff0e739fdf?dmode=source
clearly demonstrates. But if it is indeed symmetric and transitive
then it follows that using a function argument of unsigned char where
the function prototype says char should be ok. But the compatibility
rules say it's not ok and that's why I believe that the standard is
misleading on this point.

You are confused. The relation "has the same representation and
alignment requirements" is indeed reflexive, symmetric, and
transitive. The comment about function argument/parameter types of
unsigned char and char is irrelevant, because compatibility is not
based (just) on whether two types have the same representation and
alignment requirements.

Tim Rentsch · Mar 18, 2011

Spiros Bousbouras said:
No , I didn't miss the point. My post you are quoting addresses this
very point by explaining how someone other than myself can answer
questions about this alternative C. Here is a more detailed
explanation: the way they could do that is by taking the current
standard , making the modification I suggested and then they would have
a description of how this alternative C is supposed to work. By
consulting this description they can answer questions about the
language. In fact , at least 1 person in the thread did address my
question which means they cared somewhat about the answer.

Beyond that , contemplating functionality for C different that what the
standard specifies is a fairly common practice both here and on
comp.std.c .For example the thread "All inclusive header files?"
contains discussion of how C would/should behave if it were redesigned
from scratch. So I don't understand why you find my own contemplation
so problematic.

That's too bad, I expect you'll continue to be mystified.

Access violation reading location	0	Oct 23, 2022
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Casting to unsigned char for isupper() and friends	24	Mar 23, 2007
May fgetc() and friends return 163? Or UCHAR_MAX?	4	Jun 7, 2007
Printing the range s of unsigned char and unsigned int.	20	Sep 12, 2007
Problems reading from files	11	Aug 25, 2007
[half OT] About the not-in-common range of signed and unsigned char	6	Jul 14, 2010
Is char obsolete?	20	Apr 8, 2011

Reading from files and range of char and friends

Peter Nilsson

Eric Sosman

James Kuyper

lawrence.jones

James Kuyper

Spiros Bousbouras

Spiros Bousbouras

Spiros Bousbouras

Tim Rentsch

Tim Rentsch

Tim Rentsch

Tim Rentsch

Tim Rentsch

Keith Thompson

Keith Thompson

Spiros Bousbouras

Spiros Bousbouras

Spiros Bousbouras

Tim Rentsch

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads