Questions about K&R (Kernighan and Ritchi)


S

sandeep

Hello friends ~

I am learning C from the K&R book. I have questions about Section 8.5
("an implementation of Fopen and Getc"). Although this section is UNIX(r)
specific I think all my questions are really about standard C... so the
ISO taliban can relax... :D

1> Look at this Macro
#define feof(p) ((p)->flag & _EOF) != 0)

My question is: feof is only specified to return 0 or not 0. There is no
requirement for it to only return 0 or 1. So why the unnecessary "!= 0"
to force it to be 0 or 1? This seems very inefficient, after all feof is
likely to be called many times.

2> Here is another macro
#define getc(p) (--(p)->cnt>=0 ?(unsigned char)*(p)->ptr++ :_fillbuf(p))
Doesn't that _fillbuf(p) ought to be _fillbuf((p)), one bracket for the
function call and one bracket to stop expansion of sideeffects in p?

3> In a comment on that getc Macro, K&R say: "The characters are returned
unsigned, which ensures that all characters will be positive". I don't
really understand the point of this, I usually use char not unsigned char
for characters. And in K&R, all strings are of type char* not unsigned
char*.

Also if sizeof(char) == sizeof(int) then the character (unsigned char)
UCHARMAX will clash with EOF == -1 when it gets promoted to int.

Regards ~
 
Ad

Advertisements

K

Keith Thompson

sandeep said:
I am learning C from the K&R book. I have questions about Section 8.5
("an implementation of Fopen and Getc"). Although this section is UNIX(r)
specific I think all my questions are really about standard C... so the
ISO taliban can relax... :D

I see the smiley, but referring to those of us who prefer to
discuss ISO C as "taliban" is a bit insulting, don't you think?
(And yes, I know the word literally means "students", but I doubt
that that's what you meant.)
1> Look at this Macro
#define feof(p) ((p)->flag & _EOF) != 0)

My question is: feof is only specified to return 0 or not 0. There is no
requirement for it to only return 0 or 1. So why the unnecessary "!= 0"
to force it to be 0 or 1? This seems very inefficient, after all feof is
likely to be called many times.

Yes, the "!= 0" could be omitted, but it's not likely to be a big deal.
Since it's a macro, a compiler is likely to omit the extra calculation
anyway.

And no, feof() isn't likely to be called many times in well written
code. The way to determine whether you've reached the end of an input
stream is by checking the result of the reading function (for example,
getc() returns the value EOF). *After* that's happened, you can call
feof() to determine whether you reached end-of-file or encountered an
error.
2> Here is another macro
#define getc(p) (--(p)->cnt>=0 ?(unsigned char)*(p)->ptr++ :_fillbuf(p))
Doesn't that _fillbuf(p) ought to be _fillbuf((p)), one bracket for the
function call and one bracket to stop expansion of sideeffects in p?

No, extra parentheses aren't needed. As long as the name of the macro
parameter is immediately surrounded by parentheses (or brackets),
there's no problem with operator precedence.

And it's not about "expansion of side effects", it's about operator
precedence, i.e., which operators are associated with which operands.
Any side effects will occur anyway.
3> In a comment on that getc Macro, K&R say: "The characters are returned
unsigned, which ensures that all characters will be positive". I don't
really understand the point of this, I usually use char not unsigned char
for characters. And in K&R, all strings are of type char* not unsigned
char*.

Also if sizeof(char) == sizeof(int) then the character (unsigned char)
UCHARMAX will clash with EOF == -1 when it gets promoted to int.

getc() returns a result of type int, not char. For example, if
UCHAR_MAX is 255, then getc() will return the value 255 if you read a
'\xff' character, and the value -1 (assuming EOF==-1) if you encounter
the end of the stream or an error. They clash only if you store the
result in something smaller than an int. So don't do that.

See section 12 of the comp.lang.c FAQ,
<http://www.c-faq.com/stdio/index.html>, especially the first few
questions.
 
S

Seebs

1> Look at this Macro
#define feof(p) ((p)->flag & _EOF) != 0)

My question is: feof is only specified to return 0 or not 0. There is no
requirement for it to only return 0 or 1. So why the unnecessary "!= 0"
to force it to be 0 or 1? This seems very inefficient, after all feof is
likely to be called many times.

I've seen code like this written for the same reason that some people
write
if (p != NULL)
instead of
if (p)

It's clearer to the user. The compiler may well notice that no one uses
the specific value and just run past it.
2> Here is another macro
#define getc(p) (--(p)->cnt>=0 ?(unsigned char)*(p)->ptr++ :_fillbuf(p))
Doesn't that _fillbuf(p) ought to be _fillbuf((p)), one bracket for the
function call and one bracket to stop expansion of sideeffects in p?

No, because there's no such thing as "expansion of sideffects". Parentheses
are used *only* to control grouping -- they have no effect on side
effects. As such, the () around p are sufficient whether or not they're
also part of the function call.
3> In a comment on that getc Macro, K&R say: "The characters are returned
unsigned, which ensures that all characters will be positive". I don't
really understand the point of this, I usually use char not unsigned char
for characters. And in K&R, all strings are of type char* not unsigned
char*.

char may well be unsigned.

The point of this is that converting everything to unsigned char means
that every char value is necessarily non-negative, guaranteeing that no
value returned which represents a character can compare equal to EOF,
which is negative.
Also if sizeof(char) == sizeof(int) then the character (unsigned char)
UCHARMAX will clash with EOF == -1 when it gets promoted to int.

(Not necessarily, but I see your point.)

I am not aware of an implementation where this can actually happen;
specifically, I'm under the impression that such implementations are likely
to simply only ever yield values in some smaller range from getchar(),
so that EOF can never occur. A typical choice might be to have a 32-bit
char object, but to only store 8 bits at a time in files or retrieve
8 bits at a time when reading files.

-s
 
A

Alan Curry

|Hello friends ~
|
|I am learning C from the K&R book. I have questions about Section 8.5
|("an implementation of Fopen and Getc"). Although this section is UNIX(r)
|specific I think all my questions are really about standard C... so the
|ISO taliban can relax... :D

Was this sample implementation written from scratch for the 2nd edition, or
is it just an updated version of some code that predates the C standard?
That would explain some of the things you're seeing...


|
|1> Look at this Macro
|#define feof(p) ((p)->flag & _EOF) != 0)
|
|My question is: feof is only specified to return 0 or not 0. There is no
|requirement for it to only return 0 or 1. So why the unnecessary "!= 0"
|to force it to be 0 or 1? This seems very inefficient, after all feof is
|likely to be called many times.

If this implementation predates the standard, then what feof was "specified"
to return might have been less clear, so making it return 0 or 1 would have
been the safe thing to do.

|
|3> In a comment on that getc Macro, K&R say: "The characters are returned
|unsigned, which ensures that all characters will be positive". I don't
|really understand the point of this, I usually use char not unsigned char
|for characters. And in K&R, all strings are of type char* not unsigned
|char*.
|
|Also if sizeof(char) == sizeof(int) then the character (unsigned char)
|UCHARMAX will clash with EOF == -1 when it gets promoted to int.

Regardless of whether this implementation predates the standard, I think it's
safe to say that sizeof(char) == sizeof(int) was not even considered a remote
possibility when getc was designed.
 
E

Eric Sosman

sandeep said:
[...]
Also if sizeof(char) == sizeof(int) then the character (unsigned char)
UCHARMAX will clash with EOF == -1 when it gets promoted to int.

getc() returns a result of type int, not char. For example, if
UCHAR_MAX is 255, then getc() will return the value 255 if you read a
'\xff' character, and the value -1 (assuming EOF==-1) if you encounter
the end of the stream or an error. They clash only if you store the
result in something smaller than an int. So don't do that.

I think you've misunderstood the question. On a system
where UCHAR_MAX > INT_MAX, getc() et al. have a problem: It
is possible to read unsigned char values that won't fit in an
int and hence can't be returned properly. What happens later
is of little importance, since the damage has been done within
getc() itself.

On such a system, I think we can deduce (for hosted
implementations)

- Conversion of values in (INT_MAX, UCHAR_MAX] doesn't raise
a signal or do anything untoward, but instead yields some
implementation-defined value. (At least, it does so inside
getc() et al, which need not be written in C.)

- Each unsigned char value converts to a distinct int value;
even the out-of-range conversions preserve information.

- Since there must be as many values in [INT_MIN, -1] as in
the span of out-of-range values, INT_MIN + INT_MAX == -1.
That is, two's complement is mandatory.

To cater to such systems (should one feel it necessary), the
familiar

int ch;
while ((ch = getc(stream)) != EOF) ...

needs to be rewritten as

int ch;
whie ((ch = getc(stream) != EOF
|| !(feof(stream) || ferror(stream))) ...

because getc() must map one valid input character value to
the int value EOF.

Let us now ponder the perils of in-band signalling.
 
K

Keith Thompson

Eric Sosman said:
sandeep said:
[...]
Also if sizeof(char) == sizeof(int) then the character (unsigned char)
UCHARMAX will clash with EOF == -1 when it gets promoted to int.

getc() returns a result of type int, not char. For example, if
UCHAR_MAX is 255, then getc() will return the value 255 if you read a
'\xff' character, and the value -1 (assuming EOF==-1) if you encounter
the end of the stream or an error. They clash only if you store the
result in something smaller than an int. So don't do that.

I think you've misunderstood the question.

I think you're right. I managed to miss the "sizeof(char) ==
sizeof(int)" part of the question.

Well, I answered *some* qusetion, just not the one the OP asked.

[snip]
Let us now ponder the perils of in-band signalling.

And of a language design that encourages it (by, for example, not
providing a decent way for functions to return multiple values).
 
Ad

Advertisements

P

Peter Nilsson

sandeep said:
I am learning C from the K&R book. I have questions about
Section 8.5 ("an implementation of Fopen and Getc").
Although this section is UNIX(r) specific I think all my
questions are really about standard C... so the
ISO taliban can relax... :D

Ahh, the Jacob Navia school of begining by insulting the
very people you're seeking comments from. Sure has worked
well for him, hasn't it... ;)
1> Look at this Macro
#define feof(p) ((p)->flag & _EOF) != 0)

My question is: feof is only specified to return 0 or not 0.
There is no requirement for it to only return 0 or 1. So why
the unnecessary "!= 0" to force it to be 0 or 1? This seems
very inefficient, after all feof is likely to be called many
times.

True, but it's most likely to be called in a conditional. Most
compilers are quite capable of implementing expr != 0 without
actually evaluating the != operator.
2> Here is another macro
#define getc(p) (--(p)->cnt>=0 ?(unsigned char)*(p)
->ptr++ :_fillbuf(p))
Doesn't that _fillbuf(p) ought to be _fillbuf((p)), one
bracket for the function call and one bracket to stop
expansion of sideeffects in p?

What do you mean by expansion of sideeffects?

Note that function call parentheses and commas separating
parameters are syntactical, so there's (generally) no need
to 'protect' function parameters that represent expressions.

If someone wants to pass an argument with a comma operator
they'll have to supply parentheses to avoid a constraint
violation on calling a function macro with too many
arguments. [Although C99 now supports variadic macros.]
3> In a comment on that getc Macro, K&R say: "The
characters are returned unsigned, which ensures that all
characters will be positive". I don't really understand
the point of this, I usually use char not unsigned char
for characters.

Character codes are non-negative, hence getc's return.
Plain char was invented for hysterical reasons.
And in K&R, all strings are of type char* not unsigned
char*.

Plain char is a bain of C. It should have had two 'byte'
types and char should have been a typedef char_t. But it
isn't...
Also if sizeof(char) == sizeof(int)

Then there are all sorts of problems for hosted
implementations. Despite what some members of the
Committee may say, many aspects of the standard
library were not designed with that implementation
in mind.
then the character (unsigned char) UCHARMAX will clash with EOF
== -1 when it gets promoted to int.

The mapping is implementation defined, but yes, there will be
overlap with EOF (which needn't be -1 BTW.) General practice
though is to ignore such systems as hosted environments.
 
B

Ben Bacarisse

Seebs said:
On 2010-04-22, sandeep <[email protected]> wrote:

(Not necessarily, but I see your point.)

I am not aware of an implementation where this can actually happen;
specifically, I'm under the impression that such implementations are likely
to simply only ever yield values in some smaller range from getchar(),
so that EOF can never occur. A typical choice might be to have a 32-bit
char object, but to only store 8 bits at a time in files or retrieve
8 bits at a time when reading files.

That may be reasonable from a practical point of view, but I don't think
it is conforming. In

int i;
fread(&i, sizeof i, 1, fp);

fread's behaviour is defined in terms of fgetc: fgetc is called sizeof
i times. getchar is also (indirectly) defined in terms of fgetc so I
don't think there can be any special dispensation for it.
 
K

Keith Thompson

Ben Bacarisse said:
That may be reasonable from a practical point of view, but I don't think
it is conforming. In

int i;
fread(&i, sizeof i, 1, fp);

fread's behaviour is defined in terms of fgetc: fgetc is called sizeof
i times. getchar is also (indirectly) defined in terms of fgetc so I
don't think there can be any special dispensation for it.

I don't think that by itself makes Seebs's hypothetical implementation
non-conforming.

What does make it non-conforming is that you wouldn't be able to
write a byte with any value in the range 256..UCHAR_MAX to a file
(in binary mode) and then read it back (also in binary mode) and
get the same value.
 
S

Seebs

That may be reasonable from a practical point of view, but I don't think
it is conforming. In

int i;
fread(&i, sizeof i, 1, fp);

fread's behaviour is defined in terms of fgetc: fgetc is called sizeof
i times. getchar is also (indirectly) defined in terms of fgetc so I
don't think there can be any special dispensation for it.

Interesting point. Hadn't thought of that.

That brings us to the other answer, which is the frequent assertion that
the requirement for EOF to be a distinct value means that you can't really
have a fully conforming hosted implementation where sizeof(int) == 1.

-s
 
K

Keith Thompson

Seebs said:
Interesting point. Hadn't thought of that.

That brings us to the other answer, which is the frequent assertion that
the requirement for EOF to be a distinct value means that you can't really
have a fully conforming hosted implementation where sizeof(int) == 1.

I don't think so, but such an implementation would be inconvenient and
would break some code that's widely assumed to be portable.

The problem is that fgetc() would return EOF either if the stream is at
end-of-file, or a read error occurs, *or* the next character happens to
have a value that converts to the value of EOF.

I don't see anything in the standard that says this is illegal. But it
does mean that a program meant to be portable to such a system can't
just check whether fgetc() returns EOF; it would then also have to check
both feof() and ferror(). If it doesn't, encountering that character
(say, '\xFFFFFFFF') in an input file will fool it into thinking it's
reached the end of the file.
 
Ad

Advertisements

B

Ben Bacarisse

Keith Thompson said:
I don't think that by itself makes Seebs's hypothetical implementation
non-conforming.

I can't see how an implementation that does what Seebs suggests can
conform to 7.19.8.1 p2:

"The fread function reads, into the array pointed to by ptr, up to
nmemb elements whose size is specified by size, from the stream
pointed to by stream. For each object, size calls are made to the
fgetc function and the results stored, in the order read, in an array
of unsigned char exactly overlaying the object. The file position
indicator for the stream (if defined) is advanced by the number of
characters successfully read. If an error occurs, the resulting value
of the file position indicator for the stream is indeterminate. If a
partial element is read, its value is indeterminate."

That seem so say that when sizeof(int) == sizeof(char)

fread(&i, sizeof i, 1, fp);

must be equivalent to

((unsigned char *)&i)[0] = fgetc(fp);

I'd have though that someone reading 7.19.8.1 p2 would be able to expect
that this second form is equivalent to the first in a conforming
implementation.
What does make it non-conforming is that you wouldn't be able to
write a byte with any value in the range 256..UCHAR_MAX to a file
(in binary mode) and then read it back (also in binary mode) and
get the same value.

That's a simpler argument!
 
M

Morris Keesan

1> Look at this Macro
#define feof(p) ((p)->flag & _EOF) != 0)

My question is: feof is only specified to return 0 or not 0. There is no
requirement for it to only return 0 or 1. So why the unnecessary "!= 0"
to force it to be 0 or 1? This seems very inefficient, after all feof is
likely to be called many times.

You're right that the "!= 0" is unnecessary, but in practice it's not
particularly inefficient, firstly because feof is almost only ever called
inside a conditional, where its value would be implicitly compared with
0 anyway, and any decent compiler would generate the same code for

"if(expr)" and "if(expr != 0)"

and secondly because when feof is used correctly, it's usually called at
most once per file, only after fread or fgetc or other input function has
returned a failure indication.
 
S

spinoza1111

Hello friends ~

I am learning C from the K&R book. I have questions about Section 8.5
("an implementation of Fopen and Getc"). Although this section is UNIX(r)
specific I think all my questions are really about standard C... so the
ISO taliban can relax... :D

At least the *taliban* of Afghanistan, as they listen to the *imam*
groan *fatwa* in the *madrassah*, are concerned with the important
question. Whereas the ISO *taliban* blaspheme for they make things of
men their God, and show no compassion to others who speak not their
*shibboleth*.
 
K

Keith Thompson

Ben Bacarisse said:
Keith Thompson said:
I don't think that by itself makes Seebs's hypothetical implementation
non-conforming.

I can't see how an implementation that does what Seebs suggests can
conform to 7.19.8.1 p2:

"The fread function reads, into the array pointed to by ptr, up to
nmemb elements whose size is specified by size, from the stream
pointed to by stream. For each object, size calls are made to the
fgetc function and the results stored, in the order read, in an array
of unsigned char exactly overlaying the object. The file position
indicator for the stream (if defined) is advanced by the number of
characters successfully read. If an error occurs, the resulting value
of the file position indicator for the stream is indeterminate. If a
partial element is read, its value is indeterminate."

That seem so say that when sizeof(int) == sizeof(char)

fread(&i, sizeof i, 1, fp);

must be equivalent to

((unsigned char *)&i)[0] = fgetc(fp);

I'd have though that someone reading 7.19.8.1 p2 would be able to expect
that this second form is equivalent to the first in a conforming
implementation.

Sure, they're equivalent. Neither one (in Seebs's hypothetical
implementation) can ever read a value greater than 255. The standard
doesn't say that you can necessarily read such a value from a file --
unless you previously wrote it there, which brings us to:
 
N

Nick Keighley

[...] Although this section is UNIX(r)
specific I think all my questions are really about standard C... so the
ISO taliban can relax... :D

I see the smiley, but referring to those of us who prefer to
discuss ISO C as "taliban" is a bit insulting, don't you think?

I prefer to think of myself as part of the Congregation for the
Doctrine of the Faith

<snip>
 
Ad

Advertisements

B

Ben Bacarisse

Keith Thompson said:
Ben Bacarisse <[email protected]> writes:
That seem so say that when sizeof(int) == sizeof(char)

fread(&i, sizeof i, 1, fp);

must be equivalent to

((unsigned char *)&i)[0] = fgetc(fp);

I'd have though that someone reading 7.19.8.1 p2 would be able to expect
that this second form is equivalent to the first in a conforming
implementation.

Sure, they're equivalent. Neither one (in Seebs's hypothetical
implementation) can ever read a value greater than 255. The standard
doesn't say that you can necessarily read such a value from a file --
unless you previously wrote it there, which brings us to:

Right. I see your point. My example simply shows the limited utility
of such an implementation, not it's actual non-conformance.
 
E

Eric Sosman

That brings us to the other answer, which is the frequent assertion that
the requirement for EOF to be a distinct value means that you can't really
have a fully conforming hosted implementation where sizeof(int) == 1.

EOF is required to be an int and required to be negative,
but where is it *required* to be distinct from "legitimate"
unsigned char values converted to int? 7.4p1 says that the\
argument to a <ctype.h> function must be representable as an
unsigned char "or" equal to EOF, but unless one takes the "or"
as "xor" I can't see a requirement for distinctness.
 
P

Phil Carmody

Keith Thompson said:
I see the smiley, but referring to those of us who prefer to
discuss ISO C as "taliban" is a bit insulting, don't you think?
(And yes, I know the word literally means "students", but I doubt
that that's what you meant.) ....

No, extra parentheses aren't needed. As long as the name of the macro
parameter is immediately surrounded by parentheses (or brackets),
there's no problem with operator precedence.

And it's not about "expansion of side effects", it's about operator
precedence, i.e., which operators are associated with which operands.
Any side effects will occur anyway.

That's not quite the whole story, as the comma operator would
break the attempted _fillbuf call, were it to be able to reach
it, as it would not be interpreted as a comma operator. However,
that's not important because the processing of the macro can
never reach that stage, as it would require something that looks
like an invocation of the getc with 2 parameters, which wouldn't
be recognised as an instantiation of the above.
getc() returns a result of type int, not char. For example, if
UCHAR_MAX is 255, then

then sizeof(char) != sizeof(int), which was the predicate that
we were asked to address.

Phil
 
Ad

Advertisements

E

Eric Sosman

On 4/23/2010 2:49 PM, Phil Carmody wrote:
[...]
That's not quite the whole story, as the comma operator would
break the attempted _fillbuf call, were it to be able to reach
it, as it would not be interpreted as a comma operator. However,
that's not important because the processing of the macro can
never reach that stage, as it would require something that looks
like an invocation of the getc with 2 parameters, which wouldn't
be recognised as an instantiation of the above.

Consider: getc((foo,bar))

Considered, but I don't see your point. What breakage
do you believe would ensue?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top