Why getchar() doesn't quit if EOF isn't the first char

L

lovecreatesbea...

Thank you for your time.


#include <stdio.h>

int main(void)
{
int c;

while ((c = getchar()) != EOF){
putchar(c);
fflush(stdout);
}
return 0;
}


/* [a console interact session]

aaa^Zbbb [INPUT surrounding ^Z followed by enter]
[a single enter here]
aaa [output]

^Z [here, ^Z followed by an enter]

*/
 
M

Mark Bluemel

(e-mail address removed) wrote:

[His subject line was "Why getchar() doesn't quit if EOF isn't the first
char"]

It's a good habit to put your question in the body of the message. Some
newsreaders, apparently, may not show the header and body together.

The C language doesn't, as far as I can see, define the behaviour of
interactive input streams in any detail. It's therefore down to the host
environment to define under what circumstances an interactive stream
indicates end-of-file.

The behaviour documented for POSIX, according to one of the texts I
have to hand, is that the EOF character (^D by default) only designates
end of file if it starts a line of input. It looks like Windows follows
the same convention.
 
B

Ben Pfaff

Subject: Why getchar() doesn't quit if EOF isn't the first char

The answer to this question is actually Unix-specific. The best
answer I have seen is in _The Unix Programming Environment_ by
Kernighan and Pike. I'd encourage you to obtain a copy, because
it is a good book. I see that amazon.com has used copies for
under $10.
 
K

Keith Thompson

Ben Pfaff said:
The answer to this question is actually Unix-specific. The best
answer I have seen is in _The Unix Programming Environment_ by
Kernighan and Pike. I'd encourage you to obtain a copy, because
it is a good book. I see that amazon.com has used copies for
under $10.

Since his sample session shows ^Z, not ^D, it's probably not Unix.
 
K

Keith Thompson

(e-mail address removed) wrote:
[ Subject: Why getchar() doesn't quit if EOF isn't the first char ]
#include <stdio.h>

int main(void)
{
int c;

while ((c = getchar()) != EOF){
putchar(c);
fflush(stdout);
}
return 0;
}


/* [a console interact session]

aaa^Zbbb [INPUT surrounding ^Z followed by enter]
[a single enter here]
aaa [output]

^Z [here, ^Z followed by an enter]

*/

EOF isn't a character. It's a value returned by getchar() to indicate an
end-of-file condition; that value is distinct from any character value.

Apparently when you enter control-Z in the middle of a line, it doesn't trigger
and end-of-file condition. The manner in which this condition will be triggered
depends on your operating system and your C implementation.
 
C

CBFalconer

Mark said:
.... snip ...

The behaviour documented for POSIX, according to one of the texts
I have to hand, is that the EOF character (^D by default) only
designates end of file if it starts a line of input. It looks
like Windows follows the same convention.

There is no such thing as "an EOF char". EOF, as returned by such
routines as getc(), is an out of band value (which is why getc()
returns an int, not a char). The only thing you know about it is
that it is negative. This is also why getc returns the int version
of the unsigned char input.

On the other hand various systems have ways of persuading a
terminal to signal EOF. On Unix it is often the CTRL-d char. On
Windoze if is often the CTRL-z char. On other systems, read the
docs. These signals are often only effective immediately after a
'\n', or end-of-line, condition.
 
B

Barry Schwarz

Thank you for your time.


#include <stdio.h>

int main(void)
{
int c;

while ((c = getchar()) != EOF){
putchar(c);
fflush(stdout);
}
return 0;
}


/* [a console interact session]

aaa^Zbbb [INPUT surrounding ^Z followed by enter]
[a single enter here]
aaa [output]

^Z [here, ^Z followed by an enter]

*/

Firstly, EOF is not a character at all. It is a special value
returned by some input functions to indicate the end-of-file condition
has been detected on the input stream.

Secondly, ^Z does not, by itself, cause this condition. It is a
convention used by some systems to allow a stream to simulate this
condition. But the convention may have restrictions. It is possible
your system has the restriction that ^Z will only serve this purpose
if it immediately follows an ENTER.

If you really want to know what is happening, you should print c as an
integer (I prefer hex) rather than a character. This way, you will
see exactly what characters getchar obtains. You might want to change
the while to a do-while so you see the EOF value also. (Currently you
may not be able to tell the difference between entering a ^Z and
entering a ^C.)


Remove del for email
 
M

Mark Bluemel

CBFalconer said:
Mark Bluemel wrote:
... snip ...

There is no such thing as "an EOF char".

Of course there is in Operating Systems terms, which is what I meant
here - that should have been clear from my first sentence.

I'm well aware that there is no EOF character in C.
 
M

Mark Bluemel

CBFalconer said:
Mark Bluemel wrote:
... snip ...

There is no such thing as "an EOF char".

Of course there is in Operating Systems terms, which is what I meant
here - that should have been clear from my first sentence.

I'm well aware that there is no EOF character in C.
 
M

Mark McIntyre

Mark said:
Of course there is in Operating Systems terms,

EOF is a condition set on the stream by the OS when no more data is
available to read.
which is what I meant
here - that should have been clear from my first sentence.

I would assume that you're referring to the end-of-file marker that a
few operating systems use for legacy reasons in text files. This is not
actually an EOF character, and for reference,

Unices don't use one. When you PRESS Crtl-D it sets the flag. If you had
a char with value 0x04 in the stream, it has no effect and is treated as
an ordinary character.

Windows/DOS does use one, but its Ctrl-Z. This is a hangover from CP/M I
think. I'm too lazy to fire up my CPM emulator to find out.

$ cat test.c
#include <stdio.h>
int main()
{
int c;
FILE *s = fopen("test.txt","r");
c=fgetc(s);
while (c!=EOF)
{
printf("%d %c\n", c,c);
c = fgetc(s);
}
return 0;
}
$ hexdump test.txt
ddddd
64 64 04 64 64 64 10 00

$ ./a.out
100 d
100 d
4
100 d
100 d
100 d
10
 
K

Keith Thompson

Mark said:
EOF is a condition set on the stream by the OS when no more data is
available to read.
[...]

No, EOF is neither a character nor a condition.

The condition set on a stream when no more data is available to read
(actually, set *after* an attempt to read more data has failed) is
called the "end-of-file indicator". The feof() function can be used to
query this indicator.

EOF is a macro defined in <stdio.h>. It expands to an integer constant
expression of type int with a negative value. This value matches the
value returned by several functions to indicate either an end-of-file
condition or an error condition.

EOF stands for End Of File, but EOF and end-of-file are two quite
different things.
 
M

Mark Bluemel

Mark said:
EOF is a condition set on the stream by the OS when no more data is
available to read.


I would assume that you're referring to the end-of-file marker that a
few operating systems use for legacy reasons in text files.

Nope - I'm refering to the special character recognised by an operating
system to indicate the end of interactive input.
This is not
actually an EOF character, and for reference,

Unices don't use one. When you PRESS Crtl-D it sets the flag. If you had
a char with value 0x04 in the stream, it has no effect and is treated as
an ordinary character.

"POSIX.1 defines 11 special characters that are handled specially on
input. SVR4 adds another 6 special characters and 4.3+BSD adds 7."
(W Richard Stevens "Advanced Programming in the Unix Environment")

$ man stty
STTY(1)
User Commands
STTY(1)

NAME
stty - change and print terminal line settings

[snip]
Special characters:
* dsusp CHAR
CHAR will send a terminal stop signal once input flushed

eof CHAR
CHAR will send an end of file (terminate the input)
[snip]
 
R

Richard Tobin

Of course there is in Operating Systems terms,
[/QUOTE]
EOF is a condition set on the stream by the OS when no more data is
available to read.

In some operating systems. In others, it appears as a character
marking the end of the file.
I would assume that you're referring to the end-of-file marker that a
few operating systems use for legacy reasons in text files. This is not
actually an EOF character, and for reference,

How do you determine whether it's "actually" an EOF character? It's
just terminology, there's no fact of the matter.

It's pointless to argue "EOF doesn't mean so-and-so". The term is used
outside C, and it is natural to refer to non-C uses of it when talking
about EOF in C.

-- Richard
 
S

santosh

Mark McIntyre said:
EOF is a condition set on the stream by the OS when no more data is
available to read.

Is it "EOF"? It's probably safer to say "end-of-file", since "EOF" is an
identifier defined only by Standard C.

<snip>
 
D

Dik T. Winter

> eof CHAR
> CHAR will send an end of file (terminate the input)

Except when it is not the first character on an input line, or not
immediately preceded by the same character. In the latter case also
the preceding occurrence will be removed.
 
M

Mark McIntyre

Mark said:
stty - change and print terminal line settings

[snip]
Special characters:
* dsusp CHAR
CHAR will send a terminal stop signal once input flushed

eof CHAR
CHAR will send an end of file (terminate the input)


*shrug*.

I gave a worked example showing that Crtl-D, 0x04 is NOT an EOF, and
will not terminate reading from a file.

I don't dispute you can press Ctrl-D and send an EOF signal to your
application, but encountering character 0x04 in a stream is not the same
as sending that stream a signal to say "end of data reached".

I have a feeling we had this dull discussion about 12 months ago. I was
right then too.... :)
 
M

Mark McIntyre

Richard said:
How do you determine whether it's "actually" an EOF character?

Because the ASCII character set has no character called EOF. It has an
EOT and ESC, which occupy the positions commonly associated with the
control sequences many OSen use to send an end-of-data signal from the
keyboard.
EBCDIC also has no EOF character.
It's just terminology, there's no fact of the matter.

On that basis, nothing is fact.
It's pointless to argue "EOF doesn't mean so-and-so". The term is used
outside C, and it is natural to refer to non-C uses of it when talking
about EOF in C.

Agreed.
 
D

Dik T. Winter

>
> Because the ASCII character set has no character called EOF. It has an
> EOT and ESC, which occupy the positions commonly associated with the
> control sequences many OSen use to send an end-of-data signal from the
> keyboard.
> EBCDIC also has no EOF character.

^D is indeed EOT (End Of Transmission), but ^Z is not ESC but SUB.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top