user input, getchar, and buffer - For C beginners and those withteaching skills...

B

bpascal123

Hi,

As a newbie with C, i find user's input with C somehow difficult to
understand and implement. I have gathered posts from "Can C do it? "
thread in this forum that i find very interesting for learning and
deepen this learning or go further into details.

Pascal
 
B

bpascal123

Morris Keesan said:
On Mon, 17 Aug 2009 21:16:59 -0400, Ben Bacarisse
...
Just seeing an alternative can sometimes be helpful: ...
char verb[20]; ...
scanf("%19s", verb);
This is just asking for trouble, when you modify the declaration of verb.
If instead, one writes
Agreed.

/* Pascal, look away: this is definitely overspeed */
#define VERB_LEN 19
#define Scan_String(len, array) scanf("%" #len "s", array)
...
char verb[VERB_LEN + 1]
...
Scan_String(VERB_LEN, verb)
then the scanf call remains correct if you change the length of
verb.

But consider the context. I wanted to show some tidying up with
minimal language features. Stringification in a macro body is not a
minimal feature!
 
B

bpascal123

Hello,

Just to say, I'm allergic to pointer, i'm just very carefull as i know
they are very powerfull and C is the language that best uses them and
it might if i keep on learning C to understand some other languages
and get the best of what pointers have. I'm saying again, i have
learning how to programm from scratch 3-4 months ago (15-25 hours a
week). I think the best i'm able to deal with the basics or "baby
stuff" as James in the post above calls it, the less trouble i'll have
with pointers....

Back to business, i found this code there (for those who can read
french...and c) : http://www.siteduzero.com/tutoriel-3-35363-realiser-des-saisies-secur...

#include <stdio.h>
#include <string.h>

static void purger(void)
{
int c ;
while ( ( c = getchar() ) != '\n' && c != EOF ) ;

}

static void clean(char *chaine)
{
char *p = strchr(chaine, '\n') ;
if (p)
*p = 0 ;
else
purger() ;

}

int main(void)
{
char chaine[20] ;

printf("\n\nEnter some text : \n") ;
fgets(chaine, sizeof chaine, stdin) ;
clean(chaine) ;
printf("\nThe text entered is : '%s' ", chaine) ;

return 0 ;

}

The code above is about cleaning the buffer from '\n' until EOF ( EOF
= CTRL-Z or D ?)

I understand broadly the buffer is about memory managment and is
closer to computer instructions from the processor than RAM. Anything
can go into the buffer : video streaming, printer jobs.

Fgets unlike scanf has a limit in the number of caracters entered ?
However, like scanf, fgets reads and holds '\n'. I'm not sure of the
last one, if someone could confirm it would help.

while ( ( c = getchar() ) != '\n' && c != EOF ) ;
tells to read any newline left (here it would be located in array
"chaine" in main) after the user has entered more or less 20
characters and press ENTER... So, fgets reads <ENTER> or '\n' or EOF
if greater than 20 characters and keeps it in the buffer? So in the
next entry, if '\n' is in left in the buffer, the code will skip the
user's entry.
Then the instruction getchar() tells to grab anything left in the
buffer but '\n', assigns it to a variable c and in the meantime
somehow creates a new buffer that could hold any characters but '\n'.

char *p = strchr(chaine, '\n') ;
strchr returns the position of '\n' in chaine to *p . As i understand
from reading about strchr in webpages, char *p is an array of
characters ? And in this code it will only hold one value : '\n'.

if (p)
It tells to proceed if p is anything but empty or null. And if it's
not null, it assigns 0 in place of '\n'.

purger()
If *p is null, then it looks to empty the buffer from the presence of
'\n' and to do so purger() is called.

I'm not sure if what i understand is correct. I think this code is a
milestone for manipulating text entries and i intend to use it from
now on unless there is something better.

I feel sleepy and i hope what i have written makes some sense.

Thanks,

Pascal
 
B

bpascal123

Ben Bacarisse
View profile
- Hide quoted text -- Show quoted text -> Back to business, i found
this code there (for those who can read > french...and c) :
http://www.siteduzero.com/tutoriel-3-35363-realiser-des-saisies-secur...
#include <stdio.h> > #include <string.h> > static void purger(void)
{ > int c ; > while ( ( c = getchar() ) != '\n' && c !=
EOF ) ; > } > static void clean(char *chaine) > { > char *p =
strchr(chaine, '\n') ; > if (p) > *p = 0 ; > else
purger() ; > } > int main(void) > { > char chaine
[20] ; > printf("\n\nEnter some text : \n") ; > fgets(chaine,
sizeof chaine, stdin) ; > clean(chaine) ; > printf("\nThe text
entered is : '%s' ", chaine) ; > return 0 ; > } > The code above is
about cleaning the buffer from '\n' until EOF ( EOF > = CTRL-Z or D ?)
When there is no more input, getchar returns EOF (a negative integer)
to tell you that there is not more input. When you are typing at a
program you can signal that there is to be no more input using the
keyboard, but it is often configurable and it varies from system to
system. Ctrl+Z and Ctrl+D are common. > I understand broadly the
buffer is about memory managment and is > closer to computer
instructions from the processor than RAM. Anything > can go into the
buffer : video streaming, printer jobs. That is so general it won't
help you here. This is all about an input buffer. Most systems let
you prepare a line and send it to your program when you hit return.
This lets you correct errors before the program sees it. The code you
show is all about reading to the end of a line (usually one full input
buffer). > Fgets unlike scanf has a limit in the number of caracters
entered ? > However, like scanf, fgets reads and holds '\n'. I'm not
sure of the > last one, if someone could confirm it would help. fgets
does (read and store the newline). scanf is very general and can do
either. > while ( ( c = getchar() ) != '\n' && c != EOF ) ; > tells
to read any newline left (here it would be located in array > "chaine"
in main) after the user has entered more or less 20 > characters and
press ENTER... So, fgets reads <ENTER> or '\n' or EOF > if greater
than 20 characters and keeps it in the buffer? So in the > next entry,
if '\n' is in left in the buffer, the code will skip the > user's
entry. > Then the instruction getchar() tells to grab anything left in
the > buffer but '\n', assigns it to a variable c and in the meantime
somehow creates a new buffer that could hold any characters but >
'\n'. This sounds wrong though it may simply be that you are having
trouble saying it in English. The page you link to explained it all
quote well, I thought, so I am wary of trying to do better in a
language that is not your own. - Hide quoted text -- Show quoted text -
char *p = strchr(chaine, '\n') ; > strchr returns the position of
'\n' in chaine to *p . As i understand > from reading about strchr in
webpages, char *p is an array of > characters ? And in this code it
will only hold one value : '\n'. > if (p) > It tells to proceed if p
is anything but empty or null. And if it's > not null, it assigns 0 in
place of '\n'. > purger() > If *p is null, then it looks to empty the
buffer from the presence of > '\n' and to do so purger() is called. >
I'm not sure if what i understand is correct. I think this code is a >
milestone for manipulating text entries and i intend to use it from >
now on unless there is something better. > I feel sleepy and i hope
what i have written makes some sense. It is easier to explain
backwards rather than in the order you show the code. The user types
some text. You read it using fgets. If it fits (i.e. the input is
not too long) then strchr finds a \n in the array which gets replaced
by the string terminator (0). If the line is too long for the array,
then strchr finds no \n, but you (the user) have typed more characters
than have been read, so the program reads and that are left (i.e. up
to the \n that fgets never saw). This is only one way to read input.
It is useful in some programs and harmful in others. It all depends
on what the program is for. I'd suggest that throwing input away like
this is rarely the best thing to do, but it can help in simple
interactive programs. -- Ben.
(1 user) More options Aug 20, 3:41 am
Newsgroups: comp.lang.c
From: Ben Bacarisse <[email protected]>
Date: Thu, 20 Aug 2009 02:41:10 +0100
Local: Thurs, Aug 20 2009 3:41 am
Subject: Re: Can C do it ?
Reply | Reply to author | Forward | Print | Individual message | Show
original | Report this message | Find messages by this author

<snip>

- Hide quoted text -
- Show quoted text -
Back to business, i found this code there (for those who can read
french...and c) : http://www.siteduzero.com/tutoriel-3-35363-realiser-des-saisies-secur...
#include <stdio.h>
#include <string.h>
static void purger(void)
{
int c ;
while ( ( c = getchar() ) != '\n' && c != EOF ) ;
}
static void clean(char *chaine)
{
char *p = strchr(chaine, '\n') ;
if (p)
*p = 0 ;
else
purger() ;
}
int main(void)
{
char chaine[20] ;
printf("\n\nEnter some text : \n") ;
fgets(chaine, sizeof chaine, stdin) ;
clean(chaine) ;
printf("\nThe text entered is : '%s' ", chaine) ;
return 0 ;
}
The code above is about cleaning the buffer from '\n' until EOF ( EOF
= CTRL-Z or D ?)

When there is no more input, getchar returns EOF (a negative integer)
to tell you that there is not more input. When you are typing at a
program you can signal that there is to be no more input using the
keyboard, but it is often configurable and it varies from system to
system. Ctrl+Z and Ctrl+D are common.
I understand broadly the buffer is about memory managment and is
closer to computer instructions from the processor than RAM. Anything
can go into the buffer : video streaming, printer jobs.

That is so general it won't help you here. This is all about an input
buffer. Most systems let you prepare a line and send it to your
program when you hit return. This lets you correct errors before the
program sees it. The code you show is all about reading to the end of
a line (usually one full input buffer).
Fgets unlike scanf has a limit in the number of caracters entered ?
However, like scanf, fgets reads and holds '\n'. I'm not sure of the
last one, if someone could confirm it would help.

fgets does (read and store the newline). scanf is very general and
can do either.
while ( ( c = getchar() ) != '\n' && c != EOF ) ;
tells to read any newline left (here it would be located in array
"chaine" in main) after the user has entered more or less 20
characters and press ENTER... So, fgets reads <ENTER> or '\n' or EOF
if greater than 20 characters and keeps it in the buffer? So in the
next entry, if '\n' is in left in the buffer, the code will skip the
user's entry.
Then the instruction getchar() tells to grab anything left in the
buffer but '\n', assigns it to a variable c and in the meantime
somehow creates a new buffer that could hold any characters but
'\n'.

This sounds wrong though it may simply be that you are having trouble
saying it in English. The page you link to explained it all quote
well, I thought, so I am wary of trying to do better in a language
that is not your own.

- Hide quoted text -
- Show quoted text -
char *p = strchr(chaine, '\n') ;
strchr returns the position of '\n' in chaine to *p . As i understand
from reading about strchr in webpages, char *p is an array of
characters ? And in this code it will only hold one value : '\n'.
if (p)
It tells to proceed if p is anything but empty or null. And if it's
not null, it assigns 0 in place of '\n'.
purger()
If *p is null, then it looks to empty the buffer from the presence of
'\n' and to do so purger() is called.
I'm not sure if what i understand is correct. I think this code is a
milestone for manipulating text entries and i intend to use it from
now on unless there is something better.
I feel sleepy and i hope what i have written makes some sense.

It is easier to explain backwards rather than in the order you show
the code. The user types some text. You read it using fgets. If it
fits (i.e. the input is not too long) then strchr finds a \n in the
array which gets replaced by the string terminator (0). If the line
is too long for the array, then strchr finds no \n, but you (the user)
have typed more characters than have been read, so the program reads
and that are left (i.e. up to the \n that fgets never saw).

This is only one way to read input. It is useful in some programs and
harmful in others. It all depends on what the program is for. I'd
suggest that throwing input away like this is rarely the best thing to
do, but it can help in simple interactive programs.
 
B

bpascal123

This is only one way to read input. It is useful in some programs and
harmful in others. It all depends on what the program is for. I'd
suggest that throwing input away like this is rarely the best thing to
do, but it can help in simple interactive programs.

Hi Ben,

You say there are other ways to do the same thing. This way is not
easy to deal with with little a basic knowledge of getchar and the
buffer. I found an explanation there :
http://ubuntuforums.org/showthread.php?t=1059917

However, why isn't more simple to do :

....
char Txt[20] ;
printf("\nEnter some string : ") ;
fgets(Txt, 20, stdin) ;
cnt = strlen(Txt) ;
...

The last itiration of cnt will always be '\0' ?
So why use getchar() != '\n' when fgets with strlen don't store '\n'.
I don't know if EOF would be read by strlen since i don't know much
about the behavior of EOF.

Thanks,
Pascal
 
B

bpascal123

user923005
View profile
More options Aug 20, 11:17 pm
Newsgroups: comp.lang.c
From: user923005 <[email protected]>
Date: Thu, 20 Aug 2009 14:17:31 -0700 (PDT)
Local: Thurs, Aug 20 2009 11:17 pm
Subject: Re: Can C do it ?
Reply | Reply to author | Forward | Print | Individual message | Show
original | Report this message | Find messages by this author
On Aug 20, 1:40 pm, "(e-mail address removed)"

- Hide quoted text -
- Show quoted text -
You say there are other ways to do the same thing. This way is not
easy to deal with with little a basic knowledge of getchar and the
buffer. I found an explanation there :http://ubuntuforums.org/showthread.php?t=1059917
However, why isn't more simple to do :
...
char Txt[20] ;
printf("\nEnter some string : ") ;
fgets(Txt, 20, stdin) ;
cnt = strlen(Txt) ;
..
The last itiration of cnt will always be '\0' ?
So why use getchar() != '\n' when fgets with strlen don't store '\n'.
I don't know if EOF would be read by strlen since i don't know much
about the behavior of EOF.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char Txt[20];
size_t cnt;
printf("\nEnter some string : ");
if (fgets(Txt, (int) sizeof Txt, stdin)) {
cnt = strlen(Txt);
/* Do more stuff here... */
/*
It is possible that the read fails,
in which case Txt might contain unterminated garbage.
*/
} else {
return EXIT_FAILURE;
}
return EXIT_SUCCESS;

- Hide quoted text -
- Show quoted text -
}
 
B

bpascal123

Flash Gordon said:
(e-mail address removed) wrote:
Someone else pointed out that you can overflow the buffer here. I don't
know about French, but in Britain there are place names a lot more than
20 characters.

He's not reading place names, he's reading verbs. AFAICT there are no
infinitive forms of verbs that are longer than 20 characters in
French.
It's still not wise, though, partly because some come close and a typo
is easily made, and partly because someone might feed it something
which
isn't a verb at all, intentionally or by accident.

Richard
 
B

bpascal123

It's best to snip sig blocks (even tiny ones).

You say there are other ways to do the same thing. This way is not
easy to deal with with little a basic knowledge of getchar and the
buffer. I found an explanation there :
http://ubuntuforums.org/showthread.php?t=1059917
However, why isn't more simple to do :
...
char Txt[20] ;
printf("\nEnter some string : ") ;
fgets(Txt, 20, stdin) ;

Two things: (a) it is good to get into the habit removing arbitrary
numbers from your code so I'd write fgetc(Txt, sizeof Txt, stdin) so I
don't need to repeat the 20; (b) always check the return from input
functions -- one day it will save your life.

A good programmer feels nervous about seeing:
cnt = strlen(Txt) ;

after an input call that might not have done anything at all.
The last itiration of cnt will always be '\0' ?

Yes, and something like the code above is fine if it suits your
purpose, but the whole point of the other code was to deal with the
situation where the input does not fit. The remaining input
characters are still there in the input buffer and this is, sometimes,
undesirable (sometimes it is exactly what you want).
So why use getchar() != '\n' when fgets with strlen don't store '\n'.

If my program prompts for my name and then my age:

Enter you name: Benjamin Salavador Bacarisse
Enter your age:
Sorry, "acarisse" is not a valid age.

The simple fgets stopped after 19 chars (it used the 20th for the \0)
and when the program tried to get the age (may using scanf("%d",
&age)) the scanf saw the remaining characters.

If, after reading the name, I check if there was a newline (maybe
using strchr(Txt, '\n')) I can take action when the input is obviously
too long (i.e. when there is not newline in Txt). The action I need
might be to throw the data away:

int c;
while ((c = getchar()) != '\n') /* do nothing */;

but this loop won't ever end if there is newline (for example if I
enter "Ben^D^D" on my Linux system). Hence we really need:

while ((c = getchar()) != '\n' && c != EOF) /* do nothing */;
I don't know if EOF would be read by strlen since i don't know much
about the behavior of EOF.

EOF is not a character. It does not get put in the string (in fact it
can't be put into a string[1]). EOF is in int (not a char) designed
so that no valid character can be confused with it[1]. When you type
^D or ^Z or whatever, the program never sees this[2] -- it is an
instruction to the operating system to close the program's input
stream. Since stdin is closed, the next gechar() returns the special
(negative) value EOF.

[1] Except on peculiar hardware that beginners should definitely put
to one side for a while.

[2] Except on very old operating system that are best forgotten.
 
B

bpascal123

- Hide quoted text -
- Show quoted text -
It's best to snip sig blocks (even tiny ones).
You say there are other ways to do the same thing. This way is not
easy to deal with with little a basic knowledge of getchar and the
buffer. I found an explanation there :
http://ubuntuforums.org/showthread.php?t=1059917
However, why isn't more simple to do :
...
char Txt[20] ;
printf("\nEnter some string : ") ;
fgets(Txt, 20, stdin) ;
Two things: (a) it is good to get into the habit removing arbitrary
numbers from your code so I'd write fgetc(Txt, sizeof Txt, stdin) so I
don't need to repeat the 20; (b) always check the return from input
functions -- one day it will save your life.
A good programmer feels nervous about seeing:
after an input call that might not have done anything at all.
Yes, and something like the code above is fine if it suits your
purpose, but the whole point of the other code was to deal with the
situation where the input does not fit. The remaining input
characters are still there in the input buffer and this is, sometimes,
undesirable (sometimes it is exactly what you want).
If my program prompts for my name and then my age:
Enter you name: Benjamin Salavador Bacarisse
Enter your age:
Sorry, "acarisse" is not a valid age.
The simple fgets stopped after 19 chars (it used the 20th for the \0)
and when the program tried to get the age (may using scanf("%d",
&age)) the scanf saw the remaining characters.
If, after reading the name, I check if there was a newline (maybe
using strchr(Txt, '\n')) I can take action when the input is obviously
too long (i.e. when there is not newline in Txt). The action I need
might be to throw the data away:
int c;
while ((c = getchar()) != '\n') /* do nothing */;
but this loop won't ever end if there is newline (for example if I
enter "Ben^D^D" on my Linux system). Hence we really need:
while ((c = getchar()) != '\n' && c != EOF) /* do nothing */;
EOF is not a character. It does not get put in the string (in fact it
can't be put into a string[1]). EOF is in int (not a char) designed
so that no valid character can be confused with it[1]. When you type
^D or ^Z or whatever, the program never sees this[2] -- it is an
instruction to the operating system to close the program's input
stream. Since stdin is closed, the next gechar() returns the special
(negative) value EOF.
[1] Except on peculiar hardware that beginners should definitely put
to one side for a while.
[2] Except on very old operating system that are best forgotten.

Hi,

Your reply tells a lot Ben. Many thanks. I started to write the one
below before i read yours. I already have the answer from you about
most questions. So this below would be additional informations.

About getchar() != '\n' it seems gcc under linux needs this more than
gcc(djgpp) in Windows. Although gcc linux and gcc or djgpp in windows
should be the same.

From http://ubuntuforums.org/showthread.php?t=1059917, someone says :

"while (getchar() != '\n');
is that the test - (getchar() != '\n') - gets executed each time you
go round the loop, so this fetches a character and tests it against
'\n', and goes back round to execute the condition again if the first
test fails (i.e. the character is '\n'). One of the key things here is
the semi-colon at the end of the statement - which effectively creates
an empty loop - so the condition (including the getchar() function)."

End of quotes (he and i are beginners)

So far, i understand '\n' is located somewhere in the buffer among
other characters that don't impact futur input from stdin.

And as it is said above, the getchar looks for characters (that are in
the buffer ?). What action does it do then ? It reads, memorized or i
don't know what, these characters from the buffer and skips '\n'.

So at the end, we can say getchar() != '\n' CREATES and fills a new
buffer from reading the current one with while. That new buffer
doesn't include '\n'. I'm not just of this assertion.

Not over, i don't think i'm right from what i have just written. It's
all i can understand. But next it gets worse :

int c ;
while ( ( c = getchar() ) != '\n' && c != EOF ) ;

getchar is supposed to work with characters ? Why c is declared as an
int.?

As Ben has said above (in the previous post), EOF is a number (-1 I
think) . So is it why we need an integer variable (here c) ? (Only to
deal with EOF)

I went through http://c-faq.com/~scs/cclass/notes/sx6c.html 6.2
section deals well with getchar but more with EOF than '\n'.

Variable c gets a char value from getchar. Does it then for each
character it can get fills a new buffer with anything but '\n' and
affects that very current character to variable c.
Why a variable is needed here ? Just to secure the not equal
expression ?

6.2 section from the site above says :
Quote :
"The simple example we've been discussing illustrates the tradeoffs
well. We have four things to do:

1. call getchar,
2. assign its return value to a variable,
3. test the return value against EOF, and
4. process the character (in this case, print it out again).

We can't eliminate any of these steps. We have to assign getchar's
value to a variable (we can't just use it directly) because we have to
do two different things with it (test, and print). Therefore,
compressing the assignment and test into the same line is the only
good way of avoiding two distinct calls to getchar. You may not agree
that the compressed idiom is better for being more compact or easier
to read, but the fact that there is now only one call to getchar is a
real virtue.

Don't think that you'll have to write compressed lines like

while((c = getchar()) != EOF)

right away, or in order to be an ``expert C programmer.'' But, for
better or worse, most experienced C programmers do like to use these
idioms (whether they're justified or not), so you'll need to be able
to at least recognize and understand them when you're reading other
peoples' code."

End of quote

It says "we have to do two different things with it (test, and
print)". I think this assertion is closely related to the code they
run and explain in the same page.

However, with '\n', we just need to test it and not to print it ? So
why is a int variable needed ? The question should be how come int
variable c can be both in getchar() and !=EOF ?

What's the point of testing a int variable against '\n' ? '\n' is a
character, isn't it?

Pascal
 
B

bpascal123

Morris Keesan
View profile
More options Aug 21, 3:11 am
Newsgroups: comp.lang.c
From: "Morris Keesan" <[email protected]>
Date: Fri, 21 Aug 2009 01:11:05 GMT
Local: Fri, Aug 21 2009 3:11 am
Subject: Re: Can C do it ?
Reply | Reply to author | Forward | Print | Individual message | Show
original | Report this message | Find messages by this author
On Thu, 20 Aug 2009 20:44:19 -0400, (e-mail address removed)

[lots of stuff asking about getchar(), and testing for EOF and '\n']

int c;
while (((c = getchar()) != EOF) && (c != '\n')) {}

(equivalent to what you wrote, but slightly repunctuated for my
preferences: extra parentheses for clarity, and {} to make it
more obvious that it's an empty while loop).

Yes, the reason c is declared as int is so that we can test it
against EOF, which is guaranteed to be a value different than
anything which can be represented by a char. (Usually implemented
as -1, but that value is not required by any standard).

The reason we do this instead of simply

while (getchar() != '\n') {}

is that it's possible that your standard input stream might end
with no newline character.
 
B

bpascal123

[...]
About getchar() != '\n' it seems gcc under linux needs this more than
gcc(djgpp) in Windows. Although gcc linux and gcc or djgpp in windows
should be the same.

Yes, they should be the same. The external representation of text
files differs between Linux and Windows, but the end-of-line sequence
should be converted to '\n' on either system. I don't understand what
you mean when you say that one system "needs this more" than the
other.

[...]
And as it is said above, the getchar looks for characters (that are in
the buffer ?). What action does it do then ? It reads, memorized or i
don't know what, these characters from the buffer and skips '\n'.
So at the end, we can say getchar() != '\n' CREATES and fills a new
buffer from reading the current one with while. That new buffer
doesn't include '\n'. I'm not just of this assertion.

There is buffering going on, but it's not something you need to worry
about. getchar() reads the next character from the specified input
stream and returns its value (or returns the value of EOF if there are
no more characters to read). '\n' is just another character as far as
getchar() is concerned.
Not over, i don't think i'm right from what i have just written. It's
all i can understand. But next it gets worse :
int c ;
while ( ( c = getchar() ) != '\n' && c != EOF ) ;
getchar is supposed to work with characters ? Why c is declared as an
int.?

Because getchar() returns an int result.

Suppose you're designing a function that reads the next character from
an input stream and gives you its value. The obvious thing to do is
to return a char value, but that's not enough. You also need to be
able to indicate that there are no more characters to be read. So
it's returning two piece of information: the next character (if any)
and an indication of whether there was a next character.

There are several ways this could be done. It could return a
structure containing a char and some sort of flag. It could take a
pointer argument, and store the flag value via that pointer. It could
store the flag in a global (ick!).

The method chosen by the creators of getchar() was something called
in-band signalling: returning a single result that's big enough to
store *either* a character value or a flag value.

If there is another character to be read, getchar() reads it, treats
it as an unsigned char value (so it's guaranteed to be non-negative),
converts that value to int, and returns it. If there isn't (either
because you've reached the end of the file or because there was an
error), getchar returns the value EOF (typically -1, but the only
guarantee is that it's negative). Assuming that INT_MAX >= UCHAR_MAX,
this single result gives you all the information you need.
As Ben has said above (in the previous post), EOF is a number (-1 I
think) . So is it why we need an integer variable (here c) ? (Only to
deal with EOF)

Yes, only to deal with EOF. If it weren't for the need to store that
extra information, getchar() could return a char.

But note that char is also an integer type; it's just a relatively
narrow one, typically used to hold character codes. C represents
characters as numbers.

[...]
What's the point of testing a int variable against '\n' ? '\n' is a
character, isn't it?

'\n' is a character. It's also an integer. On most systems you're
likely to encounter, it's the ASCII linefeed character), so '\n'==10.
If getchar() reads a newline from its input stream, it returns the
value '\n'.

--
Keith Thompson (The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/
~kst>
Nokia
"We must do something. This is something. Therefore, we must do
this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
B

bpascal123

James Kuyper
View profile
(1 user) More options Aug 21, 1:32 pm
Newsgroups: comp.lang.c
From: James Kuyper <[email protected]>
Date: Fri, 21 Aug 2009 11:32:28 GMT
Local: Fri, Aug 21 2009 1:32 pm
Subject: Re: Can C do it ?
Reply | Reply to author | Forward | Print | Individual message | Show
original | Report this message | Find messages by this author
Morris Keesan wrote:

....
Yes, the reason c is declared as int is so that we can test it
against EOF, which is guaranteed to be a value different than
anything which can be represented by a char.

The standard requires that all members of the basic character set are
represented by positive values, but allows char to be signed, and also
allows for an extended character set which may contain characters
whose
values could be negative.
The standard does not require that EOF be distinct from all char
values,
For a fully conforming implementation which chooses to have INT_MAX <
UCHAR_MAX, if you use getc() to read a byte from the input stream
whose
value is (unsigned char)EOF, it must return EOF.
 
B

bpascal123

Consider this program:

#include <stdio.h>
int main(void) {
int c;
while ((c = getchar()) != EOF)
putchar(c);
return 0;

}

There is nothing special about getchar(). It is exactly getc(stdin)
just
as putchar(c) is putc(c, stdout). Both input and output are line
buffered and there are two separate buffers. Let's consider the input
buffer.

When we start the program without arguments, the input buffer is empty
and the call to getchar() blocks, waiting for data from the underlying
buffered i/o system.

As you now type 'Hello There' on your keyboard, getchar() is still
blocked and sees nothing. You can use some editing features to change
'There' to 'Sweetheart' for example. getchar() is still blocking..
until
you press Enter. This puts '\n' as the last character of the buffer
and
un-blocks the buffer. Now all characters in the buffer, including the
'\n' are sent, one at a time, to getchar().

The buffer is effectively emptied in this process. There is no
Stamdard
way to examine the buffer itself.

When the buffer is empty Unix-like systems interpret ^D as EOF to quit
the program. Windows interprets ^Z the same way.
 
B

bpascal123

Keith Thompson
View profile
empty Unix-like systems interpret ^D as EOF to quit > the program.
Windows interprets ^Z the same way. I think it's clearer to say that
they interpret ^D or ^Z as an end-of-file indicator. The word EOF is
certainly derived from the words End Of File, but EOF has a very
specific meaning in C, a macro defined in <stdio.h>, and its best
(especially when communicating with newbies) to use the term EOF only
to refer to that macro or to its value. When the user types ^D or ^Z,
it's interpreted by (some layer of) the OS as an end-of-file
indication. If a C program is reading from the correspoding input
stream by repeatedly calling getchar(), after the getchar() has
processed the last character that preceded the ^D or ^Z, the *end-of-
file indicator* for the stream is set. When getchar() is called on a
stream whose end-of-file indicator is set, it returns the value of
EOF. Note that neither ^D (ASCII value 4), nor ^Z (ASCII value 26),
nor EOF (a negative int value, typically -1) is (necessarily) ever
stored in a file. ^D and ^Z are used to signal to the OS that no more
data from the keyboard is to be written to the file. EOF is returned
by getchar() to indicate that there's no more data to be read from the
standard input stream. They're both a form of in-band signalling.
(Just to confuse matters, a ^Z character physically stored in an MS-
DOS or Windows text file *can* be used to indicate the logical end of
the file -- but it's more common for the end of a file to be defined
just by the lack of any more data, as indicated by the file system's
stored value for the size of the file in bytes.) -- Keith Thompson
(The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/~kst> Nokia "We
must do something. This is something. Therefore, we must do
this." -- Antony Jay and Jonathan Lynn, "Yes Minister"
More options Aug 22, 5:26 am
Newsgroups: comp.lang.c
From: Keith Thompson <[email protected]>
Date: Fri, 21 Aug 2009 20:26:50 -0700
Local: Sat, Aug 22 2009 5:26 am
Subject: Re: Can C do it ?
Reply | Reply to author | Forward | Print | Individual message | Show
original | Report this message | Find messages by this author

[...]
When the buffer is empty Unix-like systems interpret ^D as EOF to quit
the program. Windows interprets ^Z the same way.

I think it's clearer to say that they interpret ^D or ^Z as an
end-of-file indicator. The word EOF is certainly derived from the
words End Of File, but EOF has a very specific meaning in C, a macro
defined in <stdio.h>, and its best (especially when communicating
with newbies) to use the term EOF only to refer to that macro or
to its value.

When the user types ^D or ^Z, it's interpreted by (some layer of)
the OS as an end-of-file indication. If a C program is reading
from the correspoding input stream by repeatedly calling getchar(),
after the getchar() has processed the last character that preceded
the ^D or ^Z, the *end-of-file indicator* for the stream is set.
When getchar() is called on a stream whose end-of-file indicator
is set, it returns the value of EOF.

Note that neither ^D (ASCII value 4), nor ^Z (ASCII value 26),
nor EOF (a negative int value, typically -1) is (necessarily)
ever stored in a file. ^D and ^Z are used to signal to the OS
that no more data from the keyboard is to be written to the file.
EOF is returned by getchar() to indicate that there's no more data
to be read from the standard input stream. They're both a form of
in-band signalling.

(Just to confuse matters, a ^Z character physically stored in an
MS-DOS or Windows text file *can* be used to indicate the logical
end of the file -- but it's more common for the end of a file to
be defined just by the lack of any more data, as indicated by
the file system's stored value for the size of the file in bytes.)

--
Keith Thompson (The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/
~kst>
Nokia
"We must do something. This is something. Therefore, we must do
this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
P

pascal360

Gordon Burditt
View profile
(1 user) More options Aug 22, 5:32 am
Newsgroups: comp.lang.c
From: (e-mail address removed) (Gordon Burditt)
Date: Fri, 21 Aug 2009 22:32:40 -0500
Local: Sat, Aug 22 2009 5:32 am
Subject: Re: Can C do it ?
Reply | Reply to author | Forward | Print | Individual message | Show
original | Report this message | Find messages by this author
Under windows, i get : the same line over and over saying something
with Null ... and a strange number, the same over and over as if it's
a result from the loop in the code.

Does that error appear when you RUN the code or COMPILE the code?
Spell the error messages exactly as they appear and post them.
Loops in the code don't cause looping error messages during
compilation.
(At least not in non-seriously-broken compilers.)
Under windows, i tried to add this line of code several times in the
code where the system expects an entry : while ( getchar() != '\n' )
{}

If you tried and failed to add this line of code, please explain
what error message your editor and/or compiler produced when you
failed. DId you run out of disk space for the source code?
I think if someone can tell me why these lines of code don't compile
in my Windows system (i'm using djgpp), it will be a great step for
understanding C.
Now again about getchar. I'd like to understand more about the part
where the user enters data with getchar() handling it like in the code
above. The user enters a character or a string of characters and hits
enter and (in linux) the string or characters entered are displayed.

stdin is not always connected to a terminal device (often, a terminal
device is the console keyboard, but it can be a serial terminal or
a virtual terminal over a network). Even Windows allows redirection
of stdin from a file. Terminal devices are not always read via
stdin. Some of the behavior you are seeing comes from a terminal
device driver. Some comes from the C library stdio code.

There are many potential buffers involved. If stdin is connected
to a virtual terminal connected to a physical keyboard over a
network, there are many buffers on the two end machines and possibly
a few buffers in each of possibly dozens of computers, VPN servers,
routers, wireless access points, or switches between them.
When the user enters a characters, the stdin places it into the buffer
and so on for each character entered until <Enter> is pressed down.
Let say i entered "bcd<Enter>". It goes into the buffer like
'b''c''d''\n' ?

The Linux terminal driver does line canonicalization (in the normal
mode it's used - to change mode you need OS-specific C code). A
line of input is collected, and nothing is passed on until the
entire line is finished (the user presses a key usually labelled
<ENTER>). Line editing is handled here: if you enter <backspace>
the line canonicalization process removes the previous character.
There is a character to cancel the line and start over. So, if you
type: b Q <backspace> c d <ENTER> after you press 'd', no data has
been sent on, and after you press <ENTER>, b c d \n is sent on.
Line canonicalization needs some kind of buffer to store a whole
line until it's finished.

I am not sure where line canonicalization is done in Windows.

Line canonicalization is done on terminal input, not files.
Backspaces
in files will be read as backspaces.

The C library is responsible for translating line endings (\r\n in
Windows) in files or terminal input to the C standard \n, and
translates the other way on the way out. This happens whether you
are reading a terminal or a file on stdin. The C library, for
efficiency reasons, will generally grab a bunch of input (a disk
block or a whole line released from line canonicalization) and
dispense it in single-character chunks (if getchar() or getc() is
used), or smaller pieces (if fread() is used for small records).
Then it reads first 'b' and what does it do with it ?

stdio reads a bunch of input (typically a whole line or file block)
after <ENTER> is pressed. getchar() takes the first available
character ('b') and returns it. The next getchar() returns 'c'.
The next one returns 'd'. The next one returns a newline (\n).
For the next one the stdio buffer is now empty, so it waits reading
another bunch of input.
Does It just leave it in the buffer as it is since it's not '\n' ...
and right after putchar() prints it on the screen, it does the same
for 'c' until getchar() == '\n' or getchar() = '\n' (i don't which one
is correct getchar() == or getchar() = ).

"Until getchar() = 'c'" is wrong in C code. = is an assignment
operator, and you can't assign to a function return value like that.
If you want to step outside of the C code to talk ABOUT the code,
and use it as a short cut for saying "Until getchar() returns a
value of 'c'", most people but the hard-core pedants will understand
you. I'd still prefer to use "Until getchar() == 'c'".
Does Getchar() read and take '\n' from its location in the buffer ?

It's getchar(), not Getchar(). C function names are case-sensitive.
Where does '\n' go then ?

It is returned by getchar().
Getchar could be seen as "search and destroy" in the buffer (stdin
stream) for any value equal to '\n' otherwise it just leave it as it
is ?

No. Line canonicalization accumulates an entire line, then passes
it on, before any of it gets to the stdio buffer.

getchar() takes a character from the stdio buffer, updates the
bookkeeping to indicate the character has been used, and returns
it. If there are no characters in the stdio buffer, it will get
some, waiting if necessary.
Is there a way to access and take a picture of the buffer ?

Not in Standard C. There isn't a "THE" buffer, there are several.
Some of them are on other machines. It may be possible to write a
system-specific routine for the stdio buffer, which grunges around
inside a FILE object for stdin and determines what part of the
buffer is still in use. If you look at the contents of a FILE
struct and the code for getchar() and the functions it calls, it
shouldn't be too difficult. Chances are this will not be portable
between Windows and Linux.
Let say
with hexa values as descriptions ?

Does a printf() format of "%2.2x" suggest anything?
 
P

pascal360

Joe Wright
View profile
(1 user) More options Aug 22, 7:03 am
Newsgroups: comp.lang.c
From: Joe Wright <[email protected]>
Date: Sat, 22 Aug 2009 01:03:47 -0400
Local: Sat, Aug 22 2009 7:03 am
Subject: Re: Can C do it ?
Reply | Reply to author | Forward | Print | Individual message | Show
original | Report this message | Find messages by this author
I think it's clearer to say that they interpret ^D or ^Z as an
end-of-file indicator. The word EOF is certainly derived from the
words End Of File, but EOF has a very specific meaning in C, a macro
defined in <stdio.h>, and its best (especially when communicating
with newbies) to use the term EOF only to refer to that macro or
to its value.

Noted. I'll spell out End Of File in future.
When the user types ^D or ^Z, it's interpreted by (some layer of)
the OS as an end-of-file indication. If a C program is reading
from the correspoding input stream by repeatedly calling getchar(),
after the getchar() has processed the last character that preceded
the ^D or ^Z, the *end-of-file indicator* for the stream is set.
When getchar() is called on a stream whose end-of-file indicator
is set, it returns the value of EOF.

The end-of-file indicator is set differently for a file or for the
keyboard. And yes, the I/O system knows which. Sending a line to
getchar() ending with '\n' does not set the end-of-file (eof)
indicator.
The eof gets set only when you attempt to read beyond the last byte of
a
file or when, from the keyboard, ^D or ^Z follows '\n'.
Note that neither ^D (ASCII value 4), nor ^Z (ASCII value 26),
nor EOF (a negative int value, typically -1) is (necessarily)
ever stored in a file. ^D and ^Z are used to signal to the OS
that no more data from the keyboard is to be written to the file.
EOF is returned by getchar() to indicate that there's no more data
to be read from the standard input stream. They're both a form of
in-band signalling.

The ^D or ^Z keypresses immediately following '\n', followed by '\n'
or
Enter is the keyboard version of end-of-file. Otherwise 4 and 26 are
just charters with that value.
(Just to confuse matters, a ^Z character physically stored in an
MS-DOS or Windows text file *can* be used to indicate the logical
end of the file -- but it's more common for the end of a file to
be defined just by the lack of any more data, as indicated by
the file system's stored value for the size of the file in bytes.)

The ^Z written as a character to signify the end of a text file is an
artifact of CP/M carried over into early MSDOS. CP/M wrote files in
128-byte chunks. The filesystem couldn't tell you whether a file was 3
bytes or 123 bytes. You needed to read it and stop at the 1A.

Relatively modern systems retain this legacy. Many Microsoft text
files
still end with 1A for no other obvious reason.
 
P

pascal360

Joe Wright
View profile
(1 user) More options Aug 22, 7:03 am
Newsgroups: comp.lang.c
From: Joe Wright <[email protected]>
Date: Sat, 22 Aug 2009 01:03:47 -0400
Local: Sat, Aug 22 2009 7:03 am
Subject: Re: Can C do it ?
Reply | Reply to author | Forward | Print | Individual message | Show
original | Report this message | Find messages by this author
I think it's clearer to say that they interpret ^D or ^Z as an
end-of-file indicator. The word EOF is certainly derived from the
words End Of File, but EOF has a very specific meaning in C, a macro
defined in <stdio.h>, and its best (especially when communicating
with newbies) to use the term EOF only to refer to that macro or
to its value.

Noted. I'll spell out End Of File in future.
When the user types ^D or ^Z, it's interpreted by (some layer of)
the OS as an end-of-file indication. If a C program is reading
from the correspoding input stream by repeatedly calling getchar(),
after the getchar() has processed the last character that preceded
the ^D or ^Z, the *end-of-file indicator* for the stream is set.
When getchar() is called on a stream whose end-of-file indicator
is set, it returns the value of EOF.

The end-of-file indicator is set differently for a file or for the
keyboard. And yes, the I/O system knows which. Sending a line to
getchar() ending with '\n' does not set the end-of-file (eof)
indicator.
The eof gets set only when you attempt to read beyond the last byte of
a
file or when, from the keyboard, ^D or ^Z follows '\n'.
Note that neither ^D (ASCII value 4), nor ^Z (ASCII value 26),
nor EOF (a negative int value, typically -1) is (necessarily)
ever stored in a file. ^D and ^Z are used to signal to the OS
that no more data from the keyboard is to be written to the file.
EOF is returned by getchar() to indicate that there's no more data
to be read from the standard input stream. They're both a form of
in-band signalling.

The ^D or ^Z keypresses immediately following '\n', followed by '\n'
or
Enter is the keyboard version of end-of-file. Otherwise 4 and 26 are
just charters with that value.
(Just to confuse matters, a ^Z character physically stored in an
MS-DOS or Windows text file *can* be used to indicate the logical
end of the file -- but it's more common for the end of a file to
be defined just by the lack of any more data, as indicated by
the file system's stored value for the size of the file in bytes.)

The ^Z written as a character to signify the end of a text file is an
artifact of CP/M carried over into early MSDOS. CP/M wrote files in
128-byte chunks. The filesystem couldn't tell you whether a file was 3
bytes or 123 bytes. You needed to read it and stop at the 1A.

Relatively modern systems retain this legacy. Many Microsoft text
files
still end with 1A for no other obvious reason.
 
P

pascal360

- Hide quoted text -
- Show quoted text -
Joe Wright said:
Keith Thompson wrote: [...]
When the user types ^D or ^Z, it's interpreted by (some layer of)
the OS as an end-of-file indication. If a C program is reading
from the correspoding input stream by repeatedly calling getchar(),
after the getchar() has processed the last character that preceded
the ^D or ^Z, the *end-of-file indicator* for the stream is set.
When getchar() is called on a stream whose end-of-file indicator
is set, it returns the value of EOF.
The end-of-file indicator is set differently for a file or for the
keyboard. And yes, the I/O system knows which. Sending a line to
getchar() ending with '\n' does not set the end-of-file (eof)
indicator. The eof gets set only when you attempt to read beyond the
last byte of a file or when, from the keyboard, ^D or ^Z follows '\n'.

A line ending with '\n' doesn't signal end-of-file for a disk file
either.

The input model, as far as getchar() is concerned, is pretty much
the same either for a disk file or for input from a keyboard.
Both are modeled as input text streams, and both are composed of
a sequence of lines, each terminated by '\n'. The end-of-file
condition is triggered differently: for a disk file, typically as
defined by the filesystem's idea of how many bytes the file contains,
and for keyboard input, typically by the user entering some special
key combination. But the behavior as seen by a C program calling
getchar() is very similar. (The C implementation, including the OS,
goes to considerable effort to make them look similar.)
The ^D or ^Z keypresses immediately following '\n', followed by '\n'
or Enter is the keyboard version of end-of-file. Otherwise 4 and 26
are just charters with that value.

s/charters/characters/

nNote also that, under Unix, typing ^D twice in the middle of a line
also signals an end-of-file condition; getchar() can then return
EOF without having previously returned '\n'. And ^V^D causes
getchar() to return '\004'. (Both ^V and ^D can be reconfigured
to other values.) I'm less familiar with how Windows does this.
The ^Z written as a character to signify the end of a text file is an
artifact of CP/M carried over into early MSDOS. CP/M wrote files in
128-byte chunks. The filesystem couldn't tell you whether a file was 3
bytes or 123 bytes. You needed to read it and stop at the 1A.
Relatively modern systems retain this legacy. Many Microsoft text
files still end with 1A for no other obvious reason.

And, IIRC, a 1A byte in the middle of a Windows text file is treated
as an end-of-file marker (but only if you read it in text mode;
in binary mode it's just another byte).

--
Keith Thompson (The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/
~kst>
Nokia
"We must do something. This is something. Therefore, we must do
this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
P

pascal360

Keith Thompson said:
Note also that, under Unix, typing ^D twice in the middle of a line
also signals an end-of-file condition; getchar() can then return
EOF without having previously returned '\n'. And ^V^D causes
getchar() to return '\004'. (Both ^V and ^D can be reconfigured
to other values.) I'm less familiar with how Windows does this.

Erk. ^D^D /may/ signal to the tty that it is to report on an e-o-f
condition to the program attached to it. On my Unix consoles here,
it certainly doesn't:

FreeBSD 5.3 zsh - mid-line ^D tab completes, ^D^D backs out
SunOS 5.8 csh - mid-line ^D tab completes, ^D^D duplicates.
SunOS 5.9 bash - mid-line ^D does nothing perpetually
Linux bash - mid-line ^D does nothing perpetually

So I can't reproduce it on a console. (It may behave differently
via a telnet or SSH session, perhaps, as there are two ends to
consider.)

Phil
--
 
P

pascal360

Richard Tobin
View profile
More options Aug 22, 12:05 pm
Newsgroups: comp.lang.c
From: (e-mail address removed) (Richard Tobin)
Date: 22 Aug 2009 10:05:49 GMT
Local: Sat, Aug 22 2009 12:05 pm
Subject: Re: Can C do it ?
Reply | Reply to author | Forward | Print | Individual message | Show
original | Report this message | Find messages by this author
Phil Carmody said:
Erk. ^D^D /may/ signal to the tty that it is to report on an e-o-f
condition to the program attached to it. On my Unix consoles here,
it certainly doesn't:
FreeBSD 5.3 zsh - mid-line ^D tab completes, ^D^D backs out
SunOS 5.8 csh - mid-line ^D tab completes, ^D^D duplicates.
SunOS 5.9 bash - mid-line ^D does nothing perpetually
Linux bash - mid-line ^D does nothing perpetually

All those shells are probably putting the terminal in raw mode (or at
least a not-fully-cooked mode) and interpreting the characters
themselves. Try it inside "cat" on each system. It should
echo the line after the first one, and exit after the second.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top