What is the easiest way to take a string from a console while including
whitespaces? This program's user will input a line of text that will
consist of multiple tokens, but the number tokens and the length of each
line will vary at each use. Is there a way to get scanf to take a string
that contains whitespace? Or is there another function that would work?
getc() or getchar() is what you're looking for.
If your parser is simple you will write it on your own, else you may
use yacc/bison as a parser generator. Both uses getc() to have direct
access to the stream to avoid dynamic buffers.
Get a char, check it against the ones you will accept in the state you
are currently and when needed change the state.
You can easy accept token after token without a buffer (except the
token itself requires some kind of). You can convert any numerical
value on the fly wheras you sees any error directly.
You can use fscanf() or other library functions only if you sure that
you have a trusted input. But trusted input is only possible when you
accepts only files written by a well tested program.
Ever when you have to read anything from stdin you are sure that your
input is really untrusted and brings anything wrong. But no library
function will be ready to read any possible wrong things.
Even gets() tends to buffer overflow!
It will be more easy to write a little parser that reads char by chare
and acts on it as to handle a buffer that may be always to short to
get anything:
- you awaits a maximum line lengh of 80: the input is 81 - buffer
overflow or extra handling to extend the buffer until you have a
complete line readed in
- you awaits a maximum line length of 132: ther are 32K without '\n'
waiting in the stream.
- You awaits 32K - but the stream has 2GB without '\n'
You can't never handle all possible errors with the higher level
standard library functions.
Using token tables, getc()/getchar() makes it quite easy to extend the
functionality of your program without rewriting anything when you have
to handle a new token or change the token parameters.
ungetc() is designed to put a single char back into the input stream -
but the character you puts back can be any charater - not only the one
you've readed last!
Even as ungetc() is designed to put a single character it is not too
hard to write a wrapper around gtc()/ungetc to stack more than one
back. Yes, you must then use only the wrapper to read - but macros are
your friends anyway.
At least getc() is in no ways slower than (f)gets(), scanf() or
whatever, because all of them use getc() internally.
As you have to tokenise the stream you will win some runtime when you
use getc() directly, because you saves the (formatted) copy from one
puffer to another and scan that new puffer again (multiple times). You
can bring each char ito its real destionation and format on the fly
instead.
O.k., you may have some extra overhead to handle states - but make a
well design before you starts coding and you gets more flexibility for
free - and you saves the overhead the higher leveld library brings
with:
- dynamic buffer only to get a complete line
malloc() is a source of errors you have to handle
- breaking the line into tokens by copying the data partially from the
input buffer into another buffer
temp (buffer) -> destination field or another temp to convert from
string to something else
- checking the results of the library functions to make something
strange to come around