Kevin said:
All I know that unget() is much better on my PC, with my compiler than
seekg. Is this likely to be true in general for relatively small (<100)
numbers of bytes? When would I want to use putback(char) instead?
To best answer these questions, lets have a look at the underlying
machinery. IOStreams are built on top of stream buffers (that is,
object of type 'std::basic_streambuf<cT, traits>'). As the name says
this class provides a concept of a buffer although it is possible to
create unbuffered stream buffers (that is, the name is somewhat
misleading). File streams are very likely to use the internal buffer,
except, maybe, when using some special files like a tty or a named
piped. If a buffer is set up for the stream buffer, most operations
are simple pointer operations: check whether the pointers are in the
allowed range and do something with the respective character.
For 'sungetc()', the stream buffer function called by input stream's
'unget()', this basically means to check whether the current read
pointer is at the beginning of the buffer and if it is not to move it
on character back. This operation is very likely to be very fast. If
'sungetc()' is at the beginning of the buffer, it will call
'pbackfail(traits::eof())'.
The operation of 'sputbackc()', as you have correctly guessed the
stream buffer function called by the input stream's 'putback()'
function, is a little bit more complex and slower: it starts by
checking whether the current position is at the beginning of the
buffer and if it is, it checks whether the previous character matches
the one being put back. If either of these fails, 'sputbackc()' calls
'pbackfail(c)' with the character put back character as argument
(after being converted to 'int_type' using 'traits::to_int_type()').
Otherwise the current read position is moved one character back.
For the case that putting back a character does not hit a buffer
boundary this explains that the 'unget()' should be fast. A few
questions obviously remain:
- How many characters can be safely put back? The answer is quite
simple: none. If you are at a buffer boundary, put back can fail
and there is no guarantee in the standard one the number of
available put back positions. I would expect any reasonable
standard library implementation to allow at least one character
being put back but this is really a quality of implementation
issue - and it is unclear what is better quality here: there is
rarely a need for put back (eg. none in the standard library I/O
functions) and providing a put back buffer would incur unnecessary
overhead. Also, it is easly worked around this problem by
providing a filtering stream buffer which allows eg. a specified
number of put back characters.
- What does 'pbackfail()' do? Well, it obviously tries to back up
one position in the stream. In case a wrong character was put back
it can choose to accept these (ie. using 'putback()' you might be
able to put characters into the stream which have not been there).
In case of hitting the beginning of the buffer it might read the
previous page or simply put the character passed to 'pbackfail()'
into the buffer after making room somehow, thereby assuming that
the character was the right one (that is, 'putback()' might be
successful when 'unget()' is not).
- What happens when the END of a buffer is reached? Are characters
retained for put back? When the end of the input buffer is
reached, 'underflow()' is called. This function is supposed to
make new buffer with at least one character available. It can set
up the new buffer in such a way that old characters are retained:
The buffer is set up with the call 'setg(begin, current, end)'.
The first argument is the beginning of the buffer, the second is
the current read position (ie. it points to the character made
available by 'underflow()'), and the third is the end of the
buffer. That is, the range [begin, current) is available for put
back. A library can copy "n" characters from the end of the
previous buffer to the beginning of the new buffer. Unfortunately,
there is no guarantee that "n > 0" for file buffers.
In practical terms, this means, that you cannot rely on the put back
doing anything useful for the standard streams! There are a few paths
how to work around this problem:
- Check the documentation of the standard library you are using: It
might provide better guarantees for file streams. Of course, this
way you become dependent on a particular implementation.
- If the documentation does not tell you anything, you might by able
to look at the implementation. Note, however, that this is a very
dangerous path because the implementation for the next version may
be change.
- The safest approach would be the creation of a simple filtering
stream buffer: if you know that you are simply reading the stream
from beginning to end, except for putting back a maximum of "n"
characters, such a filtering stream buffer is simple to write. If
you mix things with seeking within the stream, things become
somewhat more complex...
- Avoid put back in the first place. What is the point of processing
read characters again? There is no problem with peeking at the
current read position: this always works. ... and for many cases
this is sufficient.
The only explanation I can think of is the file may be buffered somehow;
There is no guarantee that files are buffered (and, in fact, you can
turn off buffering by calling 'setbuf(0, 0)' on the stream buffer)
but I would bet that buffered file streams are the default on all
implementations: unbuffered file reading is just slow.
then ungetting might take you past the beginning of the buffer, whereas
putback'ing will be able to expand the buffer in this case.
This is roughly the deal. Of course, you cannot count on it being
the case...