Mike said:
You'd need to define a standard way to express the boundary between the
binary and text portions.
Um, "this byte position here" (i.e. ftell()) is good enough, no need to
overengineer it.
In case of complex encodings like UTF-8, I'd expect (and will probably
create for my case) its behaviour to be like this:
- Backed by a buffer (the usual way, probably byte[])
- readByte() reads from the buffer, handles buffering of new data, etc.
- readChar() reads as much bytes as it needs to reconstitute a
character, in case of UTF-8 it could be one or several - it doesn't
matter. If it encounters an invalid byte (by the expectations set by
used encoding), raise proper exception because it's an encoding error in
the stream.
- Introduce private or protected pushByte() and pushChar() that do the
reverse of readXXX, on the buffer. "Fixup" the fact that one character
can have more bytes by initially making the buffer 4+ bytes longer, but
don't use this extra space when filling the buffer in readByte(). Like
in C, make pushXXX work only for a single byte/character.
- Modify readLine() to use readChar(), reads characters until CR+LF; can
use existing logic that reads one char after CR to see if it's LF and
push it back if it isn't.
- Every other readXXX method uses readByte() as usual.
The intended result: freely mix bytes and characters. In the extreme
(but supported!) case, the stream can have a UTF-8 character (encoded by
one or several bytes) followed by a "raw" byte, followed by a UTF-8
character, etc. The programmer is responsible to know how the stream is
formatted.