Ok, I wasn't aware of this.
Mmh, ok. I think I've got the idea ... still I wonder why a
file buffer shouldn't be accessible at random adresses
Mainly because that operation isn't supported by all systems.
In the past, most systems used various file types, and text
files often imposed a fixed length line. So text output padded
with white space, and text input stripped the padding. And of
course, an integral value couldn't be made to mean anything
sensible.
The intent always was that you could seek to an arbitrary point
in a file opened in binary mode, although there may be problems
there as well when multi-byte encodings are involved. The
history here is quite complicated, since it involves trying to
make a concept designed for a simple environment (the Unix file
system and ASCII) work in more complicated environments.
Basically, the idea is that fpos (which is inspired by C's
fpos_t) will save both the OS's view of the position and the
encoding state. But of course, in an extreme case, you can't
know either unless you've actually been there. The C standard
takes the point of view that 1) for a file opened in text mode,
this is actually the case, and that except for seeking to either
end, you can only seek to where you've been, and 2) if the file
was opened in binary mode, you can at least specify an offset in
bytes relative to either end or the current position, and that
you know more about the content that the library does, and will
take whatever precautions necessary for encoding state.
In addition, about the time C was being standardized, the
question of seeking in files longer than what a long could
contain became pertinant---fpos_t could use a struct or an
internal type longer than long to specify the OS position even
if it didn't fit into a long.
And finally, C++ throws in locale dependent code translation,
even in char based streams. (I think C uses locale dependent
code translation to convert bytes to wchar_t when reading and
writing wchar_t. With the added twist that the locale used is a
global variable, which can be changed in ways unknown to you by
any function you happen to call.)
In C++, streampos corresponds more or less to C's fpos_t (but
with some poorly specified twists), and streamoff corresponds to
the relative positionning, which is done with a long in C.
For the most part, in modern OS's, like Windows and Unix (but
not the OS's of mainframes), the only file structure is an array
of bytes. And most modern encodings don't depend on position
dependent state---if you're using UTF-8, for example, each
character may contain more than one bytes, but how many bytes,
or the meaning of the character doesn't depend on some byte
you're read an unspecified distance previously. And of course,
with 64 bit long long's, we're set for the foreseeable future
with regards to file size. In such circumstances, it would make
sense for both streampos and streamoff to be typedef's to long
long---although the standard doesn't allow it in the case of
streampos. You still have a slight problem in Windows, in that
for text files, the system's representation of the position
doesn't correspond exactly to the number of bytes you'd read to
get there (and in some cases, you can successfully position well
beyond the end of the file, and successfully read from that
position---but since doing so is undefined behavior, that's your
tough luck). But globally, it's not too big of a problem if one
is aware of it. Thus, while not required by the standard, from
a quality of implementation point of view, I would be very
disappointed in a library implementation for Windows or Unix in
which streampos was not a typedef to an integral type (long
long, if long is only 32 bits).
Sry, the int was a bad idea. I was just a bit lazy to look up
for the "size_type" which I guess would be the right type.
Maybe
. I'd use long long, and cross my fingers. I've seen
library implementations in which streampos was based on a long
for the positional element, even though long wasn't large enough
to represent the position in a file. And while I'd like to just
say that such implementations are broken, there are historical
reasons which constrain implementors somewhat. In practice, I
don't think that there's any way to handle files longer than
about 2 GB that is both portable and safe.
Yap, but wasn't the OP looking for a way to read from the
files end?
Yes, but by using the relative positionning functions, he
doesn't need to know the length of the file to do so. In a
binary file, he *can* seek to the end - some number of bytes.
And in a text file, under Unix or Windows, he can do it as well,
with the restriction that the actual position might be off
somewhat under Windows (but that's not necessarily a fatal
problem).
Anyway, i've seen many many implementations to retrieve
filelength (in binary mode I must admit) the way I described
above.
I've seen it a lot, too. That doesn't mean that the standard
says it's right.
Just wondering what the correct way would be.
The simple answer is that you can't find the file length,
portably, in standard C++, except by actually reading the file.
At least if by "file length" you mean the number of bytes you
can successfully read.
I totally agree, that in text mode the value retrieved won't
match the file size in bytes.
I've really looked around a lot and almost every hit said
something similiar to
this:
http://www.cplusplus.com/reference/iostream/istream/tellg.html
They state, that the return value is integral ("An integral
value of type streampos with the number of characters between
the beginning of the input sequence and the current position")
and they also state that the position is absolute ("Returns
the absolute position of the get pointer.")
Interesting. The standard doesn't even allow streampos to be
"an integral value". At most, it will convert to a streamoff
(with loss of information in the case of a state dependent
encoding). There is also a requirement that you can convert a
streamoff to an integral type. If streamoff is not an integral
type (which should only happen on very exotic machines),
however, using streampos as an integral value involves two user
defined conversions, which can't happen explicitly.
And of course, the standard says absolutely nothing about the
semantics of this integral value, except that if you use it to
seek in the file (within the restrictions of the standard), it
will get you where you want. Although there is no guarantee
that the mappings aren't arbitrary, I can't see an
implementation doing anything abitrary in the case of binary
files, since something like filebuf::seekoff( n, ios::beg ) is
guaranteed to put you at the place you would have been at if
you'd read n bytes---using an arbitrary mapping would require a
lot of juggling in some of the conversions between integral
types and streampos. (For a text file, of course, that function
call is undefined behavior.)
Also it is said that construction from int has to be supported by
streampos ...
Construction from an int, yes (although the standard fails to
indicate what the semantic should be). Explicit
construction---I don't see anything which would prevent the
constructor from being explicit.
Do they simply ignore the facts or I did I get your answers wrong?
The full requirements of fpos (and streampos is required to be
an instantiation of fpos) and streamoff are given in table 116;
there is certainly no requirement that fpos be directly
convertible to an integral type.
(I'm just wondering. I wonder if it would be worth writing up a
proposal to require 1) that streamoff be a typedef to an
integral type, and 2) that if the file is opened in binary mode,
it's integral value correspond to the number of bytes from the
beginning of the file. I suspect that that would cause no
problems, and would in fact make the standard conform to the
pratical reality for most programmers.)