Yes, you'd have to pass a parameter specifying which mode to use,
Which means the OS, when writing a block of data, no longer has to merely
write it, but parse it - look for any embedded characters which would be
translated into greater or lesser sequences, and record that value as
well. I suspect this is going to have an impact on performance - assuming
you can get 'em to do it at all.
I did not say the OS has to have either or both of the sizes
precalculated. If the result impacts performance more than reading
the whole file and counting bytes, taking into account things like
how often file sizes of either type are needed and how often writes
are done, then someone made a poor decision of pessimization.
I consider having the text file size used for reading the file into
memory to be used insufficiently often to make it worth caching it.
Your opinion may differ.
POSIX happens to keep *both* precalculated, since there's no
difference between binary and text mode. Windows keeps the binary-mode
size precalculated. Thus, performance for getting the text-mode
size may suck significantly more than getting the binary-mode size
on Windows.
Alternatively, the function itself could do the job, by opening the file
and reading the file, in the appropriate mode, beginning to end. Can you
say performance hit?
If you don't need the correct answer, you can do it in zero bytes
and zero time. But if the performance hit is so bad, maybe you
shouldn't use a method that needs a precalculated file size. That
approach of reading the file in chunks (this is to read it into
memory, NOT precalculate the size) and realloc()ing when needed
(say, doubling each time, with fallback if you run out of memory)
is starting to look more and more efficient all the time, even with
the copying (if any).
While we're at it, how about revisiting the strategy of reading the
entire file into memory? Is it really a good idea? If the file is
large, you may force parts of this program or other programs to page
out. Slow. Now, depending on what you are doing with the file, reading
it in chunks might be worse. Or better. If you're just dumping the
file in hex, reading chunks at a time lets your program run in much
less memory, and makes it work on files MUCH larger than what you can
fit in memory.
This assumes I will only ever read the file in one mode, or determine size
by reading the file, btyewise, at time of determining the size. The
former isn't reliable, the latter is hellishly inefficient.
Each time you read the file into memory, you read it in *one* mode,
I hope (no switching in the middle of the file). When you want the
file size for that buffer, you read it in that one mode. How you
read it last time or will read it next time is irrelevant.
You made a bad decision, performance-wise, to use a precalculated
file length, especially in text mode if the OS doesn't keep the
value handy and text mode != binary mode. Stick with that decision,
and performance is going to suck.
If you *must* have a precalculated value, have the OS save the one
involving the same mode that the file was written in (and which
kind it is). My guess is that this will cover at least 80% of the
times that file size is needed for the purpose of reading the file
into memory.
Yes, but again - which value?
It returns the one associated with the mode you intend to use to read
the file into memory. You have to make up your mind which mode to use
before you start reading. Use the same decision when you determine
the file size.
Sure. Now, again, *which* file size? Determined *how*?
The size associated with the mode the file was written in, if your
application knows what that is (and no, I don't expect the OS to
keep track of it). It's up to your application to know what mode
to open its own files in. Either it knows from what prompt was
answered (e.g. text editors always do text files; graphics editors
always do binary files), or the file extension, or it asks the user,
or it just handles generic files and can do everything in binary
mode.
(This assumes that the reference "correct" output was generated on
THIS system or was converted to the local file format. If it wasn't,
well, size comparisons may be totally worthless). Since here,
you're using file size as a shortcut for comparing the files for
equality to quickly find a mismatch, you can dispense with the step
entirely and proceed to reading the files byte-by-byte and comparing
them if finding the size is a performance bottleneck.