Standard C++ file size???

P

Peter Olcott

Is there any standard C++ way to determine the size of a
file before it is read?
 
G

Gennaro Prota

Victor said:
No. The "standard C++ way" is to open the file for reading, seek to the
end of the file and get the position. If you need the size of the file
on disk (and you have the name of the file) without "touching" is in any
way, use the existing platform (OS) mechanisms to get the "file stats"
(statistics). RTFM on programming your OS.

Note that "file size" is a less trivial notion than it might naively
appear: is it the number of bytes allocated on disk? The number of
characters you can read from the file in text mode? The number of bytes
you can read in binary mode? And what about symbolic links?

As Victor said, your platform is likely to expose a suitable function
yielding the number which corresponds to one particular definition of
"size" for the elements it can be applied to. POSIX systems must have
stat --note that this, modulo platform-specific extensions, doesn't know
what "size" is for some file types--; Win32 has GetFileAttributesEx and
GetFileAttributes, etc. Variants for "large file support" (e.g. stat64)
are also common.

Of course, Boost.Filesystem may provide what you need just out of the
box.
 
P

Peter Olcott

Victor Bazarov said:
No. The "standard C++ way" is to open the file for
reading, seek to the end of the file and get the position.

My best guess is that this is exactly what I need. I want to
read in an ASCII text file into a single contiguous block of
memory.

It would seem that I could do this using the method you
propose, and use a std::vector<unsigned char> for the memory
block, resized to position + 1. I would also guess that this
same method may also work for any possible type of data. Of
course I am assuming that the data is being read in binary
mode, in each case.
 
M

Maxim Yegorushkin

My best guess is that this is exactly what I need. I want to
read in an ASCII text file into a single contiguous block of
memory.

It would seem that I could do this using the method you
propose, and use a std::vector<unsigned char> for the memory
block, resized to position + 1. I would also guess that this
same method may also work for any possible type of data. Of
course I am assuming that the data is being read in binary
mode, in each case.

In this case you don't need to know the size. It could be as simple
as:

#include <fstream>
#include <iterator>
#include <vector>

int main()
{
std::ifstream file("text.file");
std::vector<char> file_in_memory(
(std::istream_iterator<char>(file))
, (std::istream_iterator<char>())
);
// the file has been read into file_in_memory
}

However, if performance is paramount, or you need to know the exact
file errors, or the file is too big to fit into memory, you may like
to use your platform's native functions (like POSIX open(), fstat()
and mmap()).
 
J

James Kanze

No. The "standard C++ way" is to open the file for reading,
seek to the end of the file and get the position.

That's a frequently used method, but it certainly isn't standard
C++. There's no guarantee that the position is convertable to
an integral type, and there's no guarantee that the integral
value means anything if it is.

In practice, this will probably work under Unix, and with binary
(but not text) files under Windows. Elsewhere, who knows?
If you need the size of the file on disk (and you have the
name of the file) without "touching" is in any way, use the
existing platform (OS) mechanisms to get the "file stats"
(statistics). RTFM on programming your OS.

Supposing, of course, that the system has some sort of request
for determining what you mean by file size. The most obvious
meaning is the number of bytes you will read before encountering
EOF. And as far as I know, Unix is the only system which has a
request which will return this. Another reasonable meaning is
the number of bytes the file occupies on the disk, but I don't
know of any system which has a request for this. (Unix
certainly doesn't.)
 
G

Gennaro Prota

James Kanze wrote:
[file size]
Another reasonable meaning is
the number of bytes the file occupies on the disk, but I don't
know of any system which has a request for this. (Unix
certainly doesn't.)

I have never tried it, but I think a few math (and path manipulation),
using GetDiskFreeSpaceEx and GetDiskFreeSpaceA should do it for (recent)
Windows. There might be gotchas I'm not seeing offhand, though.

Hopefully as off-topic as occasionally tolerable,
 
P

PeteOlcott

That's a frequently used method, but it certainly isn't standard
C++.  There's no guarantee that the position is convertable to
an integral type, and there's no guarantee that the integral
value means anything if it is.

In practice, this will probably work under Unix, and with binary
(but not text) files under Windows.  Elsewhere, who knows?

Why would it not work for Text files under Windows?
(I am only looking for the size that can be block read into memory)
 
J

James Kanze

stat(), lstat(), fstat() will determine the number of blocks
used.

So they do. (I didn't remember it from when I learned stat.
But that was some time ago.) They also return the block size,
so with a little bit of multiplication... (Of course, this
doesn't include the space actually taken up by the inode:).
Or in the directory entry. As Gennaro pointed out, the
definition of size is a bit vague to begin with, and I'm sure
that with a little bit of effort, I can come up with one that no
system supports.)
 
J

James Kanze

Why would it not work for Text files under Windows? (I am
only looking for the size that can be block read into memory)

Because it doesn't. Try it:

#include <iostream>
#include <fstream>
#include <vector>

void
readAll(
char const* filename )
{
std::ifstream f( filename ) ;
if ( ! f ) {
throw "cannot open" ;
}
f.seekg( 0, std::ios::end ) ;
if ( ! f ) {
throw "seek error" ;
}
long long size = f.tellg() ;
std::cout << filename << ": size = " << size << std::endl ;
if ( size != 0 ) {
f.clear() ;
f.seekg( 0, std::ios::beg ) ;
if ( ! f ) {
throw "rewind failed" ;
}
std::vector< char > v( size ) ;
f.read( &v[ 0 ], size ) ;
if ( ! f ) {
throw "read failed" ;
}
}
}

int
main( int argc, char** argv )
{
for ( int i = 1 ; i != argc ; ++ i ) {
try {
readAll( argv[ i ] ) ;
} catch ( char const* error ) {
std::cout << argv[ i ] << ": " << error << std::endl ;
}
}
return 0 ;
}

Compile and try it on some text files. On a variant with some
extra comments, reading the source itself, I get:
readall.cc: size = 1677
under Solaris (g++ or Sun CC), but
readall.cc: size = 1733
readall.cc: read failed
under Windows (compiled with VC++).

If I open the file in binary mode, or use system level requests,
of course, I can make it work.
 
G

Gennaro Prota

James said:
Because it doesn't. Try it:

#include <iostream>
#include <fstream>
#include <vector>

void
readAll(
char const* filename )
{
std::ifstream f( filename ) ;
if ( ! f ) {
throw "cannot open" ;
}
f.seekg( 0, std::ios::end ) ;
if ( ! f ) {
throw "seek error" ;
}
long long size = f.tellg() ;

I think Victor meant that everything stopped here. Yes, the size so
obtained will happily count some garbage as well, and it's not likely
that read() will work with it, but at least that's the number you should
see in the Windows Explorer. In many cases that's all that is needed to
avoid a lot of user complaints :)

PS: of course, too, the match with Explorer properties and everything I
say above is all a big dance of "likely", "perhaps" and "should be";
nothing, as you mentioned, is really guaranteed.
 
P

Peter Olcott

Victor Bazarov said:
There is a difference between the number of bytes in the
file (physically on the disk) and the number of bytes you
get when you read the file due to the translation
happening for the sequence of CR-LF,

I am talking about reading a Text file in binary mode so
there is no translation. I am making a computer language
compiler so my lexical analyzer will treat the text as
binary data.
 
J

James Kanze

"Victor Bazarov" <[email protected]> wrote in message

[...[
I am talking about reading a Text file in binary mode so
there is no translation.

You can't read a text file in binary mode. If you open a file
in binary mode, it is a binary file; if you open it in text
mode, it is a text file.

Outside of C/C++, some operating systems don't make a
distinction (Unix and Windows, for example); a file is a text
file or a binary file only in virtue of how you open it (and
only in C or C++). In other systems (probably most), if the
file was written as text, you can't open it as binary, and vice
versa.
I am making a computer language compiler so my lexical
analyzer will treat the text as binary data.

Hmmm. The most logical thing would be for a compiler to open
the files as text. (On some systems, the editors save the files
as text files, and you can't open them in binary.)
 
J

James Kanze

I think Victor meant that everything stopped here. Yes, the
size so obtained will happily count some garbage as well, and
it's not likely that read() will work with it, but at least
that's the number you should see in the Windows Explorer.

Which means?
In many cases that's all that is needed to avoid a lot of user
complaints :)

If the requirements specification says to display the value
shown by Windows Explorer, fine. If the goal is to allocate a
buffer so you can read it in one go, it doesn't work. If the
goal is to know exactly how much space the file takes on the
disk, it doesn't work. As you said yourself, size is a rather
vague concept when it comes to files. Unless you're determining
the size so you can display it, in a way that is compatible with
Windows Explorer, then I don't see this working.

More importantly, it's very implementation defined; on some
implementations, it might not even compile. As long as you're
being implementation defined, you might as well use the platform
specific functions and be done with it. Not that they'll
necessarily give you anything more useful, but they'll almost
certainly give you a useless answer a lot faster, and they'll
probably define more or less what their answer really
corresponds to.
 
M

Matthias Buelow

Peter said:
I am talking about reading a Text file in binary mode so
there is no translation. I am making a computer language
compiler so my lexical analyzer will treat the text as
binary data.

Just out of curiosity, why do you need to know the file size for that?
Does your language need a lookahead of more than 1 character?
 
G

Gennaro Prota

James Kanze wrote:
[seeking to the end and getting the position]
Which means?

I don't know. It might even vary from one Windows incarnation to
another. The attempt didn't get through, but...

....this was really meant to be humorous.
If the requirements specification says to display the value
shown by Windows Explorer, fine. If the goal is to allocate a
buffer so you can read it in one go, it doesn't work. If the
goal is to know exactly how much space the file takes on the
disk, it doesn't work. As you said yourself, size is a rather
vague concept when it comes to files. Unless you're determining
the size so you can display it, in a way that is compatible with
Windows Explorer, then I don't see this working.

Sure, I completely agree. "File" itself is probably hard to define in
general, too (for one thing: do you see it from the perspective of the
filesystem or from the perspective of the user and its contents? And
what interpretation of the contents?).

Considering the original question, what about this summary:

Q.: Is there any standard C++ way to determine the size of a file
without "reading" it?

A.: Not a strictly conforming one: the concepts of "file size" and
"file read" themselves, in fact, have no universal meaning; you'll
have to resort to an implementation-defined mechanism, if any,
such as stat() on POSIX platforms, GetFileSize()/GetFileSizeEx()
on Win32. The system documentation, or conformity to further
standards such as POSIX, may/should clarify what meaning of "size"
and/or "read" each of those mechanisms correspond to. This may not
apply to all of the supported file types.

Maybe this should be a FAQ (or two :).
 
G

gpderetta

James Kanze wrote:
Considering the original question, what about this summary:

Q.: Is there any standard C++ way to determine the size of a file
without "reading" it?

A.: Not a strictly conforming one: the concepts of "file size" and
"file read" themselves, in fact, have no universal meaning; you'll
have to resort to an implementation-defined mechanism, if any,
such as stat() on POSIX platforms, GetFileSize()/GetFileSizeEx()
on Win32. The system documentation, or conformity to further
standards such as POSIX, may/should clarify what meaning of "size"
and/or "read" each of those mechanisms correspond to. This may not
apply to all of the supported file types.

Maybe this should be a FAQ (or two :).

BTW, I do not think that anybody mentioned in this thread that in most
systems (i.e. any system that allows concurrent access to files), the
data returned by any kind of get file size API might be stale the
instant after it has been returned.

You can't really rely on it, except to treat it as some kind of hint.
Unless of course the OS gives you some way to acquire exclusive access
to that file before the get file size request.
 
J

James Kanze

BTW, I do not think that anybody mentioned in this thread that in most
systems (i.e. any system that allows concurrent access to files), the
data returned by any kind of get file size API might be stale the
instant after it has been returned.
You can't really rely on it, except to treat it as some kind of hint.
Unless of course the OS gives you some way to acquire exclusive access
to that file before the get file size request.

Yes and no. C++ certainly doesn't give you any guarantees, or
any way to get any. And even at the system level, you're not
sure of gettting any. On the other hand, you often have
practical guarantees at a higher levelthat you can more or less
count on: most programs don't handle the case where the file
contents changes gracefully, and most programs don't have to.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,778
Messages
2,569,605
Members
45,238
Latest member
Top CryptoPodcasts

Latest Threads

Top