Determining size of file

G

googler

Is there any C library function that returns the size of a given file?
Otherwise, is there a way in which file size can be determined in a C
program? I need to get this for both Linux and Windows platforms, so a
generic solution is what I am looking for.

Thanks for your help.
 
J

Jordan Abel

Is there any C library function that returns the size of a given file?
Otherwise, is there a way in which file size can be determined in a C
program? I need to get this for both Linux and Windows platforms, so a
generic solution is what I am looking for.

Thanks for your help.

Probably the best chance at getting a universally meaningful result is
eating the entire file with getc() and using ftell. That is likely not
the best solution. On those particular platforms, fseek() to the end
will probably work [there is an argument, based on the rationale, that
it will always give you at least as meaningful an answer as the getc()
loop], at least on a file opened in binary mode. Both also include
library functions that will give you the answer, but they are of course
different on each platform.
 
W

Walter Roberson

Probably the best chance at getting a universally meaningful result is
eating the entire file with getc() and using ftell. That is likely not
the best solution. On those particular platforms, fseek() to the end
will probably work [there is an argument, based on the rationale, that
it will always give you at least as meaningful an answer as the getc()
loop], at least on a file opened in binary mode.

"A binary stream need not meaningfully support fseek calls with
a whence value of SEEK_END".

On Linux, there is no seek to end for /dev/random or /dev/zero
or sockets or named pipes; possibly not for tty's or ptty's either.
Reading to the end would work for sockets and named pipes... though if
you have two-way process communication going on then you might end up
with a logical deadlock... Some Unices support an indefinitely-extensible
mmap() segment, extended by trying to read the data there...
Both also include
library functions that will give you the answer, but they are of course
different on each platform.

If you don't know that what you've been handed is an "ordinary file"
then it can be difficult to define a meaningful "size" for it. But
of course there is nothing in the standard C library that would allow
you to test whether what you've been handed is an "ordinary file" or not.
 
G

Gordon Burditt

Is there any C library function that returns the size of a given file?
Otherwise, is there a way in which file size can be determined in a C
program? I need to get this for both Linux and Windows platforms, so a
generic solution is what I am looking for.

What do you want to use the data for?
Thanks for your help.

*WHICH* "size of a given file"? There are several different definitions,
not all listed here, which are likely to give different answers:

(1) The number of characters you can read with fgetc() from the file
when it is opened in text mode. (On a system with line endings \r\n,
this counts \r\n as one character, but on disk it's two (MS-DOS and
Windows). If the file really contains binary stuff, on some systems
a control-Z is treated as end-of-file and it and the bytes after it
don't count. If you had ideas of reading the whole file into memory
in text mode, this is the size you want.

(2) The number of characters you can read with fgetc() from the file
when it is opened in binary mode. This may count bytes at the
end of the file from the last byte written to the end of a disk block
(e.g. on CP/M, which doesn't track end-of-file to a byte boundary).
If you had ideas of reading the whole file into memory in binary
mode, this is the size you want. It's also likely to be the size
given by "ls -l".

(3) The amount of disk space needed to store the file. This tends to
be the size of the data on disk rounded up to a block boundary.
There are variations on this as to whether to count "indirect"
blocks used to keep track of blocks belonging to the file.
Files with unwritten "holes" can make (3) drastically smaller
than (2) (e.g. the multi-gigabyte file with unwritten holes
that fits on a floppy).

(1) and (2) you can get by opening the file, reading it to the end,
and counting characters. (3) can be obtained by non-standard-C
function stat() on some systems (st_blocks multiplied by the disk
block size). stat() may often give you (2) in st_size also.

Gordon L. Burditt
 
K

Kenneth Brody

googler said:
Is there any C library function that returns the size of a given file?
Otherwise, is there a way in which file size can be determined in a C
program? I need to get this for both Linux and Windows platforms, so a
generic solution is what I am looking for.

While not strictly C, as defined in this group, there are common POSIX
extensions for this. See the stat() function, which is available on
many platforms, including Linux and Windows.

If you have questions regarding stat(), you will need to take them to a
newsgroup for which it's not off-topic. (Perhaps comp.unix.programmer?)

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>
 
J

Jordan Abel

Probably the best chance at getting a universally meaningful result is
eating the entire file with getc() and using ftell. That is likely not
the best solution. On those particular platforms, fseek() to the end
will probably work [there is an argument, based on the rationale, that
it will always give you at least as meaningful an answer as the getc()
loop], at least on a file opened in binary mode.

"A binary stream need not meaningfully support fseek calls with
a whence value of SEEK_END".

On Linux, there is no seek to end for /dev/random or /dev/zero
or sockets or named pipes; possibly not for tty's or ptty's either.

And those don't have sizes, so it all works out. If seek fails, the file
doesn't have a meaningful "size". though in some cases you might want
the number of bytes that can be read [a la "wc -c"]
 
J

Jordan Abel

Probably the best chance at getting a universally meaningful result is
eating the entire file with getc() and using ftell. That is likely not
the best solution. On those particular platforms, fseek() to the end
will probably work [there is an argument, based on the rationale, that
it will always give you at least as meaningful an answer as the getc()
loop], at least on a file opened in binary mode.

"A binary stream need not meaningfully support fseek calls with
a whence value of SEEK_END".

The argument i mentioned [based on the rationale] is that, at least in
the case of the padding bytes, such a seek should seek to the end
including the padding bytes on any reasonable system - for non-ordinary
streams a seek-to-end is no better in text mode than in binary mode, so
this clause isn't the one in effect in that case.

for another example stdin is a text stream, you can't seek to the end of
that on many implementations.
 
G

googler

Jordan said:
Is there any C library function that returns the size of a given file?
Otherwise, is there a way in which file size can be determined in a C
program? I need to get this for both Linux and Windows platforms, so a
generic solution is what I am looking for.

Thanks for your help.

Probably the best chance at getting a universally meaningful result is
eating the entire file with getc() and using ftell. That is likely not
the best solution. On those particular platforms, fseek() to the end
will probably work [there is an argument, based on the rationale, that
it will always give you at least as meaningful an answer as the getc()
loop], at least on a file opened in binary mode. Both also include
library functions that will give you the answer, but they are of course
different on each platform.

Thank you all. I used fseek(fp, 0, SEEK_END) followed by call to
ftell(). Works fine for both platforms and ordinary text files that I'm
dealing with.
 
S

Simon Biber

Gordon said:
*WHICH* "size of a given file"? There are several different definitions,
not all listed here, which are likely to give different answers:

(1) The number of characters you can read with fgetc() from the file
when it is opened in text mode. (On a system with line endings \r\n,
this counts \r\n as one character, but on disk it's two (MS-DOS and
Windows). If the file really contains binary stuff, on some systems
a control-Z is treated as end-of-file and it and the bytes after it
don't count. If you had ideas of reading the whole file into memory
in text mode, this is the size you want.

(2) The number of characters you can read with fgetc() from the file
when it is opened in binary mode. This may count bytes at the
end of the file from the last byte written to the end of a disk block
(e.g. on CP/M, which doesn't track end-of-file to a byte boundary).
If you had ideas of reading the whole file into memory in binary
mode, this is the size you want. It's also likely to be the size
given by "ls -l".

(3) The amount of disk space needed to store the file. This tends to
be the size of the data on disk rounded up to a block boundary.
There are variations on this as to whether to count "indirect"
blocks used to keep track of blocks belonging to the file.
Files with unwritten "holes" can make (3) drastically smaller
than (2) (e.g. the multi-gigabyte file with unwritten holes
that fits on a floppy).

<OT>

This command will produce a file that is 92 KiB by (2), but almost 2 TiB
by (3).
dd seek=4091M count=1 if=/dev/zero of=big

[sbiber@eagle ~]$ du -h big
92K big
[sbiber@eagle ~]$ ls -lh big
-rw-rw-r-- 1 sbiber sbiber 2.0T Dec 2 01:36 big

Very easily fits on a floppy, and more than two thousand GiB. Of course,
the floppy's file system must support sparse files, which rules out
using the usual FAT.

</OT>
 
K

Kenneth Brody

Simon Biber wrote:
[... sparse files ...]
<OT>

This command will produce a file that is 92 KiB by (2), but almost 2 TiB
by (3).
dd seek=4091M count=1 if=/dev/zero of=big

[sbiber@eagle ~]$ du -h big
92K big
[sbiber@eagle ~]$ ls -lh big
-rw-rw-r-- 1 sbiber sbiber 2.0T Dec 2 01:36 big

Very easily fits on a floppy, and more than two thousand GiB. Of course,
the floppy's file system must support sparse files, which rules out
using the usual FAT.

I once thought of copy-protecting a program by taking advantage of
sparse files. Back in the days when 40MB (yes, "MB") was a lot, I
figured that writing data to some scattered bytes within a sparse
file, and verifying those bytes upon startup, would be a simple way
of doing this. After all, there was no way you could actually have
a 2GB file, so there was no simple way of copying it somewhere else.

And, getting back to fseek(SEEK_END), let's not forget about sequential
access devices. (Does the standard have anything to say about such
things?) For example, pipes, ttys, and tape drives aren't known for
their rewind capabilities.

--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>
 
M

Mark McIntyre

Thank you all. I used fseek(fp, 0, SEEK_END) followed by call to
ftell(). Works fine for both platforms and ordinary text files that I'm
dealing with.

Be careful that you do not allow users to tell your programme to read
from a pipe or tcp port such as chargen. How you do that is beyond the
scope of this group.
 
G

googler

Mark said:
Be careful that you do not allow users to tell your programme to read
from a pipe or tcp port such as chargen. How you do that is beyond the
scope of this group.

I still have some doubt. It will be great if anybody can help.

Earlier (before adding code for finding file size), my code was
something like
fp = fopen(file_name, "a");
fprintf(fp, ...);

After adding two lines for getting file size, it changes to
fp = fopen(file_name, "a");
fseek(fp, 0, SEEK_END);
n = ftell(fp); /* n is used somewhere else */
fprintf(fp, ...);

I just want to know if this has the same effect as the earlier code
with regards to appending to the end of the file.

Thanks in advance.
 
M

Mark McIntyre

After adding two lines for getting file size, it changes to
fp = fopen(file_name, "a");

This is opening the file in text mode. To reiterate:

For a text stream, its file position indicator contains unspecified
information, usable by the fseek function for returning the file
position indicator for the stream to its position at the time of the
ftell call; the difference between two such return values is not
necessarily a meaningful measure of the number of characters written
or read.

In other words, once opened in text mode, the result of ftell isn't
necessarily the file size. You can't rely on it.
fseek(fp, 0, SEEK_END);

Don't need this - opening a file for append necesssarily opens it with
the file pointer at the end.
I just want to know if this has the same effect as the earlier code
with regards to appending to the end of the file.

Yes.
 
J

Joe Wright

Mark said:
This is opening the file in text mode. To reiterate:

For a text stream, its file position indicator contains unspecified
information, usable by the fseek function for returning the file
position indicator for the stream to its position at the time of the
ftell call; the difference between two such return values is not
necessarily a meaningful measure of the number of characters written
or read.

In other words, once opened in text mode, the result of ftell isn't
necessarily the file size. You can't rely on it.




Don't need this - opening a file for append necesssarily opens it with
the file pointer at the end.




Yes.

No real mystery here is there? Assuming a DOS or Windows file system,
text lines are ended with two characters, CR and LF. C implementations
on these file systems determine to be Unixy and ignore the CR when
reading the text file but then putting it back when writing the file.

It is only this which accounts for the difference between file size and
how many bytes you can read before EOF.

On Unixy systems, there is no difference and file size and bytes read
are the same.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top