File size

A

Andrew Clark

Hello all,

I recall several threads over the years about how reading file size
cannot be done consistantly or portably, but I don't remember any good
reasons (not that I haven't read them, I'm sure, but it's more of a
failure to hold them in my brain). Here is an attempt that I was
commissioned to write, and I'd appreciate any comments and/or criticism
(specific or general) before I release it to my customer. Thanks!

Andrew

/* begin filesize.c */
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

int main ( int argc, char *argv [] )
{
int status;

if ( 2 != argc )
{
fprintf ( stderr, "Usage: %s <filename>\n", argv [ 0 ] );
status = EXIT_FAILURE;
}
else
{
FILE *fp;

fp = fopen ( argv [ 1 ], "rb" );
if ( !fp )
{
fprintf ( stderr, "Cannot open file \"%s\" for reading.
[Error %d]\n", argv [ 1 ], errno );
status = EXIT_FAILURE;
}
else
{
char c;
long unsigned size = 0;

while ( fread ( &c, sizeof c, 1, fp ) )
{
size++;
}

fclose ( fp );
printf ( "Size of file (in bytes): %lu\n", size );
#if 0
if ( size > 1 << 10 )
{
/*** 1 KB ***/
printf ( "[%lu KB]\n", size / ( 1 << 10 ) );
}
#endif

status = EXIT_SUCCESS;
}
}

#if 0
printf ( "Returning status code: %d\n", status );
#endif
return status;
}

/* end filesize.c */
 
P

Peter Nilsson

Andrew Clark said:
Hello all,

I recall several threads over the years about how reading file size
cannot be done consistantly or portably, but I don't remember any good
reasons (not that I haven't read them, I'm sure, but it's more of a
failure to hold them in my brain).

You don't have to know facts, you merely have to know where to find them. ;)
Here is an attempt that I was commissioned to write, and I'd appreciate
any comments and/or criticism (specific or general) before I release it to
my customer. Thanks!

If it's commercial code, what's preventing you from using some POSIX or other non-standard
but _much_ more practical method?
/* begin filesize.c */
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

int main ( int argc, char *argv [] )
{
int status;

if ( 2 != argc )
{
fprintf ( stderr, "Usage: %s <filename>\n", argv [ 0 ] );

argv[0] may be "" or even NULL.
status = EXIT_FAILURE;
}
else
{
FILE *fp;

fp = fopen ( argv [ 1 ], "rb" );
if ( !fp )
{
fprintf ( stderr, "Cannot open file \"%s\" for reading.
[Error %d]\n", argv [ 1 ], errno );
status = EXIT_FAILURE;
}
else
{
char c;
long unsigned size = 0;

Files can be larger than unsigned long can represent. Hence the presence of fpos_t. You
should check for overflow of size. [Using two unsigned longs (64+ bits) should be enough
for the next couple of years though.]
while ( fread ( &c, sizeof c, 1, fp ) )

getchar() is likely to be much more efficient. Faster still is reading in larger chunks,
say 1024 bytes at a time.

But why bother reading the file at all? Why not just keep fseek-ing by large amounts?
{
size++;
}

You don't check for read errors. You may be misreporting the size.

Actually, even if you read the file without error, you have no guarantee that the file
will be the same size on subsequent reading.

Note that neither reading nor fseeking will work for streams which can't be rewound.
fclose ( fp );
printf ( "Size of file (in bytes): %lu\n", size );
#if 0
if ( size > 1 << 10 )

1024 is clearer to me than 1 << 10.
{
/*** 1 KB ***/
printf ( "[%lu KB]\n", size / ( 1 << 10 ) );
}
#endif

status = EXIT_SUCCESS;
}
}

#if 0
printf ( "Returning status code: %d\n", status );
#endif
return status;
}

/* end filesize.c */
 
E

Emmanuel Delahaye

Peter Nilsson said:
getchar() is likely to be much more efficient. Faster still is reading
in larger chunks, say 1024 bytes at a time.

You meant fgetc(), of course...
 
M

Malcolm

Peter Nilsson said:
If it's commercial code, what's preventing you from using some
POSIX or other non-standard
but _much_ more practical method?
Maybe the customer wants portable ANSI C, or maybe this has been used as a
selling point.
Files can be larger than unsigned long can represent. Hence the
presence of fpos_t. You should check for overflow of size. [Using
two unsigned longs (64+ bits) should be enough
for the next couple of years though.]
And then we've all the fun of writing an int_to_ascii function, because two
longs can't be passed to printf().
But why bother reading the file at all? Why not just keep fseek-ing > by large amounts?
Unfortunately, if the file is text then fseek() / ftell() may not represent
the size. Here it is binary, so we don't have that problem, but there is
another issue with fseek() and ftell() not necessarily reporting the end of
the file.
That said, fseek()ing the end of the file and calling ftell() is a good
enough method for most practical purposes.
 
P

Peter Nilsson

Malcolm said:
Maybe the customer wants portable ANSI C, ...

Why would a client want ANSI C for _this_ task? What's stopping a programmer from
informing their client of better options?
Files can be larger than unsigned long can represent. Hence the
presence of fpos_t. You should check for overflow of size. [Using
two unsigned longs (64+ bits) should be enough
for the next couple of years though.]

And then we've all the fun of writing an int_to_ascii function, because two
longs can't be passed to printf().

Here's one to get you started...

char *ul2toa(char *s, unsigned long hi, unsigned long lo)
{
char *u, *v = s;
unsigned long q, r, d;

while (hi)
{
r = hi % 10;
hi = hi / 10;

d = (lo >> 16) + r * ((-1ul >> 16) + 1);
r = d % 10;
q = d / 10;

lo = (r << 16) + (lo & 0xFFFF);
r = lo % 10;
lo = lo / 10 + (q << 16);

*v++ = '0' + r;
}

do
{
*v++ = '0' + (lo % 10);
}
while (lo /= 10);

*v = 0;
for (u = s; u < --v; u++)
{ char c = *u; *u = *v; *v = c; }

return s;
}
 
M

Malcolm

Peter Nilsson said:
Why would a client want ANSI C for _this_ task? [ file
size ] What's stopping a programmer from
informing their client of better options?
Let's say that the customer wants a suite of programs that use the stdin /
stdout model. Let's say that some are very processor intensive and they are
constantly investing in the latest hardware. If they have any sense they
will specify that all programs must be written in portable ANSI C, and
recompile the whole lot if new kit arrives.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top