Dr Dobbs snippet

kid joe · Aug 5, 2009

Hi all,

What do people think of this snippet on Dr Dobbs today?

<URL:http://www.ddj.com/cpp/219100141>

I see these problems:
1) no error checking of fseek, ftell or fclose - could be problems if the
file is "special" in some way or if I/O errors occur
2) main should return int not void
3) using argv[1] instead of argv[0] would make more sense, but then need
to check argc>0

Also this program makes me wonder why ftell returns long and not size_t.

Cheers,
Joe

Keith Thompson · Aug 5, 2009

kid joe said:
Hi all,

What do people think of this snippet on Dr Dobbs today?

<URL:http://www.ddj.com/cpp/219100141>

Here's the code:
==================================================
/*
** FLENGTH.C - a simple function using all ANSI-standard functions
** to determine the size of a file.
**
** Public domain by Bob Jarvis.
*/

#include <stdio.h>
#include <io.h>

long flength(char *fname)
{
FILE *fptr;
long length = 0L;

fptr = fopen(fname, "rb");
if(fptr != NULL)
{
fseek(fptr, 0L, SEEK_END);
length = ftell(fptr);
fclose(fptr);
}

return length;
}

#ifdef TEST

void main(int argc, char *argv[])
{
printf("Length of %s = %ld\n", argv[0], flength(argv[0]));
}

#endif /* TEST */
==================================================

I see these problems:
1) no error checking of fseek, ftell or fclose - could be problems if the
file is "special" in some way or if I/O errors occur

If fseek fails but ftell succeeds, ftell will probably return 0.

If ftell fails, it returns -1.

If fopen fails, flength returns 0.

This all might be reasonable if it were documented, but I think the
author just assumed nothing would go wrong (other than failing to open
the file).

2) main should return int not void
Absolutely.

3) using argv[1] instead of argv[0] would make more sense, but then need
to check argc>0

I think the intent is that, in test mode, the program reports the size
of its own executable. Using argv[1] would certainly enable more
thorough testing. (Or maybe the author forgot what argv[0] means.)

Also this program makes me wonder why ftell returns long and not size_t.

size_t is for the size of an object, not of a file. A system could
support files bigger than either SIZE_MAX or LONG_MAX bytes; for
example, 32-bit systems often support files bigger than 4 gigabytes.

fsetpos and fgetpos allow for files whose sizes can't be represented
as integers, by using an opaque type, fpos_t, that can represent any
position in a file. When they were invented (1989 or earlier(?)),
there was no requirement for integers wider than 32 bits, so there was
a real possibility of files whose size couldn't be represeented in
*any* integer type. If this were being invented today, we'd probably
just use 64-bit integers.

One thing you missed: there's no <io.h> header in standard C.

Ben Pfaff · Aug 5, 2009

kid joe said:
Also this program makes me wonder why ftell returns long and not size_t.

I would guess that ftell was invented before size_t. size_t is
not in 1st edition K&R, so it was probably an invention of the C
standards committee.

Also, size_t is not a very good type for measuring the size of a
file. For example, 16-bit MS-DOS has 16-bit size_t, but supports
files larger than 65535 bytes.

Antoninus Twink · Aug 5, 2009

unsigned long getfilesize(FILE *fp)
{
unsigned long count = 0;
rewind(fp); /* optional, depending on the semantics you want */
while(getc(fp) != EOF)
{
++count;
}
return count;
}

GREAT.

The famous Heathfield efficiency strikes again.

How about this for an improvement, Heathfield?

unsigned long getfilesize(FILE *fp)
{
unsigned long count = 0;
volatile uint32_t u;
int r = (srand48(time(0)), 10*drand48());
/* make sure this trivial function takes plenty of time! */
while(r--) for(u=1; u; u++);
rewind(fp); /* optional, depending on the semantics you want */
while(getc(fp) != EOF)
{
++count;
}
return count;
}

Nobody · Aug 6, 2009

Also this program makes me wonder why ftell returns long and not size_t.

size_t is for memory regions.

POSIX uses off_t for file positions, and provides fseeko() and ftello()
which use off_t rather than long.

This allows access to files larger than 2GiB on 32-bit systems, although
there are a bunch of other complications. E.g. you have to use fopen64(),
either explicitly or by defining _FILE_OFFSET_BITS=64 (which redirects
various standard functions to 64-bit versions) before including the
headers.

Kaz Kylheku · Aug 6, 2009

Hi all,

What do people think of this snippet on Dr Dobbs today?

<URL:http://www.ddj.com/cpp/219100141>

I see these problems:
1) no error checking of fseek, ftell or fclose - could be problems if the
file is "special" in some way or if I/O errors occur
2) main should return int not void
3) using argv[1] instead of argv[0] would make more sense, but then need
to check argc>0

Also this program makes me wonder why ftell returns long and not size_t.

Historic reasons: ftell was invented before the introduction of size_t. So why
wasn't it fixed to return size_t? ftell can return -1 which indicates error,
whereas size_t is signed. So this would mean returning (size_t) -1, or
introducing some other interface change, like an extra argument.

These days it would be a bad idea to use size_t for file positions, since
files can be larger than address spaces. Thus file size/position has to be
abstracted separately from object size.

For instance, a not uncommon combination is a 32 bit environment on top
of an OS with large file support in which file offsets are 64 bits wide.
The C compiler for that environment will likely have a 32 bit size_t,
which won't serve well as a file offset.

Richard Bos · Aug 6, 2009

Ben Pfaff said:
I would guess that ftell was invented before size_t. size_t is
not in 1st edition K&R, so it was probably an invention of the C
standards committee.

It is not explicitly stated in the Rationale, but the way I read it, it
does imply that you're correct.

Richard

Richard Bos · Aug 6, 2009

Eric Sosman said:
kid joe wrote:

Not much. Not much that's good, that is.

The sad thing is that Dr. Dobbs' used to be a magazine that was worth
reading.

Richard

Nobody · Aug 6, 2009

Historic reasons: ftell was invented before the introduction of size_t. So why
wasn't it fixed to return size_t? ftell can return -1 which indicates error,
whereas size_t is signed. So this would mean returning (size_t) -1, or
introducing some other interface change, like an extra argument.

These days it would be a bad idea to use size_t for file positions, since
files can be larger than address spaces. Thus file size/position has to be
abstracted separately from object size.

"These days"? Having more disk space than RAM (or address space) has
historically been the norm (the main exception being some microcomputers
which only had floppy drives).

The only context I can think of where size_t would be an improvement over
long is Win64, due to its 32-bit long type. The usual combinations are:

long size_t typical platform
32 16 8-bit and 16-bit systems
32 32 32-bit systems[1], plus some 8086 memory models.
64 64 64-bit systems (other than Win64)

[1] meaning anything with 32-bit registers, regardless of bus width,
including e.g. 68000.

Squeamizh · Aug 6, 2009

GREAT.

The famous Heathfield efficiency strikes again.

It's not Heathfield's fault that the standard doesn't allow for an
efficient solution. If he were recommending that this solution always
be preferred to a sane non-portable solution, then you might have a
point, but I don't think he was doing that.

user923005 · Aug 6, 2009

I would guess that ftell was invented before size_t. size_t is
not in 1st edition K&R, so it was probably an invention of the C
standards committee.

Also, size_t is not a very good type for measuring the size of a
file. For example, 16-bit MS-DOS has 16-bit size_t, but supports
files larger than 65535 bytes.

Right. For file sizes we should use an fpos_t. See (for instance):

12.25: What's the difference between fgetpos/fsetpos and ftell/fseek?
What are fgetpos() and fsetpos() good for?

A: ftell() and fseek() use type long int to represent offsets
(positions) in a file, and may therefore be limited to offsets
of about 2 billion (2**31-1). The newer fgetpos() and fsetpos()
functions, on the other hand, use a special typedef, fpos_t, to
represent the offsets. The type behind this typedef, if chosen
appropriately, can represent arbitrarily large offsets, so
fgetpos() and fsetpos() can be used with arbitrarily huge files.
fgetpos() and fsetpos() also record the state associated with
multibyte streams. See also question 1.4.

References: K&R2 Sec. B1.6 p. 248; ISO Sec. 7.9.1,
Secs. 7.9.9.1,7.9.9.3; H&S Sec. 15.5 p. 252.

Of course, exact file positions are problematic for many reasons (e.g.
textual eol translation for text files, live updates after
measurement, etc.).
In a multi-user environment (or even a multi-threading single user
environment), unless we physically lock the file and count every byte
in binary mode, we should always assume that a file size is only an
estimate.

IMO-YMMV.

Ben Pfaff · Aug 6, 2009

user923005 said:
Right. For file sizes we should use an fpos_t. See (for instance):

No, fpos_t is not useful for obtaining the size of a file,
because there is no way to get the size of the file out of it.
All we know for sure about fpos_t is that it an object type other
than an array type. It might be, for example, a structure type.

"unsigned long" is the best type available in C90 for
representing a file size. In C99, "unsigned long long" is
better. In POSIX, off_t is even better.

Nobody · Aug 6, 2009

Right. For file sizes we should use an fpos_t.

The problem with fpos_t is that it isn't necessarily a numeric type, so
you can't e.g. calculate the number of bytes between two file positions,
or even check whether the current file position is equal to some
previously-recorded position.

Phil Carmody · Aug 6, 2009

The sad thing is that Dr. Dobbs' used to be a magazine that was worth
reading.

Not since the mid-90s, I*M*HO. It's interesting for your first couple
of years of programming; bearable for the next few; and then not
worth the subscription. Finally, it's not even worth a trip to the
library.

Phil

Eric Sosman · Aug 7, 2009

user923005 said:
I would guess that ftell was invented before size_t. size_t is
not in 1st edition K&R, so it was probably an invention of the C
standards committee.

Also, size_t is not a very good type for measuring the size of a
file. For example, 16-bit MS-DOS has 16-bit size_t, but supports
files larger than 65535 bytes.

Click to expand...

Right. For file sizes we should use an fpos_t. [...]

... except that there's no portable way to extract a
file size (or even a byte count) from an fpos_t. It's an
opaque type, intentionally so.

If you'd said "For file *positions* we should use an
fpos_t," I wouldn't have objected. But fpos_t is useless
for the purpose the O.P. referred to.

Eric Sosman · Aug 7, 2009

Phil said:
Not since the mid-90s, I*M*HO. It's interesting for your first couple
of years of programming; bearable for the next few; and then not
worth the subscription. Finally, it's not even worth a trip to the
library.

I signed up for a one-year subscription in the late 1980's,
read about three issues carefully, skimmed about three more,
recycled six, and didn't renew. Piece of pretentious trash,
I thought it.

But that was in another country; and besides, the wench
is dead.

luserXtrog · Aug 7, 2009

I signed up for a one-year subscription in the late 1980's,
read about three issues carefully, skimmed about three more,
recycled six, and didn't renew. Piece of pretentious trash,
I thought it.

Is there something better? CUJ? NYT on Tuesday? Sci-Am? Pentacle?

But that was in another country; and besides, the wench
is dead.

no comprendo.

Richard Bos · Aug 7, 2009

Ben Pfaff said:
"unsigned long" is the best type available in C90 for
representing a file size. In C99, "unsigned long long" is
better.

I would prefer uintmax_t. It's a required type, it's guaranteed to be at
laast as large as unsigned long long, and if a file size can't be
represented in it, it can't be represented in any other integer type,
either. So you lose nothing compared to unsigned long long, and you
_may_ gain some.

Richard

James Kuyper · Aug 7, 2009

Richard said:
I would prefer uintmax_t. It's a required type, it's guaranteed to be at
laast as large as unsigned long long, and if a file size can't be
represented in it, it can't be represented in any other integer type,
either. So you lose nothing compared to unsigned long long, and you
_may_ gain some.

uintmax_t might, for example, be an emulated 128-bit type, with a
correspondingly large increase it the execution time for each operation
involving it.

Ben Bacarisse · Aug 7, 2009

Is there something better? CUJ? NYT on Tuesday? Sci-Am? Pentacle?

If you are interested in programming and algorithms, there used to be
a journal called "Software, Practise and Experience" (Wiley) which, in
it's early days, was a very good read indeed. I have no idea what it
is like now, but a quick glance on-line suggests that it may have got
a bit "wooly". In its hey-day it was quite down to earth.

Of course, it is almost certainly too pricey to be worth buying. A
good library would have it (mine does not).

seg fault	76	Jun 17, 2008
questions on ftell and fopen	25	Mar 2, 2007
hexump.c	79	Sep 9, 2011
An exercise in fread optimisation	10	Dec 10, 2007
Help!-Errors while running this snippet code	2	Jul 2, 2003
Can not read VCD file in Linux	8	Feb 1, 2011
Text to string program	7	Aug 27, 2006
printig effort	45	Jun 8, 2008

Dr Dobbs snippet

kid joe

Keith Thompson

Ben Pfaff

Antoninus Twink

Nobody

Kaz Kylheku

Richard Bos

Richard Bos

Nobody

Squeamizh

user923005

Ben Pfaff

Nobody

Phil Carmody

Eric Sosman

Eric Sosman

luserXtrog

Richard Bos

James Kuyper

Ben Bacarisse

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads