Dr Dobbs snippet

K

kid joe

Hi all,

What do people think of this snippet on Dr Dobbs today?

<URL:http://www.ddj.com/cpp/219100141>

I see these problems:
1) no error checking of fseek, ftell or fclose - could be problems if the
file is "special" in some way or if I/O errors occur
2) main should return int not void
3) using argv[1] instead of argv[0] would make more sense, but then need
to check argc>0

Also this program makes me wonder why ftell returns long and not size_t.

Cheers,
Joe
 
K

Keith Thompson

kid joe said:
Hi all,

What do people think of this snippet on Dr Dobbs today?

<URL:http://www.ddj.com/cpp/219100141>

Here's the code:
==================================================
/*
** FLENGTH.C - a simple function using all ANSI-standard functions
** to determine the size of a file.
**
** Public domain by Bob Jarvis.
*/

#include <stdio.h>
#include <io.h>

long flength(char *fname)
{
FILE *fptr;
long length = 0L;

fptr = fopen(fname, "rb");
if(fptr != NULL)
{
fseek(fptr, 0L, SEEK_END);
length = ftell(fptr);
fclose(fptr);
}

return length;
}

#ifdef TEST

void main(int argc, char *argv[])
{
printf("Length of %s = %ld\n", argv[0], flength(argv[0]));
}

#endif /* TEST */
==================================================
I see these problems:
1) no error checking of fseek, ftell or fclose - could be problems if the
file is "special" in some way or if I/O errors occur

If fseek fails but ftell succeeds, ftell will probably return 0.

If ftell fails, it returns -1.

If fopen fails, flength returns 0.

This all might be reasonable if it were documented, but I think the
author just assumed nothing would go wrong (other than failing to open
the file).
2) main should return int not void
Absolutely.

3) using argv[1] instead of argv[0] would make more sense, but then need
to check argc>0

I think the intent is that, in test mode, the program reports the size
of its own executable. Using argv[1] would certainly enable more
thorough testing. (Or maybe the author forgot what argv[0] means.)
Also this program makes me wonder why ftell returns long and not size_t.

size_t is for the size of an object, not of a file. A system could
support files bigger than either SIZE_MAX or LONG_MAX bytes; for
example, 32-bit systems often support files bigger than 4 gigabytes.

fsetpos and fgetpos allow for files whose sizes can't be represented
as integers, by using an opaque type, fpos_t, that can represent any
position in a file. When they were invented (1989 or earlier(?)),
there was no requirement for integers wider than 32 bits, so there was
a real possibility of files whose size couldn't be represeented in
*any* integer type. If this were being invented today, we'd probably
just use 64-bit integers.

One thing you missed: there's no <io.h> header in standard C.
 
B

Ben Pfaff

kid joe said:
Also this program makes me wonder why ftell returns long and not size_t.

I would guess that ftell was invented before size_t. size_t is
not in 1st edition K&R, so it was probably an invention of the C
standards committee.

Also, size_t is not a very good type for measuring the size of a
file. For example, 16-bit MS-DOS has 16-bit size_t, but supports
files larger than 65535 bytes.
 
A

Antoninus Twink

unsigned long getfilesize(FILE *fp)
{
unsigned long count = 0;
rewind(fp); /* optional, depending on the semantics you want */
while(getc(fp) != EOF)
{
++count;
}
return count;
}

GREAT.

The famous Heathfield efficiency strikes again.

How about this for an improvement, Heathfield?

unsigned long getfilesize(FILE *fp)
{
unsigned long count = 0;
volatile uint32_t u;
int r = (srand48(time(0)), 10*drand48());
/* make sure this trivial function takes plenty of time! */
while(r--) for(u=1; u; u++);
rewind(fp); /* optional, depending on the semantics you want */
while(getc(fp) != EOF)
{
++count;
}
return count;
}
 
N

Nobody

Also this program makes me wonder why ftell returns long and not size_t.

size_t is for memory regions.

POSIX uses off_t for file positions, and provides fseeko() and ftello()
which use off_t rather than long.

This allows access to files larger than 2GiB on 32-bit systems, although
there are a bunch of other complications. E.g. you have to use fopen64(),
either explicitly or by defining _FILE_OFFSET_BITS=64 (which redirects
various standard functions to 64-bit versions) before including the
headers.
 
K

Kaz Kylheku

Hi all,

What do people think of this snippet on Dr Dobbs today?

<URL:http://www.ddj.com/cpp/219100141>

I see these problems:
1) no error checking of fseek, ftell or fclose - could be problems if the
file is "special" in some way or if I/O errors occur
2) main should return int not void
3) using argv[1] instead of argv[0] would make more sense, but then need
to check argc>0

Also this program makes me wonder why ftell returns long and not size_t.

Historic reasons: ftell was invented before the introduction of size_t. So why
wasn't it fixed to return size_t? ftell can return -1 which indicates error,
whereas size_t is signed. So this would mean returning (size_t) -1, or
introducing some other interface change, like an extra argument.

These days it would be a bad idea to use size_t for file positions, since
files can be larger than address spaces. Thus file size/position has to be
abstracted separately from object size.

For instance, a not uncommon combination is a 32 bit environment on top
of an OS with large file support in which file offsets are 64 bits wide.
The C compiler for that environment will likely have a 32 bit size_t,
which won't serve well as a file offset.
 
R

Richard Bos

Ben Pfaff said:
I would guess that ftell was invented before size_t. size_t is
not in 1st edition K&R, so it was probably an invention of the C
standards committee.

It is not explicitly stated in the Rationale, but the way I read it, it
does imply that you're correct.

Richard
 
R

Richard Bos

Eric Sosman said:
kid joe wrote:

Not much. Not much that's good, that is.

The sad thing is that Dr. Dobbs' used to be a magazine that was worth
reading.

Richard
 
N

Nobody

Historic reasons: ftell was invented before the introduction of size_t. So why
wasn't it fixed to return size_t? ftell can return -1 which indicates error,
whereas size_t is signed. So this would mean returning (size_t) -1, or
introducing some other interface change, like an extra argument.

These days it would be a bad idea to use size_t for file positions, since
files can be larger than address spaces. Thus file size/position has to be
abstracted separately from object size.

"These days"? Having more disk space than RAM (or address space) has
historically been the norm (the main exception being some microcomputers
which only had floppy drives).

The only context I can think of where size_t would be an improvement over
long is Win64, due to its 32-bit long type. The usual combinations are:

long size_t typical platform
32 16 8-bit and 16-bit systems
32 32 32-bit systems[1], plus some 8086 memory models.
64 64 64-bit systems (other than Win64)

[1] meaning anything with 32-bit registers, regardless of bus width,
including e.g. 68000.
 
S

Squeamizh

GREAT.

The famous Heathfield efficiency strikes again.

It's not Heathfield's fault that the standard doesn't allow for an
efficient solution. If he were recommending that this solution always
be preferred to a sane non-portable solution, then you might have a
point, but I don't think he was doing that.
 
U

user923005

I would guess that ftell was invented before size_t.  size_t is
not in 1st edition K&R, so it was probably an invention of the C
standards committee.

Also, size_t is not a very good type for measuring the size of a
file.  For example, 16-bit MS-DOS has 16-bit size_t, but supports
files larger than 65535 bytes.

Right. For file sizes we should use an fpos_t. See (for instance):

12.25: What's the difference between fgetpos/fsetpos and ftell/fseek?
What are fgetpos() and fsetpos() good for?

A: ftell() and fseek() use type long int to represent offsets
(positions) in a file, and may therefore be limited to offsets
of about 2 billion (2**31-1). The newer fgetpos() and fsetpos()
functions, on the other hand, use a special typedef, fpos_t, to
represent the offsets. The type behind this typedef, if chosen
appropriately, can represent arbitrarily large offsets, so
fgetpos() and fsetpos() can be used with arbitrarily huge files.
fgetpos() and fsetpos() also record the state associated with
multibyte streams. See also question 1.4.

References: K&R2 Sec. B1.6 p. 248; ISO Sec. 7.9.1,
Secs. 7.9.9.1,7.9.9.3; H&S Sec. 15.5 p. 252.

Of course, exact file positions are problematic for many reasons (e.g.
textual eol translation for text files, live updates after
measurement, etc.).
In a multi-user environment (or even a multi-threading single user
environment), unless we physically lock the file and count every byte
in binary mode, we should always assume that a file size is only an
estimate.

IMO-YMMV.
 
B

Ben Pfaff

user923005 said:
Right. For file sizes we should use an fpos_t. See (for instance):

No, fpos_t is not useful for obtaining the size of a file,
because there is no way to get the size of the file out of it.
All we know for sure about fpos_t is that it an object type other
than an array type. It might be, for example, a structure type.

"unsigned long" is the best type available in C90 for
representing a file size. In C99, "unsigned long long" is
better. In POSIX, off_t is even better.
 
N

Nobody

Right. For file sizes we should use an fpos_t.

The problem with fpos_t is that it isn't necessarily a numeric type, so
you can't e.g. calculate the number of bytes between two file positions,
or even check whether the current file position is equal to some
previously-recorded position.
 
P

Phil Carmody

The sad thing is that Dr. Dobbs' used to be a magazine that was worth
reading.

Not since the mid-90s, I*M*HO. It's interesting for your first couple
of years of programming; bearable for the next few; and then not
worth the subscription. Finally, it's not even worth a trip to the
library.

Phil
 
E

Eric Sosman

user923005 said:
I would guess that ftell was invented before size_t. size_t is
not in 1st edition K&R, so it was probably an invention of the C
standards committee.

Also, size_t is not a very good type for measuring the size of a
file. For example, 16-bit MS-DOS has 16-bit size_t, but supports
files larger than 65535 bytes.

Right. For file sizes we should use an fpos_t. [...]

... except that there's no portable way to extract a
file size (or even a byte count) from an fpos_t. It's an
opaque type, intentionally so.

If you'd said "For file *positions* we should use an
fpos_t," I wouldn't have objected. But fpos_t is useless
for the purpose the O.P. referred to.
 
E

Eric Sosman

Phil said:
Not since the mid-90s, I*M*HO. It's interesting for your first couple
of years of programming; bearable for the next few; and then not
worth the subscription. Finally, it's not even worth a trip to the
library.

I signed up for a one-year subscription in the late 1980's,
read about three issues carefully, skimmed about three more,
recycled six, and didn't renew. Piece of pretentious trash,
I thought it.

But that was in another country; and besides, the wench
is dead.
 
L

luserXtrog

     I signed up for a one-year subscription in the late 1980's,
read about three issues carefully, skimmed about three more,
recycled six, and didn't renew.  Piece of pretentious trash,
I thought it.

Is there something better? CUJ? NYT on Tuesday? Sci-Am? Pentacle?
     But that was in another country; and besides, the wench
is dead.

no comprendo.
 
R

Richard Bos

Ben Pfaff said:
"unsigned long" is the best type available in C90 for
representing a file size. In C99, "unsigned long long" is
better.

I would prefer uintmax_t. It's a required type, it's guaranteed to be at
laast as large as unsigned long long, and if a file size can't be
represented in it, it can't be represented in any other integer type,
either. So you lose nothing compared to unsigned long long, and you
_may_ gain some.

Richard
 
J

James Kuyper

Richard said:
I would prefer uintmax_t. It's a required type, it's guaranteed to be at
laast as large as unsigned long long, and if a file size can't be
represented in it, it can't be represented in any other integer type,
either. So you lose nothing compared to unsigned long long, and you
_may_ gain some.

uintmax_t might, for example, be an emulated 128-bit type, with a
correspondingly large increase it the execution time for each operation
involving it.
 
B

Ben Bacarisse

Is there something better? CUJ? NYT on Tuesday? Sci-Am? Pentacle?

If you are interested in programming and algorithms, there used to be
a journal called "Software, Practise and Experience" (Wiley) which, in
it's early days, was a very good read indeed. I have no idea what it
is like now, but a quick glance on-line suggests that it may have got
a bit "wooly". In its hey-day it was quite down to earth.

Of course, it is almost certainly too pricey to be worth buying. A
good library would have it (mine does not).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

seg fault 76
questions on ftell and fopen 25
hexump.c 79
An exercise in fread optimisation 10
Help!-Errors while running this snippet code 2
Can not read VCD file in Linux 8
Text to string program 7
printig effort 45

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top