Programming in standard c

M

Malcolm McLean

Julienne Walker said:
I get the distinct impression that you're basing these complaints on
requirements that I'm not aware of. Can you give me a formal
description of this function so that I have a better idea of what I'm
dealing with?
It's got to load a text file into a contiguous block of RAM, on any platform
running ANSI standard C.
Implied is that it shouldn't waste memory, make too many passes over the
data, or repeatedly reallocate.
It can't be done, because implementations don't have to return an index from
ftell(). So you need to call fgetc() iteratively to get the size of the
file. However MiniBasic has to load scripts, and the function I used is in
practise good enough.
 
E

Eric Sosman

Bart said:
I've been using a function like the following:

unsigned int getfilesize(FILE* handle)
{
unsigned int p,size;
p=ftell(handle); /*p=current position*/
fseek(handle,0,2); /*get eof position*/
size=ftell(handle); /*size in bytes*/
fseek(handle,p,0); /*restore file position*/
return size;
}

What is wrong with this, is it non-standard? (Apart from the likely 4Gb
limit)

Several things are wrong with it, even apart from the
possible 64KB limit.

Zeroth, you should have #include'd <stdio.h>. I'll let
you get away with this one, though, on the grounds that since
you're using FILE you probably *have* #include'd it but just
failed to show the inclusion.

First, there's no error checking. None, nada, zero, zip.

Second, ftell() returns a long. When you store the long
value in an unsigned int, the conversion might not preserve
the value; you may end up seeking back to a different place
than you started. (Or, on a text stream, you may invoke
undefined behavior since the value of `p' in the second fseek()
may not be the value ftell() returned.)

Third, what are the magic numbers 2 and 0 that you use
as the third arguments in the fseek() calls? My guess is
that they are the expansions of the macros SEEK_END and
SEEK_CUR on some system you once used, and that you've
decided for some bizarre reason to avoid using the macros.
So the values will be right (one supposes) on that system,
but there's no telling what they might mean on another.

Fourth, for a text stream the value returned by ftell() is
not necessarily a byte count; it is a value with an unspecified
encoding. Calling it a "file size" makes unwarranted assumptions.

Fifth, there's 7.19.9.2p3: "A binary stream need not
meaningfully support fseek calls with a whence value of SEEK_END."
So if SEEK_END expands to the value 2 (see above), the first
ftell() call may be meaningless on a binary stream.

Sixth, for a binary stream there may be an unspecified
number of extraneous zero bytes after the last byte actually
written to the file. (This isn't as bad as the others, because
if you read the file you'll actually be able to read those
zeroes if they are present: They behave as if they're in the
file, even though they may never have been written to it.)

But other than that, it looks pretty good.
 
E

Eric Sosman

jacob said:
int main(void) { int n = printf("hello\n");}
How much is n?

Answer for the code as shown: Impossible to tell,
because the code needn't even compile under C99 rules,
and invokes undefined behavior in both C90 and C99.

Answer for the code as probably intended: Either
six or an unspecified negative number.
No way to know since the error codes of printf
are NOT standardized. This means that I can only
know that n can be... *ANYTHING*. Maybe it wrote
some characters, then stopped, or whatever!

No, n cannot be "*ANYTHING*". For example, it cannot
be forty-two.
The problem with the lack of standardization of error codes
means that I can't do error checking in a portable way
and thus, no portable program of any importance can be
written that handles the different error situations that
could arise.

No such program can be portable anyhow, since the list
of potential failure modes is system-specific. Do you want
to force an implementation to throw away information about
the cause of a failure, simply to cram its diagnosis into
one least-common-denominator framework of failure codes?
Perhaps you do: I see that your "Happy christmas" effort
diagnoses *every* fopen() failure as "file not found" --
no "file locked by another user," no "too many open files,"
no "insufficient memory," no "permission denied," just "file
not found." (Well, at least you're following an established
precedent: "Tapes? What tapes? There are no such tapes, and
besides, we burned 'em.")
In normal software, you *are* interested into why this program/function
call failed. You can't portably do that in standard C;

Right. When you have enumerated all the failure conditions
for all the file systems that C has run on, runs on today, or
will run on in the future, then you can talk about a comprehensive
and portable encoding scheme for them.
You can't even know the size of a file without reading it all.

This is true, and sometimes a problem. Not usually, but
sometimes.
A bit of more functionality would be better for all of us. But
if I am in this group obviously, it is not because I
believe standard C is useless but because I want to fix some
problems with it.

Either you don't comprehend the difficulty, or you have
seen a way to solve it that has eluded a lot of other people.
The latter would be better for everyone (if you're willing to
share the solution under not-too-expensive terms), but from
the content of your posts over the years I greatly fear that
the case is the former.
Does this answer your question?

No. You made a blanket, all-inclusive statement that
"You can't do *anything* in just standard C," and I asked
whether you stood by it or would retreat from it. You have
still neither affirmed nor recanted your claim.
 
A

army1987

jacob said:
fpos_t filesize(FILE *);

would be useful isn't it?
On my system fpos_t isn't an integer. It isn't an arithmetic type, either.
It isn't a scalar, either.
How do I convert an object whose type looks like
typedef struct
{
__off_t __pos;
__mbstate_t __state;
} _G_fpos_t;
typedef _G_fpos_t fpos_t;
to a number?
 
J

jacob navia

army1987 said:
On my system fpos_t isn't an integer. It isn't an arithmetic type, either.
It isn't a scalar, either.
How do I convert an object whose type looks like
typedef struct
{
__off_t __pos;
__mbstate_t __state;
} _G_fpos_t;
typedef _G_fpos_t fpos_t;
to a number?

you convert the __pos member into a long long.
Read the docs, maybe you are interested in the
mbstate member, maybe not.

In any case I would say that a long long
result would be a better return type.
 
J

jacob navia

Eric said:
Answer for the code as shown: Impossible to tell,
because the code needn't even compile under C99 rules,
and invokes undefined behavior in both C90 and C99.

WOW. How clever you are.

No such program can be portable anyhow, since the list
of potential failure modes is system-specific. Do you want
to force an implementation to throw away information about
the cause of a failure, simply to cram its diagnosis into
one least-common-denominator framework of failure codes?

The way of arguing is obvious:

You establish a false alternative. If somebody asks for
better standardization of error codes, you say that the
alternativew are
o NOTHING (no standardization at all)
o a comprehensive error list of all possible error codes.

The OBVIOUS alternative of standardizing the most common ones
(IO error, not enough memory, incorrect argument, etc)
and leaving to the implementation to return more explicit error codes
is not at all considered...

This could be done by making errno return two error codes:

main error code (general type of failure)
system specific error code

For an IO error, an implementation would set the
main error code to EIO, and the specific error code
to the code of the device that failed, or the code
of the faulty controller, or WHATEVER.

Programs would have the choice of just testing for the more common
error codes or being specific to a particular system.

Note that as it is NOW, portable programs can't do any type
of error analysis at all.

Case in point:

fopen.

The most common errors are

file not found
file locked
privileges problem
IO error
no memory

There is NO WAY a program in standard C can analyze those common
errors to take appropriate action!

For the people here, this is not serious at all, anyway you do
not care about error analysis because we are n C isn't it?

Or what is the rationale behind this attitude?
 
R

Richard Heathfield

jacob navia said:
WOW. How clever you are.

Sarcasm doesn't work very well when you're in the wrong. If you don't want
people to post blindingly obvious corrections to your code, don't make
blindingly obvious mistakes.

You establish a false alternative. If somebody asks for
better standardization of error codes, you say that the
alternativew are
o NOTHING (no standardization at all)
o a comprehensive error list of all possible error codes.

The OBVIOUS alternative of standardizing the most common ones
(IO error, not enough memory, incorrect argument, etc)
and leaving to the implementation to return more explicit error codes
is not at all considered...

On the contrary, that's what ISO did. That's why we have EDOM and ERANGE.
The difference between what you suggest and what they actually
standardised is mere haggling over where to draw the line. If you want
more error codes added to the Standard, lobby ISO to that effect.
Complaining about it in comp.lang.c won't achieve anything, because
comp.lang.c doesn't write the Standard.

<snip>
 
F

Flash Gordon

Malcolm McLean wrote, On 27/12/07 12:12:
/*
function to slurp in an ASCII file
Params: path - path to file
Returns: malloced string containing whole file
*/
char *loadfile(char *path)
{
FILE *fp;
int ch;
long i = 0;
long size = 0;
char *answer;

fp = fopen(path, "r");

OK, you got the mode right for the file so you've done better than Jacob.
if(!fp)
{
printf("Can't open %s\n", path);
return 0;
}

fseek(fp, 0, SEEK_END);

You should check for success.
size = ftell(fp);

Using a method you know is not portable is hardly the best way to answer
Jacob's challenge.
fseek(fp, 0, SEEK_SET);

answer = malloc(size + 100);
if(!answer)
{
printf("Out of memory\n");
fclose(fp);
return 0;

You should try for consistent indenting.
}

while( (ch = fgetc(fp)) != EOF)
answer[i++] = ch;

This could overrun your buffer since you don't check.
answer[i++] = 0;

fclose(fp);

return answer;
}

This will do it. Add 100 + size/10 for luck if paranoid.
You are right that a perverse implementation can break this, which is a
bug in the standard.

Or a limitation due to the limitations of existing systems.

Of course, if you had bothered to add in a few simple checks you could
have produced a solution that would work for files up the the maximum
size of block that can be allocated. So get your best guess of the file
size and then expand the buffer if the file turns out to be larger (or
the fseek or ftell failed) and optionally shrink it down at the end.

Since the systems I work with can have larger files than the total of
physical+virtual memory such a function is of no real use to be.
 
E

Eric Sosman

jacob said:
WOW. How clever you are.

Thank you.
Case in point:

fopen.

The most common errors are

file not found
file locked
privileges problem
IO error
no memory

Are you willing to share the empirical evidence you have
gathered that shows these are the "most common" errors? My
guess would have been that both "is a directory" and "too many
open files" would have come higher on the list than "no memory,"
and it would be interesting to see the counts you've accumulated.
There is NO WAY a program in standard C can analyze those common
errors to take appropriate action!

What is the appropriate action for "file locked," and how
does it differ from that for "privileges problem?" Portable
actions only, please: you keep talking about portable error
analysis, which is mere mockery without portable error response.
For the people here, this is not serious at all, anyway you do
not care about error analysis because we are n C isn't it?
>
Or what is the rationale behind this attitude?

My attitude is that your claim "You can't do *anything* in
just standard C" is a load of horse feathers, and that you know
it's a load of horse feathers, but you're too smart to affirm it
and too proud to recant.
 
F

Flash Gordon

Malcolm McLean wrote, On 27/12/07 12:21:
It's got to load a text file into a contiguous block of RAM, on any
platform running ANSI standard C.

Easy for reasonably sized files where it is possible, not possible if
the file is larger than the memory available to the process.
Implied is that it shouldn't waste memory, make too many passes over the
data, or repeatedly reallocate.

Those are not implied by the initial statement of requirements. They
also make it impossible even if you leave behind the strictures of
standard C, since the only way to avoid waste memory is to find the file
size, and on Windows (to take one example) the only way to find the
space required is to do a complete scan of the file since Windows uses 2
bytes in a file to indicate the end of a line and can signal the end of
a text file with another byte at *any* point in the physical file. So
the impossibility is nothing to do with C but everything to do with the
way *common* systems work.
It can't be done, because implementations don't have to return an index
from ftell().

That is a limitation of C because it is a limitation of some of the
underlying systems C runs on, such as Windows.
So you need to call fgetc() iteratively to get the size of
the file.

Any "getfilesize()" function that worked "correctly" for text files on
Windows (i.e. reported the number of characters you can read if the file
is not modified) would have to read the file a byte at a time anyway.
However MiniBasic has to load scripts, and the function I used
is in practise good enough.

Well, I've claimed that writing a function that can (subject to system
limitations) read an entire text file is not hard, so I'm not surprised
by your claim.
 
S

Stephen Montgomery-Smith

Richard said:
[Stephen's reply, whilst long, was well worth reading. I only have comments
to make on a tiny portion of it. Please imagine that, instead of snipping
the rest, I had quoted it all and written <aol>I agree!</aol> underneath.]

Stephen Montgomery-Smith said:
jacob navia wrote:

As a newcomer to this group who hasn't even read the FAQ, let me
nevertheless brazenly seek to answer your question.

I think you are correct in that standard C is of somewhat limited value.

*All* tools are of somewhat limited value. I think many people would be
astounded at just how much can be done with standard C, and just how
widely that functionality can be implemented.
But perhaps we should see standard C as perhaps a tool to be embedded
into real C, rather than as an object with value in of itself.

How do you feel about s/rather than/as well/ - because I think that such a
change reflects reality rather more closely. Certainly for my own part, I
know that my use of what you call "real C" (by which you appear to mean "C
+ non-ISO9899 libraries") is dwarfed by my use of ISO C. Most of the C
programs I write are ISO C programs. Only a very small proportion use
non-ISO9899 libraries.

Of course, you are correct.

But to reiterate my points - many years ago I used to program in PASCAL.
The problem was PASCAL had certain limitations, and so to overcome
them every implementation had to have certain non-standard extensions.

Then I switched to C. C also has limitations, because a programming
language simply cannot cover every eventuality that a user or OS might
need. But C was defined in a sufficiently ambiguous manner that all the
extensions were permitted by the standard, and one still had standard C.
Somehow the inventors of C (and their successor standards bodies)
attained that delicate balance, because of course to be too ambiguous
would be just as bad as being too strict.

Another thing about C - somehow it is easy to use. PASCAL, I remember,
was very klunky, and it took too many typestrokes to accomplish
something very simple. Next, the other day, a friend sent me a program
written in FORTRAN, and I simply couldn't read it! And this program was
was performing numerical analysis, something that while perhaps
mathematically difficult, is simple from a programming point of view.
On the other hand, I can read C code for OS internals, minimally
commented, and as long as I know broadly what the code is meant to do,
it reads very easily.

Stephen
 
J

jacob navia

What do I mean with error analysis?

Something like this
FOPEN
[snip]

ERRORS
The fopen() function shall fail if:
[EACCES]
Search permission is denied on a component of the path prefix, or the
file exists and the permissions specified by mode are denied, or the
file does not exist and write permission is denied for the parent
directory of the file to be created.
[EINTR]
A signal was caught during fopen().
[EISDIR]
The named file is a directory and mode requires write access.
[ELOOP]
A loop exists in symbolic links encountered during resolution of the
path argument.
[EMFILE]
{OPEN_MAX} file descriptors are currently open in the calling process.
[ENAMETOOLONG]
The length of the filename argument exceeds {PATH_MAX} or a pathname
component is longer than {NAME_MAX}.
[ENFILE]
The maximum allowable number of files is currently open in the system.
[ENOENT]
A component of filename does not name an existing file or filename is an
empty string.
[ENOSPC]
The directory or file system that would contain the new file cannot be
expanded, the file does not exist, and the file was to be created.
[ENOTDIR]
A component of the path prefix is not a directory.
[ENXIO]
The named file is a character special or block special file, and the
device associated with this special file does not exist.
[EOVERFLOW]
The named file is a regular file and the size of the file cannot be
represented correctly in an object of type off_t.
[EROFS]
The named file resides on a read only file system and write access was
specified.

You see?
An implementation would be allowed to extend this errors but we could
portably test for a certain kind of error.

To test if a file does not exist I could test for ENOENT when I try
to open it. I could test EISDIR to see if this file is a directory...
etc etc!
 
M

Malcolm McLean

Flash Gordon said:
Malcolm McLean wrote, On 27/12/07 12:12:

You should check for success.


Using a method you know is not portable is hardly the best way to answer
Jacob's challenge.
The code is designed to be used in a production environment, and it is
adequate for that. It reads in a MiniBasic script file. If the file is huge
the function will fail, but the interpreter will choke on such an input
anyway.
 
F

Francine.Neary

jacob said:
I do not want to argue trhat it is impossible to write this program
in C. I am arguing that it is not possible to write it in STANDARD C.

Look, all those bugs can be easily corrected and your approach is maybe
sounder than mine. You will agree however, that

fpos_t filesize(FILE *);

would be useful isn't it?

It's hardly a surprise that complicated file operations end up
requiring
platform-specific code. Many programs will benefit or require
platform-specific code. The obvious solution is to isolate all of the
non-portable code in your program into a single place, so that when
it comes time to port it to a different platform, only this one module
needs to be rewritten. Your filesize() function is an excellent
candidate for inclusion in this non-standard module.

It's a bad workman who blames his tools: if Standard C isn't the right
tool for a particular job, why not use an appropriate tool rather than
berating Standard C?
 
B

Ben Pfaff

Richard Heathfield said:
Most of the C programs I write are ISO C programs. Only a very
small proportion use non-ISO9899 libraries.

If one may inquire, what kind of task do most of your programs
seek to accomplish?
 
R

Richard Heathfield

Ben Pfaff said:
If one may inquire, what kind of task do most of your programs
seek to accomplish?

It varieth exceeding great, and I'm struggling to pick a form of words that
could briefly and communicatively describe "most" of the programs I write.
At present, however, I'm writing a PHP code generator. In this case, it
happens to be written in ISO C not because it must be, but because I have
no particular need for extensions (why use 'em for the sake of it?).
 
K

Keith Thompson

jacob navia said:
What do I mean with error analysis?

Something like this
FOPEN
[snip]

ERRORS
The fopen() function shall fail if:
[EACCES]
Search permission is denied on a component of the path prefix, or the
file exists and the permissions specified by mode are denied, or the
file does not exist and write permission is denied for the parent
directory of the file to be created.
[EINTR]
A signal was caught during fopen().
[EISDIR]
The named file is a directory and mode requires write access. [snip]

You see?
An implementation would be allowed to extend this errors but we could
portably test for a certain kind of error.

To test if a file does not exist I could test for ENOENT when I try
to open it. I could test EISDIR to see if this file is a directory...
etc etc!

Test and do what? Do you expect a program to behave differently if an
fopen() call fails because the file doesn't exist than if it fails
because it's a directory? Most of the time, I'd expect the program to
print an error message (which is readily available by calling
strerror(errno)) and then doing some generic error handling.

But ok, let's assume you want different behavior beyond just a
differently worded error message. If you know you're on a
POSIX-compliant system, you can go ahead and use all those E* macros.
(POSIX already exists, and is widely supported; I see no need to
incorporate large chunks of it into the C standard, particularly since
C is intended to be supported on a wider variety of underlying
platforms than POSIX is.) Or, if you want both flexibility and
portability, you can do something like this (untested code):

if (... fopen failed ...) {
switch (errno) {

#ifdef EACCES
case EACCES:
/* handle permission error */
break;
#endif

#ifdef EINTR
case EINTR:
/* handle signal error */
break;
#endif

#ifdef EISDIR
case EISDIR:
/* handle directory error */
break;
#endif

default:
/* Unrecognized error; use strerror() to obtain a
message */
break;
}
}
 
C

CJ

Or, if you want both flexibility and
portability, you can do something like this (untested code):

if (... fopen failed ...) {
switch (errno) {

#ifdef EACCES
case EACCES:
/* handle permission error */
break;
#endif

#ifdef EINTR
case EINTR:
/* handle signal error */
break;
#endif

#ifdef EISDIR
case EISDIR:
/* handle directory error */
break;
#endif

default:
/* Unrecognized error; use strerror() to obtain a
message */
break;
}
}

But if you're not using POSIX, you could define EACCESS etc. as glocal
macros yourself, and then this code could break badly...
 
E

Eric Sosman

jacob said:
What do I mean with error analysis?

Something like this
FOPEN
[snip]

ERRORS
The fopen() function shall fail if:
[EACCES]
Search permission is denied on a component of the path prefix, or the
file exists and the permissions specified by mode are denied, or the
file does not exist and write permission is denied for the parent
directory of the file to be created.
[EINTR]
A signal was caught during fopen().
[EISDIR]
The named file is a directory and mode requires write access.
[ELOOP]
A loop exists in symbolic links encountered during resolution of the
path argument.
[EMFILE]
{OPEN_MAX} file descriptors are currently open in the calling process.
[ENAMETOOLONG]
The length of the filename argument exceeds {PATH_MAX} or a pathname
component is longer than {NAME_MAX}.
[ENFILE]
The maximum allowable number of files is currently open in the system.
[ENOENT]
A component of filename does not name an existing file or filename is an
empty string.
[ENOSPC]
The directory or file system that would contain the new file cannot be
expanded, the file does not exist, and the file was to be created.
[ENOTDIR]
A component of the path prefix is not a directory.
[ENXIO]
The named file is a character special or block special file, and the
device associated with this special file does not exist.
[EOVERFLOW]
The named file is a regular file and the size of the file cannot be
represented correctly in an object of type off_t.
[EROFS]
The named file resides on a read only file system and write access was
specified.

You see?
An implementation would be allowed to extend this errors but we could
portably test for a certain kind of error.

To test if a file does not exist I could test for ENOENT when I try
to open it. I could test EISDIR to see if this file is a directory...
etc etc!

I repeat the question you snipped and have not yet even
begun to answer:

If you like, we can now rephrase it in terms of your list:
For each error code, please explain what "appropriate action"
(your phrase) should be taken, how its action differs from the
actions taken for other codes, and how to take action portably.

Also, do you imagine that the list you exhibit here is an
exhaustive list of failure modes for fopen()? What code should
an implementation use if fopen() fails for lack of memory? What
code reports an incompatibility between the file organization and
C's sequential "stream of bytes" model (c.f. OpenVMS)? What code
is appropriate for a security violation (e.g., low-privilege
program attempting to read a high-privilege file -- note that your
description of EPERM does not cover this case)? What code should
be used if the file name references an environment variable that
has no definition? What code should be used -- ah, the hell with
it. You have not even begun to think about these problems, much
less solve them.
 
E

Erik Trulsson

jacob navia said:
er... YES!


We could restrict this to normal files.


I just can't imagine a file system that doesn't provide a way
of knowing the length of a file.

Your imagination is obviously not very good.

Take for example CP/M. CP/M only keeps tracks of how many disk-blocks
a file uses. It does not keep track of how many bytes are used in each block.
It is normally the case that the last block of a file has one or more unused bytes.
Under CP/M there is no general way of finding out how many of those bytes are
unused for a given file. For text files the end is marked by a CONTROL-Z (this
was later inherited by MS-DOS). For binary files each program will have to come
up with some way of keeping track of that information inside the data files, or
just accept that there may be some garbage bytes at the end of a file.


Another example would be a file stored on a magnetic tape. There it might not be
possible to find out how large the file is without reading the file until
you reach an end-of-file marker. I am fairly certain that such devices
are still in use.


Maybe there is SOMEWHERE in
the world a crazy file system like that but why should we
care about it?

The world contains many weird file systems and storage devices that are actually
used quite a bit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top