Is this fully portable and/or smart?

C

Chris Torek

[on fseek()ing to offset -1 from SEEK_END]
Yes, but in the instant case I don't think it is possible for it
to have worked thousands of times on a particular "implementation"
and then to quit working,

Obviously you have not used VMS. :)
and as a matter of fact, if it works ONCE I can't see why it
would fail to work forever...

VMS has dozens of file formats, and using the -1 trick works on
some of them, but not all of them. So it would depend on the
file format of the file you opened.

The answer (which by now should be obvious) to the first part of
the quesetion (in the subject line) is "it is not fully portable".
As for whether it is "smart", that one is trickier.

In my ancient TeX-DVI-file-handling library, which had a rather
different but related problem to solve, I had a machine-dependent
function I called "make seekable", so that you could string DVI-file
commands together with pipes. Some readers here in in comp.lang.c
may be aware that Unix-like systems (including Linux) cause seek
(including fseek()) operations on pipes to fail. Even if you
fopen() your file, it is possible that the name refers to a pipe
(e.g., a "named pipe", or perhaps simply /dev/stdin), so that the
seek will fail.

The fully-portable, but ugly, solution is simply to copy the entire
file, adding a newline at the end if and only if the original
version did not have one. This is obviously going to be slower
than a machine-specific function that can use the seek-to-end trick.
Whether it is "significantly" slower depends on many other things.
 
C

Chris Torek

Finally, consider VMS. It's been years, but as I recall, text
files under VMS cannot be randomly-seeked, as the files are
stores as variable-length records, and you can only seek to
a record boundary. (This is a good example of only being able
to pass certain values, such as those returned from ftell.)

Actually, fseek() on VMS is considerably more complicated than
that.

VMS record formats fall into a couple of different categories. A
text file with fixed-length records "acts like" a punched card (the
format more or less goes with old-style mainframe punched-card
formats). If the record format is "fixed" and the length is N,
then it is easy to map from an offset-in-bytes to a <line,
record-offset> pair: offset o is line (o/N)+1 (line numbers start
at 1), record-offset (o%N). Of course, mapping this to a C stdio
text stream is more work, because the text stream has to insert
"apparent newlines" between each record, and optionally remove
trailing blanks.

A text file with variable-length records is tougher. The
variable-length records may have prefix byte-count fields. (In
early versions of VMS, this was the only other kind of text file.)
The VMS C library, however, still allows seeking to any position
within such a file, *provided* you have been to that position
earlier and used ftell() to find out where you are. The value
returned by ftell() is encoded: the ftell() routine packages
up a fairly large set of information that allows the C library
to find that record and its offset again, puts the data into a
table entry, and returns a pointer into the table entry (either
as an offset or as an actual pointer -- I never learned which).
(The table is freed when the stdio "FILE *" stream is closed.)

Finally, in VMS version 5 and later, there is a new [%] text format
called "stream-LF". In this format, text has variable-length
records that are separated/terminated by a linefeed (or "newline")
character. In other words, these files look exactly like any
Unix-like system's files. Here seeking to any arbitrary offset is
trivial, and everything works nicely for C.

[% Well, it was new back then. :) ]
 
K

Keith Thompson

Joe Wright said:
Peter Nilsson wrote: [...]
No. But even if you read the whole file, rewind can fail.
Assuming you have opened "test_file" successfully, what do you think
might cause 'rewind(test_file);' to fail?

If "test_file" isn't seekable, then rewind(test_file) will fail to
rewind it -- but it has no way to tell you that it failed.
 
P

Paul Hsieh

This is how I handle a check that the last character of a text
file is a newline:

/* checks if newline is last character of text file */
unsigned check_text_file_newline_termination(FILE *test_file) {
int end_char;

fseek(test_file,-1L,SEEK_END);

You have found one of the most worthless semantics in the entire C
standard library. According to the standard fseek() on text files
cannot function in a manner superior to fgetpos(), which you should
probably use instead.

In any event, the most obvious problem is for systems that use
multiple bytes to denote an end of line (like DOS/Windows and most
internet protocols.) Unless the system is willing to perform
immediate parsing and it maintains a strict isolation of the
termination characters (which is possible) it will not back you up to
a point where:
end_char=getc(test_file);

will give you the '\n' you were looking for. When one realizes this
nonsense one is inevitably lead to the question: What is the use of
text files in the C language anyways? Personally, I prefer to always
open files as binary and use the following grammar to read them:

contents := line* linebody? DOSEOF?
line := linebody lineterminator
linebody := [^\n\r]+
lineterminator := \n | \r | \r\n

(Where \n = LF and \r = CR and DOSEOF = \033.) For ASCII and UTF-8,
this makes you compatible with Unix, Mac and DOS all at the same
time. You can even open a file which has mistakenly mixed the line
terminator formats without issue.

This also suggests a method for you to determine if the text file has
a line terminator -- open it as binary, do a fseek to the end with
offset -1L as you do above, and check the last character for either \r
or \n; if its DOSEOF then back it up one more then check for \r or
\n. This *may* be wrong on UNIX systems that can allow \r and DOSEOF
to be legitimate content characters, but you can typically make
demands that text files not contain control characters other than \n
and \t.
rewind(test_file);

if(end_char=='\n') return TRUE;
else return FALSE;
}

The question is: is this actually guaranteed to work properly on
all "conforming" C "implementations"?

Did this actually work on a Windows system? I am too lazy to check.
If it did, I can only assume that the C compiler library is just
promoting a LF by itself to a '\n'. My recollection was that on some
DOS systems text files were terminated by and 27 (=EOF) character as
well. Either way, I don't believe you can expect the C libraries for
all Windows/DOS compilers to support this (but I could be wrong.)
[...] My reading of the spec says
"no", but of course it works just fine on the several systems I've
used it on...

Maybe some weird EBCDIC system would fail. Or maybe the standard
worshipers here might pull some random nonsense system like the
Epilepsy or Tandem pit stop where it fails. Personally, I prefer to
pick standards which have the most relevance. In this case, the three
main desktop OSes cover pretty much all the text file formats that
matter, and the grammar I gave above reads all of them
simultaneously. The C standard has less to offer me than that.
 
S

Spiros Bousbouras

When one realizes this
nonsense one is inevitably lead to the question: What is the use of
text files in the C language anyways? Personally, I prefer to always
open files as binary and use the following grammar to read them:

contents := line* linebody? DOSEOF?
line := linebody lineterminator
linebody := [^\n\r]+
lineterminator := \n | \r | \r\n

(Where \n = LF and \r = CR and DOSEOF = \033.)

What if the file contains empty lines ? It seems to
me you should either do
linebody := [^\n\r]*
or
line := linebody? lineterminator
 
B

Bill Reid

Chris Torek said:
[on fseek()ing to offset -1 from SEEK_END]
Yes, but in the instant case I don't think it is possible for it
to have worked thousands of times on a particular "implementation"
and then to quit working,

Obviously you have not used VMS. :)

Au contraire, but it was so long ago I hardly remember...

What I do recall was that I worked with a LOT of text files created
using VMS "Edit" (or whatever their text editor was called), and these
files were actually parsed by UNIX scripts, commands, and utilities,
with only a small amount of conversion required for line endings (I think
that was all they needed)...and if I remember correctly (I may not), I
think the files were all cross-mounted on UNIX servers...
VMS has dozens of file formats, and using the -1 trick works on
some of them, but not all of them. So it would depend on the
file format of the file you opened.

This of course is always true of ALL systems, since SEEK_END
is not required to be meaningful for binary streams according to the
"standard"...and the standard specifically says you must use
"0" as an offset for SEEK_END...once again, somebody didn't
"get the memo"...
The answer (which by now should be obvious) to the first part of
the quesetion (in the subject line) is "it is not fully portable".

Particularly for ports to the 20-year-old past!
As for whether it is "smart", that one is trickier.

In my ancient TeX-DVI-file-handling library, which had a rather
different but related problem to solve, I had a machine-dependent
function I called "make seekable", so that you could string DVI-file
commands together with pipes. Some readers here in in comp.lang.c
may be aware that Unix-like systems (including Linux) cause seek
(including fseek()) operations on pipes to fail. Even if you
fopen() your file, it is possible that the name refers to a pipe
(e.g., a "named pipe", or perhaps simply /dev/stdin), so that the
seek will fail.

Yes, a "pipe" will cause a fseek() error, and set ESPIPE in
errno()...and this type of error is unknown to the C standard,
but "implementation-defined", like in POSIX...
The fully-portable, but ugly, solution is simply to copy the entire
file, adding a newline at the end if and only if the original
version did not have one. This is obviously going to be slower
than a machine-specific function that can use the seek-to-end trick.
Whether it is "significantly" slower depends on many other things.

Yeah, I can imagine. In any event, the nature of the way I use
this is I don't use "pipes", stdin, CTRL-Z terminated files (I actually
filter this type of stuff out routinely before the file is saved in the
first place), non-disk files of any sort...so we're kind of down to
only an unopened file or illegal seek value error, and the file is
guaranteed to be opened by the calling function, and I would
think that if -1L from SEEK_END is a legal seek value (even
though the "standard" says it isn't) once on an "implementation",
it always will be...
 
B

Bill Reid

Keith Thompson said:
Joe Wright said:
Peter Nilsson wrote: [...]
No. But even if you read the whole file, rewind can fail.
Assuming you have opened "test_file" successfully, what do you think
might cause 'rewind(test_file);' to fail?

If "test_file" isn't seekable, then rewind(test_file) will fail to
rewind it -- but it has no way to tell you that it failed.

At the risk of repeating my reply to "vippstar", any non-seekable
"file", like a "pipe", will of course fail to fseek(), which in this case
means the rewind() is irrelevant...
 
B

Bill Reid

Flash Gordon said:
Bill Reid wrote, On 20/05/08 02:00:

You asked if it would work on all possible conforming implementations,
so why are you shouting at someone for pointing out places where it
might fail without your code spotting it?

Well, there are actually a limited number of reasons why it
would fail in the first place, and most if not all of those don't
apply to this particular usage. Now on my "silent but deadly"
system, I'm further limited to ONE possible failure, an unopened
file, and THAT error is handled in the calling function, riiiiiiiight?
(I know, some goofball could inherit my personal code after
my death and start calling it with unopened files...)
 
B

Bill Reid

Keith Thompson said:
Where did you get the idea that fgetpos and fsetpos aren't
POSIX-compliant?

Well, if you want to get "technical" about it, I don't think the
presence of the two in a C compiler makes the C compiler
"non-conformant" to POSIX, and they ARE mentioned IN PASSING
in the POSIX standard, BUUYUUTTTT......
They're standard C functions (both C90 and C99), and
therefore they're POSIX-compliant as well.

Wrong, so very wrong.

"Standard C" != POSIX

This is by the clear language of the POSIX standard. A "C
implementation" may only be called "POSIX-conformant" if
it includes certain extensions and changes to the "standard"
C libraries. To the extent those changes exist, the C
"implementation" can no longer be called "standard" C,
and certainly can't be considered "portable" (except of course,
to other POSIX systems).
If you're willing to settle for POSIX compliance without necessarily
having code that's fully portable C, you should ask for advice in
comp.unix.programmer (this would let you use fseeko() and ftello(),
for example).

Don't forget lseek(), fileno(), filedes, and on and on and on...

fseek() and ftell(), et. al., are the clearly-described overlapping
requirements of both the "C" standard and POSIX, so that's what
I use...FOR MAXIMUM POSSIBLE PORTABILITY!!!
 
K

Keith Thompson

Bill Reid said:
Well, if you want to get "technical" about it, I don't think the
presence of the two in a C compiler makes the C compiler
"non-conformant" to POSIX, and they ARE mentioned IN PASSING
in the POSIX standard, BUUYUUTTTT......

And, in fact, the *absence* of fgetpos and fsetpos in a C
implementation would cause that implementation to be non-conformat to
both C and POSIX.
Wrong, so very wrong.

No. Take a look at any draft of the C standard, or any C textbook, or
your online documentation.
"Standard C" != POSIX

Do you seriously think I'm not perfectly well aware of that?
This is by the clear language of the POSIX standard. A "C
implementation" may only be called "POSIX-conformant" if
it includes certain extensions and changes to the "standard"
C libraries.
Right.

To the extent those changes exist, the C
"implementation" can no longer be called "standard" C,
and certainly can't be considered "portable" (except of course,
to other POSIX systems).

The C standard specifically allows for extensions. Most (or all?)
POSIX extensions are compatible with the C standard. For example,
POSIX specifies a <unistd.h> header; this doesn't conflict with
anything in the C standard.

(Some POSIX extensions, as I recall, are in the form of additional
declarations in <stdio.h>, but I *think* those extensions are enabled
only if you define a certain preprocessor symbol. I don't remember
the details.)

But all of that is beside the point.
Don't forget lseek(), fileno(), filedes, and on and on and on...

I didn't forget them; I didn't mention them because they weren't
relevant to my point.
fseek() and ftell(), et. al., are the clearly-described overlapping
requirements of both the "C" standard and POSIX, so that's what
I use...FOR MAXIMUM POSSIBLE PORTABILITY!!!

Uh huh.

The point that you're persistently missing is that fgetpos() and
fgetpos() are *also* specified in *both* the C and POSIX standards.
fseek(), ftell(), fgetpos(), and fsetpos() all have exactly the same
status with respect to the C standard (C89, C90, C95, C99) and the
POSIX standard.
 
B

Bill Reid

Keith Thompson said:
And, in fact, the *absence* of fgetpos and fsetpos in a C
implementation would cause that implementation to be non-conformat to
both C and POSIX.

What authority do you rely on to say that absence of those two in
a C "implementation" makes it not POSIX-conformant?
No. Take a look at any draft of the C standard, or any C textbook, or
your online documentation.

I think you're having logic problems again. "POSIX" is only listed
in the bibliography of the C "standard". My online documentation
hardly mentions POSIX at all, except for some of the supported
extensions. It tends to use the term "UNIX" to designate what I
assume to be POSIX portability for most stuff, and fgetpos() and
fsetpos() are specifically listed as NOT being portable to "UNIX"...

In any event, you're looking at this the wrong way...since even by
your "logic" POSIX is a superset of the C "standard", why would
I read "C" documentation to figure out what is in POSIX?
Do you seriously think I'm not perfectly well aware of that?

<ATOMIC_BOGGLE!!!>

B-b-b-but...well, I'm just too stunned to think of something to
say here...

You're losing your power to shock me...
The C standard specifically allows for extensions. Most (or all?)
POSIX extensions are compatible with the C standard.

You mean "undefined behavior" is now "compatible" with the
C "standard"? Maybe you're regaining your ability to confound
me with contradictory nonsense...
For example,
POSIX specifies a <unistd.h> header; this doesn't conflict with
anything in the C standard.

I can't believe I'm actually reading this...even more so, actually
bothering to reply to it...
(Some POSIX extensions, as I recall, are in the form of additional
declarations in <stdio.h>, but I *think* those extensions are enabled
only if you define a certain preprocessor symbol. I don't remember
the details.)

Try REAL hard and you might remember what you're talking
about here...
But all of that is beside the point.

Yes, it is, it really is...
I didn't forget them; I didn't mention them because they weren't
relevant to my point.

You had a point?
Uh huh.

The point that you're persistently missing is that fgetpos() and
fgetpos() are *also* specified in *both* the C and POSIX standards.

Where is fgetpos() and fsetpos() "specified" in the POSIX standard?
Chapter and verse, please...
fseek(), ftell(), fgetpos(), and fsetpos() all have exactly the same
status with respect to the C standard (C89, C90, C95, C99) and the
POSIX standard.

Not in MY copy of the POSIX standard, or my "implementation"
documentation...
 
I

Ian Collins

Bill said:
What authority do you rely on to say that absence of those two in
a C "implementation" makes it not POSIX-conformant?
http://www.opengroup.org/onlinepubs/000095399/functions/fgetpos.html


In any event, you're looking at this the wrong way...since even by
your "logic" POSIX is a superset of the C "standard", why would
I read "C" documentation to figure out what is in POSIX?
It may reference the C standard, but it does not include the text.
You mean "undefined behavior" is now "compatible" with the
C "standard"? Maybe you're regaining your ability to confound
me with contradictory nonsense...
I can't see any reference to undefined behavior in Keith's postings. A
standard such as POSIX if free to define behavior that is implementation
defined in the C standard. It is also free to extend features (signals,
errno values and so on).
I can't believe I'm actually reading this...even more so, actually
bothering to reply to it...
Well you just have.
 
K

Keith Thompson

Bill Reid said:
Keith Thompson said:
Bill Reid said:
news:[email protected]... [...]
Where did you get the idea that fgetpos and fsetpos aren't
POSIX-compliant?

Well, if you want to get "technical" about it, I don't think the
presence of the two in a C compiler makes the C compiler
"non-conformant" to POSIX, and they ARE mentioned IN PASSING
in the POSIX standard, BUUYUUTTTT......

And, in fact, the *absence* of fgetpos and fsetpos in a C
implementation would cause that implementation to be non-conformat to
both C and POSIX.

What authority do you rely on to say that absence of those two in
a C "implementation" makes it not POSIX-conformant?

I don't know POSIX as well as I know C, but my understanding is that
POSIX requires a conforming C implementation. You say you have a copy
of the POSIX standard, so you can verify that yourself. Look up the
"c99" command. Or, if you have an older version, perhaps there's a
"c89" or "c95" command, or *some* command that's supposed to be a C
compiler.

The reference I've been using is the set of web pages at
<http://www.opengroup.org/onlinepubs/NNNNNNNNN/nframe.html>, where
"NNNNNNNNN" needs to be replaced with a decimal number that, if I
recall correctly, I obtained by registering at the site. The header
says:

The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
Copyright (c) 2001-2004 The IEEE and The Open Group

Quoting the page that describes fgetpos() (which you can find among
the first few hits of a Google search for "fgetpos"):

The functionality described on this reference page is aligned with
the ISO C standard. Any conflict between the requirements
described here and the ISO C standard is unintentional. This
volume of IEEE Std 1003.1-2001 defers to the ISO C standard.
I think you're having logic problems again. "POSIX" is only listed
in the bibliography of the C "standard". My online documentation
hardly mentions POSIX at all, except for some of the supported
extensions. It tends to use the term "UNIX" to designate what I
assume to be POSIX portability for most stuff, and fgetpos() and
fsetpos() are specifically listed as NOT being portable to "UNIX"...

I suggest that your online documentation is wrong, or perhaps merely
very old.
In any event, you're looking at this the wrong way...since even by
your "logic" POSIX is a superset of the C "standard", why would
I read "C" documentation to figure out what is in POSIX?

I suggest that quotation marks don't mean what you think they mean.

If POSIX is a superset of the C standard, then everything that's part
of C is part of POSIX. fgetpos() and fsetpos() are part of C.
Therefore, fgetpos() and fsetpos() are part of POSIX.

[snip]
Where is fgetpos() and fsetpos() "specified" in the POSIX standard?
Chapter and verse, please...

I don't have a copy of the POSIX standard. You claim that you do.
Try looking in the index or the table of contents.

[snip]

Strictly speaking, POSIX is off-topic here, but fgetpos and fsetpos
are topical. Both are standard C functions, and have been since the
first C standard was issued in 1989. I'd be surprised if you could
find a modern system with a C compiler on which they don't work as
specified.
 
P

Paul Hsieh

When one realizes this
nonsense one is inevitably lead to the question: What is the use
of text files in the C language anyways? Personally, I prefer to
always open files as binary and use the following grammar to read
them:
contents := line* linebody? DOSEOF?
line := linebody lineterminator
linebody := [^\n\r]+
lineterminator := \n | \r | \r\n
(Where \n = LF and \r = CR and DOSEOF = \033.)

What if the file contains empty lines ? It seems to
me you should either do
linebody := [^\n\r]*
or
line := linebody? lineterminator

Good catch. I'd go with the first option.
 
B

Bill Reid

Ian Collins said:

OK, NOW it is in POSIX, as of "Issue 4" (look at the change history
section); it was NOT in "Issues 1, 2, and 3"...but fseek() was in there
from the beginning. I'm relying on documentation that is several years
old at least, and who knows, possibly POSIX-compliance of similar
vintage if I should try to port my application...

For some really relevant fun, read the description of fseek() from
the same source:

http://www.opengroup.org/onlinepubs/000095399/functions/fseek.html

Note carefully that the restriction on only seeking a non-zero offset
from SEEK_SET applies only to wide-character I/O, conflicting with
more restrictive language of the C "standard"...
It may reference the C standard, but it does not include the text.

Huh? You just gave me a link to what purports to be the "POSIX"
description of a C "standard" function. ALL POSIX versions I have
read either explicitly referenced the C "standard" section for functions
that were "identical" for both, or explicitly described the differences,
or provided a full description for functions/defines/etc. that were
unique to POSIX. In this latest version, they apparently have full
descriptions of all the C "standard" functions, with a little notation
indicating what is an "extension" to the C "standard".

So if I want POSIX-compliance, why the hell would I read the
C "standard" again?
I can't see any reference to undefined behavior in Keith's postings.

Which postings? He's never posted the words "undefined behavior"?

If you're talking about this post, I know that in the past at least some
of the POSIX extensions, specifically extra arguments to strftime(), were
listed as "undefined behavior" by the C "standard", and were documented
as "extensions" in POSIX (and in my own "implementation" documentation).
So in this post, he claimed that most or ALL POSIX extensions were
"compatible" with the C "standard", he must have been referring to
some "undefined behavior" rather than just "implementation-defined"
behavior...
A
standard such as POSIX if free to define behavior that is implementation
defined in the C standard. It is also free to extend features (signals,
errno values and so on).

Well, sure, I guess...still won't make an application "portable" that
relies on any of that stuff, at least from a C "portability" standpoint,
which up until today I thought was the monomaniacal focus of the
group...
Well you just have.

Yup, did it again too...

Anyway, learned something today: if a system is POSIX "Issue 4"
compliant, I CAN use fgetpos() and fsetpos()...
 
I

Ian Collins

Bill said:
Huh? You just gave me a link to what purports to be the "POSIX"
description of a C "standard" function.

There's more to the C standard than the standard library section.
 
K

Keith Thompson

Bill Reid said:
Anyway, learned something today: if a system is POSIX "Issue 4"
compliant, I CAN use fgetpos() and fsetpos()...

I'm glad we could help.

Don't expect much more help here unless you change your attitude.
 
V

vippstar

OK, NOW it is in POSIX, as of "Issue 4" (look at the change history
section); it was NOT in "Issues 1, 2, and 3"...but fseek() was in there
from the beginning. I'm relying on documentation that is several years
old at least, and who knows, possibly POSIX-compliance of similar
vintage if I should try to port my application...

For some really relevant fun, read the description of fseek() from
the same source:

http://www.opengroup.org/onlinepubs/000095399/functions/fseek.html

Note carefully that the restriction on only seeking a non-zero offset
from SEEK_SET applies only to wide-character I/O, conflicting with
more restrictive language of the C "standard"...



Huh? You just gave me a link to what purports to be the "POSIX"
description of a C "standard" function. ALL POSIX versions I have
read either explicitly referenced the C "standard" section for functions
that were "identical" for both, or explicitly described the differences,
or provided a full description for functions/defines/etc. that were
unique to POSIX. In this latest version, they apparently have full
descriptions of all the C "standard" functions, with a little notation
indicating what is an "extension" to the C "standard".
POSIX and ISO C99 have many differences. For example, POSIX requires
CHAR_BIT to be exactly 8 while the standard only guarantees it to be
equal or greater than 8.
In POSIX, int is at least 32 bits, while in ISO C (and ANSI C) it's at
least 16 bits.
POSIX allows a void pointer to hold a function pointer, while ISO C
doesn't allow that. There are more differences than the above.
POSIX is not a superset of ISO C just like ISO C++ is not a superset
of ISO C.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,144
Latest member
KetoBaseReviews
Top