fopen() questions

N

No Such Luck

I have the need to open an arbitrary file and perform random access
reads, writes and seeks. The file may or may not exist. If it doesn't
exists, it needs to be created. I'm having a hard time trying to
figure out which mode the file should be opened in. None of them seem
like a good fit.

rb+ and r+b modes won't create the file if it doesn't exist
wb+ and w+b will overwrite the file if it does exist.

I think a possible solution would be to try and open the file in rb+
mode, and if that fails, open it in wb+ mode, creating it. However, I
feel it is sort of risky doing this in 2 steps. If something (i.e.,
network hiccup, disk issue, etc) was responsible for the initial rb+
failure, coincidental race timing could see me eradicating a perfectly
good file on accident.

The errno offers little help here, too. The same errno is reported
whether the destination is unreachable, or the destination is
reachable but the file just isn't there.

Any advice on the best way to handle this?

Thanks in advance...
 
S

Seebs

I have the need to open an arbitrary file and perform random access
reads, writes and seeks. The file may or may not exist. If it doesn't
exists, it needs to be created. I'm having a hard time trying to
figure out which mode the file should be opened in. None of them seem
like a good fit.

Long story short, I don't think you can do this.
rb+ and r+b modes won't create the file if it doesn't exist
wb+ and w+b will overwrite the file if it does exist.

And the a modes are useless because they force writes to the end of the
file.

Honestly, I don't know of a portable solution. You can reliably do this
in *nix by messing around with hard links, or you need to use some kind
of secondary mutex mechanism.
I think a possible solution would be to try and open the file in rb+
mode, and if that fails, open it in wb+ mode, creating it. However, I
feel it is sort of risky doing this in 2 steps. If something (i.e.,
network hiccup, disk issue, etc) was responsible for the initial rb+
failure, coincidental race timing could see me eradicating a perfectly
good file on accident.

"by" accident -- but I'll warn you, nothing anywhere can prevent that
kind of error.

-s
 
K

Keith Thompson

No Such Luck said:
I have the need to open an arbitrary file and perform random access
reads, writes and seeks. The file may or may not exist. If it doesn't
exists, it needs to be created. I'm having a hard time trying to
figure out which mode the file should be opened in. None of them seem
like a good fit.

rb+ and r+b modes won't create the file if it doesn't exist
wb+ and w+b will overwrite the file if it does exist.

I think a possible solution would be to try and open the file in rb+
mode, and if that fails, open it in wb+ mode, creating it. However, I
feel it is sort of risky doing this in 2 steps. If something (i.e.,
network hiccup, disk issue, etc) was responsible for the initial rb+
failure, coincidental race timing could see me eradicating a perfectly
good file on accident.
Yes.

The errno offers little help here, too. The same errno is reported
whether the destination is unreachable, or the destination is
reachable but the file just isn't there.

And the standard doesn't even require fopen() to set errno on failure
(though I expect most implementations do so).
Any advice on the best way to handle this?

It looks like "ab+" will do the trick:

a+b or ab+ append; open or create binary file for update, writing at
end-of-file

You can then fseek() to wherever you like.
 
N

No Such Luck

It looks like "ab+" will do the trick:

    a+b or ab+ append; open or create binary file for update, writing at
                       end-of-file

You can then fseek() to wherever you like.

I didn't think this would work. All the documentation I have seen
claims the following: "Opening a file with append mode (a as the first
character in the mode argument) shall cause all subsequent writes to
the file to be forced to the then current end-of-file, regardless of
intervening calls to fseek()."
 
M

Mark Storkamp

No Such Luck said:
I have the need to open an arbitrary file and perform random access
reads, writes and seeks. The file may or may not exist. If it doesn't
exists, it needs to be created. I'm having a hard time trying to
figure out which mode the file should be opened in. None of them seem
like a good fit.

rb+ and r+b modes won't create the file if it doesn't exist
wb+ and w+b will overwrite the file if it does exist.

I think a possible solution would be to try and open the file in rb+
mode, and if that fails, open it in wb+ mode, creating it. However, I
feel it is sort of risky doing this in 2 steps. If something (i.e.,
network hiccup, disk issue, etc) was responsible for the initial rb+
failure, coincidental race timing could see me eradicating a perfectly
good file on accident.

The errno offers little help here, too. The same errno is reported
whether the destination is unreachable, or the destination is
reachable but the file just isn't there.

Any advice on the best way to handle this?

Thanks in advance...

I think the paranoid solution would be to create a new file under a new
name, try to open the (possibly) existing file and copy its contents
into the new file, then if there is a hiccup, disk issue &c, you would
still leave the original file unmolested. I'm somewhat stuck as to what
you would do when you're done with the new file, since renaming it will
destroy the original file if it did indeed exist.

At some point, it seems, you just need to trust the OS.
 
K

Keith Thompson

No Such Luck said:
I didn't think this would work. All the documentation I have seen
claims the following: "Opening a file with append mode (a as the first
character in the mode argument) shall cause all subsequent writes to
the file to be forced to the then current end-of-file, regardless of
intervening calls to fseek()."

I just tried it, and you're right, it didn't work. The Linux man
page says:

a+ Open for reading and appending (writing at end of
file). The file is created if it does not exist.
The initial file position for reading is at the
beginning of the file, but output is always appended
to the end of the file.

It doesn't talk about fseek(), but it does say "always".

After opening a file with "ab+", fseek(f, 0, SEEK_SET) returns 0
(indicating success), but a subsequent write does go to the end of
the file. It seems to me like unfriendly behavior.
 
A

Alan Curry

After opening a file with "ab+", fseek(f, 0, SEEK_SET) returns 0
(indicating success), but a subsequent write does go to the end of
the file. It seems to me like unfriendly behavior.

It's friendly in the right context. It combines seek-to-the-end and
write-something into an atomic operation. Doing them separately causes
a race among multiple processes appending to the same file. It's made
less useful by the fact that writes at the stdio layer aren't
necessarily atomic themselves.

fopen modes are half-assed imitations of the UNIX open() modes, which
allow creation, truncation, and append-mode to be specified separately,
allow 8 different combinations, only some of which are achievable with
fopen. Creation with neither truncation nor append-mode is one of the
missing ones.
 
K

Keith Thompson

It's friendly in the right context. It combines seek-to-the-end and
write-something into an atomic operation. Doing them separately causes
a race among multiple processes appending to the same file. It's made
less useful by the fact that writes at the stdio layer aren't
necessarily atomic themselves.

fopen() doesn't write anything to the file; that requires a separate
write operation (fwrite(), whatever).

So if I open the file in append mode and then start writing to it, my
data is appended to the file. That's just how it should work.

But if I open the file in append mode, calls to fseek() appear to do
nothing, not even return an error indication.

Here's what happens in my small test program (I'll post it later if
there's sufficient interest). This is under Ubuntu.

Create a file and write 292 bytes of data to it, then close the file.
Open the file again in "ab+" mode.
ftell() returns 0.
Write some data to (the end of) the file.
ftell() now returns 305.
fseek to position 0.
ftell() now returns 0.
Write 3 bytes of data.
Now ftell() returns 308, and the new data is appended to the end of the
file.

I could understand it if fseek() and ftell() just treated the initial
end of the file as position 0 and only let you maneuver within the new
portion of the file, but the behavior isn't even consistent.
 
A

Alan Curry

fopen() doesn't write anything to the file; that requires a separate
write operation (fwrite(), whatever).

That doesn't contradict (or usefully enhance) anything I wrote.

stdio doesn't have atomic writes.
So if I open the file in append mode and then start writing to it, my
data is appended to the file. That's just how it should work.

But if I open the file in append mode, calls to fseek() appear to do
nothing, not even return an error indication.

That's because append mode is meant for append ONLY. Not for overwriting. It
exists to provide protection against accidental overwriting, by making
overwriting impossible. If you're seeking an append-mode file, you're missing
the point.

You can combine appending with reading, and seeks are effective for the read
operations, which might be useful on rare occasions.

This was the most important piece of my post. fopen modes are a mess, and are
best understood by memorizing the mapping from mode string to O_* flags. The
O_* flags make sense. Even non-unix implementations are effectively required
to imitate the unix O_* flags.

And when you need one of those combinations that doesn't have a corresponding
fopen mode string, fdopen(open(...)) is the only way to get it.
 
K

Keith Thompson

That doesn't contradict (or usefully enhance) anything I wrote.

stdio doesn't have atomic writes.

Then I must have misunderstood what you wrote:

It combines seek-to-the-end and write-something into an atomic
operation.

What was this supposed to refer to?
That's because append mode is meant for append ONLY. Not for overwriting. It
exists to provide protection against accidental overwriting, by making
overwriting impossible. If you're seeking an append-mode file, you're missing
the point.

You can combine appending with reading, and seeks are effective for the read
operations, which might be useful on rare occasions.

I think your description is valid for "ab" mode:

append; open or create binary file for writing at end-of-file

I'm talking about "ab+" mode:

append; open or create binary file for update, writing at
end-of-file

So if I've opened a file with fopen("foo.dat", "ab+") and then written
some data to it, what should fseek(f, 0, SEEK_SET) do? If you think
it's just a bad idea, that's fine, but then shouldn't it return an error
indication?
This was the most important piece of my post. fopen modes are a mess, and are
best understood by memorizing the mapping from mode string to O_* flags. The
O_* flags make sense. Even non-unix implementations are effectively required
to imitate the unix O_* flags.

And when you need one of those combinations that doesn't have a corresponding
fopen mode string, fdopen(open(...)) is the only way to get it.

Assuming your system supports fdopen and open, of course.
 
S

Seebs

So if I've opened a file with fopen("foo.dat", "ab+") and then written
some data to it, what should fseek(f, 0, SEEK_SET) do? If you think
it's just a bad idea, that's fine, but then shouldn't it return an error
indication?

No, because you could validly *READ* at the beginning of the file.

-s
 
A

Alan Curry

Then I must have misunderstood what you wrote:

It combines seek-to-the-end and write-something into an atomic
operation.

OK, that could have been interpreted the wrong way... what I meant was it
implicitly adds a seek before every write. And the only reason we'd ever want
an implicit seek instead of an explicit seek is that the implicit seek
combines with the following write atomically, but the explicit seek+write
could go like this:

[Start with empty file "foo"]
Process 1 Process 2
--------- ---------
fp=fopen(foo, "w")
fp=fopen(foo, "w")
fseek(fp, 0, SEEK_END)
/* now at end of file, which is
byte 0, since it's empty */
fseek(fp, 0, SEEK_END)
/* now at end of file, which is
byte 0, since it's empty */
puts("I am process 1")
/* written at offset 0 */
puts("I am process 2")
/* written at offset 0,
clobbering "I am process 1" */

With append-mode, it can only do this:

[Start with empty file "foo"]
Process 1 Process 2
--------- ---------
fp=fopen(foo, "a")
fp=fopen(foo, "a")
puts("I am process 1")
/* implicitly seeks to EOF, which
is byte 0 since it's empty.
writes at offset 0 */
puts("I am process 2")
/* implicitly seeks to EOF, which is
now after the "I am process 1" line,
and writes there */

Or the inverse, with the "I am process 2" line first. Whichever way it
goes, both lines will be in the file.

It's a primitive corruption-preventing measure that doesn't always work,
since writes aren't atomic. To give it a chance of working most of the
time, you have to turn off stdio buffering for the stream, or make it
line-buffered, or fflush after every independent log entry. That still
leaves the problem of big entries being split into multiple writes at a
lower level.
I'm talking about "ab+" mode:

append; open or create binary file for update, writing at
end-of-file

Maybe you're reading too much into the word "update". It doesn't mean
you can overwrite existing contents.
So if I've opened a file with fopen("foo.dat", "ab+") and then written
some data to it, what should fseek(f, 0, SEEK_SET) do? If you think
it's just a bad idea, that's fine, but then shouldn't it return an error
indication?

If the next operation is a read, you'll read from the beginning of the
file. If the next operation is a write, append mode takes effect. I'm
not sure that's what it *should* do, in the sense of "if you could go
back to 1972 and start fresh, what would you do" - maybe O_RDWR|O_APPEND
(and therefore "a+") should have been disallowed all along. Fail the
open, fail the fopen, errno=EINVAL. But then...

UNIX was never designed to keep people from doing stupid things,
because that policy would also keep them from doing clever
things. --Doug Gwyn
 
L

lawrence.jones

Keith Thompson said:
It looks like "ab+" will do the trick:

a+b or ab+ append; open or create binary file for update, writing at
end-of-file

You can then fseek() to wherever you like.

You can, but it doesn't do any good:

Opening a file with append mode ('a' as the first character in
the mode argument) causes all subsequent writes to the file to
be forced to the then current end-of-file, regardless of
intervening calls to the fseek function.
 
K

Keith Thompson

OK, that could have been interpreted the wrong way... what I meant was it
implicitly adds a seek before every write. And the only reason we'd ever want
an implicit seek instead of an explicit seek is that the implicit seek
combines with the following write atomically, but the explicit seek+write
could go like this:
[snip]

And 7.19.7.3, describing fputc() (which I should have checked earlier)
says:

If the file cannot support positioning requests, or if the stream
was opened with append mode, the character is appended to the output
stream.

So if you open a file in append mode, all writes to that file
are appended to the file.

[...]
Maybe you're reading too much into the word "update". It doesn't mean
you can overwrite existing contents.

Well, it does for "r+" mode (open text file for update (reading and
writing)).
If the next operation is a read, you'll read from the beginning of the
file. If the next operation is a write, append mode takes effect. I'm
not sure that's what it *should* do, in the sense of "if you could go
back to 1972 and start fresh, what would you do" - maybe O_RDWR|O_APPEND
(and therefore "a+") should have been disallowed all along. Fail the
open, fail the fopen, errno=EINVAL. But then...

UNIX was never designed to keep people from doing stupid things,
because that policy would also keep them from doing clever
things. --Doug Gwyn

Ok. But in this case, C (as opposed to UNIX) seems to be preventing you
from doing something that might be useful. The OP wanted to open a
binary file if it already exists, or create it if it doesn't, and then
possibly modify the existing data. That seems like a perfectly
reasonable thing to want to do.

And I think there's a way to do it; please shoot holes in this.

f = fopen("tmp.dat", "ab+");
freopen(NULL, "rb+", f);

One problem with freopen() (C99 7.19.5.4p3):

It is implementation-defined which changes of mode are permitted
(if any), and under what circumstances.

But at least if it doesn't work, it should give you an error indication.
 
K

Keith Thompson

You can, but it doesn't do any good:

Opening a file with append mode ('a' as the first character in
the mode argument) causes all subsequent writes to the file to
be forced to the then current end-of-file, regardless of
intervening calls to the fseek function.

Some day I'll learn to keep reading after I find what I think I'm
looking for.
 
A

Alan Curry

Ok. But in this case, C (as opposed to UNIX) seems to be preventing you
from doing something that might be useful. The OP wanted to open a
binary file if it already exists, or create it if it doesn't, and then
possibly modify the existing data. That seems like a perfectly
reasonable thing to want to do.

Yeah, the open() interface, which allows the mixing of flags arbitrarily,
opens the way for stupid things and clever things... fopen, by limiting to a
few combinations, is cutting them off. And I assume both of those interfaces
go way back before there was any dividing line between UNIX and C.
And I think there's a way to do it; please shoot holes in this.

f = fopen("tmp.dat", "ab+");
freopen(NULL, "rb+", f);

That's clever. It might even work. In fact... it did work. I ran a successful
test using glibc 2.7, and it does everything we want it to. It's clumsy,
using /proc/self/fd/ to reopen the file.

The underlying operation to get out of append mode is pretty simple:

fcntl(fileno(f), F_SETFL, fcntl(fileno(f), F_GETFL) & ~O_APPEND);

and your freopen could reasonably be interpreted as a request for exactly
that operation, even though there's no specification that explicitly says
so.
One problem with freopen() (C99 7.19.5.4p3):

It is implementation-defined which changes of mode are permitted
(if any), and under what circumstances.

But at least if it doesn't work, it should give you an error indication.

I never saw an freopen with NULL first argument before, so I did a little
survey of a few other libc's to see how well supported it might be. The
results:

OpenBSD[1]
The code looks like it doesn't support the freopen(NULL, ...) although I
might be missing something.

NetBSD[2]
This one actually starts off with _DIAGASSERT(file != NULL); which I don't
think is a good sign.

OpenSolaris[3]
This one definitely supports all feasible mode changes using fcntl.

FreeBSD[4]
This one also does the right thing with fcntl.

Test results from someone with an OpenBSD or NetBSD installation would be
nice to have.

[1] http://www.openbsd.org/cgi-bin/cvsw...13;content-type=text/plain;only_with_tag=MAIN
[2] http://cvsweb.netbsd.org/bsdweb.cgi...16&content-type=text/plain&only_with_tag=MAIN
[3] http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/stdio/fopen.c
[4] http://www.freebsd.org/cgi/cvsweb.c...en.c?rev=1.21.2.1.4.1;content-type=text/plain
 
K

Keith Thompson

I never saw an freopen with NULL first argument before, so I did a little
survey of a few other libc's to see how well supported it might be. The
results:

OpenBSD[1]
The code looks like it doesn't support the freopen(NULL, ...) although I
might be missing something.

NetBSD[2]
This one actually starts off with _DIAGASSERT(file != NULL); which I don't
think is a good sign.

OpenSolaris[3]
This one definitely supports all feasible mode changes using fcntl.

FreeBSD[4]
This one also does the right thing with fcntl.

Test results from someone with an OpenBSD or NetBSD installation would be
nice to have.

Interesting. The ability to use a NULL first argument is new in C99;
C90 just refers to "the string pointed to by filename".
 
M

Morris Keesan

I have the need to open an arbitrary file and perform random access
reads, writes and seeks. The file may or may not exist. If it doesn't
exists, it needs to be created. I'm having a hard time trying to
figure out which mode the file should be opened in. None of them seem
like a good fit.

Rather than knocking yourself out trying to do this with Standard C,
the best thing to do here may be to take advantage of non-C-standard
facilities offered by your operating system, e.g. POSIX calls, as
someone else suggested. To keep your program more portable, I suggest
isolating these OS dependencies in one or more separate source files,
whose interface you can recreate if you need to port from one OS to
another, e.g. from Linux to Windows.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top