Ben Morrow said:
You are certainly right for a large (and important) class of end-users.
Programs aimed at such people should, as you say, simply give a message
saying 'there was a problem' plus some way to send someone competent a
logfile with as much detailed information as possible.
That wasn't what I was saying. I intended to argue in favor of
omitting the \n because an event which causes a program to die is
usually one some kind of developer will need to deal with and hence,
including information which is only relevant to a developer in the
diagnostic makes sense.
[...]
Look at the error messages given by the standard Unix utilities. They
(usually) do include filenames and strerror messages, but very rarely
include __FILE__ and __LINE__ information.
In C-code, I always include the name of the function where the error
occurred. Granted, the default 'sudden death message suffix' is more
verbose than I consider necessary, OTOH, it contains enough
information to locate the operation that failed in the code and it is
already built in.
[...]
You don't need to go anywhere near that far. PerlIO (and stdio in C)
buffers writes, so when you call close (that is, the equivalent of
fclose(3), not close(2)) there is still a partial bufferful of data
which hasn't been sent to the kernel yet. If that write(2) gives an
error, that error will be returned from close.
So, more generally put: Using hidden buffering mechansisms is risky in
the sense that they decouple the operation which provides the data and
the operation which moves the data to its destination. As side effects
of that, the time window during which data loss could happen becomes
much larger and an application can no longer determine where in the
produced 'output data stream' the problem actually occurred (and try
to recover from that). The justification for hidden buffering is
'usually, it works and it usually improves the performance of the
system while it is working significantly'.
'Reliable I/O' or even just 'reliably reporting I/O errors' and
'hidden buffering' can't go together and ...
Additionally, close will return an error if *any* of the previous
writes failed (unless you call IO::Handle->clearerr), so checking
the return value of close means you don't need to check the return
value of print.
.... checking the return value of close is just a fig leaf covering the
problem: If everything worked, it is redundant and when something
didn't work, reporting that "something didn't work" (but I know neither
what nor when and can't do anything to repair that) is not helpful.
It is. Always. It's better to *know* something failed, and have some
idea as to why, than to think it succeeded when it didn't.
Let's assume that somebody runs a long-running data processing task
supposed to generate a lot of important output. On day three, the disk
is full but the program doesn't notice that. On day fifteen, after the
task has completed, it checks the return value of close and prints
"The disk became full! You lost!".
Do you really think a user will appreciate that?
Yes, if you have strict integrity requirements simply checking close
isn't enough. It does guard against the two most common causes of error,
though: disk full (or quota exceeded) and network failure.
Or so you hope: In case of a regular file residing on some local
persistent storage medium, a write system call encountering a 'too
little free space to write that' problem is supposed to write as many
bytes as can be written and return this count to the caller. This
means a final write on close will not return ENOSPC but will silently
truncate the file. Since close doesn't return a count, there's no way
to detect that. And there are - of course - people who think "Well, 0
is as good a number as any other number, so, in the case that write
really can't write anything, it is supposed to write as many bytes as
it can, namely, 0 bytes, and report this as successful write of size 0
to the caller" (I'm not making this up). Then, there are 'chamber of
horrors' storage media where 'sucessfully written' doesn't mean 'can
be read back again' and 'can be read back now' doesn't mean 'can still
be read back five minutes from now', aka 'flash ROM'. In case of
network communication, a remote system is perfectly capable of sending
an acknowledgment for something and die a sudden death before the
acknowledged data actually hit the persistent storage medium and so
on.
Eg, so far, I have written one program supposed to work reliably
despite it is running in Germany and writes to a NFS-mounted
filesystem residing in the UK and this program not only checks the
return value of each I/O operation but additionally checks that the
state of remote file system is what it was supposed to be if the
operation was actually performed sucessfully and retries everything
(with exponential backoff) until success or administrative
intervention --- network programming remains network programming, no
matter if the file system API is used for high-level convenience.