Joe Wright said:
Ok vippstar whoever you are. You weren't listening. Among the three
systems, Apple, PC and Unix there are three line endings. Apple used a
single CR, Unix a single LF and the PC two bytes, CRLF.
Jacob said "There is no portable way to know what line separator the
system uses". Chuck said "Just open the file in text mode
(i.e. without the "rb") and wait until you detect a '\n' in the
stream".
My response to Chuck was that Hell will freeze over before a single CR
in a text stream will be converted to LF for me to see (on my system
here).
Agreed, what Chuck suggested will not reliably detect what line
terminator a system uses. You might never see a '\n' when reading a
text file in binary mode (if the system doesn't use '\n' as its line
terminator, or as part of it).
If you write a small file in text mode, then read the same file in
binary mode, then you *might* be able to determine how the system
represents line endings. At least it should work for Unix (LF), MacOS
<= 9 (CR), and DOS/Windows (CRLF). But there are stranger systems
than any of those out there, for example, some that don't use a
character sequence to terminate a line.
But the point is (or should be) that most of the time *it doesn't
matter* how the system represents line endings. If you want to read a
text file, use text mode and let the implementation take care of it
for you. If you want to read a "foreign" text file, you need to know
how it's represented; you might be able to get away with guessing, but
it's better to know by other means. (How do you tell the difference
between a Windows text file and a Unix text file where each line
happens to end with a carriage-return?)
Do all of us the favor of assuming we know LF is '\n' and CR is
\r'. Now that you understand my reply to Chuck, please feel free to
comment on it.
LF isn't *necessarily* '\n'. '\n', or new-line, is a character that
the C implementation uses internally to represent a line terminator.
Whatever representation the OS uses is translated to '\n' on text-mode
input. On old MacOS, for example, it would have been sensible for
'\n' to be the ASCII CR character; I don't know whether it was
actually done that way.
It may be the case that every non-EBCDIC uses the ASCII LF character
for '\n', but the standard doesn't guarantee this.
[...]