newlines in text files

R

rihad

Hi, I have this problem: when reading a M$ Windows text file under Windows in
text mode (fopen("blah", "r")) the Windows newline sequence, \r\n is returned as
is, i.e. it's not replaced by a single '\n'. Is this the correct behaviour? But
doesn't it make writing portable programs a bit harder? It would be nice if
whatever-newline-sequence-the-platform-has were replaced by a single \n, i.e.
the "C platform" line terminator. Like ints are always 64 bit in Java
irregardless of what platform lies under the Java platform.

In case it is indeed like that, how is my program supposed to know which line
terminator is being used?
 
R

Richard Heathfield

rihad said:
Hi, I have this problem: when reading a M$ Windows text file under Windows
in text mode (fopen("blah", "r")) the Windows newline sequence, \r\n is
returned as is, i.e. it's not replaced by a single '\n'. Is this the
correct behaviour?

No. Check your text file using a hex-capable file browser (e.g. LIST.COM if
you have it). If you don't have a hex-capable file browser, here's one:

FILE *fp = fopen(filename, "rb");
if(fp != NULL)
{
int ch = 0;
int i = 0;
while((ch = getc(fp)) != EOF)
{
printf(" %02X", ch);
i++;
if(16 == i)
{
putchar('\n');
i = 0;
}
}
fclose(fp);
}
putchar('\n');

But doesn't it make writing portable programs a bit
harder? It would be nice if whatever-newline-sequence-the-platform-has
were replaced by a single \n, i.e. the "C platform" line terminator.

It would be nice, and indeed it /is/ nice. Check your input file, and check
your fopen call. It's far more likely that your data or code is broken than
that your compiler is broken.
 
C

CBFalconer

rihad said:
Hi, I have this problem: when reading a M$ Windows text file under
Windows in text mode (fopen("blah", "r")) the Windows newline
sequence, \r\n is returned as is, i.e. it's not replaced by a
single '\n'. Is this the correct behaviour? But doesn't it make
writing portable programs a bit harder? It would be nice if
whatever-newline-sequence-the-platform-has were replaced by a
single \n, i.e. the "C platform" line terminator. Like ints are
always 64 bit in Java irregardless of what platform lies under
the Java platform.

In case it is indeed like that, how is my program supposed to
know which line terminator is being used?

There is something wrong with your installation and/or library.
The only way you should see both the /r and /n is if you open the
file in binary mode. Similarly writing a /n should create the /r
/n sequence.

However, you may be operating under Cygwin or something similar,
which attempts to create a complete Li/U-nix environment under
Windoze. I don't know just what provisions they make.

At any rate this is not a C language problem, and should be dealt
with in a newsgroup dedicated to your compiler/system.
 
R

rihad

However, you may be operating under Cygwin or something similar,
which attempts to create a complete Li/U-nix environment under
Windoze. I don't know just what provisions they make.

You're right! I recompiled the program with mingw's gcc and the problem went
away. I'll have to ask why Cygwin worked that way in a different newsgroup.
 
M

Micah Cowan

rihad said:
You're right! I recompiled the program with mingw's gcc and the problem went
away. I'll have to ask why Cygwin worked that way in a different newsgroup.

Because you probably told it to (IIRC, you are asked to specify
whether you want Cygwin to use Windows or Unix line-endings).

Cygwin is a different implementation from the implementation that
created the text file in question, so all bets are off. Cygwin
attempts to emulate a UNIX-like operating system, whereas if you
used Notepad or somesuch, you wrote the text file from within a
Windows operating system.

-Micah
 
R

rihad

Cygwin is a different implementation from the implementation that
created the text file in question, so all bets are off. Cygwin
attempts to emulate a UNIX-like operating system, whereas if you
used Notepad or somesuch, you wrote the text file from within a
Windows operating system.
Then because of C's Unix heritage, can it be said that \n text files are the
most portable?
 
C

CBFalconer

rihad said:
Then because of C's Unix heritage, can it be said that \n text
files are the most portable?

No. There are three common flavors of text files with line
termination sequences:

<crlf> \r\n CP/M, MsDos, Windoze, others
<lf> \n Linux, Unix, etc.
<cr> \r Macintosh, possibly other Apples.

and filters are available on most systems to convert between the
standards. Bear in mind that many other systems do not even have
a line termination sequence - they use other means, such as a
count of chars in a fixed length record, or whatever.

On MsDos/Windoze you can easily see the differences with such hex
capable systems as Buerg's LIST. TEXTPAD is capable of generating
(and converting) between at least Linux and Dos conventions, but
can't switch between hex and char displays.

As far as your C program is concerned, you end lines with a \n.
Nothing else enters into it. The i/o libraries will handle the
rest in a manner suitable for your system.
 
M

Micah Cowan

rihad said:
Then because of C's Unix heritage, can it be said that \n text files are the
most portable?

No, it cannot. What can be said is that all text files must be
represented as having "\n" line-endings to any C program that
opens them as a text file. However, what constitutes a "text
file" is defined by the particular implementation. In this case,
the text files made in Notepad do not fit Cygwin's idea of a
"text file".

-Micah
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top