fgetc - end of line - where is 0xD?

Z

Zero

Hi there,

I have the following file:

--------------------
Hello
world
--------------------

When I open this file in binary code,
the end of the first line is 0xD 0xA.

When I read this file with fgetc like
while( (c = fgetc(pFilePointer)) != '-1')
{
printf("\n%d", c);
}

I only get the 0xA at the end of line, not 0xD.

Does anybody know, what happens?

Zeh Mau
 
R

Richard Tobin

Zero said:
When I open this file in binary code,
the end of the first line is 0xD 0xA.

You're probably using a Microsoft operating system. If you were using
Unix, you'd see a single 0xA byte (a linefeed character). If you were
using an old Mac operating system, you'd see a single 0xD byte (a
carriage return). If you were using some ancient mainframe system,
you'd see a lots of nulls padding it to 80 characters.
When I read this file with fgetc like
while( (c = fgetc(pFilePointer)) != '-1')
{
printf("\n%d", c);
}

I only get the 0xA at the end of line, not 0xD.

It would be inconvenient if you had to know how lines end on every
different operating system that your program might run on, so C
converts line ends to a single linefeed character.

If you want to see the actual bytes in the file, open it in binary mode.

-- Richard
 
Z

Zero

You're probably using a Microsoft operating system.  If you were using
Yes I do.
so C > converts line ends to a single linefeed character.
Does it mean, the 0xD is there but the fgetc-functions simply ignores
it?
As I said, in binary code, both 0xD and 0xA are shown.

Zeh Mau
 
S

Sri Harsha Dandibhotla

Hi there,

I have the following file:

--------------------
Hello
world
--------------------

When I open this file in binary code,
the end of the first line is 0xD 0xA.

When I read this file with fgetc like
while( (c = fgetc(pFilePointer)) != '-1')
{
   printf("\n%d", c);

}

I only get the 0xA at the end of line, not 0xD.

Does anybody know, what happens?

Zeh Mau


This is what I found on Wikipedia ( http://en.wikipedia.org/wiki/Newline#Newline_in_programming_languages
)

see point 2 :
When writing a file in text mode, '\n' is transparently translated to
the native newline sequence used by the system, which may be longer
than one character. (Note that a C implementation is allowed to not
store newline characters in files. For example, the lines of a text
file could be stored as rows of a SQL table or as fixed-length
records.) When reading in text mode, the native newline sequence is
translated back to '\n'. In binary mode, the second mode of I/O
supported by the C library, no translation is performed, and the
internal representation of any escape sequence is output directly.

The internal representation in windows is \r\n which is shown in
binary without converting to a single \n character.
 
V

viza

Yes I do.

Does it mean, the 0xD is there but the fgetc-functions simply ignores
it?
As I said, in binary code, both 0xD and 0xA are shown.

In practice, it ignores it when it comes at the end of a line (before an
0xa), and it shouldn't appear elsewhere. In theory, the input file on
disc is converted into an abstract series of lines, and then then the
lines are separated by newline characters, and in us-ascii a newline is
0xa.
 
M

Martin Ambuhl

Zero wrote, asking a frequently asked question (FAQ) about end of lines

Using the two line input file containing
Hello
world
We run the following.
Notice the difference between reading in text mode("r") which just sees
that the end-of-line is marked in a system-specific way and in binary
mode ("rb") which sees the actual characters:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main(void)
{
int c;
FILE *f;
const char fname[] = "inputdata";

printf("Opening \"%s\" for input in text mode.\n", fname);
if (!(f = fopen(fname, "r"))) {
fputs("fopen failed. Quitting.\n", stderr);
exit(EXIT_FAILURE);
}
printf("The characters read from the file are (in text mode):\n");
while ((c = fgetc(f)) != EOF) {
printf("%#04x %#05o %03d ", (unsigned) c, (unsigned) c, c);
if (iscntrl(c))
printf(" (a control character)\n");
else if (isspace(c))
printf(" (whitespace)\n");
else if (!isgraph(c))
printf(" (other non-graphic)\n");
else
printf("'%c'\n", c);
}

putchar('\n');

printf("Reopening \"%s\" for input in binary mode.\n", fname);
if (!(f = freopen(fname, "rb", f))) {
fputs("freopen failed. Quitting.\n", stderr);
exit(EXIT_FAILURE);
}
printf("The characters read from the file are (in binary mode):\n");
while ((c = fgetc(f)) != EOF) {
printf("%#04x %#05o %03d ", (unsigned) c, (unsigned) c, c);
if (iscntrl(c))
printf(" (a control character)\n");
else if (isspace(c))
printf(" (whitespace)\n");
else if (!isgraph(c))
printf(" (other non-graphic)\n");
else
printf("'%c'\n", c);
}
fclose(f);
return 0;
}

[output on a Windows system]
Opening "inputdata" for input in text mode.
The characters read from the file are (in text mode):
0x48 00110 072 'H'
0x65 00145 101 'e'
0x6c 00154 108 'l'
0x6c 00154 108 'l'
0x6f 00157 111 'o'
0x0a 00012 010 (a control character)
0x57 00127 087 'W'
0x6f 00157 111 'o'
0x72 00162 114 'r'
0x6c 00154 108 'l'
0x64 00144 100 'd'
0x0a 00012 010 (a control character)

Reopening "inputdata" for input in binary mode.
The characters read from the file are (in binary mode):
0x48 00110 072 'H'
0x65 00145 101 'e'
0x6c 00154 108 'l'
0x6c 00154 108 'l'
0x6f 00157 111 'o'
0x0d 00015 013 (a control character)
0x0a 00012 010 (a control character)
0x57 00127 087 'W'
0x6f 00157 111 'o'
0x72 00162 114 'r'
0x6c 00154 108 'l'
0x64 00144 100 'd'
0x0d 00015 013 (a control character)
0x0a 00012 010 (a control character)
 
S

Sri Harsha Dandibhotla

Why are you comparing the result of fgetc with the multi-character literal
'-1'? I'm surprised that loop ever terminates. Actually, I imagine the
real answer is that the above code is NOT the actual code you ran.

He meant to test for -1 and not '-1'.
Though, he should rather test for EOF instead.

I have read that EOF doesn't always have the value of -1. Can someone
please list a few implementations where the value differs from -1?
Thanks
 
G

George

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main(void)
{
int c;
FILE *f;
const char fname[] = "inputdata";

printf("Opening \"%s\" for input in text mode.\n", fname);
if (!(f = fopen(fname, "r"))) {
fputs("fopen failed. Quitting.\n", stderr);
exit(EXIT_FAILURE);
}
printf("The characters read from the file are (in text mode):\n");
while ((c = fgetc(f)) != EOF) {
printf("%#04x %#05o %03d ", (unsigned) c, (unsigned) c, c);
if (iscntrl(c))
printf(" (a control character)\n");
else if (isspace(c))
printf(" (whitespace)\n");
else if (!isgraph(c))
printf(" (other non-graphic)\n");
else
printf("'%c'\n", c);
}

putchar('\n');

printf("Reopening \"%s\" for input in binary mode.\n", fname);
if (!(f = freopen(fname, "rb", f))) {
fputs("freopen failed. Quitting.\n", stderr);
exit(EXIT_FAILURE);
}
printf("The characters read from the file are (in binary mode):\n");
while ((c = fgetc(f)) != EOF) {
printf("%#04x %#05o %03d ", (unsigned) c, (unsigned) c, c);
if (iscntrl(c))
printf(" (a control character)\n");
else if (isspace(c))
printf(" (whitespace)\n");
else if (!isgraph(c))
printf(" (other non-graphic)\n");
else
printf("'%c'\n", c);
}
fclose(f);
return 0;
}

Many of Martin's posts are short, error-free, legible programs one can copy
and adapt easily. I get the same output he does for a different data set
on the same platform:

Opening "george.txt" for input in text mode.
The characters read from the file are (in text mode):
0x31 00061 049 '1'
0x20 00040 032 (whitespace)
0x20 00040 032 (whitespace)
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x31 00061 049 '1'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x31 00061 049 '1'
0x0a 00012 010 (a control character)
....
Reopening "george.txt" for input in binary mode.
The characters read from the file are (in binary mode):
0x31 00061 049 '1'
0x20 00040 032 (whitespace)
0x20 00040 032 (whitespace)
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x31 00061 049 '1'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x30 00060 048 '0'
0x31 00061 049 '1'
0x0d 00015 013 (a control character)
0x0a 00012 010 (a control character)

The tool that I have found very helpful for this type of work is od.exe
found here:

http://downloads.sourceforge.net/unxutils/UnxUtils.zip

Copy the .exe to a convenient directory and invoke it using the batch file
dump.bat. Dump.bat contains:

od -tx1 -Ax -v %1

-t == how to display data
x1 == one hex byte
-A == how to display address (offset from start of file)
x == hex
-v == show all data, including runs of duplicates

%1 first argument to .bat file

For example, if I have a file "chars.dat", then the appropriate command is:

C:\Users\epc\temp>dump chars.dat
--
George

When you turn your heart and your life over to Christ, when you accept
Christ as the savior, it changes your heart.
George W. Bush

Picture of the Day http://apod.nasa.gov/apod/
 
C

CBFalconer

Sri said:
.... snip ...

I have read that EOF doesn't always have the value of -1.
Can someone please list a few implementations where the
value differs from -1?

No. That is why you should always use the macro EOF, which is
defined in the standard includes. See the C standard.

Some useful references about C:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://c-faq.com/> (C-faq)
<http://benpfaff.org/writings/clc/off-topic.html>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf> (C99)
<http://cbfalconer.home.att.net/download/n869_txt.bz2> (pre-C99)
<http://www.dinkumware.com/c99.aspx> (C-library}
<http://gcc.gnu.org/onlinedocs/> (GNU docs)
<http://clc-wiki.net/wiki/C_community:comp.lang.c:Introduction>
 
H

Harald van Dijk

No. That is why you should always use the macro EOF, which is defined
in the standard includes.

Non sequitur. If every implementation in the world defines EOF as -1,
there is little benefit in using the macro. If some implementation gives
it a different value, you have a definite need to use the macro for your
code to work.

I don't have an example of an implementation where EOF is anything other
than -1.
 
K

Keith Thompson

Sri Harsha Dandibhotla said:
He meant to test for -1 and not '-1'.
Though, he should rather test for EOF instead.

I have read that EOF doesn't always have the value of -1. Can someone
please list a few implementations where the value differs from -1?

I don't know of any, and it's entirely possible that there are no C
implementations where EOF has a value other than -1.

Nevertheless, you should never write -1 where EOF would be
appropriate. For one thing, your code could break if some future
implementation, or some present implementation I don't know about,
uses a value other than -1 for EOF (which would be perfectly legal).
For another, writing EOF rather than -1 makes your intent much clearer
to anyone reading your code.
 
K

Keith Thompson

Harald van Dijk said:
Non sequitur. If every implementation in the world defines EOF as -1,
there is little benefit in using the macro. If some implementation gives
it a different value, you have a definite need to use the macro for your
code to work.

Yes, there is a benefit: clarity.

0 is a valid null pointer constant on every C implementation in the
world, but I still prefer to use NULL.
 
K

Keith Thompson

Jujitsu Lizard said:
We're talking Windows here. Unix ends lines with a 0xA only.

The safest approach (for portability given the universe of two
conventions) is probably to open every file in binary mode, then have
your code contain an automaton that treats 13-10 and 10 the same way.
I believe a common approach is to consider only the 10's.
[...]

No, the safest approach is to open text files in text mode, so you
don't have to worry about how line endings are represented. That's
what text mode is for.

(If you have to deal with text files in a format not native to the
operating system you're running on, that's a different matter. If
possible, the best approach is usually to convert such files to native
format.)
 
H

Harald van Dijk

To clarify, by "little" I did not mean "no", I meant "significantly
smaller".
Yes, there is a benefit: clarity.

if ((c = getchar()) == -1)

seems almost equally straightforward to me, given that all successful
results are nonnegative. If you include non-standard functions, there are
plenty more that return a fixed negative value to indicate an error.

I do agree that EOF is more readable, but I think it's a relatively small
point when compared to a concrete implementation where EOF != -1.
0 is a valid null pointer constant on every C implementation in the
world, but I still prefer to use NULL.

But I imagine you have no problems reading code by others that uses 0 to
initialise pointers. If so, here too the benefit is there, but it is not
great (to me).
 
H

Harald van Dijk

Keith Thompson said:
Harald van D?k said:
[...] If every implementation in the world defines EOF as -1,
[...]
[...]
I conclude that either the ANSI C Committee were fruitcakes or there
really were portability concerns with -1.

Well, I don't know if there are, but according to K&R, there were. It
describes two common conventions: end of file is indicated by -1, or by 0.
The latter was later disallowed by ANSI C, and I have no idea if those
implementations that used it have been changed, and if so, what value for
EOF they have changed to.
 
B

Bartc

Keith said:
Jujitsu Lizard said:
We're talking Windows here. Unix ends lines with a 0xA only.

The safest approach (for portability given the universe of two
conventions) is probably to open every file in binary mode, then have
your code contain an automaton that treats 13-10 and 10 the same way.
I believe a common approach is to consider only the 10's.
[...]

No, the safest approach is to open text files in text mode, so you
don't have to worry about how line endings are represented. That's
what text mode is for.

(If you have to deal with text files in a format not native to the
operating system you're running on, that's a different matter. If
possible, the best approach is usually to convert such files to native
format.)

This is exactly the problem. C's text mode /assumes/ a native format, and
might go wrong on anything else. In that case you might as well work in
binary and sort out the CR/LF combinations yourself.

(Possibly related: if I execute printf("Hello World\n") under Windows, and
redirect the output to a file, as in hello >output, I get CR CR LF at the
end. I've forgotten the reason for this; anyone known why?)
 
B

Ben Bacarisse

Bartc said:
Keith said:
Jujitsu Lizard said:
When I open this file in binary code,
the end of the first line is 0xD 0xA.

We're talking Windows here. Unix ends lines with a 0xA only.

The safest approach (for portability given the universe of two
conventions) is probably to open every file in binary mode, then have
your code contain an automaton that treats 13-10 and 10 the same way.
I believe a common approach is to consider only the 10's.
[...]

No, the safest approach is to open text files in text mode, so you
don't have to worry about how line endings are represented. That's
what text mode is for.

(If you have to deal with text files in a format not native to the
operating system you're running on, that's a different matter. If
possible, the best approach is usually to convert such files to native
format.)

This is exactly the problem. C's text mode /assumes/ a native format,
and might go wrong on anything else. In that case you might as well
work in binary and sort out the CR/LF combinations yourself.

If you have to deal with files from various systems you simply have a
general program design problem. Every choice you make will involve a
set of compromises between convenience for you and your users and the
formats that your program can handle. There is very little general
advice one can give.

In the dark ages, this was less of a problem. There were so many
kinds of file that any software that moved data between systems had to
know what to do with them all. You could move a file from a
record-oriented EBCDIC machine to a Unix one and the right things
would be done. The problem you see is partly caused by the similarity
between formats, rather than the differences, and pertly by the fact
the data gets moved between systems without regard to the data's
"type".
(Possibly related: if I execute printf("Hello World\n") under Windows,
and redirect the output to a file, as in hello >output, I get CR CR LF
at the end. I've forgotten the reason for this; anyone known why?)

Name and shame the compiler and (more likely "or") the library. It
helps to know what to avoid. I've not seen that behaviour and I would
want to avoid it as far as possible.
 
B

Bartc

Ben Bacarisse said:
Name and shame the compiler and (more likely "or") the library. It
helps to know what to avoid. I've not seen that behaviour and I would
want to avoid it as far as possible.

I've just remembered the reason: I was calling C's printf() from a language
that expanded "\n" to CR,LF actually in the string literal.

Because printf writes to stdout and stdout is in text mode, the LF results
in an extra expansion. But the CR,CR,LF is only seen when directed to a
file.

So not a C problem other than stdout being awkward to set to binary mode.
 
G

Guest

Keith Thompson said:
[...] If every implementation in the world defines EOF as -1,

[all: note the conditional - Harald is not making this claim, merely
reasoning about it. I don't know (or care) whether the claim is true.]
Yes, there is a benefit: clarity.

Yes. He said "little benefit", not "no benefit". If EOF did not exist (as
this sense that we know and love so well), it would hardly be necessary to
invent it unless there really were portability issues with -1.

I disagree. I think the clarity point is important
The reason
you give is a tiny reason. Lots of Unix people hard-code -1s into their
code knowing full well that they will be understood as failure tests by
lots of other Unix people.

a bad idea I think
I conclude that either the ANSI C Committee were fruitcakes or there really
were portability concerns with -1.


Yes. It's a small thing, though - if NULL didn't exist and everybody used
0, we'd all know what it meant, right?

but again it would be a bad idea. After all we *know* that 0 will
work on all impementations but many programmers (including me)
use the NULL macro.

The main reason to use a macros like these is semantic clarity.
There are two lesser reasons (or beneficial side effects).

1. if a value changes it can be changed in one place.
2. if a value changes you don't have to worry that
a global substitution will change unexpected values

#define MAX_BASE_STATIONS 9
#define RESET_COMMAND 9
#define HEADER_SIZE 9
#define SEEK_FIELD_OFFSET -1


--
Nick Keighley

"Initialize constants with DATA statements or INITIAL attributes;
initialize variables with excutable code."
Kernighan and Plauger "The Elements of Programming Style"
 
J

James Kuyper

Bartc wrote:
....
This is exactly the problem. C's text mode /assumes/ a native format,
and might go wrong on anything else. In that case you might as well work
in binary and sort out the CR/LF combinations yourself.

If there were only a few possible choices, that would make sense. But
what about, for instance, files from systems where end-of-line is
indicated by padding to a fixed block length with '\0'? That's just one
just one of several real-world options that involve neither CR nor LF.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top