Text files

K

Kenneth Brody

Is difference between interpretation of \n only difference ?

Difference in what?

Oh, I see... You put some critical information in the subject
and failed to include it in the body of your message. The
relevent information is "text files".

So, I assume your question is "are there any differences between
text mode and binary mode, other than '\n' interpretation?"

There can be plenty of differences besides '\n' interpretation.

Consider that MS-DOS/Windows will probably treat '\x1a" as EOF
when reading in text mode. Consider also what constitues "file
size", which may be calculated differently between text and
binary modes. Finally, consider that some O/Ses actually store
the data on disk differently based on file type.

I'm sure others can tell you other differences which I may have
either forgotten or am not aware of.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
S

santosh

Difference in what?

Oh, I see... You put some critical information in the subject
and failed to include it in the body of your message. The
relevent information is "text files".

So, I assume your question is "are there any differences between
text mode and binary mode, other than '\n' interpretation?"

I rather think the OP meant to ask whether there were any other
differences apart from end-of-line interpretation, between different
text files on different systems.

To the OP:

As far as text files of different system's are concerned, they are
meant to be portable, and to a Standard C program, there should be
no difference between text files created by it and those created by
other implementations.

Under the hood, there may well be many more differences than just the
end-of-line one, but they're all supposed to be well hidden by the
operating system and the C library.
 
M

Malcolm McLean

Is difference between interpretation of \n only difference ?
No. Text files often have a control Z as end of file marker. However on most
systems the text and binary formats are in fact identical.#
 
F

Flash Gordon

Malcolm McLean wrote, On 19/07/07 20:59:
No. Text files often have a control Z as end of file marker. However on
most systems the text and binary formats are in fact identical.#

So you think most systems are not DOS/Windows/VMS/MacOS-9/whatever? I'm
sure at least one other OS I know has a different format for text files
to binary files, but I'll stick to the ones where I know there are
differences. Also, remember that Windows probably includes WinCE and its
successors which are used in some PDAs and mobile phones.
 
K

Keith Thompson

Flash Gordon said:
Malcolm McLean wrote, On 19/07/07 20:59:

So you think most systems are not DOS/Windows/VMS/MacOS-9/whatever?
I'm sure at least one other OS I know has a different format for text
files to binary files, but I'll stick to the ones where I know there
are differences. Also, remember that Windows probably includes WinCE
and its successors which are used in some PDAs and mobile phones.

What exactly does "binary format" mean? On many (most?) systems,
there's no such thing as a "binary format" -- or rather, there's a
practically infinite number of such formats, at least potentially.

A text file is (usually) just a special case of a binary file, with
some structure imposed on it.

Now it would make sense to ask what translations are performed when
access a file in text mode as opposed to binary mode, but I'm not sure
whether that's what the OP asked.
 
F

Flash Gordon

Keith Thompson wrote, On 20/07/07 00:06:
What exactly does "binary format" mean? On many (most?) systems,
there's no such thing as a "binary format" -- or rather, there's a
practically infinite number of such formats, at least potentially.

A text file is (usually) just a special case of a binary file, with
some structure imposed on it.

Taking that attitude means that "binary format" and "text format" are
identical on ALL systems, so asking the question is pointless. Therefore
that is obviously not what it meant, and it is obviously not what
Malcolm meant in his reply.
Now it would make sense to ask what translations are performed when
access a file in text mode as opposed to binary mode, but I'm not sure
whether that's what the OP asked.

I think the OP almost certainly did mean whether you get the same thing
reading/writing in text mode and binary mode. This is not just because
it is the most obvious C way to interpret the questions, but also
because as you note in some sense ALL text files can be considered
binary files.

The only other way I can see to interpret the question that makes sense
to me is whether all binary files are also text files, i.e. at a minimum
will not do silly things if you display them on your terminal, and that
is clearly false for most systems that I have used.
 
R

Richard

Malcolm McLean said:
No. Text files often have a control Z as end of file marker. However
on most systems the text and binary formats are in fact identical.#

What on earth is "binary format"?

And what are "most systems"?
 
R

Richard Heathfield

santosh said:
A catch-all name, I presume, for all human-unreadable
information.

Binary doesn't necessarily mean human-unreadable. Those amongst us who
have written programs in binary, as I have, will almost certainly agree
that, whilst reading binary isn't necessarily top of their Ten Fun
Things To Do Today list, it is nevertheless possible.

In any case, if you're right /and/ Malcolm is right, then text files are
unreadable too, on most systems. Is that your contention?
 
O

osmium

Malcolm McLean said:
No. Text files often have a control Z as end of file marker. However on
most systems the text and binary formats are in fact identical.#

The most common form of text files use ASCII code, and there is no character
called 'control Z' in ASCII. Furthermore, I don't know of any compiler for
desktop computers that expects EOF to be encoded in the data, EOF is a
*condition* detected by the OS. One can signal EOF from a keyboard by
pressing the key combination ctrl + 'z', usually written as ^Z, on a DOS
based machine. I think that is what is referred to above, but it has
nothing to do with files. Files are an *external* representation of a data
set, a keyboard is not an external representation.
 
S

santosh

osmium said:
The most common form of text files use ASCII code, and there is
no character
called 'control Z' in ASCII.
Furthermore, I don't know of any
compiler for desktop computers that expects EOF to be encoded
in the data, EOF is a
*condition* detected by the OS. One can signal EOF from a
keyboard by pressing the key combination ctrl + 'z', usually
written as ^Z, on a DOS
based machine. I think that is what is referred to above, but
it has
nothing to do with files. Files are an *external*
representation of a data set, a keyboard is not an external
representation.

Many systems though *did* have a so-called end-of-file marker.
For DOS it's ASCII character code 26, which is generated by
pressing CTRL-Z. DOS doesn't use it itself but CP/M systems used
it to mark the end of valid data in a disk block.
 
R

Roberto Waltman

osmium said:
The most common form of text files use ASCII code, and there is no character
called 'control Z' in ASCII. Furthermore, I don't know of any compiler for
desktop computers that expects EOF to be encoded in the data, EOF is a
*condition* detected by the OS.

[Off-Topic] CP/M used a 0x1A character (a.k.a "Control-Z") as an EOF
marker "encoded in the data", since the OS kept track of file sizes
only as multiples of 128 byte blocks. (This convention was used only
in text files, obviously.)

Since every CP/M program that manipulated text expected this, and
Microsoft's DOS started its life as a CP/M look alike for the 8086/8
family, DOS programs also interpreted (and many still do) a CTRL-Z as
an EOF mark, and added automatically a CTRL-Z at the end of text files
when closing them. (Try COPYing a file that contains a CTRL-Z in a
Windows system, with and without the /B switch)

There is enough software relying on this, and enough files created
with a CTRL-Z at the end, that the C# language definition includes
provisions to deal with it:

"The C# Programming Language", 2nd ed., (c) 2006

2 - Lexical Structure ...
2.3 - Lexical Analysis ...
2.3.1 - Line Terminators ...
"If the last character of the source file is
a Control-Z character (U+001A) this
character is deleted"

I was forced to do the same eons ago, when writing C programs that had
to be portable between MSDOS, CP/M and DEC's operating systems.

Roberto Waltman

[ Please reply to the group,
return address is invalid ]
 
N

Nick Keighley

The most common form of text files use ASCII code, and there is no character
called 'control Z' in ASCII. Furthermore, I don't know of any compiler for
desktop computers that expects EOF to be encoded in the data, EOF is a
*condition* detected by the OS. One can signal EOF from a keyboard by
pressing the key combination ctrl + 'z', usually written as ^Z, on a DOS
based machine. I think that is what is referred to above, but it has
nothing to do with files. Files are an *external* representation of a data
set, a keyboard is not an external representation.


yes but http://en.wikipedia.org/wiki/ASCII at least mentions
control character encodings for some of the values.

So 0x1a is actually the "substitute" character rather than ^Z?
 
P

Peter J. Holzer

osmium said:
The most common form of text files use ASCII code, and there is no character
called 'control Z' in ASCII. Furthermore, I don't know of any compiler for
desktop computers that expects EOF to be encoded in the data, EOF is a
*condition* detected by the OS.

[Off-Topic] CP/M used a 0x1A character (a.k.a "Control-Z") as an EOF
marker "encoded in the data", since the OS kept track of file sizes
only as multiples of 128 byte blocks. (This convention was used only
in text files, obviously.)

Since every CP/M program that manipulated text expected this, and
Microsoft's DOS started its life as a CP/M look alike for the 8086/8
family, DOS programs also interpreted (and many still do) a CTRL-Z as
an EOF mark, and added automatically a CTRL-Z at the end of text files
when closing them. [...]
I was forced to do the same eons ago, when writing C programs that had
to be portable between MSDOS, CP/M and DEC's operating systems.

When I wrote C programs for MS-DOS (eons ago), the stdio libraries of
the compilers I used at the time (Borland and Microsoft) did that
automatically if a file was opened in text mode.

hp
 
R

Roberto Waltman

Peter J. Holzer said:
When I wrote C programs for MS-DOS (eons ago), the stdio libraries of
the compilers I used at the time (Borland and Microsoft) did that [deleting control-Z]
automatically if a file was opened in text mode.

Correct. I never had that problem with Borland or MS compilers. The
ones that gave me trouble were: a very early C compiler for DOS, (I
may have been one of the first versions of Lattice-C, I am not sure)
and another for VAS/VMS.
In the second case it was not the compiler's fault. I had to process
data produced by several programs run in sequence, the last one was
some file conversion utility that left the CTRL-Z in place when
producing a regular VMS record-oriented text file.
I found it easier to write a wrapper around the standard file
operations and use it in all systems, deleting a trailing CTRL-Z if it
was there, and doing nothing if it wasn't.

Roberto Waltman

[ Please reply to the group,
return address is invalid ]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top