fseek

  • Thread starter Christopher Benson-Manica
  • Start date
G

Glen Herrmannsfeldt

Alan Balmer said:
This paragraph only describes the mode that the stream is opened in,
saying nothing about the files themselves. It does not say that
opening a file in the "wrong" mode results in undefined behavior. The
mode is important to the user, since in text mode it tells the I/O
implementation that it must look for, and possibly map, newline
characters. In binary mode, newline characters are treated the same as
any other character.

Newline conversion is one of the common changes, but not the only one.

The PDP-10 (TOPS-10 and TOPS-20 OS) store text files as five 7 bit ASCII
characters per word. A possible binary format is four 9 bit char's per
word. The results will be very different if the wrong one is used.

CDC had a series of machines with 60 bit words, which used either 6 or 12
bits per character, depending on the bit patterns. (Similar to UTF-8
coding, where some bits indicate the length.) I don't know what CHAR_BIT
might be, though. In this case, with variable length characters, or for
that matter with UTF-8, one can imagine that text streams and binary
streams would work differently.
Opening a text file in binary mode is perfectly legitimate - in fact
the standard provides no way to distinguish between a binary file and
a text file. Refer to 17.19.2, where the two types of streams are
defined. Now, consider a "text" file containing one "line." The thing
that makes it a text file is that each "line" has a terminating
newline character. But the standard says that the last line need not
have the terminating newline (it's implementation dependent.) How can
this file be distinguished from a binary file? What will the
implementation do to me if I open it in binary mode?

On IBM's mainframe OSs lines never have newline characters on them, though
they could contain newline characters.

There could be systems that keep text and binary files completely separate,
such that no operations are allowed on the wrong type. I don't know of any,
though.
On the other hand, presume that I have a binary file, say an
executable program. This file may contain numerous instances of a
character with the value 0xA, which happens to be the newline
character on the system I'm using now. Does that make it a text file?
Obviously not, but it may meet all the criteria for opening as a text
stream.

I have heard about people trying to make executable text files on various
systems. That is, only opcodes that are printable characters are allowed.
Very strange, but some systems allow it.
Perhaps Dan is right, and the writers made a mistake. I'm in no
position to make a judgment on that.

The standard tries to allow for reasonable differences in hardware, OS, and
file system design. Many features are to accommodate features of existing
systems, or ones that existed in the past.

-- glen
 
D

Dan Pop

That may be true, but I would be disappointed to find a system where reading
a C text file would not read text files commonly available on a system, such
as produced by the systems normal text editor(s).

There is nothing preventing the system's normal text editor(s) from being
written in C ;-) It's even current practice, these days.

Dan
 
D

Dan Pop

In said:
Nor is it prohibited by the standard.

Name me a system where it can't be done.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When you have to resort to Trollsdale favourite "argument", there *is*
something wrong with your position in the debate.

Dan
 
D

Dan Pop

The standard tries to allow for reasonable differences in hardware, OS, and
file system design. Many features are to accommodate features of existing
systems, or ones that existed in the past.

Then again, we have the DeathStation. Text files *must* have the .t
extension (otherwise fopen in text mode fails) and binary files *must*
have the .b extension (otherwise fopen in binary mode fails). Files with
other extensions cannot be opened. Neither rename() nor any system
utility allows changing the file's extension. Perfectly allowed by the
standard.

Dan
 
A

Alan Balmer

See 7.19.5.3 (fopen) -- the standard does not provide any way to connect
a text stream to a binary file or vice versa. Attempting to open a
binary file in text mode or vice versa results in undefined behavior.

Seems to be the appropriate section, yes; but re-reading it I
stumbled over something I've overseen till now:

7.19.5.3p6
[...]
Opening (or creating) a text file with update mode may instead
open (or create) a binary stream in some implementations.

I'm not sure if this can be used to form an argument in the debate
about the relationship of binary and text files, and if so, pro or
contra which side; I'm just puzzled that such a strange behaviour
is sanctioned by the standard...

Regards

Like you, I don't know that it's pertinent to the current discussion,
but it is interesting, and rather oddly tucked in to the section. An
afterthought to cover some actual implementation?
 
A

Alan Balmer

mode is important to the user, since in text mode it tells the I/O

Newline conversion is one of the common changes, but not the only one.

The PDP-10 (TOPS-10 and TOPS-20 OS) store text files as five 7 bit ASCII
characters per word. A possible binary format is four 9 bit char's per
word. The results will be very different if the wrong one is used.

It really doesn't matter -the standard refers to streams, not the
physical implementation. Both binary and text streams are a sequence
of characters as far as the standard is concerned. You ask for a
character, the system must do whatever is necessary to provide one.
CDC had a series of machines with 60 bit words, which used either 6 or 12
bits per character, depending on the bit patterns. (Similar to UTF-8
coding, where some bits indicate the length.) I don't know what CHAR_BIT
might be, though. In this case, with variable length characters, or for
that matter with UTF-8, one can imagine that text streams and binary
streams would work differently.

I wrote programs to convert tapes from those 60-bit CDC machines :)
That wasn't the only strange thing - as I recall they would split data
at the end of a record.

However, I don't think a C implementation would provide variable
length characters, in spite of the hardware.
On IBM's mainframe OSs lines never have newline characters on them, though
they could contain newline characters.

Again, you appear to be confusing bit patterns on a disk with a C
stream. If there's a text stream, and it has more than one line, each
line must end with a newline.
There could be systems that keep text and binary files completely separate,
such that no operations are allowed on the wrong type. I don't know of any,
though.

For good reason. Think about it.

What "operations" would you prohibit for each type of file?
I have heard about people trying to make executable text files on various
systems. That is, only opcodes that are printable characters are allowed.
Very strange, but some systems allow it.

Unix scripts are executable text files ;-)

You may be thinking about certain loader formats which use only
printable characters.
 
A

Alan Balmer

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When you have to resort to Trollsdale favourite "argument", there *is*
something wrong with your position in the debate.

Dan

Incorrect premise - I did not "have to resort" to anything. Striking
the last line does not affect the truth of the posting.

OTOH, when you have to resort to this type of comment, I am forced to
conclude that you have no further useful input.
 
E

Eric Sosman

Alan said:
Nor is it prohibited by the standard.

Name me a system where it can't be done.

Unix. fopen(".", "wb") will fail, even though the
file "." exists and even if it is writeable by the user
running the program.
 
A

Alan Balmer

Unix. fopen(".", "wb") will fail, even though the
file "." exists and even if it is writeable by the user
running the program.

And fopen(".", "w") will succeed? I've never tried this (since "." is
the current directory in Unix, I've never had a need.) What's the
error code, if you know off-hand? Is this on all Unices? (The
questions are obviously off-topic for c.l.c, I'm just trying to
minimize my test time.)
 
E

Eric Sosman

Alan said:
And fopen(".", "w") will succeed? I've never tried this (since "." is
the current directory in Unix, I've never had a need.) What's the
error code, if you know off-hand? Is this on all Unices? (The
questions are obviously off-topic for c.l.c, I'm just trying to
minimize my test time.)

<off-topic>

fopen(".", "w") also fails, because Unix forbids
writing to directory files through the ordinary file
interfaces -- you've got to use special-purpose calls
to modify a directory. On Solaris, the error code is
EISDIR; I don't know whether that's mandated by POSIX.

As far as I know, all Unices behave this way. But
there might be some combinations of Unix version with
"foreign" file systems where directories can be written
(garbled, most likely) as plain files.

</off-topic>
 
G

Glen Herrmannsfeldt

Dan Pop said:
In <p7ysb.134611$275.397546@attbi_s53> "Glen Herrmannsfeldt"

There is nothing preventing the system's normal text editor(s) from being
written in C ;-) It's even current practice, these days.

In that case, there shouldn't be any problems reading the files. In many
older systems, though, that wasn't true. Continuing my previous comments, I
don't believe that the TOPS-10 editors were written in C. It would be
unfortunate for a C implementation for TOPS-10 not to read the text files
written by other programs, even though it could follow a C standard exactly.

-- glen
 
G

Glen Herrmannsfeldt

Alan Balmer said:
On Thu, 13 Nov 2003 05:41:11 GMT, "Glen Herrmannsfeldt"
Alan Balmer said:
On Wed, 12 Nov 2003 23:13:01 GMT, (e-mail address removed) wrote:

Alan Balmer <[email protected]> wrote [quoting Dan Pop]:
(snip)
Newline conversion is one of the common changes, but not the only one.

The PDP-10 (TOPS-10 and TOPS-20 OS) store text files as five 7 bit ASCII
characters per word. A possible binary format is four 9 bit char's per
word. The results will be very different if the wrong one is used.

It really doesn't matter -the standard refers to streams, not the
physical implementation. Both binary and text streams are a sequence
of characters as far as the standard is concerned. You ask for a
character, the system must do whatever is necessary to provide one.


Yes, and the 'whatever' can be very different for text and binary.
I wrote programs to convert tapes from those 60-bit CDC machines :)
That wasn't the only strange thing - as I recall they would split data
at the end of a record.
However, I don't think a C implementation would provide variable
length characters, in spite of the hardware.

I didn't use them for very long, but I thought that to get lower case you
needed the variable length. Also, I wound't be surprised to see UTF-8 get
more popular in the future.

(snip)
Again, you appear to be confusing bit patterns on a disk with a C
stream. If there's a text stream, and it has more than one line, each
line must end with a newline.

When you mix text files and binary files together, that is what happens.
Yes, when a C program reads a line from a text file it will find a '\n' at
the end. It is not requried, though, that it be stored that way on disk.

With some OS it is also possible to have '\n' that are not at the end of a
line, and even in a text file! There are systems that store lines with a
length prefix, and allow all possible bit combinations for characters in the
line. C programs writing a text file would never do that, but files written
in binary mode, or in other languages, could.
For good reason. Think about it.
What "operations" would you prohibit for each type of file?

(snip)

-- glen
 
A

Alan Balmer

<off-topic>

fopen(".", "w") also fails, because Unix forbids
writing to directory files through the ordinary file
interfaces -- you've got to use special-purpose calls
to modify a directory.

As I thought, though I was willing to see what happens ;-) However,
that leaves the question of why you thought this fact was pertinent to
the discussion of opening text files as binary, or a counter example
showing that some text files could not be opened as binary.
 
E

Eric Sosman

Alan said:
[several intermediate quoting levels snipped]
^^^^^^^^
[...] However,
that leaves the question of why you thought this fact was pertinent to
the discussion of opening text files as binary, or a counter example
showing that some text files could not be opened as binary.

The claim was "any file may be opened in binary mode."
Counter-examples were solicited; I offered one.
 
M

Mark McIntyre

The claim was "any file may be opened in binary mode."
Counter-examples were solicited; I offered one.

<flame bait>
"." isn't a file. The fact that you can't fopen it proves that. It
might be a named block of diskspace, but its not a file.
</flame bait>
 
E

Eric Sosman

Mark said:
<flame bait>
"." isn't a file. The fact that you can't fopen it proves that. It
might be a named block of diskspace, but its not a file.
</flame bait>

<bite type="off-topic">

"." is a file. The fact that you can fopen() it for input
proves that. It might not be a named block of disk space at
all (consider "cd /proc/$$"), but it is a file.

</bite>
 
L

lawrence.jones

Irrwahn Grausewitz said:
7.19.5.3p6
[...]
Opening (or creating) a text file with update mode may instead
open (or create) a binary stream in some implementations.

Some systems, particularly those with record-oriented file systems,
have a native text file format that is not generally updatable. (You
can't insert, delete, or change the length of an existing record.) But,
as a number of people have pointed out, it is generally desirable to
have C text files use the native text file format. The committee
decided that, in this case, it was better to get an updatable stream/
file that didn't conform to the native text file format than to have it
always fail.

-Larry Jones

When I want an editorial, I'll ASK for it! -- Calvin
 
L

lawrence.jones

Alan Balmer said:
This paragraph only describes the mode that the stream is opened in,
saying nothing about the files themselves.

On the contrary, it specifically says that modes containing "b" open
binary files and modes without "b" open text files. It does not say
what happens if the named file is not of the correct type, so you get
undefined behavior if you use the return stream (if there is one).

-Larry Jones

It's either spectacular, unbelievable success, or crushing, hopeless
defeat! There is no middle ground! -- Calvin
 
A

Alan Balmer

<bite type="off-topic">

"." is a file. The fact that you can fopen() it for input
proves that. It might not be a named block of disk space at
all (consider "cd /proc/$$"), but it is a file.

</bite>

OK, then, back on subject: Can it be opened for *input* in binary
mode? I didn't claim that any file can be opened for writing.
 
D

Dan Pop

In that case, there shouldn't be any problems reading the files. In many
older systems, though, that wasn't true. Continuing my previous comments, I
don't believe that the TOPS-10 editors were written in C. It would be
unfortunate for a C implementation for TOPS-10 not to read the text files
written by other programs, even though it could follow a C standard exactly.

AFAIK, standard C has bever been implemented on the PDP-10.

Furthermore, my definition was in the context of the C standard, which
doesn't address OS issues, so you can't expect the C standard to specify
when a file created by a non-C program can be opened in text (or binary)
mode. In real systems, it is the OS that defines the format of a
text file and the C implementation follows that definition, instead
of inventing its own.

However, one can imagine a system definining multiple formats for the
text files and the C implementation not supporting all of them.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,273
Latest member
DamonShoem

Latest Threads

Top