fgets() equivalent?

J

J de Boyne Pollard

HS> fgets() is standard in C file I/O.
HS>
HS> The only issue you need to pay attention too, is RAW
HS> (binary) vs COOK mode. It will relate the EOL (end
HS> of line) definitions of MS-DOS (<CRL><LR>) vs
HS> Unix (<LF>). Depending on your application that
HS> may or may no pertain.

TR> This is a handy definition, but it is NOT CORRECT. [...]
TR> The raw vs cooked distinction in Unix is VERY different
TR> from the binary vs text distinction in MS-DOS. [...]

Actually, the binary/text dichotomy comes from the C language. The
operating systems themselves have and make no such distinction. (To
the operating systems themselves, files are just octet streams. There
are no lines, no newline sequences, and no EOF marker characters.) It
is simply the case that C language implementations targetting PC/MS/DR-
DOS use the either-CR+LF-or-LF newline convention for text files
(although they are not required to do so), and C language
implementations targetting Unices and Linux use the LF newline
convention for text files (and are required to do so by the POSIX
standard, which defines additional restrictions on C implementations).
 
S

santosh

J said:
HS> fgets() is standard in C file I/O.
HS>
HS> The only issue you need to pay attention too, is RAW
HS> (binary) vs COOK mode. It will relate the EOL (end
HS> of line) definitions of MS-DOS (<CRL><LR>) vs
HS> Unix (<LF>). Depending on your application that
HS> may or may no pertain.

TR> This is a handy definition, but it is NOT CORRECT. [...]
TR> The raw vs cooked distinction in Unix is VERY different
TR> from the binary vs text distinction in MS-DOS. [...]

Actually, the binary/text dichotomy comes from the C language. The
operating systems themselves have and make no such distinction. (To
the operating systems themselves, files are just octet streams. There
are no lines, no newline sequences, and no EOF marker characters.)

Not the case with all operating systems. Many systems like CP/M and some
mainframes have a record oriented file system, where the file is
represented as a sequence of records. CP/M also had a end-of-file
marker. Also non 8-bit byte systems may not view files as an octet
stream.

<snip>
 
J

J de Boyne Pollard

TR> This is a handy definition, but it is NOT CORRECT. [...]
TR> The raw vs cooked distinction in Unix is VERY different
TR> from the binary vs text distinction in MS-DOS. [...]

JdeBP> Actually, the binary/text dichotomy comes from the C
JdeBP> language. The operating systems themselves have
JdeBP> and make no such distinction. (To the operating
JdeBP> systems themselves, files are just octet streams.
JdeBP> There are no lines, no newline sequences, and no
JdeBP> EOF marker characters.)

s> Not the case with all operating systems. [...]

M. Roberts wasn't talking about all operating systems. The operating
systems that xe was talking about xe mentioned by name.
 
T

Tim Roberts

J de Boyne Pollard said:
Actually, the binary/text dichotomy comes from the C language. The
operating systems themselves have and make no such distinction. (To
the operating systems themselves, files are just octet streams. There
are no lines, no newline sequences, and no EOF marker characters.)

I'm sorry, but you are incorrect. Apparently, you never got burned trying
to use the "copy" command without "/b" in the early versions of MS-DOS on a
file that happened to contain an embedded Ctrl-Z (the text-mode "end of
file" character). It, in turn, inherited that behavior from CP/M.

The C run-time library had to ADD the text/binary distinction because CP/M
and MS-DOS embedded it in their file system mechanisms. That concept was
certainly not part of the C run-time before implementations were built for
those operating systems.
 
K

Keith Thompson

Tim Roberts said:
I'm sorry, but you are incorrect. Apparently, you never got burned trying
to use the "copy" command without "/b" in the early versions of MS-DOS on a
file that happened to contain an embedded Ctrl-Z (the text-mode "end of
file" character). It, in turn, inherited that behavior from CP/M.

The C run-time library had to ADD the text/binary distinction because CP/M
and MS-DOS embedded it in their file system mechanisms. That concept was
certainly not part of the C run-time before implementations were built for
those operating systems.

Are you sure that CP/M and MS-DOS where the specific reasons for this
C feature? There are certainly other operating systems (including
VMS) that distinguish between text files and binary files.
 
G

Gary Chanson

Keith Thompson said:
Are you sure that CP/M and MS-DOS where the specific reasons for this
C feature? There are certainly other operating systems (including
VMS) that distinguish between text files and binary files.

My understanstanding is that it was originally imported into CP/M from
Unix.
 
K

Keith Thompson

Gary Chanson said:
My understanstanding is that it was originally imported into CP/M from
Unix.

That doesn't make sense. CP/M (or at least a C implementation under
CP/M) has to distinguish between text and binary files, because it
uses a two-character CR-LF sequence to mark the end of a line. Unix
uses a single LF character, and thus doesn't need to distinguish
between text and binary.
 
D

David Craig

How about that even Unix needs to generate a CR/LF pair when a 'Newline -
0x0A' is encountered in output to a tty type device. Unix is old and
works/worked with teletype terminals where a CR returns the carriage to
column one and the LF causes the paper to feed up one line. Some even
required multiple CR characters because they were so slow and would loose
characters that followed too quickly when a major movement of the carriage
was required.
 
J

J. J. Farrell

Gary said:
My understanstanding is that it was originally imported into CP/M from
Unix.

Your understanding is incorrect. One of the key concepts of UNIX was
that files were just files. There was no distinction between different
types of file, and no "special data" in the file to indicate
end-of-file. I don't know if UNIX originated this concept, but it was
relatively novel at the time and UNIX did much to popularize it. The
distinction between binary and text files in the Standard I/O library
was added when C was ported to other OSes.
 
D

Dik T. Winter

> How about that even Unix needs to generate a CR/LF pair when a 'Newline -
> 0x0A' is encountered in output to a tty type device.

How about that there is a difference between how files are stored on disk
and what happens if said file is displayed on a tty type device? The
conversion is done by the tty driver. As a MacOS tty driver would
convert a CR to the combined CR/LF. Normally such *tty drivers* would
expect that it is a text file that will be displayed. With respect to
the C programming environment there is no difference between text files
and binary files.
 
D

Dik T. Winter

> Your understanding is incorrect. One of the key concepts of UNIX was
> that files were just files. There was no distinction between different
> types of file, and no "special data" in the file to indicate
> end-of-file. I don't know if UNIX originated this concept, but it was
> relatively novel at the time and UNIX did much to popularize it. The
> distinction between binary and text files in the Standard I/O library
> was added when C was ported to other OSes.

The concept was much older. On all the older systems I have worked with,
end-of-file was no special data in the file, but merely metadata held by
the system in the information about the file. I think that CP/M was the
first system that made that metadata part of the file. On the other hand,
the distinction between text and binary files has been present in many
file systems, but at a quite different level. And the only level were
they were different was whether to interprete a particular sequence of
bytes as end-of-line. Never whether something should be interpreted as
end-of-file.
 
C

CBFalconer

David Craig wrote: *** and top-posted - fixed ***
Keith Thompson said:
Gary Chanson said:
[...

The C run-time library had to ADD the text/binary distinction
because CP/M and MS-DOS embedded it in their file system
mechanisms. That concept was certainly not part of the C
run-time before implementations were built for those operating
systems.

Are you sure that CP/M and MS-DOS where the specific reasons
for this C feature? There are certainly other operating
systems (including VMS) that distinguish between text files
and binary files.

My understanstanding is that it was originally imported into
CP/M from Unix.

That doesn't make sense. CP/M (or at least a C implementation
under CP/M) has to distinguish between text and binary files,
because it uses a two-character CR-LF sequence to mark the end
of a line. Unix uses a single LF character, and thus doesn't
need to distinguish between text and binary.

How about that even Unix needs to generate a CR/LF pair when a
'Newline - 0x0A' is encountered in output to a tty type device.
Unix is old and works/worked with teletype terminals where a CR
returns the carriage to column one and the LF causes the paper
to feed up one line. Some even required multiple CR characters
because they were so slow and would loose characters that
followed too quickly when a major movement of the carriage was
required.

This was usually handled by having the terminal driver emit "CR,
LF, DC3" to prompt for a new line. At line end, the echoing
machinery would emit "DC1, CR". I think I have the sequence
right. At any rate, there was enough idle time for the carriage to
recover, and the sequences would also stop/start the tape reader,
if present and loaded. When the input line was half duplex those
sequences would also prompt the sending device to unload another
line.

Please do not top-post. Your answer belongs after (or intermixed
with) the quoted material to which you reply, after snipping all
irrelevant material. I fixed this one. See the following links:

--
<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/> (taming google)
<http://members.fortunecity.com/nnqweb/> (newusers)
 
C

CBFalconer

Dik T. Winter said:
The concept was much older. On all the older systems I have
worked with, end-of-file was no special data in the file, but
merely metadata held by the system in the information about the
file. I think that CP/M was the first system that made that
metadata part of the file. On the other hand, the distinction
between text and binary files has been present in many file
systems, but at a quite different level. And the only level
were they were different was whether to interprete a particular
sequence of bytes as end-of-line. Never whether something
should be interpreted as end-of-file.

No, EOF has always meant "we hit the end of recorded data". The
CP/M solution was because the file length was recorded in terms of
128 byte records, and these did not match the structure of text
files. Therefore CP/M added an EOF character to the text stored.

Similarly CP/M didn't do any LF --> CR/LF --> LF translation while
writing and reading, but just wrote the CR/LF sequence. Less code
that way :). DOS just copied it, because of laziness and because
the primary market.
 
P

Pops

Finally, some one who can relate to history. Thanks David. :)

The following note is not specifically to you, just in general.

In general, the terms of "raw" vs "cooked" is broadly applied to
basically means non-translation vs translation. Generally, we cook
something in order to establish some structural consistency either for
input or output.

Specifically, when it comes to dealing with text vs binary ideas, in
general, we are talking about dealing with the control codes, ASCII code
0 to 31.

For file storage and we are mostly talking about three control codes
dealing with EOL (End of line) or EOF (End of File) entities.

<lf> 0x0A ^J
<cr> 0x0D ^M
<eof> 0x1D ^Z

(note, I am not using strict ASCII mnemonics here.)

It is still possible to have other control codes in a text file. So the
idea of text vs binary is really only relevant to the application or
usage in question. If the text file as ANSI escape codes, then it can
viewed as binary as well. In fact, there are software where this is
important for them to detect a binary file if has other control codes
other than <lf><cr><eof>. But it really depends on how they are used.

Typically when it comes to interactive input device, other control codes
come into play:

<xon> 0x11 ^Q
<xoff> 0x13 ^S
<etx> 0x03 ^C
<lf> 0x0A ^J
<cr> 0x0D ^M
<eof> 0x1D ^Z

When it comes to output devices, like a printer, terminals other control
codes come into play. For printers and smart terminals with ANSI, VT10x
emulations, it typically interpret these, especially the escape code:

<xon> 0x11 ^Q
<xoff> 0x13 ^S
<cr> 0x0D ^M
<lf> 0x0A ^J
<eof> 0x1D ^Z
<esc> 0x1B ^[

Anyway, even Unix has to deal with the outside I/O world and even with
its own world, it is all done transparently.

The issue with the OP, and my main point when using the fgets() that it
is important to note the idea of "cooked" and "raw" translations,
especially Windows or MS-DOS world with cooking the standard devices and
files open in text mode are 100% cooked. So it depends on where the
input is coming from when he is dealing with a line reader which was
what he was seeking.


--
HLS


David said:
How about that even Unix needs to generate a CR/LF pair when a 'Newline -
0x0A' is encountered in output to a tty type device. Unix is old and
works/worked with teletype terminals where a CR returns the carriage to
column one and the LF causes the paper to feed up one line. Some even
required multiple CR characters because they were so slow and would loose
characters that followed too quickly when a major movement of the carriage
was required.

Keith Thompson said:
Gary Chanson said:
news:[email protected]... [...
The C run-time library had to ADD the text/binary distinction
because CP/M and MS-DOS embedded it in their file system
mechanisms. That concept was certainly not part of the C run-time
before implementations were built for those operating systems.
Are you sure that CP/M and MS-DOS where the specific reasons for this
C feature? There are certainly other operating systems (including
VMS) that distinguish between text files and binary files.
My understanstanding is that it was originally imported into CP/M
from
Unix.
That doesn't make sense. CP/M (or at least a C implementation under
CP/M) has to distinguish between text and binary files, because it
uses a two-character CR-LF sequence to mark the end of a line. Unix
uses a single LF character, and thus doesn't need to distinguish
between text and binary.

--
Keith Thompson (The_Other_Keith) <[email protected]>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
P

Pops

Dik said:
How about that there is a difference between how files are stored on disk
and what happens if said file is displayed on a tty type device? The
conversion is done by the tty driver. As a MacOS tty driver would
convert a CR to the combined CR/LF. Normally such *tty drivers* would
expect that it is a text file that will be displayed. With respect to
the C programming environment there is no difference between text files
and binary files.


+1, while C itself is device independent, the type of device itself
means something as you elegantly pointed out in regards to the device
driver in question.

The original poster, I presume migrating or posting from Unix, wanted
the equivalent behavior of fgets().

My basic point in my reply was he needs to deal with Cooked vs Raw
concepts, especially in windows, and especially if his applications has
to interface with devices or files that from various places.

When a device is opened using the standard C I/O functions with the mode
attribute containing "t" by the Windows or MS-DOS target application,
C/C++ RTL (run time library) will read/write in cooked mode, by default.
Its all clearly there in the MS C/C++ RTL source code provided in every
distribution.

Now if the application needs to interface with the outside world to get
input, then it MAY need to be compiled or switch at run time to do I/O
in non-cooked mode.

You know how many times you see people posting simple C fetch using the
standard device I/O heuristics claiming its 100% portable and Windows
developers run into cooked standard I/O problems? Quite a few times.

In general, for windows, all you need to add a few lines to make the
standard I/O devices raw.

_setmode( _fileno( stdin ), _O_BINARY );
_setmode( _fileno( stdout ), _O_BINARY );

Here is an example:

/* fetch.c -- fetch via HTTP and dump the entire session to stdout
posted by some a unix wienie claiming portability.

- ported to windows to illustrate need to change the stdout
default _O_TEXT cooked mode to _O_BINARY raw mode.

*/

#ifdef _WIN32

#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <winsock.h>
#include <fcntl.h>
#include <io.h>

#pragma comment(lib,"wsock32.lib")
#define close(a) closesocket(a)
#define read(a,b,c) recv(a,b,c,0)
#define write(a,b,c) send(a,b,c,0)

#else
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <signal.h>
#endif

main(argc, argv)
int argc;
char **argv;
{
int pfd; /* fd from socket */
int len;
char *hostP, *fileP;
char buf[1024];
struct hostent *hP; /* for host */
struct sockaddr_in sin;
#ifdef _WIN32
WSADATA wd;
if (WSAStartup(MAKEWORD(1, 1), &wd) != 0) {
exit(1);
}
_setmode( _fileno( stdin ), _O_BINARY );
_setmode( _fileno( stdout ), _O_BINARY );
#endif

if ( argc != 3 ) {
fprintf( stderr, "Usage: %s host file\n", argv[0] );
exit( 1 );
}

hostP = argv[1];
fileP = argv[2];

hP = gethostbyname( hostP );
if ( hP == NULL ) {
fprintf( stderr, "Unknown host \"%s\"\n", hostP );
exit( 1 );
}

pfd = socket( AF_INET, SOCK_STREAM, 0 );
if ( pfd < 0 ) {
perror( "socket" );
exit( 1 );
}


sin.sin_family = hP->h_addrtype;
memcpy( (char *)&sin.sin_addr, hP->h_addr, hP->h_length );
sin.sin_port = htons( 80 );
if ( connect( pfd, (struct sockaddr *)&sin, sizeof(sin) ) < 0 ) {
perror( "connect" );
close( pfd );
exit( 1 );
}

sprintf( buf, "GET %s HTTP/1.0\r\n\r\n", fileP );
write( pfd, buf, strlen(buf));

while ( ( len = read( pfd, buf, sizeof(buf)) ) > 0)
fwrite( buf, 1, len, stdout );

close( pfd );
fflush( stdout );
exit( 0 );
}
 
P

Pops

Dik said:
The concept was much older. On all the older systems I have worked with,
end-of-file was no special data in the file, but merely metadata held by
the system in the information about the file. I think that CP/M was the
first system that made that metadata part of the file. On the other hand,
the distinction between text and binary files has been present in many
file systems, but at a quite different level. And the only level were
they were different was whether to interprete a particular sequence of
bytes as end-of-line. Never whether something should be interpreted as
end-of-file.

I can't speak for unix, I havn't work on it in a long time, but in
Windows, ^Z is interpreted as an EOF (end of file).

In DOS window:

c:\> type con > foo
aasdsadad<CR>
asdadlaskdlas<CR>
asdsada<CR>
^Z<CR>

and the pipe will close. Read a file with ^Z in cooked mode its the
feof() returns true.

Now, in our FTP server and client applications, this is important. If a
file is transferred in TYPE BINARY, then all its all escaped. If a file
was transfered in TYPE ASCII, some FTP server/client will truncate any
runoff of ^Z characters to save the file. Some don't kind because other
usages of the file may deal with that. Others won't. Some systems will
pack the storage to the nearest block size boundary it is using.

This is common in old XMODEM file transfer protocol with its 128 byte
blocks. So it not uncommon to see downloaded xmodem files with an even
file size of 128 block size and if you looked at it the bottom was full
of ^Z characters or even junk sometimes. The smarter XMODEM receiver
would do the truncation upon reception.

Today, the standards organizations such as the IETF, recognize that
<CR><LF> is the standard delimiter in client/server communications,
especially in email formats (RFC 2822). The irony was the Unix where
most of the protocols were founded upon and never had to deal with this,
today, it is the one that needs to take more into account cooking
concepts in order to deal with the predominate <CRL><LF> outside world.
 
J

J de Boyne Pollard

JdeBP> Actually, the binary/text dichotomy comes from the C
JdeBP> language. The operating systems themselves have
JdeBP> and make no such distinction. (To the operating
JdeBP> systems themselves, files are just octet streams.
JdeBP> There are no lines, no newline sequences, and no EOF
JdeBP> marker characters.)

TR> I'm sorry, but you are incorrect.

False. Your understanding of the operation of the COPY command is
wrong, and you have an erroneous idea of where the behaviour that you
observed actually originates.

TR> Apparently, you never got burned trying to use the
TR> "copy" command without "/b" in the early versions of
TR> MS-DOS on a file that happened to contain an embedded
TR> Ctrl-Z (the text-mode "end of file" character).

I encountered that behaviour. I encountered the silly behaviour of
the COPY command that caused it to fail to copy zero-length files,
too. However, that behaviour doesn't mean what you think it to mean.

TR> It, in turn, inherited that behavior from CP/M.

No, it didn't. PIP has no equivalent option.

TR> The C run-time library had to ADD the text/binary
TR> distinction because CP/M and MS-DOS embedded
TR> it in their file system mechanisms.

False. And this is where your error lies. The behaviour of the COPY
command _is embedded in that command itself_. It has to comprise code
for processing in "binary mode" and in "text mode". (You can see that
code in the FreeDOS COMMAND at <URL:https://
freedos.svn.sourceforge.net/svnroot/freedos/freecom/trunk/cmd/copy.c>,
for example. This, in its turn, uses the stream mode flags of the C
language's standard library, which is where all of the code to make a
distinction between "text" and "binary" streams actually resides.)
The operating system _makes no such distinction_. I suggest actually
taking a look at the PC/MS/DR-DOS system API. There is no text/binary
distinction embedded in the filesystem mechanism. Files are, as I
said, just octet streams.
 
K

Kaz Kylheku

How about that even Unix needs to generate a CR/LF pair when a 'Newline -
0x0A' is encountered in output to a tty type device.

Fortunately, Thompson was intelligent enough to realize that the
control characters for printing devices should not determine the
representation of text files. The conversion is tucked away into the
kernel, and can be turned on and off.

The people who designed CR-LF into the various Internet protocols
really dropped the ball. There was an opportunity to fix this
braindamage in HTTP, but alas.
Unix is old and
works/worked with teletype terminals where a CR returns the carriage to
column one and the LF causes the paper to feed up one line.

That's, like, because CR actually stands for carriage return, and LF
for line feed, which is enshrined in the USASCII code. :)

It's wrong for a character display or printing device to give any
other meanings to these standardized codes.

The VT100 terminal, which is widely emulated today, also works this
way, and so Unix systems in general nearly always have the ONLCR flag
turned on when communicating with their own character consoles or
terminal emulators like xterm, etc.
 
A

Alexander Grigoriev

It's not "Windows", it's particular CON pseudo device and C runtime library.
 
E

Ernie Wright

J said:
TR> The C run-time library had to ADD the text/binary
TR> distinction because CP/M and MS-DOS embedded
TR> it in their file system mechanisms.

False. And this is where your error lies. The behaviour of the COPY
command _is embedded in that command itself_. It has to comprise code
for processing in "binary mode" and in "text mode". [...]
The operating system _makes no such distinction_. I suggest actually
taking a look at the PC/MS/DR-DOS system API. There is no text/binary
distinction embedded in the filesystem mechanism. Files are, as I
said, just octet streams.

But *devices* are not. MS-DOS character-mode devices do distinguish
between text and binary streams. Devices include AUX, PRN and CON.
Since these can be a source or destination for the COPY command, COPY
must also respect the distinction, and so must any other interface that
treats devices as if they were files.

Including C streams. C's stdin, stdout, stderror streams are typically
mapped to the MS-DOS CON device.

MS-DOS Int 21h functions 4400h and 4401h get and set device status. Bit
5 of DX determines whether the device is functioning in text or binary
mode.

I don't think CP/M makes this distinction, but I don't know. I think
the *convention* of terminating text files with Ctrl-Z arose because
CP/M couldn't store the exact byte size of the file. Its file size
granularity was 128 bytes.

- Ernie http://home.comcast.net/~erniew
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top