fgets() equivalent?

Discussion in 'C Programming' started by J de Boyne Pollard, Nov 30, 2007.

  1. HS> fgets() is standard in C file I/O.
    HS>
    HS> The only issue you need to pay attention too, is RAW
    HS> (binary) vs COOK mode. It will relate the EOL (end
    HS> of line) definitions of MS-DOS (<CRL><LR>) vs
    HS> Unix (<LF>). Depending on your application that
    HS> may or may no pertain.

    TR> This is a handy definition, but it is NOT CORRECT. [...]
    TR> The raw vs cooked distinction in Unix is VERY different
    TR> from the binary vs text distinction in MS-DOS. [...]

    Actually, the binary/text dichotomy comes from the C language. The
    operating systems themselves have and make no such distinction. (To
    the operating systems themselves, files are just octet streams. There
    are no lines, no newline sequences, and no EOF marker characters.) It
    is simply the case that C language implementations targetting PC/MS/DR-
    DOS use the either-CR+LF-or-LF newline convention for text files
    (although they are not required to do so), and C language
    implementations targetting Unices and Linux use the LF newline
    convention for text files (and are required to do so by the POSIX
    standard, which defines additional restrictions on C implementations).
    J de Boyne Pollard, Nov 30, 2007
    #1
    1. Advertising

  2. J de Boyne Pollard

    santosh Guest

    J de Boyne Pollard wrote:

    > HS> fgets() is standard in C file I/O.
    > HS>
    > HS> The only issue you need to pay attention too, is RAW
    > HS> (binary) vs COOK mode. It will relate the EOL (end
    > HS> of line) definitions of MS-DOS (<CRL><LR>) vs
    > HS> Unix (<LF>). Depending on your application that
    > HS> may or may no pertain.
    >
    > TR> This is a handy definition, but it is NOT CORRECT. [...]
    > TR> The raw vs cooked distinction in Unix is VERY different
    > TR> from the binary vs text distinction in MS-DOS. [...]
    >
    > Actually, the binary/text dichotomy comes from the C language. The
    > operating systems themselves have and make no such distinction. (To
    > the operating systems themselves, files are just octet streams. There
    > are no lines, no newline sequences, and no EOF marker characters.)


    Not the case with all operating systems. Many systems like CP/M and some
    mainframes have a record oriented file system, where the file is
    represented as a sequence of records. CP/M also had a end-of-file
    marker. Also non 8-bit byte systems may not view files as an octet
    stream.

    <snip>
    santosh, Nov 30, 2007
    #2
    1. Advertising

  3. TR> This is a handy definition, but it is NOT CORRECT. [...]
    TR> The raw vs cooked distinction in Unix is VERY different
    TR> from the binary vs text distinction in MS-DOS. [...]

    JdeBP> Actually, the binary/text dichotomy comes from the C
    JdeBP> language. The operating systems themselves have
    JdeBP> and make no such distinction. (To the operating
    JdeBP> systems themselves, files are just octet streams.
    JdeBP> There are no lines, no newline sequences, and no
    JdeBP> EOF marker characters.)

    s> Not the case with all operating systems. [...]

    M. Roberts wasn't talking about all operating systems. The operating
    systems that xe was talking about xe mentioned by name.
    J de Boyne Pollard, Nov 30, 2007
    #3
  4. J de Boyne Pollard

    Tim Roberts Guest

    J de Boyne Pollard <> wrote:
    >
    >Actually, the binary/text dichotomy comes from the C language. The
    >operating systems themselves have and make no such distinction. (To
    >the operating systems themselves, files are just octet streams. There
    >are no lines, no newline sequences, and no EOF marker characters.)


    I'm sorry, but you are incorrect. Apparently, you never got burned trying
    to use the "copy" command without "/b" in the early versions of MS-DOS on a
    file that happened to contain an embedded Ctrl-Z (the text-mode "end of
    file" character). It, in turn, inherited that behavior from CP/M.

    The C run-time library had to ADD the text/binary distinction because CP/M
    and MS-DOS embedded it in their file system mechanisms. That concept was
    certainly not part of the C run-time before implementations were built for
    those operating systems.
    --
    Tim Roberts,
    Providenza & Boekelheide, Inc.
    Tim Roberts, Dec 1, 2007
    #4
  5. Tim Roberts <> writes:
    > J de Boyne Pollard <> wrote:
    >>Actually, the binary/text dichotomy comes from the C language. The
    >>operating systems themselves have and make no such distinction. (To
    >>the operating systems themselves, files are just octet streams. There
    >>are no lines, no newline sequences, and no EOF marker characters.)

    >
    > I'm sorry, but you are incorrect. Apparently, you never got burned trying
    > to use the "copy" command without "/b" in the early versions of MS-DOS on a
    > file that happened to contain an embedded Ctrl-Z (the text-mode "end of
    > file" character). It, in turn, inherited that behavior from CP/M.
    >
    > The C run-time library had to ADD the text/binary distinction because CP/M
    > and MS-DOS embedded it in their file system mechanisms. That concept was
    > certainly not part of the C run-time before implementations were built for
    > those operating systems.


    Are you sure that CP/M and MS-DOS where the specific reasons for this
    C feature? There are certainly other operating systems (including
    VMS) that distinguish between text files and binary files.

    --
    Keith Thompson (The_Other_Keith) <>
    Looking for software development work in the San Diego area.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Dec 1, 2007
    #5
  6. J de Boyne Pollard

    Gary Chanson Guest

    "Keith Thompson" <> wrote in message
    news:...
    > Tim Roberts <> writes:
    >> J de Boyne Pollard <> wrote:
    >>>Actually, the binary/text dichotomy comes from the C language. The
    >>>operating systems themselves have and make no such distinction. (To
    >>>the operating systems themselves, files are just octet streams. There
    >>>are no lines, no newline sequences, and no EOF marker characters.)

    >>
    >> I'm sorry, but you are incorrect. Apparently, you never got burned
    >> trying
    >> to use the "copy" command without "/b" in the early versions of MS-DOS on
    >> a
    >> file that happened to contain an embedded Ctrl-Z (the text-mode "end of
    >> file" character). It, in turn, inherited that behavior from CP/M.
    >>
    >> The C run-time library had to ADD the text/binary distinction because
    >> CP/M
    >> and MS-DOS embedded it in their file system mechanisms. That concept was
    >> certainly not part of the C run-time before implementations were built
    >> for
    >> those operating systems.

    >
    > Are you sure that CP/M and MS-DOS where the specific reasons for this
    > C feature? There are certainly other operating systems (including
    > VMS) that distinguish between text files and binary files.


    My understanstanding is that it was originally imported into CP/M from
    Unix.

    --

    - Gary Chanson (Windows SDK MVP)
    - Abolish Public Schools
    Gary Chanson, Dec 2, 2007
    #6
  7. "Gary Chanson" <> writes:
    > "Keith Thompson" <> wrote in message
    > news:...

    [...
    >>> The C run-time library had to ADD the text/binary distinction
    >>> because CP/M and MS-DOS embedded it in their file system
    >>> mechanisms. That concept was certainly not part of the C run-time
    >>> before implementations were built for those operating systems.

    >>
    >> Are you sure that CP/M and MS-DOS where the specific reasons for this
    >> C feature? There are certainly other operating systems (including
    >> VMS) that distinguish between text files and binary files.

    >
    > My understanstanding is that it was originally imported into CP/M from
    > Unix.


    That doesn't make sense. CP/M (or at least a C implementation under
    CP/M) has to distinguish between text and binary files, because it
    uses a two-character CR-LF sequence to mark the end of a line. Unix
    uses a single LF character, and thus doesn't need to distinguish
    between text and binary.

    --
    Keith Thompson (The_Other_Keith) <>
    Looking for software development work in the San Diego area.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Dec 3, 2007
    #7
  8. J de Boyne Pollard

    David Craig Guest

    How about that even Unix needs to generate a CR/LF pair when a 'Newline -
    0x0A' is encountered in output to a tty type device. Unix is old and
    works/worked with teletype terminals where a CR returns the carriage to
    column one and the LF causes the paper to feed up one line. Some even
    required multiple CR characters because they were so slow and would loose
    characters that followed too quickly when a major movement of the carriage
    was required.

    "Keith Thompson" <> wrote in message
    news:...
    > "Gary Chanson" <> writes:
    >> "Keith Thompson" <> wrote in message
    >> news:...

    > [...
    >>>> The C run-time library had to ADD the text/binary distinction
    >>>> because CP/M and MS-DOS embedded it in their file system
    >>>> mechanisms. That concept was certainly not part of the C run-time
    >>>> before implementations were built for those operating systems.
    >>>
    >>> Are you sure that CP/M and MS-DOS where the specific reasons for this
    >>> C feature? There are certainly other operating systems (including
    >>> VMS) that distinguish between text files and binary files.

    >>
    >> My understanstanding is that it was originally imported into CP/M
    >> from
    >> Unix.

    >
    > That doesn't make sense. CP/M (or at least a C implementation under
    > CP/M) has to distinguish between text and binary files, because it
    > uses a two-character CR-LF sequence to mark the end of a line. Unix
    > uses a single LF character, and thus doesn't need to distinguish
    > between text and binary.
    >
    > --
    > Keith Thompson (The_Other_Keith) <>
    > Looking for software development work in the San Diego area.
    > "We must do something. This is something. Therefore, we must do this."
    > -- Antony Jay and Jonathan Lynn, "Yes Minister"
    David Craig, Dec 3, 2007
    #8
  9. Gary Chanson wrote:
    > "Keith Thompson" <> wrote in message
    > news:...
    >> Tim Roberts <> writes:
    >>> J de Boyne Pollard <> wrote:
    >>>> Actually, the binary/text dichotomy comes from the C language. The
    >>>> operating systems themselves have and make no such distinction. (To
    >>>> the operating systems themselves, files are just octet streams. There
    >>>> are no lines, no newline sequences, and no EOF marker characters.)
    >>> I'm sorry, but you are incorrect. Apparently, you never got burned
    >>> trying
    >>> to use the "copy" command without "/b" in the early versions of MS-DOS on
    >>> a
    >>> file that happened to contain an embedded Ctrl-Z (the text-mode "end of
    >>> file" character). It, in turn, inherited that behavior from CP/M.
    >>>
    >>> The C run-time library had to ADD the text/binary distinction because
    >>> CP/M
    >>> and MS-DOS embedded it in their file system mechanisms. That concept was
    >>> certainly not part of the C run-time before implementations were built
    >>> for
    >>> those operating systems.

    >> Are you sure that CP/M and MS-DOS where the specific reasons for this
    >> C feature? There are certainly other operating systems (including
    >> VMS) that distinguish between text files and binary files.

    >
    > My understanstanding is that it was originally imported into CP/M from
    > Unix.


    Your understanding is incorrect. One of the key concepts of UNIX was
    that files were just files. There was no distinction between different
    types of file, and no "special data" in the file to indicate
    end-of-file. I don't know if UNIX originated this concept, but it was
    relatively novel at the time and UNIX did much to popularize it. The
    distinction between binary and text files in the Standard I/O library
    was added when C was ported to other OSes.
    J. J. Farrell, Dec 3, 2007
    #9
  10. In article <> "David Craig" <> writes:
    > "Keith Thompson" <> wrote in message
    > news:...
    > > That doesn't make sense. CP/M (or at least a C implementation under
    > > CP/M) has to distinguish between text and binary files, because it
    > > uses a two-character CR-LF sequence to mark the end of a line. Unix
    > > uses a single LF character, and thus doesn't need to distinguish
    > > between text and binary.


    > How about that even Unix needs to generate a CR/LF pair when a 'Newline -
    > 0x0A' is encountered in output to a tty type device.


    How about that there is a difference between how files are stored on disk
    and what happens if said file is displayed on a tty type device? The
    conversion is done by the tty driver. As a MacOS tty driver would
    convert a CR to the combined CR/LF. Normally such *tty drivers* would
    expect that it is a text file that will be displayed. With respect to
    the C programming environment there is no difference between text files
    and binary files.
    --
    dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
    home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
    Dik T. Winter, Dec 3, 2007
    #10
  11. In article <fivmf5$22ru$> "J. J. Farrell" <> writes:
    ....
    > Your understanding is incorrect. One of the key concepts of UNIX was
    > that files were just files. There was no distinction between different
    > types of file, and no "special data" in the file to indicate
    > end-of-file. I don't know if UNIX originated this concept, but it was
    > relatively novel at the time and UNIX did much to popularize it. The
    > distinction between binary and text files in the Standard I/O library
    > was added when C was ported to other OSes.


    The concept was much older. On all the older systems I have worked with,
    end-of-file was no special data in the file, but merely metadata held by
    the system in the information about the file. I think that CP/M was the
    first system that made that metadata part of the file. On the other hand,
    the distinction between text and binary files has been present in many
    file systems, but at a quite different level. And the only level were
    they were different was whether to interprete a particular sequence of
    bytes as end-of-line. Never whether something should be interpreted as
    end-of-file.
    --
    dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
    home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
    Dik T. Winter, Dec 3, 2007
    #11
  12. J de Boyne Pollard

    CBFalconer Guest

    David Craig wrote: *** and top-posted - fixed ***
    > "Keith Thompson" <> wrote in message
    >> "Gary Chanson" <> writes:
    >>> "Keith Thompson" <> wrote in message
    >>>

    >> [...
    >>
    >>>>> The C run-time library had to ADD the text/binary distinction
    >>>>> because CP/M and MS-DOS embedded it in their file system
    >>>>> mechanisms. That concept was certainly not part of the C
    >>>>> run-time before implementations were built for those operating
    >>>>> systems.
    >>>>
    >>>> Are you sure that CP/M and MS-DOS where the specific reasons
    >>>> for this C feature? There are certainly other operating
    >>>> systems (including VMS) that distinguish between text files
    >>>> and binary files.
    >>>
    >>> My understanstanding is that it was originally imported into
    >>> CP/M from Unix.

    >>
    >> That doesn't make sense. CP/M (or at least a C implementation
    >> under CP/M) has to distinguish between text and binary files,
    >> because it uses a two-character CR-LF sequence to mark the end
    >> of a line. Unix uses a single LF character, and thus doesn't
    >> need to distinguish between text and binary.

    >
    > How about that even Unix needs to generate a CR/LF pair when a
    > 'Newline - 0x0A' is encountered in output to a tty type device.
    > Unix is old and works/worked with teletype terminals where a CR
    > returns the carriage to column one and the LF causes the paper
    > to feed up one line. Some even required multiple CR characters
    > because they were so slow and would loose characters that
    > followed too quickly when a major movement of the carriage was
    > required.


    This was usually handled by having the terminal driver emit "CR,
    LF, DC3" to prompt for a new line. At line end, the echoing
    machinery would emit "DC1, CR". I think I have the sequence
    right. At any rate, there was enough idle time for the carriage to
    recover, and the sequences would also stop/start the tape reader,
    if present and loaded. When the input line was half duplex those
    sequences would also prompt the sending device to unload another
    line.

    Please do not top-post. Your answer belongs after (or intermixed
    with) the quoted material to which you reply, after snipping all
    irrelevant material. I fixed this one. See the following links:

    --
    <http://www.catb.org/~esr/faqs/smart-questions.html>
    <http://www.caliburn.nl/topposting.html>
    <http://www.netmeister.org/news/learn2quote.html>
    <http://cfaj.freeshell.org/google/> (taming google)
    <http://members.fortunecity.com/nnqweb/> (newusers)



    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Dec 3, 2007
    #12
  13. J de Boyne Pollard

    CBFalconer Guest

    "Dik T. Winter" wrote:
    > "J. J. Farrell" <> writes:
    > ...
    >> Your understanding is incorrect. One of the key concepts of UNIX
    >> was that files were just files. There was no distinction between
    >> different types of file, and no "special data" in the file to
    >> indicate end-of-file. I don't know if UNIX originated this
    >> concept, but it was relatively novel at the time and UNIX did
    >> much to popularize it. The distinction between binary and text
    >> files in the Standard I/O library was added when C was ported to
    >> other OSes.

    >
    > The concept was much older. On all the older systems I have
    > worked with, end-of-file was no special data in the file, but
    > merely metadata held by the system in the information about the
    > file. I think that CP/M was the first system that made that
    > metadata part of the file. On the other hand, the distinction
    > between text and binary files has been present in many file
    > systems, but at a quite different level. And the only level
    > were they were different was whether to interprete a particular
    > sequence of bytes as end-of-line. Never whether something
    > should be interpreted as end-of-file.


    No, EOF has always meant "we hit the end of recorded data". The
    CP/M solution was because the file length was recorded in terms of
    128 byte records, and these did not match the structure of text
    files. Therefore CP/M added an EOF character to the text stored.

    Similarly CP/M didn't do any LF --> CR/LF --> LF translation while
    writing and reading, but just wrote the CR/LF sequence. Less code
    that way :). DOS just copied it, because of laziness and because
    the primary market.

    --
    Chuck F (cbfalconer at maineline dot net)
    <http://cbfalconer.home.att.net>
    Try the download section.


    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Dec 3, 2007
    #13
  14. J de Boyne Pollard

    Pops Guest

    Finally, some one who can relate to history. Thanks David. :)

    The following note is not specifically to you, just in general.

    In general, the terms of "raw" vs "cooked" is broadly applied to
    basically means non-translation vs translation. Generally, we cook
    something in order to establish some structural consistency either for
    input or output.

    Specifically, when it comes to dealing with text vs binary ideas, in
    general, we are talking about dealing with the control codes, ASCII code
    0 to 31.

    For file storage and we are mostly talking about three control codes
    dealing with EOL (End of line) or EOF (End of File) entities.

    <lf> 0x0A ^J
    <cr> 0x0D ^M
    <eof> 0x1D ^Z

    (note, I am not using strict ASCII mnemonics here.)

    It is still possible to have other control codes in a text file. So the
    idea of text vs binary is really only relevant to the application or
    usage in question. If the text file as ANSI escape codes, then it can
    viewed as binary as well. In fact, there are software where this is
    important for them to detect a binary file if has other control codes
    other than <lf><cr><eof>. But it really depends on how they are used.

    Typically when it comes to interactive input device, other control codes
    come into play:

    <xon> 0x11 ^Q
    <xoff> 0x13 ^S
    <etx> 0x03 ^C
    <lf> 0x0A ^J
    <cr> 0x0D ^M
    <eof> 0x1D ^Z

    When it comes to output devices, like a printer, terminals other control
    codes come into play. For printers and smart terminals with ANSI, VT10x
    emulations, it typically interpret these, especially the escape code:

    <xon> 0x11 ^Q
    <xoff> 0x13 ^S
    <cr> 0x0D ^M
    <lf> 0x0A ^J
    <eof> 0x1D ^Z
    <esc> 0x1B ^[

    Anyway, even Unix has to deal with the outside I/O world and even with
    its own world, it is all done transparently.

    The issue with the OP, and my main point when using the fgets() that it
    is important to note the idea of "cooked" and "raw" translations,
    especially Windows or MS-DOS world with cooking the standard devices and
    files open in text mode are 100% cooked. So it depends on where the
    input is coming from when he is dealing with a line reader which was
    what he was seeking.


    --
    HLS


    David Craig wrote:
    > How about that even Unix needs to generate a CR/LF pair when a 'Newline -
    > 0x0A' is encountered in output to a tty type device. Unix is old and
    > works/worked with teletype terminals where a CR returns the carriage to
    > column one and the LF causes the paper to feed up one line. Some even
    > required multiple CR characters because they were so slow and would loose
    > characters that followed too quickly when a major movement of the carriage
    > was required.
    >
    > "Keith Thompson" <> wrote in message
    > news:...
    >> "Gary Chanson" <> writes:
    >>> "Keith Thompson" <> wrote in message
    >>> news:...

    >> [...
    >>>>> The C run-time library had to ADD the text/binary distinction
    >>>>> because CP/M and MS-DOS embedded it in their file system
    >>>>> mechanisms. That concept was certainly not part of the C run-time
    >>>>> before implementations were built for those operating systems.
    >>>> Are you sure that CP/M and MS-DOS where the specific reasons for this
    >>>> C feature? There are certainly other operating systems (including
    >>>> VMS) that distinguish between text files and binary files.
    >>> My understanstanding is that it was originally imported into CP/M
    >>> from
    >>> Unix.

    >> That doesn't make sense. CP/M (or at least a C implementation under
    >> CP/M) has to distinguish between text and binary files, because it
    >> uses a two-character CR-LF sequence to mark the end of a line. Unix
    >> uses a single LF character, and thus doesn't need to distinguish
    >> between text and binary.
    >>
    >> --
    >> Keith Thompson (The_Other_Keith) <>
    >> Looking for software development work in the San Diego area.
    >> "We must do something. This is something. Therefore, we must do this."
    >> -- Antony Jay and Jonathan Lynn, "Yes Minister"

    >
    >
    Pops, Dec 3, 2007
    #14
  15. J de Boyne Pollard

    Pops Guest

    Dik T. Winter wrote:
    >
    > > How about that even Unix needs to generate a CR/LF pair when a 'Newline -
    > > 0x0A' is encountered in output to a tty type device.

    >
    > How about that there is a difference between how files are stored on disk
    > and what happens if said file is displayed on a tty type device? The
    > conversion is done by the tty driver. As a MacOS tty driver would
    > convert a CR to the combined CR/LF. Normally such *tty drivers* would
    > expect that it is a text file that will be displayed. With respect to
    > the C programming environment there is no difference between text files
    > and binary files.



    +1, while C itself is device independent, the type of device itself
    means something as you elegantly pointed out in regards to the device
    driver in question.

    The original poster, I presume migrating or posting from Unix, wanted
    the equivalent behavior of fgets().

    My basic point in my reply was he needs to deal with Cooked vs Raw
    concepts, especially in windows, and especially if his applications has
    to interface with devices or files that from various places.

    When a device is opened using the standard C I/O functions with the mode
    attribute containing "t" by the Windows or MS-DOS target application,
    C/C++ RTL (run time library) will read/write in cooked mode, by default.
    Its all clearly there in the MS C/C++ RTL source code provided in every
    distribution.

    Now if the application needs to interface with the outside world to get
    input, then it MAY need to be compiled or switch at run time to do I/O
    in non-cooked mode.

    You know how many times you see people posting simple C fetch using the
    standard device I/O heuristics claiming its 100% portable and Windows
    developers run into cooked standard I/O problems? Quite a few times.

    In general, for windows, all you need to add a few lines to make the
    standard I/O devices raw.

    _setmode( _fileno( stdin ), _O_BINARY );
    _setmode( _fileno( stdout ), _O_BINARY );

    Here is an example:

    /* fetch.c -- fetch via HTTP and dump the entire session to stdout
    posted by some a unix wienie claiming portability.

    - ported to windows to illustrate need to change the stdout
    default _O_TEXT cooked mode to _O_BINARY raw mode.

    */

    #ifdef _WIN32

    #include <windows.h>
    #include <stdio.h>
    #include <string.h>
    #include <winsock.h>
    #include <fcntl.h>
    #include <io.h>

    #pragma comment(lib,"wsock32.lib")
    #define close(a) closesocket(a)
    #define read(a,b,c) recv(a,b,c,0)
    #define write(a,b,c) send(a,b,c,0)

    #else
    #include <stdio.h>
    #include <string.h>
    #include <sys/types.h>
    #include <sys/socket.h>
    #include <netinet/in.h>
    #include <netdb.h>
    #include <signal.h>
    #endif

    main(argc, argv)
    int argc;
    char **argv;
    {
    int pfd; /* fd from socket */
    int len;
    char *hostP, *fileP;
    char buf[1024];
    struct hostent *hP; /* for host */
    struct sockaddr_in sin;
    #ifdef _WIN32
    WSADATA wd;
    if (WSAStartup(MAKEWORD(1, 1), &wd) != 0) {
    exit(1);
    }
    _setmode( _fileno( stdin ), _O_BINARY );
    _setmode( _fileno( stdout ), _O_BINARY );
    #endif

    if ( argc != 3 ) {
    fprintf( stderr, "Usage: %s host file\n", argv[0] );
    exit( 1 );
    }

    hostP = argv[1];
    fileP = argv[2];

    hP = gethostbyname( hostP );
    if ( hP == NULL ) {
    fprintf( stderr, "Unknown host \"%s\"\n", hostP );
    exit( 1 );
    }

    pfd = socket( AF_INET, SOCK_STREAM, 0 );
    if ( pfd < 0 ) {
    perror( "socket" );
    exit( 1 );
    }


    sin.sin_family = hP->h_addrtype;
    memcpy( (char *)&sin.sin_addr, hP->h_addr, hP->h_length );
    sin.sin_port = htons( 80 );
    if ( connect( pfd, (struct sockaddr *)&sin, sizeof(sin) ) < 0 ) {
    perror( "connect" );
    close( pfd );
    exit( 1 );
    }

    sprintf( buf, "GET %s HTTP/1.0\r\n\r\n", fileP );
    write( pfd, buf, strlen(buf));

    while ( ( len = read( pfd, buf, sizeof(buf)) ) > 0)
    fwrite( buf, 1, len, stdout );

    close( pfd );
    fflush( stdout );
    exit( 0 );
    }


    --
    HLS
    Pops, Dec 3, 2007
    #15
  16. J de Boyne Pollard

    Pops Guest

    Dik T. Winter wrote:
    > In article <fivmf5$22ru$> "J. J. Farrell" <> writes:
    > ...
    > > Your understanding is incorrect. One of the key concepts of UNIX was
    > > that files were just files. There was no distinction between different
    > > types of file, and no "special data" in the file to indicate
    > > end-of-file. I don't know if UNIX originated this concept, but it was
    > > relatively novel at the time and UNIX did much to popularize it. The
    > > distinction between binary and text files in the Standard I/O library
    > > was added when C was ported to other OSes.

    >
    > The concept was much older. On all the older systems I have worked with,
    > end-of-file was no special data in the file, but merely metadata held by
    > the system in the information about the file. I think that CP/M was the
    > first system that made that metadata part of the file. On the other hand,
    > the distinction between text and binary files has been present in many
    > file systems, but at a quite different level. And the only level were
    > they were different was whether to interprete a particular sequence of
    > bytes as end-of-line. Never whether something should be interpreted as
    > end-of-file.


    I can't speak for unix, I havn't work on it in a long time, but in
    Windows, ^Z is interpreted as an EOF (end of file).

    In DOS window:

    c:\> type con > foo
    aasdsadad<CR>
    asdadlaskdlas<CR>
    asdsada<CR>
    ^Z<CR>

    and the pipe will close. Read a file with ^Z in cooked mode its the
    feof() returns true.

    Now, in our FTP server and client applications, this is important. If a
    file is transferred in TYPE BINARY, then all its all escaped. If a file
    was transfered in TYPE ASCII, some FTP server/client will truncate any
    runoff of ^Z characters to save the file. Some don't kind because other
    usages of the file may deal with that. Others won't. Some systems will
    pack the storage to the nearest block size boundary it is using.

    This is common in old XMODEM file transfer protocol with its 128 byte
    blocks. So it not uncommon to see downloaded xmodem files with an even
    file size of 128 block size and if you looked at it the bottom was full
    of ^Z characters or even junk sometimes. The smarter XMODEM receiver
    would do the truncation upon reception.

    Today, the standards organizations such as the IETF, recognize that
    <CR><LF> is the standard delimiter in client/server communications,
    especially in email formats (RFC 2822). The irony was the Unix where
    most of the protocols were founded upon and never had to deal with this,
    today, it is the one that needs to take more into account cooking
    concepts in order to deal with the predominate <CRL><LF> outside world.

    --
    HLS
    Pops, Dec 3, 2007
    #16
  17. JdeBP> Actually, the binary/text dichotomy comes from the C
    JdeBP> language. The operating systems themselves have
    JdeBP> and make no such distinction. (To the operating
    JdeBP> systems themselves, files are just octet streams.
    JdeBP> There are no lines, no newline sequences, and no EOF
    JdeBP> marker characters.)

    TR> I'm sorry, but you are incorrect.

    False. Your understanding of the operation of the COPY command is
    wrong, and you have an erroneous idea of where the behaviour that you
    observed actually originates.

    TR> Apparently, you never got burned trying to use the
    TR> "copy" command without "/b" in the early versions of
    TR> MS-DOS on a file that happened to contain an embedded
    TR> Ctrl-Z (the text-mode "end of file" character).

    I encountered that behaviour. I encountered the silly behaviour of
    the COPY command that caused it to fail to copy zero-length files,
    too. However, that behaviour doesn't mean what you think it to mean.

    TR> It, in turn, inherited that behavior from CP/M.

    No, it didn't. PIP has no equivalent option.

    TR> The C run-time library had to ADD the text/binary
    TR> distinction because CP/M and MS-DOS embedded
    TR> it in their file system mechanisms.

    False. And this is where your error lies. The behaviour of the COPY
    command _is embedded in that command itself_. It has to comprise code
    for processing in "binary mode" and in "text mode". (You can see that
    code in the FreeDOS COMMAND at <URL:https://
    freedos.svn.sourceforge.net/svnroot/freedos/freecom/trunk/cmd/copy.c>,
    for example. This, in its turn, uses the stream mode flags of the C
    language's standard library, which is where all of the code to make a
    distinction between "text" and "binary" streams actually resides.)
    The operating system _makes no such distinction_. I suggest actually
    taking a look at the PC/MS/DR-DOS system API. There is no text/binary
    distinction embedded in the filesystem mechanism. Files are, as I
    said, just octet streams.
    J de Boyne Pollard, Dec 3, 2007
    #17
  18. J de Boyne Pollard

    Kaz Kylheku Guest

    On Dec 2, 5:29 pm, "David Craig" <> wrote:
    > How about that even Unix needs to generate a CR/LF pair when a 'Newline -
    > 0x0A' is encountered in output to a tty type device.


    Fortunately, Thompson was intelligent enough to realize that the
    control characters for printing devices should not determine the
    representation of text files. The conversion is tucked away into the
    kernel, and can be turned on and off.

    The people who designed CR-LF into the various Internet protocols
    really dropped the ball. There was an opportunity to fix this
    braindamage in HTTP, but alas.

    > Unix is old and
    > works/worked with teletype terminals where a CR returns the carriage to
    > column one and the LF causes the paper to feed up one line.


    That's, like, because CR actually stands for carriage return, and LF
    for line feed, which is enshrined in the USASCII code. :)

    It's wrong for a character display or printing device to give any
    other meanings to these standardized codes.

    The VT100 terminal, which is widely emulated today, also works this
    way, and so Unix systems in general nearly always have the ONLCR flag
    turned on when communicating with their own character consoles or
    terminal emulators like xterm, etc.
    Kaz Kylheku, Dec 3, 2007
    #18
  19. It's not "Windows", it's particular CON pseudo device and C runtime library.

    "Pops" <> wrote in message
    news:...
    >
    > I can't speak for unix, I havn't work on it in a long time, but in
    > Windows, ^Z is interpreted as an EOF (end of file).
    >
    > In DOS window:
    >
    > c:\> type con > foo
    > aasdsadad<CR>
    > asdadlaskdlas<CR>
    > asdsada<CR>
    > ^Z<CR>
    >
    > and the pipe will close. Read a file with ^Z in cooked mode its the
    > feof() returns true.
    >
    > Now, in our FTP server and client applications, this is important. If a
    > file is transferred in TYPE BINARY, then all its all escaped. If a file
    > was transfered in TYPE ASCII, some FTP server/client will truncate any
    > runoff of ^Z characters to save the file. Some don't kind because other
    > usages of the file may deal with that. Others won't. Some systems will
    > pack the storage to the nearest block size boundary it is using.
    >
    > This is common in old XMODEM file transfer protocol with its 128 byte
    > blocks. So it not uncommon to see downloaded xmodem files with an even
    > file size of 128 block size and if you looked at it the bottom was full of
    > ^Z characters or even junk sometimes. The smarter XMODEM receiver would
    > do the truncation upon reception.
    >
    > Today, the standards organizations such as the IETF, recognize that
    > <CR><LF> is the standard delimiter in client/server communications,
    > especially in email formats (RFC 2822). The irony was the Unix where most
    > of the protocols were founded upon and never had to deal with this, today,
    > it is the one that needs to take more into account cooking concepts in
    > order to deal with the predominate <CRL><LF> outside world.
    >
    > --
    > HLS
    Alexander Grigoriev, Dec 3, 2007
    #19
  20. J de Boyne Pollard

    Ernie Wright Guest

    J de Boyne Pollard wrote:

    > TR> The C run-time library had to ADD the text/binary
    > TR> distinction because CP/M and MS-DOS embedded
    > TR> it in their file system mechanisms.
    >
    > False. And this is where your error lies. The behaviour of the COPY
    > command _is embedded in that command itself_. It has to comprise code
    > for processing in "binary mode" and in "text mode". [...]
    > The operating system _makes no such distinction_. I suggest actually
    > taking a look at the PC/MS/DR-DOS system API. There is no text/binary
    > distinction embedded in the filesystem mechanism. Files are, as I
    > said, just octet streams.


    But *devices* are not. MS-DOS character-mode devices do distinguish
    between text and binary streams. Devices include AUX, PRN and CON.
    Since these can be a source or destination for the COPY command, COPY
    must also respect the distinction, and so must any other interface that
    treats devices as if they were files.

    Including C streams. C's stdin, stdout, stderror streams are typically
    mapped to the MS-DOS CON device.

    MS-DOS Int 21h functions 4400h and 4401h get and set device status. Bit
    5 of DX determines whether the device is functioning in text or binary
    mode.

    I don't think CP/M makes this distinction, but I don't know. I think
    the *convention* of terminating text files with Ctrl-Z arose because
    CP/M couldn't store the exact byte size of the file. Its file size
    granularity was 128 bytes.

    - Ernie http://home.comcast.net/~erniew
    Ernie Wright, Dec 3, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Charles Wilkins

    cin before fgets

    Charles Wilkins, Aug 7, 2003, in forum: C++
    Replies:
    4
    Views:
    710
    Charles Wilkins
    Aug 7, 2003
  2. Mike Mimic

    fgets and newline

    Mike Mimic, May 15, 2004, in forum: C++
    Replies:
    4
    Views:
    8,010
    John Harrison
    May 15, 2004
  3. DJP
    Replies:
    7
    Views:
    7,368
    glen herrmannsfeldt
    Oct 21, 2004
  4. Replies:
    6
    Views:
    1,236
    Mark McIntyre
    Jun 25, 2003
  5. Eigenvector

    fgets and problems reading into array

    Eigenvector, Jul 26, 2003, in forum: C Programming
    Replies:
    12
    Views:
    627
    Glen Herrmannsfeldt
    Jul 29, 2003
Loading...

Share This Page