C Text/Binary Files

Discussion in 'C Programming' started by Bartc, Jun 22, 2008.

  1. Bartc

    Bartc Guest

    The stdin/stdout files of C seem to be always in Text mode.

    Is there any way of running a C program so that these (but especially
    stdout) are in Binary mode instead?

    (I'm in the process of wrapping a different language around C which doesn't
    want the concept of text and binary files. But if I output a string such as
    "ONE\nTWO\n", this will behave differently between stdout and a regular
    (binary) file. Examples on my OS:

    "\n" Output 13,10 in text mode; 10 in binary mode
    "\w" Output 13,13,10 in text mode; 13,10 in binary mode

    (\w is a new escape code equivalent to \r\n). Workarounds will be awkward
    (and I could never stop \n expanding to 13,10 for stdout) so would be nice
    to avoid them)

    --
    Thanks,

    Bartc
     
    Bartc, Jun 22, 2008
    #1
    1. Advertising

  2. Bartc

    santosh Guest

    Bartc wrote:

    > The stdin/stdout files of C seem to be always in Text mode.
    >
    > Is there any way of running a C program so that these (but especially
    > stdout) are in Binary mode instead?


    Yes, use freopen like this:

    FILE *fin, *fout, *ferr;

    fin = freopen(NULL, "rb", stdin);
    fout = freopen(NULL, "ab", stdout);
    ferr = freopen(NULL, "ab", stderr);

    You could assign the return value to stdin, stdout and stderr itself,
    but the standards says that they are not necessarily modifiable
    lvalues. However it will probably work on most systems you would care
    about.

    See section 7.19.5.4 of the standard for details.

    <snip>
     
    santosh, Jun 23, 2008
    #2
    1. Advertising

  3. santosh <> writes:

    > Bartc wrote:
    >
    >> The stdin/stdout files of C seem to be always in Text mode.
    >>
    >> Is there any way of running a C program so that these (but especially
    >> stdout) are in Binary mode instead?

    >
    > Yes, use freopen like this:
    >
    > FILE *fin, *fout, *ferr;
    >
    > fin = freopen(NULL, "rb", stdin);
    > fout = freopen(NULL, "ab", stdout);
    > ferr = freopen(NULL, "ab", stderr);
    >
    > You could assign the return value to stdin, stdout and stderr itself,
    > but the standards says that they are not necessarily modifiable
    > lvalues. However it will probably work on most systems you would care
    > about.


    More importantly, freopen is not guaranteed to do what Bartc wants.
    Thus the key information is not what the standard says but what
    typical implementations do on systems where there is difference
    between text and binary mode. I can give only one data point:
    lcc-win32 returns NULL from the freopen call (for stdout).

    --
    Ben.
     
    Ben Bacarisse, Jun 23, 2008
    #3
  4. Bartc

    santosh Guest

    Ben Bacarisse wrote:

    > santosh <> writes:
    >
    >> Bartc wrote:
    >>
    >>> The stdin/stdout files of C seem to be always in Text mode.
    >>>
    >>> Is there any way of running a C program so that these (but
    >>> especially stdout) are in Binary mode instead?

    >>
    >> Yes, use freopen like this:
    >>
    >> FILE *fin, *fout, *ferr;
    >>
    >> fin = freopen(NULL, "rb", stdin);
    >> fout = freopen(NULL, "ab", stdout);
    >> ferr = freopen(NULL, "ab", stderr);
    >>
    >> You could assign the return value to stdin, stdout and stderr itself,
    >> but the standards says that they are not necessarily modifiable
    >> lvalues. However it will probably work on most systems you would care
    >> about.

    >
    > More importantly, freopen is not guaranteed to do what Bartc wants.
    > Thus the key information is not what the standard says but what
    > typical implementations do on systems where there is difference
    > between text and binary mode. I can give only one data point:
    > lcc-win32 returns NULL from the freopen call (for stdout).


    And it similarly fails for stdin too. It's perhaps surprising that it
    should fail. What difficulty would an implementation like win-lcc have
    with this?
     
    santosh, Jun 23, 2008
    #4
  5. Bartc

    Ali Karaali Guest

    >
    > See section 7.19.5.4 of the standard for details.
    >
    > <snip>


    Anyway, How can I find out standard's documents?
     
    Ali Karaali, Jun 23, 2008
    #5
  6. Bartc

    Bartc Guest

    "Bartc" <> wrote in message
    news:LCA7k.14088$...
    > The stdin/stdout files of C seem to be always in Text mode.


    Thanks for the replies.

    I think if I use exclusively "\w" for newlines (ie. "\r\n") in strings and
    internal functions that generate newlines, then this will work for binary
    files.

    For stdout, this will generate (on my OS) 13,13,10, but for console output
    that is not critical. The only problem will be when stdout is piped or
    redirected to a file at the OS command line, then I will need to process the
    output to take out the extra 13.

    I can live with that.

    I have tried freopen() as suggested, and that sort of works, but output is
    then sent to a file. So this is an alternative perhaps to redirection by the
    OS and the mode /will/ be binary.

    --
    Bartc
     
    Bartc, Jun 23, 2008
    #6
  7. Ali Karaali <> writes:

    >>
    >> See section 7.19.5.4 of the standard for details.
    >>
    >> <snip>

    >
    > Anyway, How can I find out standard's documents?


    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf is a recent
    draft of C99. The same site has lots of other useful documents.

    --
    Ben.
     
    Ben Bacarisse, Jun 23, 2008
    #7
  8. Bartc

    rahul Guest

    On Jun 23, 8:21 am, santosh <> wrote:

    > And it similarly fails for stdin too. It's perhaps surprising that it
    > should fail. What difficulty would an implementation like win-lcc have
    > with this?



    The following works for me:
    #include <stdio.h>
    #include <stdlib.h>

    int
    main(void) {
    stdout = freopen(NULL, "ab", stdout);
    return 0;
    }

    I compiled that with gcc on Linux. It works probably because Linux/
    Unix does not distinguish between text and binary mode.
     
    rahul, Jun 23, 2008
    #8
  9. Bartc

    Richard Bos Guest

    santosh <> wrote:

    > Bartc wrote:
    >
    > > The stdin/stdout files of C seem to be always in Text mode.
    > >
    > > Is there any way of running a C program so that these (but especially
    > > stdout) are in Binary mode instead?

    >
    > Yes, use freopen like this:
    >
    > FILE *fin, *fout, *ferr;
    >
    > fin = freopen(NULL, "rb", stdin);
    > fout = freopen(NULL, "ab", stdout);
    > ferr = freopen(NULL, "ab", stderr);


    Note that freopen() with a null first argument is new in C99. In C89,
    you had to give a new file name.

    Richard
     
    Richard Bos, Jun 23, 2008
    #9
  10. "Bartc" <> writes:
    > "Bartc" <> wrote in message
    > news:LCA7k.14088$...
    > > The stdin/stdout files of C seem to be always in Text mode.

    >
    > Thanks for the replies.
    >
    > I think if I use exclusively "\w" for newlines (ie. "\r\n") in strings and
    > internal functions that generate newlines, then this will work for binary
    > files.

    [...]

    What is "\w"? It's not a standard escape sequence; its value is
    implementation-defined.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jun 23, 2008
    #10
  11. On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
    > What is "\w"? It's not a standard escape sequence; its value is
    > implementation-defined.


    "\w" does not match the syntax of a string literal, so by the rule of the
    longest match this is tokenised as {"}{\}{w}{"}. The behaviour is
    undefined if a double quote character occurs as a single token. There need
    not be any value given to "\w", and if there is, it need not be documented.
     
    Harald van Dijk, Jun 23, 2008
    #11
  12. Bartc

    Bartc Guest

    "Keith Thompson" <> wrote in message
    news:...
    > "Bartc" <> writes:
    >> "Bartc" <> wrote in message
    >> news:LCA7k.14088$...
    >> > The stdin/stdout files of C seem to be always in Text mode.

    >>
    >> Thanks for the replies.
    >>
    >> I think if I use exclusively "\w" for newlines (ie. "\r\n") in strings
    >> and
    >> internal functions that generate newlines, then this will work for binary
    >> files.

    > [...]
    >
    > What is "\w"? It's not a standard escape sequence; its value is
    > implementation-defined.


    Sorry. In my original post I'd indicated (not very clearly) that \w was a
    new escape in a language I was creating to wrap around C.

    So it's not a C escape but is translated to "\r\n". It represents 'windows
    newline'; (or more generally, the full newline sequence used in the target
    OS).

    --
    Bartc
     
    Bartc, Jun 23, 2008
    #12
  13. Harald van Dþÿ3k <> writes:
    > On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
    > > What is "\w"? It's not a standard escape sequence; its value is
    > > implementation-defined.

    >
    > "\w" does not match the syntax of a string literal, so by the rule
    > of the longest match this is tokenised as {"}{\}{w}{"}. The
    > behaviour is undefined if a double quote character occurs as a
    > single token. There need not be any value given to "\w", and if
    > there is, it need not be documented.


    I believe you're mostly or entirely right, and I was wrong.

    I misinterpreted the second clause of C99 6.4.4.4p10:

    The value of an integer character constant containing more than
    one character (e.g., 'ab'), or containing a character or escape
    sequence that does not map to a single-byte execution character,
    is implementation-defined.

    as applying to things like '\w'; instead, it applies to things like
    '\xffffffff'.

    "\w" is split into 4 preprocessor tokens:
    " \ w "
    The " is not a punctuator; it's in the category "each non-white-space
    character that cannot be one of the above" (C99 6.4), which means the
    behavior is undefined.

    In addition, though, this preprocessor token cannot be converted to a
    token. The constraint in 6.4p2 is:

    Each preprocessing token that is converted to a token shall have
    the lexical form of a keyword, an identifier, a constant, a string
    literal, or a punctuator.

    So, assuming that "\w" isn't surrounded by something like "#if 0"
    .... "endif", it would seem to be a constraint violation. By C99
    5.1.1.3, this requires a diagnostic even if the behavior is also
    undefined.

    Note that, by the same reasoning, "abcd\w" should be split into 5
    preprocessing tokens:

    " abcd \ w "

    which just seems confusing. But since such cases require a diagnostic
    anyway, a compiler doesn't actually have to pp-tokenize it that way;
    as long as it prints a warning or error message, its job is done.

    Still, I think the description would have been simpler if a \ followed
    by any character in a character or string literal were allowed
    syntactically, with a constraint limiting the following character to
    the ones that are specified. Then "\w" would be a single pp-token and
    a single token (a string literal), with a diagnostic required because
    of the constraint violation.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jun 23, 2008
    #13
  14. On 23 Jun 2008 at 21:43, Keith Thompson wrote:
    > 䡡牡汤â¶æ…®â„ij欠㱴牵敤晸ä§æµ¡æ¥¬â¹£æ½­ã¸ ç²æ¥´æ•³?
    > 㸠佮â潮Ⱐ㈳âŠç•®â€²ã€°ã  ã„²ã¨µã¤ºã€±â€­ã€·ã€°â° ä­¥æ¥´æ  å‘¨æ½­ç³æ½®â·ç‰¯ç‘¥?


    You may want to check whether you really mean to include this header:
    Content-Type: text/plain; charset=utf-16be
     
    Antoninus Twink, Jun 23, 2008
    #14
  15. On Mon, 23 Jun 2008 14:43:50 -0700, Keith Thompson wrote:
    > Harald van Dþÿ3k <> writes:
    >> On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
    >> > What is "\w"? It's not a standard escape sequence; its value is
    >> > implementation-defined.

    >>
    >> "\w" does not match the syntax of a string literal, so by the rule of
    >> the longest match this is tokenised as {"}{\}{w}{"}. The behaviour is
    >> undefined if a double quote character occurs as a single token. There
    >> need not be any value given to "\w", and if there is, it need not be
    >> documented.

    > [...]
    > "\w" is split into 4 preprocessor tokens:
    > " \ w "
    > The " is not a punctuator; it's in the category "each non-white-space
    > character that cannot be one of the above" (C99 6.4), which means the
    > behavior is undefined.


    Yes. This would normally cause nothing more than a constraint violation
    (as you pointed out below) or syntax error, but in the special case of '
    or ", the behaviour is explicitly undefined.

    > In addition, though, this preprocessor token cannot be converted to a
    > token. The constraint in 6.4p2 is:
    >
    > Each preprocessing token that is converted to a token shall have the
    > lexical form of a keyword, an identifier, a constant, a string
    > literal, or a punctuator.
    >
    > So, assuming that "\w" isn't surrounded by something like "#if 0" ...
    > "endif", it would seem to be a constraint violation. By C99 5.1.1.3,
    > this requires a diagnostic even if the behavior is also undefined.


    That's a fair point, though I'm not sure this is intended. As I understand
    it, the point of making a stray " undefined was (in part) to allow for
    implementations to support multi-line string literals as an extension. An
    example similar to what I've posted on c.l.c before:

    #define IGNORE(arg) /* nothing */
    int main(void) {
    IGNORE(")
    void *p = 1;
    IGNORE(")
    }

    Strictly by the standard, the two identical lines are tokenised as
    {IGNORE}{(}{"}{)}, which expands to nothing. So after preprocessing, an
    non-zero integer constant is used to initialise a pointer, which violates
    a constraint. Some implementations, however, are unable to diagnose this,
    because they take the undefined behaviour of a stray " as permission to
    tokenise the body of main as

    {IGNORE}
    {(}
    {")\n void *p = 1;\n IGNORE("}
    {)}

    I believe that since the behaviour is undefined in translation phase 3,
    any constraint violations in later phases should not require a diagnostic.
    I cannot back this up with wording from the standard, only explain with
    examples.

    > Note that, by the same reasoning, "abcd\w" should be split into 5
    > preprocessing tokens:
    >
    > " abcd \ w "


    Yes, and then by my interpretation, the behaviour is undefined, so an
    implementation may choose to make this a single string literal, with or
    without a diagnostic, without any requirement on generated code (if any).

    > which just seems confusing. But since such cases require a diagnostic
    > anyway, a compiler doesn't actually have to pp-tokenize it that way; as
    > long as it prints a warning or error message, its job is done.
    >
    > Still, I think the description would have been simpler if a \ followed
    > by any character in a character or string literal were allowed
    > syntactically, with a constraint limiting the following character to the
    > ones that are specified. Then "\w" would be a single pp-token and a
    > single token (a string literal), with a diagnostic required because of
    > the constraint violation.


    Agreed.
     
    Harald van Dijk, Jun 23, 2008
    #15
  16. [Apologies for the binary garbage I posted earlier. I'm having
    multiple system problems, and the system I'm now using apparently
    didn't like the non-ASCII character in Harald's last name. My "From:"
    address has also been incorrect in most of today's postings; the
    "" address hasn't existed for several years.]

    Harald van D?k <> writes:
    > On Mon, 23 Jun 2008 12:59:01 -0700, Keith Thompson wrote:
    > > What is "\w"? It's not a standard escape sequence; its value is
    > > implementation-defined.

    >
    > "\w" does not match the syntax of a string literal, so by the rule
    > of the longest match this is tokenised as {"}{\}{w}{"}. The
    > behaviour is undefined if a double quote character occurs as a
    > single token. There need not be any value given to "\w", and if
    > there is, it need not be documented.


    I believe you're mostly or entirely right, and I was wrong.

    I misinterpreted the second clause of C99 6.4.4.4p10:

    The value of an integer character constant containing more than
    one character (e.g., 'ab'), or containing a character or escape
    sequence that does not map to a single-byte execution character,
    is implementation-defined.

    as applying to things like '\w'; instead, it applies to things like
    '\xffffffff'.

    "\w" is split into 4 preprocessor tokens:
    " \ w "
    The " is not a punctuator; it's in the category "each non-white-space
    character that cannot be one of the above" (C99 6.4), which means the
    behavior is undefined.

    In addition, though, this preprocessor token cannot be converted to a
    token. The constraint in 6.4p2 is:

    Each preprocessing token that is converted to a token shall have
    the lexical form of a keyword, an identifier, a constant, a string
    literal, or a punctuator.

    So, assuming that "\w" isn't surrounded by something like "#if 0"
    .... "endif", it would seem to be a constraint violation. By C99
    5.1.1.3, this requires a diagnostic even if the behavior is also
    undefined.

    Note that, by the same reasoning, "abcd\w" should be split into 5
    preprocessing tokens:

    " abcd \ w "

    which just seems confusing. But since such cases require a diagnostic
    anyway, a compiler doesn't actually have to pp-tokenize it that way;
    as long as it prints a warning or error message, its job is done.

    Still, I think the description would have been simpler if a \ followed
    by any character in a character or string literal were allowed
    syntactically, with a constraint limiting the following character to
    the ones that are specified. Then "\w" would be a single pp-token and
    a single token (a string literal), with a diagnostic required because
    of the constraint violation.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Jun 23, 2008
    #16
  17. Bartc wrote:
    > "Keith Thompson" <> wrote in message
    > news:...
    >> "Bartc" <> writes:
    >>> "Bartc" <> wrote in message
    >>> news:LCA7k.14088$...
    >>>> The stdin/stdout files of C seem to be always in Text mode.
    >>>
    >>> Thanks for the replies.
    >>>
    >>> I think if I use exclusively "\w" for newlines (ie. "\r\n") in
    >>> strings and
    >>> internal functions that generate newlines, then this will work for
    >>> binary files.

    >> [...]
    >>
    >> What is "\w"? It's not a standard escape sequence; its value is
    >> implementation-defined.

    >
    > Sorry. In my original post I'd indicated (not very clearly) that \w
    > was a new escape in a language I was creating to wrap around C.
    >
    > So it's not a C escape but is translated to "\r\n". It represents
    > 'windows newline'; (or more generally, the full newline sequence used
    > in the target OS).

    So where then does your '\w' differ from C's '\n'? In Windows '\n' results
    in CR LF, in UNIX in LF, in MacOS in CP (or the other way round?), on other
    platforms in whatever that platform uses to separate lines.

    Bye, Jojo
     
    Joachim Schmitz, Jun 24, 2008
    #17
  18. Bartc

    Bartc Guest

    "Joachim Schmitz" <> wrote in message
    news:g3qc5m$qic$...
    > Bartc wrote:
    >> "Keith Thompson" <> wrote in message
    >> news:...


    >>> What is "\w"? It's not a standard escape sequence; its value is
    >>> implementation-defined.

    >>
    >> Sorry. In my original post I'd indicated (not very clearly) that \w
    >> was a new escape in a language I was creating to wrap around C.
    >>
    >> So it's not a C escape but is translated to "\r\n". It represents
    >> 'windows newline'; (or more generally, the full newline sequence used
    >> in the target OS).


    > So where then does your '\w' differ from C's '\n'? In Windows '\n' results
    > in CR LF, in UNIX in LF, in MacOS in CP (or the other way round?), on
    > other platforms in whatever that platform uses to separate lines.


    \w expands to \r\n (eg. CR,LF) at compile-time (in the other language).
    \n stays as \n (typically LF) at compile-time.

    \n only expands to all those other combinations at runtime, and only for
    text modes.
    At runtime, \w would result in \r followed by the expansion of \n, for text
    modes.

    Actual code:
    printf("Hello World\w")

    After translating to C:
    printf("Hello World\r\n");

    At runtime (using printf, stdout directed to a file):
    150C:0100 48 65 6C 6C 6F 20 57 6F-72 6C 64 0D 0D 0A 30 3A Hello
    World...0:

    --
    Bartc
     
    Bartc, Jun 24, 2008
    #18
  19. In article <tD38k.14756$>,
    Bartc <> wrote:

    >\w expands to \r\n (eg. CR,LF) at compile-time (in the other language).
    >\n stays as \n (typically LF) at compile-time.


    I can see this might be useful for writing to binary files in the
    system's native text format.

    It's limited to systems where the line break is represented by a
    sequence of characters: it doesn't make sense on systems with lines
    implemented in some other way (e.g. with a count). Of course, you may
    not consider that important nowadays.

    For a purely C solution you could just define a macro; e.g. for
    Windows

    #define LINEEND "\015\012"

    and you can use it easily in constant strings

    "hello" LINEEND "world" LINEEND

    -- Richard
    --
    In the selection of the two characters immediately succeeding the numeral 9,
    consideration shall be given to their replacement by the graphics 10 and 11 to
    facilitate the adoption of the code in the sterling monetary area. (X3.4-1963)
     
    Richard Tobin, Jun 24, 2008
    #19
  20. Bartc

    santosh Guest

    BigRelax wrote:

    > Hello ``
    > I am a student from china.
    > I like c.
    >
    > If you make a friend with me, I am very happy.
    > My MSN ID is


    This is not a group for "making friends" or idle chit-chat. If you have
    questions or problem on standard C post them here.

    > --
    > Message posted using
    > http://www.talkaboutprogramming.com/group/comp.lang.c/ More
    > information at http://www.talkaboutprogramming.com/faq.html


    Complain to the maintainer of the above forum that the signature
    separator that they add is broken.
     
    santosh, Jun 25, 2008
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page