"\x1337\xcafe"?

Discussion in 'C Programming' started by Ivan Shmakov, Nov 19, 2012.

  1. Ivan Shmakov

    Ivan Shmakov Guest

    Somehow, I assumed that the following two definitions are
    equivalent:

    unsigned char buf[]
    = ("\x1337\xcafe");

    unsigned char buf[]
    = ("\x13" "37\xca" "\xfe");

    However, it turns out that \x1337 is understood as 0x1337, which
    is then truncated to the size of unsigned char: 0x37. Thus, the
    compiler reads the first definition as:

    unsigned char buf[]
    = ("\x37\xfe");

    I wonder what do the specifications say on this matter?

    TIA.

    $ gcc --version
    gcc (Debian 4.7.2-4) 4.7.2
    Copyright (C) 2012 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

    $

    --
    FSF associate member #7257
     
    Ivan Shmakov, Nov 19, 2012
    #1
    1. Advertising

  2. Ivan Shmakov

    Nobody Guest

    On Mon, 19 Nov 2012 13:55:21 +0700, Ivan Shmakov wrote:

    > I wonder what do the specifications say on this matter?


    6.4.4.4 (Character constants) p1:

    ...

    escape-sequence:
    simple-escape-sequence
    octal-escape-sequence
    hexadecimal-escape-sequence
    universal-character-name

    ...

    octal-escape-sequence:
    \ octal-digit
    \ octal-digit octal-digit
    \ octal-digit octal-digit octal-digit

    hexadecimal-escape-sequence:
    \x hexadecimal-digit
    hexadecimal-escape-sequence hexadecimal-digit

    ...

    6.4.5 (String Literals) p3:

    The same considerations apply to each element of the sequence in a
    character string literal or a wide string literal as if it were in an
    integer character constant or a wide character constant, except that the
    single-quote ' is representable either by itself or by the escape
    sequence \', but the double-quote " shall be represented by the escape
    sequence \"

    Note the disparity between octal (which is limited to 3 digits) and
    hexadecimal (which isn't limited).
     
    Nobody, Nov 19, 2012
    #2
    1. Advertising

  3. Ivan Shmakov <> wrote:
    > Somehow, I assumed that the following two definitions are
    > equivalent:


    > unsigned char buf[]
    > = ("\x1337\xcafe");


    > unsigned char buf[]
    > = ("\x13" "37\xca" "\xfe");


    > However, it turns out that \x1337 is understood as 0x1337, which
    > is then truncated to the size of unsigned char: 0x37. Thus, the
    > compiler reads the first definition as:


    > unsigned char buf[]
    > = ("\x37\xfe");


    In addition to the following post, note that C requires char
    to be at least eight bits, but it can be more. In implementations
    where it is more, it doesn't need to truncate.

    -- glen
     
    glen herrmannsfeldt, Nov 19, 2012
    #3
  4. Ivan Shmakov <> writes:
    > Somehow, I assumed that the following two definitions are
    > equivalent:
    >
    > unsigned char buf[]
    > = ("\x1337\xcafe");
    >
    > unsigned char buf[]
    > = ("\x13" "37\xca" "\xfe");
    >
    > However, it turns out that \x1337 is understood as 0x1337, which
    > is then truncated to the size of unsigned char: 0x37. Thus, the
    > compiler reads the first definition as:
    >
    > unsigned char buf[]
    > = ("\x37\xfe");
    >
    > I wonder what do the specifications say on this matter?

    [...]

    It's not hard to find out.

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Nov 19, 2012
    #4
  5. Ivan Shmakov

    BGB Guest

    On 11/19/2012 12:55 AM, Ivan Shmakov wrote:
    > Somehow, I assumed that the following two definitions are
    > equivalent:
    >
    > unsigned char buf[]
    > = ("\x1337\xcafe");
    >
    > unsigned char buf[]
    > = ("\x13" "37\xca" "\xfe");
    >
    > However, it turns out that \x1337 is understood as 0x1337, which
    > is then truncated to the size of unsigned char: 0x37. Thus, the
    > compiler reads the first definition as:
    >
    > unsigned char buf[]
    > = ("\x37\xfe");
    >
    > I wonder what do the specifications say on this matter?
    >


    this is the defined behavior for C.


    \x will parse any number of characters provided they all look like hex,
    and is used regardless of character size (say, if the string uses
    wide-characters, ...).


    this is not the case though for many other languages though, which often
    take the interpretation that \x is followed by exactly 2 characters, and
    add things like \u and \U to deal with wider characters.

    but, yeah, this is "just one of those things...".


    > TIA.
    >
    > $ gcc --version
    > gcc (Debian 4.7.2-4) 4.7.2
    > Copyright (C) 2012 Free Software Foundation, Inc.
    > This is free software; see the source for copying conditions. There is NO
    > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    >
    > $
    >
     
    BGB, Nov 19, 2012
    #5
  6. Ivan Shmakov

    BartC Guest

    "Keith Thompson" <> wrote in message
    news:...
    > Ivan Shmakov <> writes:


    >> I wonder what do the specifications say on this matter?

    > [...]
    >
    > It's not hard to find out.
    >
    > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf


    It is without a page number! That document has 700 pages.

    --
    Bartc
     
    BartC, Nov 19, 2012
    #6
  7. Ivan Shmakov

    Ike Naar Guest

    On 2012-11-19, BartC <> wrote:
    > "Keith Thompson" <> wrote in message
    > news:...
    >> Ivan Shmakov <> writes:

    >
    >>> I wonder what do the specifications say on this matter?

    >> [...]
    >>
    >> It's not hard to find out.
    >>
    >> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

    >
    > It is without a page number! That document has 700 pages.


    Fortunately it also has a table of contents.
     
    Ike Naar, Nov 19, 2012
    #7
  8. Ivan Shmakov

    Ivan Shmakov Guest

    >>>>> Keith Thompson <> writes:
    >>>>> Ivan Shmakov <> writes:


    [...]

    >> I wonder what do the specifications say on this matter?


    > It's not hard to find out.


    > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf


    ACK, thanks for the pointer!

    Curiously enough, but for the last twelve years or so, the GNU C
    Library manual, and (considerably) later POSIX, were the only
    sources I've ever needed to refer to to code in C.

    --
    FSF associate member #7257
     
    Ivan Shmakov, Nov 19, 2012
    #8
  9. BGB <> wrote:

    (snip)

    > this is the defined behavior for C.


    > \x will parse any number of characters provided they all look like hex,
    > and is used regardless of character size (say, if the string uses
    > wide-characters, ...).


    > this is not the case though for many other languages though, which often
    > take the interpretation that \x is followed by exactly 2 characters, and
    > add things like \u and \U to deal with wider characters.


    But \u and \U, in Java at least, do more than that, as one will find
    out if one tries to put a \u0022 into a string.

    -- glen
     
    glen herrmannsfeldt, Nov 19, 2012
    #9
  10. Ivan Shmakov

    BGB Guest

    On 11/19/2012 9:31 AM, glen herrmannsfeldt wrote:
    > BGB <> wrote:
    >
    > (snip)
    >
    >> this is the defined behavior for C.

    >
    >> \x will parse any number of characters provided they all look like hex,
    >> and is used regardless of character size (say, if the string uses
    >> wide-characters, ...).

    >
    >> this is not the case though for many other languages though, which often
    >> take the interpretation that \x is followed by exactly 2 characters, and
    >> add things like \u and \U to deal with wider characters.

    >
    > But \u and \U, in Java at least, do more than that, as one will find
    > out if one tries to put a \u0022 into a string.
    >


    yeah... Java handles escapes before it parses the strings...


    this isn't really the case though for other languages which have
    borrowed the \u and \U syntax, which have (AFAICT) generally more
    interpreted it as a character escape for use within strings.


    in the case of my language, it is a special case currently handled in
    two places:
    as a character escape in a string;
    as part of an identifier.

    most everything else is limited to plain ASCII, or would involve using
    UTF-8 (my parser is kind-of hard-coded to assume UTF-8). (note that my
    project stores most strings internally as UTF-8).


    or such...
     
    BGB, Nov 19, 2012
    #10
  11. Ivan Shmakov

    Guest

    Ike Naar <> wrote:
    > On 2012-11-19, BartC <> wrote:
    > > "Keith Thompson" <> wrote in message
    > > news:...
    > >> Ivan Shmakov <> writes:

    > >
    > >>> I wonder what do the specifications say on this matter?
    > >> [...]
    > >>
    > >> It's not hard to find out.
    > >>
    > >> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

    > >
    > > It is without a page number! That document has 700 pages.

    >
    > Fortunately it also has a table of contents.


    And an index.
    --
    Larry Jones

    I think football is a sport the way ducks think hunting is a sport. -- Calvin
     
    , Nov 20, 2012
    #11
  12. Ivan Shmakov

    James Kuyper Guest

    On 11/20/2012 05:42 PM, wrote:
    > Ike Naar <> wrote:
    >> On 2012-11-19, BartC <> wrote:
    >>> "Keith Thompson" <> wrote in message
    >>> news:...
    >>>> Ivan Shmakov <> writes:
    >>>
    >>>>> I wonder what do the specifications say on this matter?
    >>>> [...]
    >>>>
    >>>> It's not hard to find out.
    >>>>
    >>>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
    >>>
    >>> It is without a page number! That document has 700 pages.

    >>
    >> Fortunately it also has a table of contents.

    >
    > And an index.


    All of which can be quite useless to someone unfamiliar with the
    document who's uncertain what words the standard uses in connection with
    the relevant rule.

    For the record, the relevant grammar rule is in 6.4.5, "String
    literals", paragraph 1 on page 70. However, the key part of that rule is
    "escape sequence", and you need to go back to 6.4.4.4p1 to learn the
    specific grammar rules for escape sequences that actually explain this
    behavior.
     
    James Kuyper, Nov 20, 2012
    #12
  13. Keith Thompson <> writes:

    > It's not hard to find out.
    >
    > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf


    Yeah, it's quite obvious that a file called n1570.pdf contains
    the ISO C standard. ;-) (Actually it's a draft.)

    Sorry for the sarcasm, but seriously, why isn't it called
    something like c-language-standard-2011.pdf?
    Or c-language-standard-201x-version-1570.pdf?

    Is there a simple algorithm to find the latest C standard?
    Or the latest draft of the latest C standard?
    Or, for extra credit, the latest draft of the latest commonly
    implemented C standard?
    The only algorithm I know is "ask somebody who knows".

    - Bob
     
    Robert A Duff, Nov 24, 2012
    #13
  14. Robert A Duff <> wrote:
    > Keith Thompson <> writes:


    >> It's not hard to find out.


    >> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf


    > Yeah, it's quite obvious that a file called n1570.pdf contains
    > the ISO C standard. ;-) (Actually it's a draft.)


    > Sorry for the sarcasm, but seriously, why isn't it called
    > something like c-language-standard-2011.pdf?
    > Or c-language-standard-201x-version-1570.pdf?


    Because then you would have to pay big bucks
    (or Euros) for it.

    > Is there a simple algorithm to find the latest C standard?
    > Or the latest draft of the latest C standard?
    > Or, for extra credit, the latest draft of the latest commonly
    > implemented C standard?
    > The only algorithm I know is "ask somebody who knows".



    -- glen
     
    glen herrmannsfeldt, Nov 24, 2012
    #14
  15. Ivan Shmakov

    James Kuyper Guest

    On 11/23/2012 07:52 PM, Robert A Duff wrote:
    > Keith Thompson <> writes:
    >
    >> It's not hard to find out.
    >>
    >> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

    >
    > Yeah, it's quite obvious that a file called n1570.pdf contains
    > the ISO C standard. ;-) (Actually it's a draft.)
    >
    > Sorry for the sarcasm, but seriously, why isn't it called
    > something like c-language-standard-2011.pdf?
    > Or c-language-standard-201x-version-1570.pdf?


    Because they've got thousands of documents, and an official document
    numbering scheme, and if you're looking for a specific document you just
    look it up in an appropriate list.

    > Is there a simple algorithm to find the latest C standard?


    Yes, go to ISO, <http://www.iso.org/iso/home.html> or your national
    standards body <http://www.iso.org/iso/home/about/iso_members.htm>, and
    order it. A direct ISO link for the current version is:
    <http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=57853>
    but it's cheaper from INCITS:
    <http://webstore.ansi.org/RecordDetail.aspx?sku=INCITS%2FISO%2FIEC+9899-2012>

    > Or the latest draft of the latest C standard?


    The most useful link I found with a Google search was
    <http://www.iso-9899.info/wiki/The_Standard>, which contains a link to
    the latest draft, n1570.pdf, which is almost as good as the official
    standard, and free.

    > Or, for extra credit, the latest draft of the latest commonly
    > implemented C standard?


    The wiki page I gave above indicates that C90 is the most commonly
    implemented version, but does not list any free sources for that
    document. C99 is less commonly implemented, and there's a free draft
    version, n1256.pdf, which, oddly enough, is more useful than any version
    you would have to pay for.
    There was a fierce debate in this newsgroup a few years ago about
    whether C99 was commonly implemented. The key point of disagreement was
    about how complete the support for C99's new features had to be, in
    order to qualify.
    --
    James Kuyper
     
    James Kuyper, Nov 24, 2012
    #15
  16. Robert A Duff <> writes:
    > Keith Thompson <> writes:
    >
    >> It's not hard to find out.
    >>
    >> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

    >
    > Yeah, it's quite obvious that a file called n1570.pdf contains
    > the ISO C standard. ;-) (Actually it's a draft.)


    Sorry, the person I was addressing has said he has a copy of N1570, and
    most regular readers here know about it.

    (Other responders have said how one could find it, so I won't repeat
    that information.)

    [...]

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Nov 24, 2012
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page