"\x1337\xcafe"?

Ivan Shmakov · Nov 19, 2012

Somehow, I assumed that the following two definitions are
equivalent:

unsigned char buf[]
= ("\x1337\xcafe");

unsigned char buf[]
= ("\x13" "37\xca" "\xfe");

However, it turns out that \x1337 is understood as 0x1337, which
is then truncated to the size of unsigned char: 0x37. Thus, the
compiler reads the first definition as:

unsigned char buf[]
= ("\x37\xfe");

I wonder what do the specifications say on this matter?

TIA.

$ gcc --version
gcc (Debian 4.7.2-4) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$

Nobody · Nov 19, 2012

I wonder what do the specifications say on this matter?

6.4.4.4 (Character constants) p1:

...

escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
universal-character-name

...

octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit

hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit

...

6.4.5 (String Literals) p3:

The same considerations apply to each element of the sequence in a
character string literal or a wide string literal as if it were in an
integer character constant or a wide character constant, except that the
single-quote ' is representable either by itself or by the escape
sequence \', but the double-quote " shall be represented by the escape
sequence \"

Note the disparity between octal (which is limited to 3 digits) and
hexadecimal (which isn't limited).

glen herrmannsfeldt · Nov 19, 2012

Ivan Shmakov said:
Somehow, I assumed that the following two definitions are
equivalent:

unsigned char buf[]
= ("\x1337\xcafe");

unsigned char buf[]
= ("\x13" "37\xca" "\xfe");

However, it turns out that \x1337 is understood as 0x1337, which
is then truncated to the size of unsigned char: 0x37. Thus, the
compiler reads the first definition as:

unsigned char buf[]
= ("\x37\xfe");

In addition to the following post, note that C requires char
to be at least eight bits, but it can be more. In implementations
where it is more, it doesn't need to truncate.

-- glen

Keith Thompson · Nov 19, 2012

Ivan Shmakov said:
Somehow, I assumed that the following two definitions are
equivalent:

unsigned char buf[]
= ("\x1337\xcafe");

unsigned char buf[]
= ("\x13" "37\xca" "\xfe");

However, it turns out that \x1337 is understood as 0x1337, which
is then truncated to the size of unsigned char: 0x37. Thus, the
compiler reads the first definition as:

unsigned char buf[]
= ("\x37\xfe");

I wonder what do the specifications say on this matter?

[...]

It's not hard to find out.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

BGB · Nov 19, 2012

Somehow, I assumed that the following two definitions are
equivalent:

unsigned char buf[]
= ("\x1337\xcafe");

unsigned char buf[]
= ("\x13" "37\xca" "\xfe");

However, it turns out that \x1337 is understood as 0x1337, which
is then truncated to the size of unsigned char: 0x37. Thus, the
compiler reads the first definition as:

unsigned char buf[]
= ("\x37\xfe");

I wonder what do the specifications say on this matter?

this is the defined behavior for C.

\x will parse any number of characters provided they all look like hex,
and is used regardless of character size (say, if the string uses
wide-characters, ...).

this is not the case though for many other languages though, which often
take the interpretation that \x is followed by exactly 2 characters, and
add things like \u and \U to deal with wider characters.

but, yeah, this is "just one of those things...".

BartC · Nov 19, 2012

Keith Thompson said:
I wonder what do the specifications say on this matter?

Click to expand...

[...]

It's not hard to find out.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

It is without a page number! That document has 700 pages.

Ike Naar · Nov 19, 2012

Keith Thompson said:
Keith Thompson said:

I wonder what do the specifications say on this matter?

Click to expand...

[...]

It's not hard to find out.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Click to expand...

It is without a page number! That document has 700 pages.

Fortunately it also has a table of contents.

Ivan Shmakov · Nov 19, 2012

Keith Thompson said:
[...]

I wonder what do the specifications say on this matter?

Click to expand...

It's not hard to find out.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

ACK, thanks for the pointer!

Curiously enough, but for the last twelve years or so, the GNU C
Library manual, and (considerably) later POSIX, were the only
sources I've ever needed to refer to to code in C.

glen herrmannsfeldt · Nov 19, 2012

(snip)

this is the defined behavior for C.

\x will parse any number of characters provided they all look like hex,
and is used regardless of character size (say, if the string uses
wide-characters, ...).

this is not the case though for many other languages though, which often
take the interpretation that \x is followed by exactly 2 characters, and
add things like \u and \U to deal with wider characters.

But \u and \U, in Java at least, do more than that, as one will find
out if one tries to put a \u0022 into a string.

-- glen

BGB · Nov 19, 2012

(snip)

But \u and \U, in Java at least, do more than that, as one will find
out if one tries to put a \u0022 into a string.

yeah... Java handles escapes before it parses the strings...

this isn't really the case though for other languages which have
borrowed the \u and \U syntax, which have (AFAICT) generally more
interpreted it as a character escape for use within strings.

in the case of my language, it is a special case currently handled in
two places:
as a character escape in a string;
as part of an identifier.

most everything else is limited to plain ASCII, or would involve using
UTF-8 (my parser is kind-of hard-coded to assume UTF-8). (note that my
project stores most strings internally as UTF-8).

or such...

lawrence.jones · Nov 20, 2012

Ike Naar said:
Keith Thompson said:

Ivan Shmakov <[email protected]> writes:

Click to expand...

I wonder what do the specifications say on this matter?
[...]

It's not hard to find out.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Click to expand...

It is without a page number! That document has 700 pages.

Click to expand...

Fortunately it also has a table of contents.

And an index.

James Kuyper · Nov 20, 2012

Ike Naar said:
Ike Naar said:

I wonder what do the specifications say on this matter?
[...]

It's not hard to find out.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

It is without a page number! That document has 700 pages.

Click to expand...

Fortunately it also has a table of contents.

Click to expand...

And an index.

All of which can be quite useless to someone unfamiliar with the
document who's uncertain what words the standard uses in connection with
the relevant rule.

For the record, the relevant grammar rule is in 6.4.5, "String
literals", paragraph 1 on page 70. However, the key part of that rule is
"escape sequence", and you need to go back to 6.4.4.4p1 to learn the
specific grammar rules for escape sequences that actually explain this
behavior.

Robert A Duff · Nov 24, 2012

Keith Thompson said:
It's not hard to find out.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Yeah, it's quite obvious that a file called n1570.pdf contains
the ISO C standard. ;-) (Actually it's a draft.)

Sorry for the sarcasm, but seriously, why isn't it called
something like c-language-standard-2011.pdf?
Or c-language-standard-201x-version-1570.pdf?

Is there a simple algorithm to find the latest C standard?
Or the latest draft of the latest C standard?
Or, for extra credit, the latest draft of the latest commonly
implemented C standard?
The only algorithm I know is "ask somebody who knows".

- Bob

glen herrmannsfeldt · Nov 24, 2012

Yeah, it's quite obvious that a file called n1570.pdf contains
the ISO C standard. ;-) (Actually it's a draft.)

Sorry for the sarcasm, but seriously, why isn't it called
something like c-language-standard-2011.pdf?
Or c-language-standard-201x-version-1570.pdf?

Because then you would have to pay big bucks
(or Euros) for it.

Is there a simple algorithm to find the latest C standard?
Or the latest draft of the latest C standard?
Or, for extra credit, the latest draft of the latest commonly
implemented C standard?
The only algorithm I know is "ask somebody who knows".

-- glen

James Kuyper · Nov 24, 2012

Yeah, it's quite obvious that a file called n1570.pdf contains
the ISO C standard. ;-) (Actually it's a draft.)

Sorry for the sarcasm, but seriously, why isn't it called
something like c-language-standard-2011.pdf?
Or c-language-standard-201x-version-1570.pdf?

Because they've got thousands of documents, and an official document
numbering scheme, and if you're looking for a specific document you just
look it up in an appropriate list.

Is there a simple algorithm to find the latest C standard?

Yes, go to ISO, <http://www.iso.org/iso/home.html> or your national
standards body <http://www.iso.org/iso/home/about/iso_members.htm>, and
order it. A direct ISO link for the current version is:
<http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=57853>
but it's cheaper from INCITS:

Or the latest draft of the latest C standard?

The most useful link I found with a Google search was
<http://www.iso-9899.info/wiki/The_Standard>, which contains a link to
the latest draft, n1570.pdf, which is almost as good as the official
standard, and free.

Or, for extra credit, the latest draft of the latest commonly
implemented C standard?

The wiki page I gave above indicates that C90 is the most commonly
implemented version, but does not list any free sources for that
document. C99 is less commonly implemented, and there's a free draft
version, n1256.pdf, which, oddly enough, is more useful than any version
you would have to pay for.
There was a fierce debate in this newsgroup a few years ago about
whether C99 was commonly implemented. The key point of disagreement was
about how complete the support for C99's new features had to be, in
order to qualify.

Keith Thompson · Nov 24, 2012

Robert A Duff said:
Yeah, it's quite obvious that a file called n1570.pdf contains
the ISO C standard. ;-) (Actually it's a draft.)

Sorry, the person I was addressing has said he has a copy of N1570, and
most regular readers here know about it.

(Other responders have said how one could find it, so I won't repeat
that information.)

[...]

strange behaviour of realloc()	3	May 23, 2013
'strlen' : Why valgrind reports invalid read ?	12	Jan 18, 2013
Makefile doesn't detect header file change	5	Mar 20, 2012
struct padding is slower than struct packing	13	May 3, 2013
[MUDFLAP] Is sizeof(ARRAY[0]) equivalent to sizeof(*ARRAY) ?	46	Jan 9, 2013
Segmentation fault	13	Jul 22, 2006
casting constant value from float to unsigned short - compiler bugs?	36	Dec 2, 2011
print hex value of char	22	Jul 7, 2010

"\x1337\xcafe"?

Ivan Shmakov

Nobody

glen herrmannsfeldt

Keith Thompson

BGB

BartC

Ike Naar

Ivan Shmakov

glen herrmannsfeldt

BGB

lawrence.jones

James Kuyper

Robert A Duff

glen herrmannsfeldt

James Kuyper

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads