"\x1337\xcafe"?

I

Ivan Shmakov

Somehow, I assumed that the following two definitions are
equivalent:

unsigned char buf[]
= ("\x1337\xcafe");

unsigned char buf[]
= ("\x13" "37\xca" "\xfe");

However, it turns out that \x1337 is understood as 0x1337, which
is then truncated to the size of unsigned char: 0x37. Thus, the
compiler reads the first definition as:

unsigned char buf[]
= ("\x37\xfe");

I wonder what do the specifications say on this matter?

TIA.

$ gcc --version
gcc (Debian 4.7.2-4) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$
 
N

Nobody

I wonder what do the specifications say on this matter?

6.4.4.4 (Character constants) p1:

...

escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
universal-character-name

...

octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit

hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit

...

6.4.5 (String Literals) p3:

The same considerations apply to each element of the sequence in a
character string literal or a wide string literal as if it were in an
integer character constant or a wide character constant, except that the
single-quote ' is representable either by itself or by the escape
sequence \', but the double-quote " shall be represented by the escape
sequence \"

Note the disparity between octal (which is limited to 3 digits) and
hexadecimal (which isn't limited).
 
G

glen herrmannsfeldt

Ivan Shmakov said:
Somehow, I assumed that the following two definitions are
equivalent:
unsigned char buf[]
= ("\x1337\xcafe");
unsigned char buf[]
= ("\x13" "37\xca" "\xfe");
However, it turns out that \x1337 is understood as 0x1337, which
is then truncated to the size of unsigned char: 0x37. Thus, the
compiler reads the first definition as:
unsigned char buf[]
= ("\x37\xfe");

In addition to the following post, note that C requires char
to be at least eight bits, but it can be more. In implementations
where it is more, it doesn't need to truncate.

-- glen
 
K

Keith Thompson

Ivan Shmakov said:
Somehow, I assumed that the following two definitions are
equivalent:

unsigned char buf[]
= ("\x1337\xcafe");

unsigned char buf[]
= ("\x13" "37\xca" "\xfe");

However, it turns out that \x1337 is understood as 0x1337, which
is then truncated to the size of unsigned char: 0x37. Thus, the
compiler reads the first definition as:

unsigned char buf[]
= ("\x37\xfe");

I wonder what do the specifications say on this matter?
[...]

It's not hard to find out.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
 
B

BGB

Somehow, I assumed that the following two definitions are
equivalent:

unsigned char buf[]
= ("\x1337\xcafe");

unsigned char buf[]
= ("\x13" "37\xca" "\xfe");

However, it turns out that \x1337 is understood as 0x1337, which
is then truncated to the size of unsigned char: 0x37. Thus, the
compiler reads the first definition as:

unsigned char buf[]
= ("\x37\xfe");

I wonder what do the specifications say on this matter?

this is the defined behavior for C.


\x will parse any number of characters provided they all look like hex,
and is used regardless of character size (say, if the string uses
wide-characters, ...).


this is not the case though for many other languages though, which often
take the interpretation that \x is followed by exactly 2 characters, and
add things like \u and \U to deal with wider characters.

but, yeah, this is "just one of those things...".
 
G

glen herrmannsfeldt

(snip)
this is the defined behavior for C.
\x will parse any number of characters provided they all look like hex,
and is used regardless of character size (say, if the string uses
wide-characters, ...).
this is not the case though for many other languages though, which often
take the interpretation that \x is followed by exactly 2 characters, and
add things like \u and \U to deal with wider characters.

But \u and \U, in Java at least, do more than that, as one will find
out if one tries to put a \u0022 into a string.

-- glen
 
B

BGB

(snip)




But \u and \U, in Java at least, do more than that, as one will find
out if one tries to put a \u0022 into a string.

yeah... Java handles escapes before it parses the strings...


this isn't really the case though for other languages which have
borrowed the \u and \U syntax, which have (AFAICT) generally more
interpreted it as a character escape for use within strings.


in the case of my language, it is a special case currently handled in
two places:
as a character escape in a string;
as part of an identifier.

most everything else is limited to plain ASCII, or would involve using
UTF-8 (my parser is kind-of hard-coded to assume UTF-8). (note that my
project stores most strings internally as UTF-8).


or such...
 
J

James Kuyper

Ike Naar said:
I wonder what do the specifications say on this matter?
[...]

It's not hard to find out.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

It is without a page number! That document has 700 pages.

Fortunately it also has a table of contents.

And an index.

All of which can be quite useless to someone unfamiliar with the
document who's uncertain what words the standard uses in connection with
the relevant rule.

For the record, the relevant grammar rule is in 6.4.5, "String
literals", paragraph 1 on page 70. However, the key part of that rule is
"escape sequence", and you need to go back to 6.4.4.4p1 to learn the
specific grammar rules for escape sequences that actually explain this
behavior.
 
R

Robert A Duff

Keith Thompson said:

Yeah, it's quite obvious that a file called n1570.pdf contains
the ISO C standard. ;-) (Actually it's a draft.)

Sorry for the sarcasm, but seriously, why isn't it called
something like c-language-standard-2011.pdf?
Or c-language-standard-201x-version-1570.pdf?

Is there a simple algorithm to find the latest C standard?
Or the latest draft of the latest C standard?
Or, for extra credit, the latest draft of the latest commonly
implemented C standard?
The only algorithm I know is "ask somebody who knows".

- Bob
 
G

glen herrmannsfeldt

Yeah, it's quite obvious that a file called n1570.pdf contains
the ISO C standard. ;-) (Actually it's a draft.)
Sorry for the sarcasm, but seriously, why isn't it called
something like c-language-standard-2011.pdf?
Or c-language-standard-201x-version-1570.pdf?

Because then you would have to pay big bucks
(or Euros) for it.
Is there a simple algorithm to find the latest C standard?
Or the latest draft of the latest C standard?
Or, for extra credit, the latest draft of the latest commonly
implemented C standard?
The only algorithm I know is "ask somebody who knows".


-- glen
 
J

James Kuyper

Yeah, it's quite obvious that a file called n1570.pdf contains
the ISO C standard. ;-) (Actually it's a draft.)

Sorry for the sarcasm, but seriously, why isn't it called
something like c-language-standard-2011.pdf?
Or c-language-standard-201x-version-1570.pdf?

Because they've got thousands of documents, and an official document
numbering scheme, and if you're looking for a specific document you just
look it up in an appropriate list.
Is there a simple algorithm to find the latest C standard?

Yes, go to ISO, <http://www.iso.org/iso/home.html> or your national
standards body <http://www.iso.org/iso/home/about/iso_members.htm>, and
order it. A direct ISO link for the current version is:
<http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=57853>
but it's cheaper from INCITS:
Or the latest draft of the latest C standard?

The most useful link I found with a Google search was
<http://www.iso-9899.info/wiki/The_Standard>, which contains a link to
the latest draft, n1570.pdf, which is almost as good as the official
standard, and free.
Or, for extra credit, the latest draft of the latest commonly
implemented C standard?

The wiki page I gave above indicates that C90 is the most commonly
implemented version, but does not list any free sources for that
document. C99 is less commonly implemented, and there's a free draft
version, n1256.pdf, which, oddly enough, is more useful than any version
you would have to pay for.
There was a fierce debate in this newsgroup a few years ago about
whether C99 was commonly implemented. The key point of disagreement was
about how complete the support for C99's new features had to be, in
order to qualify.
 
K

Keith Thompson

Robert A Duff said:
Yeah, it's quite obvious that a file called n1570.pdf contains
the ISO C standard. ;-) (Actually it's a draft.)

Sorry, the person I was addressing has said he has a copy of N1570, and
most regular readers here know about it.

(Other responders have said how one could find it, so I won't repeat
that information.)

[...]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top