is 0xE-2 a valid expression?

Francois Grieu · Oct 15, 2009

Hello,

one of my compiler [gcc (GCC) 3.4.5 (mingw-vista special r3)]
barks at this program:

#include <stdio.h>
int main(void)
{
printf("%X\n",0xE-2);
return 0;
}

with 'invalid suffix "-2" on integer constant'.

If I change "0xE-2" to "0xE -2" or to "0xF-3", things work.
It appears the compiler goes in the "this is a double constant
with exponent" mood. Is this compiler broken ?

TIA,
François Grieu

Tim Rentsch · Oct 15, 2009

Francois Grieu said:
Hello,

one of my compiler [gcc (GCC) 3.4.5 (mingw-vista special r3)]
barks at this program:

#include <stdio.h>
int main(void)
{
printf("%X\n",0xE-2);
return 0;
}

with 'invalid suffix "-2" on integer constant'.

If I change "0xE-2" to "0xE -2" or to "0xF-3", things work.
It appears the compiler goes in the "this is a double constant
with exponent" mood. Is this compiler broken ?

http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf

See section 6.4.4.2. Note especially the syntax in
paragraph 1.

jacob navia · Oct 15, 2009

Francois Grieu a écrit :

Hello,

one of my compiler [gcc (GCC) 3.4.5 (mingw-vista special r3)]
barks at this program:

#include <stdio.h>
int main(void)
{
printf("%X\n",0xE-2);
return 0;
}

with 'invalid suffix "-2" on integer constant'.

If I change "0xE-2" to "0xE -2" or to "0xF-3", things work.
It appears the compiler goes in the "this is a double constant
with exponent" mood. Is this compiler broken ?

TIA,
François Grieu

Mmmm lcc-win is also broken. In my opinion this should be parsed
correctly, it is a normal expression:
0xE - 2

It can't be an hexadecimal floating constant because it is missing
the point and the "P"

Francois Grieu · Oct 15, 2009

Tim Rentsch a écrit :

Francois Grieu said:
Francois Grieu said:

Hello,

one of my compiler [gcc (GCC) 3.4.5 (mingw-vista special r3)]
barks at this program:

#include <stdio.h>
int main(void)
{
printf("%X\n",0xE-2);
return 0;
}

with 'invalid suffix "-2" on integer constant'.

If I change "0xE-2" to "0xE -2" or to "0xF-3", things work.
It appears the compiler goes in the "this is a double constant
with exponent" mood. Is this compiler broken ?

Click to expand...

http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf

See section 6.4.4.2. Note especially the syntax in
paragraph 1.

I can understand that the compiler parse 0xE as the beginning of
an hexadecimal-floating-constant; but there is no binary-exponent-part
after that, so this is not what the User meant; does not the standard
mandate the compiler to understand this is an hexadecimal-constant
(6.4.4.1)?

François Grieu

jacob navia · Oct 15, 2009

Francois Grieu a écrit :

I can understand that the compiler parse 0xE as the beginning of
an hexadecimal-floating-constant; but there is no binary-exponent-part
after that, so this is not what the User meant; does not the standard
mandate the compiler to understand this is an hexadecimal-constant
(6.4.4.1)?

François Grieu

Of course, why you should be forbidden to write
0xE - 2?

White space should NOT be significant!

Tim Rentsch · Oct 15, 2009

Francois Grieu said:
Tim Rentsch a @C3{A9}crit :

Francois Grieu said:

Hello,

one of my compiler [gcc (GCC) 3.4.5 (mingw-vista special r3)]
barks at this program:

#include <stdio.h>
int main(void)
{
printf("%X\n",0xE-2);
return 0;
}

with 'invalid suffix "-2" on integer constant'.

If I change "0xE-2" to "0xE -2" or to "0xF-3", things work.
It appears the compiler goes in the "this is a double constant
with exponent" mood. Is this compiler broken ?

Click to expand...

http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf

See section 6.4.4.2. Note especially the syntax in
paragraph 1.

Click to expand...

I can understand that the compiler parse 0xE as the beginning of
an hexadecimal-floating-constant; but there is no binary-exponent-part
after that, so this is not what the User meant; does not the standard
mandate the compiler to understand this is an hexadecimal-constant
(6.4.4.1)?

Sorry, I misunderstood the point of your question. It sure does
seem like a misdiagnosis on the part of the compiler. Also
present in gcc 4.2.1. The way the expression is written does
make it likely it will be misread, and maybe there should be
a warning for that, but the compiler shouldn't issue a bogus
error message (and it does show as an error on both compilers
I tried it on).

jacob navia · Oct 15, 2009

Tim Rentsch a écrit :

Sorry, I misunderstood the point of your question. It sure does
seem like a misdiagnosis on the part of the compiler. Also
present in gcc 4.2.1. The way the expression is written does
make it likely it will be misread, and maybe there should be
a warning for that, but the compiler shouldn't issue a bogus
error message (and it does show as an error on both compilers
I tried it on).

Yes, it is a bug. lcc-win had that bug too, but it doesn't have
it anymore

Francois Grieu · Oct 15, 2009

Tim Rentsch wrote :

Francois Grieu said:
Francois Grieu said:

one of my compiler [gcc (GCC) 3.4.5 (mingw-vista special r3)]
barks at this program: [quotation marks removed]

Click to expand...

#include <stdio.h>
int main(void)
{
printf("%X\n",0xE-2);
return 0;
}

with 'invalid suffix "-2" on integer constant'.

If I change "0xE-2" to "0xE -2" or to "0xF-3", things work.
It appears the compiler goes in the "this is a double constant
with exponent" mood. Is this compiler broken ?

Click to expand...

[snip]

It sure does seem like a misdiagnosis on the part of the compiler.

Sure the diagnostic makes no sense.
But is there an error in the program in the first place?

> Also present in gcc 4.2.1.
> The way the expression is written does make it likely it will
> be misread, and maybe there should be a warning for that, but
> the compiler shouldn't issue a bogus error message (and it does
> show as an error on both compilers I tried it on).

My reading of the standard is that any C compiler (with printf
support) should be just happy with my program. Many are.

François Grieu

Chris Dollin · Oct 15, 2009

jacob said:
Francois Grieu a écrit :

Of course, why you should be forbidden to write
0xE - 2?

White space should NOT be significant!

That boat sailed long ago: intx=0, a=b+++c, #define isNil (x) (x == 0).

--
"Is there a reason this is written in iambic pentameter?" Marten,
/Questionable Content/

Hewlett-Packard Limited registered office: Cain Road, Bracknell,
registered no: 690597 England Berks RG12 1HN

Francois Grieu · Oct 15, 2009

Chris Dollin a écrit :

That boat sailed long ago: intx=0, a=b+++c, #define isNil (x) (x == 0).

I agree that whitespace is often significant. But is it in that case?
And at the end of the day, is 0xE-2 a valid expression?

François Grieu

Francois Grieu · Oct 15, 2009

I said:
is 0xE-2 a valid expression?

Thinking about it, the answer is no under C99 at least.

We have (6.4 clause 4): "If the input stream has been parsed into
preprocessing tokens up to a given character, the next preprocessing
token is the longest sequence of characters that could constitute a
preprocessing token."

0xE-2 is legitimately parsed as a single preprocessing-token because:
0 is a digit, thus a pp-number; (6.4.8)
x is an identifier-nondigit, thus
0x is a pp-number of the form "pp-number identifier-nondigit";
0xE- is a pp-number of the form "pp-number E sign";
0xE-2 is a pp-number of the form "pp-number digit";
0xE-2 is a preprocessing-token of the form "pp-number"; (6.4)

The rest follows. Even the diagnostic given by GCC is arguably OK.

François Grieu

jacob navia · Oct 15, 2009

Francois Grieu a écrit :

Thinking about it, the answer is no under C99 at least.

We have (6.4 clause 4): "If the input stream has been parsed into
preprocessing tokens up to a given character, the next preprocessing
token is the longest sequence of characters that could constitute a
preprocessing token."

0xE-2 is legitimately parsed as a single preprocessing-token because:
0 is a digit, thus a pp-number; (6.4.8)
x is an identifier-nondigit, thus
0x is a pp-number of the form "pp-number identifier-nondigit";
0xE- is a pp-number of the form "pp-number E sign";
0xE-2 is a pp-number of the form "pp-number digit";
0xE-2 is a preprocessing-token of the form "pp-number"; (6.4)

The rest follows. Even the diagnostic given by GCC is arguably OK.

François Grieu

I think that is incorrect. Lcc-win gave that diagnostic before
the correction I did this morning.

If there is some obscure specs that make 0xE-2 a "pp-number",
it is plain wrong.

It would be a bug in the specs

James Kuyper · Oct 15, 2009

Francois said:
Hello,

one of my compiler [gcc (GCC) 3.4.5 (mingw-vista special r3)]
barks at this program:

#include <stdio.h>
int main(void)
{
printf("%X\n",0xE-2);
return 0;
}

with 'invalid suffix "-2" on integer constant'.

If I change "0xE-2" to "0xE -2" or to "0xF-3", things work.
It appears the compiler goes in the "this is a double constant
with exponent" mood. Is this compiler broken ?

The fundamental problem here is that C has two separate grammars; a
lexical grammar (Annex A.1) used in translation phase 4, and a phrase
structure grammar (Annex A.2) used in translation phase 7. During
translation phase 3, the source is parsed into preprocessing tokens
(5.1.1.2p3); one of the possible forms for a preprocessing token is
called a pp-number (6.4p1), and 0xE-2 meets the requirements to be
parsed as a pp-number (6.4.8p1). This is because the committee wanted to
keep translation phase 3 relatively simple, so it accepts as pp-numbers
many things that will not actually qualify as numeric constants.

During translation phase 7, preprocessing tokens are converted into
tokens (5.1.1.2p7). This process is a one-for-one transformation, so the
preprocessing token "0xE-2" cannot be converted into the three tokens
"0xE", "-", and "2", which is what you want. Parsing from left to
right, it qualifies as a hexadecimal constant right up until the parser
looks at the '-', at which point it can't be matched to any token type
listed in 6.4p1.

Therefore, you need to make sure that you don't use the '-' operator in
a way that makes it look, as far as the preprocessor is concerned, as if
it part of a pp-number.

Francois Grieu · Oct 15, 2009

jacob navia wrote :

Francois Grieu a écrit :

I think that is incorrect. Lcc-win gave that diagnostic before
the correction I did this morning.

If there is some obscure specs that make 0xE-2 a "pp-number",

There is. Quoting ISO/IEC 9899:1999

6.4.8 Preprocessing numbers
Syntax
pp-number:
digit
. digit
pp-number digit
pp-number identifier-nondigit
pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .

Description
A preprocessing number begins with a digit optionally preceded
by a period (.) and may be followed by valid identifier characters
and the character sequences e+, e-, E+, E-, p+, p-, P+, or P-.
Preprocessing number tokens lexically include all floating and
integer constant tokens.

it is plain wrong.

It would be a bug in the specs

I agree this is confusing. OTOH, at one point one has to consider
the spec is Right. And I can't find a simple change to the spec to
fix that without changing 6.4 clause 4 (quoted at the beginning of
this post).

François Grieu

Richard Tobin · Oct 15, 2009

If there is some obscure specs that make 0xE-2 a "pp-number",
it is plain wrong.

I'm not convinced of this. The standard could of course ensure that
nothing that won't be successfully be parsed as a number gets tokenised
as one, but this (a) allows visually confusing expressions like the one
shown and (b) limits compatible extensions in the future.

Other languages hav similar issues, see for example

http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node189.html

-- Richard

Dik T. Winter · Oct 15, 2009

> Chris Dollin a écrit : ....
> I agree that whitespace is often significant. But is it in that case?

Yes, when two symbols are separated by white space they can not be from a
single token.

Dik T. Winter · Oct 15, 2009

> Francois Grieu a écrit : ....
> Of course, why you should be forbidden to write
> 0xE - 2?
>
> White space should NOT be significant!

Eh? I know of three languages where white space in statements indeed was
not significant: Fortran, Algol 60 and Algol 68. The first is difficult
to parse (you need a lot of read-ahead), in other two have special notation
for the keywords.

jameskuyper · Oct 15, 2009

jacob said:
Francois Grieu a ï¿½crit :

I think that is incorrect. Lcc-win gave that diagnostic before
the correction I did this morning.

If the correction causes "0xE-2" to be parsed the same as if it were
"0xE - 2", then it renders your compiler non-conforming.

If there is some obscure specs that make 0xE-2 a "pp-number",

Did you bother reviewing how the standard specifies pp-number before
posting that message? It's right there, in precisely the most obvious
place to look for the definition of a pp-number: 6.4.8p1. Knowing that
you need to look for the definition of pp-number, rather than the
definition of an integer constant, is the tricky part of this issue.

it is plain wrong.

It would be a bug in the specs

The committee's decision to simplify the specification of pp-number by
allowing it to match things that are not actual numeric constants was
deliberate, not an accident. You have every right to disagree with
that decision, but if you care to convince them that it was a bad
decision, you had better fully understand the reasons they had for
making it.

The C Rationale says: "In the interests of keeping the description
simple, occasional spurious forms are scanned as preprocessing
numbers. For example, 0x123E+1 is a single token under the rules. The
C89 Committee felt that it was better to tolerate such anomalies than
burden the preprocessor with a more exact, and exacting, lexical
specification. It felt that this anomaly was no worse than the
principle under which the characters a+++++b are tokenized as a ++ ++
+ b (an invalid expression), even though the tokenization a ++ + ++ b
would yield a syntactically correct expression. In both cases,
exercise of reasonable precaution in coding style avoids surprises."

James Kuyper · Oct 15, 2009

Dik said:
Yes, when two symbols are separated by white space they can not be from a
single token.

Unless the token is a character constant or a string literal. It's
actually preprocessing tokens that are relevant in this case; the list
of preprocessing tokens that can include whitespace also includes header
names.

It's difficult to make simple general statements about the grammar that
are also correct.

jameskuyper · Oct 15, 2009

jameskuyper said:
If the correction causes "0xE-2" to be parsed the same as if it were
"0xE - 2", then it renders your compiler non-conforming.

Sorry - I got that backwards. Removing that diagnostic renders lcc-win
non-conforming. If, after generating the required diagnostic, your
compiler then chose to break up the pre-processing token into multiple
tokens, that would be perfectly conforming, since the standard does
not define the behaviour when a pp-number fails to parse as a valid
token in phase 7.

Is the expression 1 ? "123" : "1234" a valid one ?	6	Apr 24, 2012
+1 an invalid constant expression	13	Jan 1, 2010
Unexplained delay Module::Build + ExtUtils::MakeMaker building pureperl modules	1	Aug 8, 2012
more portable compile-time assert()	12	Jan 11, 2008
Why do i get error "error: expression must have a constant value"	10	Mar 15, 2006
A more verbose discussion of const and conversions... (part 2)	0	Nov 30, 2006
RNGs: A double KISS	10	Apr 14, 2010
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010

is 0xE-2 a valid expression?

Francois Grieu

Tim Rentsch

jacob navia

Francois Grieu

jacob navia

Tim Rentsch

jacob navia

Francois Grieu

Chris Dollin

Francois Grieu

Francois Grieu

jacob navia

James Kuyper

Francois Grieu

Richard Tobin

Dik T. Winter

Dik T. Winter

jameskuyper

James Kuyper

jameskuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads