number formats

J

James Brown

All,

this is a bit of an odd question but please bear with me:

Suppose I have the following (bad) C expression:

unsigned int x = 0xABCDEFg;

Note the illegal 'g' at the end of the hex-literal. My question is, what
would the expected behavior of an ANSI-C compiler be in this case? I would
expect it either to say something along the lines of "illegal suffix on
number 0xABCDEF" or "unexpected identifier 'g' "

Is there an expected, 'correct' way for the compiler to deal with this
scenario? In other words, if I was writing a simple C-parser (which I am),
what would be the proper way to deal with this?

thanks,
James
 
R

Richard Heathfield

James Brown said:

Is there an expected, 'correct' way for the compiler to deal with
[unsigned int x = 0xABCDEFg; ]

It's a syntax error. The implementation must emit at least one diagnostic
message for any translation unit containing any syntax errors or constraint
violations.
In other words, if I was writing a simple C-parser (which I am),
what would be the proper way to deal with this?

Emit a diagnostic message.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.
 
E

Eric Sosman

James Brown wrote On 11/13/06 17:49,:
All,

this is a bit of an odd question but please bear with me:

Suppose I have the following (bad) C expression:

unsigned int x = 0xABCDEFg;

Note the illegal 'g' at the end of the hex-literal. My question is, what
would the expected behavior of an ANSI-C compiler be in this case? I would
expect it either to say something along the lines of "illegal suffix on
number 0xABCDEF" or "unexpected identifier 'g' "

Is there an expected, 'correct' way for the compiler to deal with this
scenario? In other words, if I was writing a simple C-parser (which I am),
what would be the proper way to deal with this?

It depends on the purposes of your parser. If you were
writing a full-blown C compiler, it would parse 0xABCDEFg as
a preprocessing token and later on would issue a diagnostic
when unable to convert that preprocessing token to a token.
(Note that the grammar for pp-numbers matches all manner of
nonsense: 1.TWO.3, for example. Such things are invalidated
on semantic rather than syntactic grounds.)
 
B

Ben Pfaff

Eric Sosman said:
(Note that the grammar for pp-numbers matches all manner of
nonsense: 1.TWO.3, for example. Such things are invalidated
on semantic rather than syntactic grounds.)

I would think that this would qualify as a lexical error. It
occurs early in translation phase 7, when pp-tokens are converted
to tokens. Syntactic and semantic analysis happens after that,
although still in the same phase.
 
D

David Wade

James Brown said:
All,

this is a bit of an odd question but please bear with me:

Suppose I have the following (bad) C expression:

unsigned int x = 0xABCDEFg;

Note the illegal 'g' at the end of the hex-literal. My question is, what
would the expected behavior of an ANSI-C compiler be in this case? I would
expect it either to say something along the lines of "illegal suffix on
number 0xABCDEF" or "unexpected identifier 'g' "

How can you be sure that its the "g" thats wrong? There could be a missing
"+" between the "0" and the "X", Some one could have used a lower case "x"
instead of a "*", there could be an operator missing between the "F" and the
"g". All you can say is that its a syntax error.
 
J

James Brown

Eric Sosman said:
James Brown wrote On 11/13/06 17:49,:

It depends on the purposes of your parser. If you were
writing a full-blown C compiler, it would parse 0xABCDEFg as
a preprocessing token and later on would issue a diagnostic
when unable to convert that preprocessing token to a token.
(Note that the grammar for pp-numbers matches all manner of
nonsense: 1.TWO.3, for example. Such things are invalidated
on semantic rather than syntactic grounds.)

ok thanks, so I think what you (and Richard) are saying is that as long as
an appropriate error is issued, it doesn't really matter. If I class it as a
'bad number' syntax error then this is fine, and likewise reporting that an
'identifier follows a number-literal' is also suitable. And the reason is,
it totally depends on what stage I find/classify the error in my compiler? I
guess what I was trying to get at was, what is the most appropriate message
to give the user:

1# treat '0xABCDEFg' as a single unit (malformed integer constant),
2# treat '0xABCDEFg' in the same way as I would treat: '0xABCDEF'
<whitespace> 'g', because my lexer knows to stop processing hex-digits when
it finds the first non-digit (the 'g') and it return two tokens representing
the hex-part and the 'g'.

I'll go with option#1, seems more natural to me at least.

James
 
R

Random832

2006-11-13 said:
ok thanks, so I think what you (and Richard) are saying is that as long as
an appropriate error is issued, it doesn't really matter. If I class it as a
'bad number' syntax error then this is fine, and likewise reporting that an
'identifier follows a number-literal' is also suitable. And the reason is,
it totally depends on what stage I find/classify the error in my compiler? I
guess what I was trying to get at was, what is the most appropriate message
to give the user:

1# treat '0xABCDEFg' as a single unit (malformed integer constant),
2# treat '0xABCDEFg' in the same way as I would treat: '0xABCDEF'
<whitespace> 'g', because my lexer knows to stop processing hex-digits when
it finds the first non-digit (the 'g') and it return two tokens representing
the hex-part and the 'g'.

I'll go with option#1, seems more natural to me at least.

Also, if you use option #2, you might forget to handle 0xE+1 as
a malformed number constant, since it _would_ be valid if you split it
up (which you're not allowed to do)
 
E

Eric Sosman

James said:
[...]
1# treat '0xABCDEFg' as a single unit (malformed integer constant),
2# treat '0xABCDEFg' in the same way as I would treat: '0xABCDEF'
<whitespace> 'g', because my lexer knows to stop processing hex-digits when
it finds the first non-digit (the 'g') and it return two tokens representing
the hex-part and the 'g'.

I'll go with option#1, seems more natural to me at least.

Your instincts are good. Consider

#define g + 42
unsigned int x = 0xABCDEFg;

.... as it would be treated under the two options. #2 would
separate the `g' and lead to `= 0xABCDEF + 42', while #1 would
group the `g' with the rest and eventually toss an error for
an ill-formed constant. That's what a C compiler does, so
that's what your parser should imitate.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top