Question about comment parsing between C and C++ compiler

linq936 · May 7, 2007

Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,

a //*
//*/ b

In C and C++ compiler what does the above code trun out?

I think it is simple for C compiler, it is a/b.

But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.

I am wondering on this.

Just couple page back, it mentions that compiler has a "maximal
munch strategy". For me when the C++ compiler reads the 1st line,
there is ambiguous intepretation, it could be "a// *" or "a / /*",
then if we apply the "maximal much strategy", it should use the second
one and parse the code to

a / /*
// */ b

which is a/b.

I think I am confused at somewhere, could you shed some light?

Thanks.

pete · May 7, 2007

Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,

a //*
//*/ b

In C and C++ compiler what does the above code trun out?

I think it is simple for C compiler, it is a/b.

But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.

I am wondering on this.

Just couple page back, it mentions that compiler has a "maximal
munch strategy". For me when the C++ compiler reads the 1st line,
there is ambiguous intepretation, it could be "a// *" or "a / /*",
then if we apply the "maximal much strategy", it should use the second
one and parse the code to

a / /*
// */ b

which is a/b.

I think I am confused at somewhere, could you shed some light?

What does "maximal munch strategy" mean?

Richard Heathfield · May 7, 2007

(e-mail address removed) said:

Hi,
I am reading book <<Expert C Programming>>, it has the following
quiz,

a //*
//*/ b

In C and C++ compiler what does the above code trun out?

I think it is simple for C compiler, it is a/b.

It's fairly simple, but nowadays it is not quite as simple as you make
out. What PvdL didn't realise was that //-comments would be introduced
into C in the 1999 language revision!

But for C++ compiler, the book says it is a. The reason is "//"
makes the rest of line comment.
Yeah.

I am wondering on this.

Just couple page back, it mentions that compiler has a "maximal
munch strategy".

Grab the biggest token you can, yes.

For me when the C++ compiler reads the 1st line,

there is ambiguous intepretation, it could be "a// *" or "a / /*",

No, it can't be either of those. It could be

a / /*

or

a // *

and maximal munch dictates the second.

Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.

Keith Thompson · May 7, 2007

pete said:
What does "maximal munch strategy" mean?

It means that, when determining the next token, the compiler grabs as
many characters as possible to get a valid token.

For example, this:

x+++++y

is tokenized as

x ++ ++ + y

which results in a syntax error, even though this:

x ++ + ++ y

would result in a valid parse. (Tokenization doesn't account for
later phases.)

Rg · May 7, 2007

[...]

It means that, when determining the next token, the compiler grabs as
many characters as possible to get a valid token.

[...]

It other words, it means the lexical analyzer is greedy.

Ain't that much simpler to say?

Old Wolf · May 7, 2007

Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.

C++98 does (I'll save the OP the effort of making a new post there).

Off-topic but possibly interesting aside: C++ uses "<" and
">" like brackets in some contexts, but the maximal munch
rule has the effect that <a<b>> gets parsed unexpectedly
because the closing chevrons get tokenised as the right-shift
operator.

There's been a DR accepted to change this so that >> is not
maximally munched in this situation -- for better or worse.

Richard Heathfield · May 7, 2007

Old Wolf said:

C++98 does (I'll save the OP the effort of making a new post there).

Off-topic but possibly interesting aside: C++ uses "<" and
">" like brackets in some contexts, but the maximal munch
rule has the effect that <a<b>> gets parsed unexpectedly
because the closing chevrons get tokenised as the right-shift
operator.

I don't see why that's unexpected. Maximum munch is hardly a secret in
C, and I presume it's no secret in C++ either. I didn't know it applied
in C++, but in my C++ programming I have always conservatively assumed
that it does.

There's been a DR accepted to change this so that >> is not
maximally munched in this situation -- for better or worse.

It's for worse. Hard cases make bad law.

Richard Tobin · May 7, 2007

There's been a DR accepted to change this so that >> is not
maximally munched in this situation -- for better or worse.

[/QUOTE]

It's for worse. Hard cases make bad law.

Unless the exception proves the rule.

-- Richard

Richard Heathfield · May 8, 2007

Richard Tobin said:

It's for worse. Hard cases make bad law.

Unless the exception proves the rule.[/QUOTE]

No, not really. Exceptions are sometimes necessary, but never elegant.

Old Wolf · May 8, 2007

Old Wolf said:

It's for worse. Hard cases make bad law.

Funny situation really. I assume the DR came about because
many newbies were being tripped up by the situation; maximal
munch must be 'unintuitive' for most people. As it is for me,
I might add; my mind tends to parse a sentence in the way
that makes the most sense and I suspect others' minds work
that way too (as evinced by the fact that people can read
all sorts of mis-spelled garbage). In the <a<b>> case, the
pairs of matching chevron brackets is clearly what was intended.

Of course it makes the compiler writers' job harder too, but
C++ parsing is already so convoluted and context sensitive
that the horse has long since bolted on the idea of having
an easily-parsable syntax.

Keith Thompson · May 8, 2007

Richard Heathfield said:
Richard Tobin said:

No, not really. Exceptions are sometimes necessary, but never elegant.

Except when they are, of course.

}

Default User · May 8, 2007

Old said:
Funny situation really. I assume the DR came about because
many newbies were being tripped up by the situation; maximal
munch must be 'unintuitive' for most people.

Shortly after I had started my current position at work, I "solved"
that for a guy. Of course, I was able to do so because I'd just read
about it on clc++, but hey, never let them see behind the curtain.

Brian

Richard Heathfield · May 8, 2007

Default User said:

Old Wolf wrote:

Shortly after I had started my current position at work, I "solved"
that for a guy. Of course, I was able to do so because I'd just read
about it on clc++, but hey, never let them see behind the curtain.

It wouldn't matter if you did. With a few honorable exceptions,
comp.lang.c can be viewed as an inordinately long series of
unsuccessful attempts to persuade people to lift the curtain.

linq936 · May 8, 2007

(e-mail address removed) said:

It's fairly simple, but nowadays it is not quite as simple as you make
out. What PvdL didn't realise was that //-comments would be introduced
into C in the 1999 language revision!

Grab the biggest token you can, yes.

For me when the C++ compiler reads the 1st line,

No, it can't be either of those. It could be

a / /*

or

a // *

and maximal munch dictates the second.

Whether C++ actually has a maximal munch rule is a question that our
friends in comp.lang.c++ would undoubtedly be able to answer.

We are same at what are the options for the parser, namely 2 options
are, in your format,

a / /*

or

a // *

The reason I think "maximal much strategy" should take option 1 is, I
think, if parser sees /* then it would take it all the way to the
matching */ and take the whole thing together as one token, this
definitely has more characters than option 2.

I do not check the compiler parser implementation and whether the
standard mandates this, so this could be compiler dependent?

Thanks.

Richard Heathfield · May 8, 2007

(e-mail address removed) said:

We are same at what are the options for the parser, namely 2 options
are, in your format,
The reason I think "maximal much strategy" should take option 1 is, I
think, if parser sees /* then it would take it all the way to the
matching */ and take the whole thing together as one token, this
definitely has more characters than option 2.

6.4.9 Comments
1 Except within a character constant, a string literal, or a comment,
the characters /* introduce a comment. The contents of such a comment
are examined only to identify multibyte characters and to find the
characters */ that terminate it.69)
2 Except within a character constant, a string literal, or a comment,
the characters // introduce a comment that includes all multibyte
characters up to, but not including, the next new-line character. The
contents of such a comment are examined only to identify multibyte
characters and to find the terminating new-line character.

As you can see if you read carefully, //* falls within para 2, not para
1. The characters // are encountered first, so they fall within the
purview of para 2 before we get as far as the * which would otherwise
have invoked para 1.

Or, if you prefer, we can think of it in maximum munch terms again.
Maximum munch is not predictive. We don't say "which parse will give us
the biggest tokens possible?" but "starting with and including THIS
CHARACTER, what is the biggest token we can grab?"

And thus we take // rather than / /*, because // is bigger than /
whichever way you slice it.

I do not check the compiler parser implementation and whether the
standard mandates this, so this could be compiler dependent?

No. Your decision not to check whether the Standard mandates a given
behaviour does not affect the wording of the Standard or the
conformance of implementations to that Standard.

linq936 · May 8, 2007

(e-mail address removed) said:

6.4.9 Comments
1 Except within a character constant, a string literal, or a comment,
the characters /* introduce a comment. The contents of such a comment
are examined only to identify multibyte characters and to find the
characters */ that terminate it.69)
2 Except within a character constant, a string literal, or a comment,
the characters // introduce a comment that includes all multibyte
characters up to, but not including, the next new-line character. The
contents of such a comment are examined only to identify multibyte
characters and to find the terminating new-line character.

As you can see if you read carefully, //* falls within para 2, not para
1. The characters // are encountered first, so they fall within the
purview of para 2 before we get as far as the * which would otherwise
have invoked para 1.

Or, if you prefer, we can think of it in maximum munch terms again.
Maximum munch is not predictive. We don't say "which parse will give us
the biggest tokens possible?" but "starting with and including THIS
CHARACTER, what is the biggest token we can grab?"

And thus we take // rather than / /*, because // is bigger than /
whichever way you slice it.

No. Your decision not to check whether the Standard mandates a given
behaviour does not affect the wording of the Standard or the
conformance of implementations to that Standard.

That is it.

Thanks for the elaboration.

C++ SSE and SSE2 compiler settings, and their Floating Point effects.	0	May 31, 2022
Noob question about mathematical addition vs. "string addition" in C#	1	Mar 6, 2022
About as basic "Newbie-Question" that you can get.	3	Sep 4, 2023
Filter sober in c++ don't pass test	0	Dec 2, 2023
Meta-C question about header order	65	Apr 12, 2014
[C Language] Need help transferring Linux CodeBlocks Project to Windows CodeBlocks Project	1	Jun 19, 2023
Completing the RDC Developer Framework (and RDC (C-like) compiler)	0	Feb 15, 2014
Minimal C++ compiler	0	Dec 18, 2012

Question about comment parsing between C and C++ compiler

linq936

pete

Richard Heathfield

Keith Thompson

Rg

Old Wolf

Richard Heathfield

Richard Tobin

Richard Heathfield

Old Wolf

Keith Thompson

Default User

Richard Heathfield

linq936

Richard Heathfield

linq936

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads