macros

borophyll · Aug 23, 2007

Hi

Can anyone explain why the output of the following code:

#define A 1
#define B 2
A.B

after preprocessing only, is:

1 . 2

Why does the preprocessor put spaces between each character, when the
macro invocation has none?

Regards, B

Eric Sosman · Aug 23, 2007

Hi

Can anyone explain why the output of the following code:

#define A 1
#define B 2
A.B

after preprocessing only, is:

1 . 2

Why does the preprocessor put spaces between each character, when the
macro invocation has none?

The preprocessor is often -- and incorrectly -- said to
operate on the text of the program's source code. In fact,
it operates on tokens (formally, "preprocessing tokens")
that have been derived from the text. These tokens simply
come one after another, pretty maids all in a row; they do
not need to be separated by spaces. Equally, though, they
do not magically combine with each other just because they
happen to be adjacent.

So: By the time the sample code reaches the preprocessor,
it has been divided into three tokens which we may represent as

A
.
B

Macro substitution replaces A and B with other tokens -- one
apiece, in this case -- leading to

1
.
2

They are still three distinct and separate tokens, and do
not somehow collapse into a single 1.2 token.

Similarly,

#define S /
#define A *
S* A/

.... is a syntax error, not a comment. Macro substitution
operates on already-scanned tokens, not on text that will
later be scanned into tokens.

As for the spaces you see, well, that's just a sort of
artifact introduced for the sake of convenience. Remember,
the preprocessor operates on tokens and not on text. What
you see is the compiler's attempt to present these tokens in
a readable form. But this form can only be an approximation,
because the process that converts source characters to tokens
is not always invertible. That is one of the reasons the
Standard does not specify what the preprocessor's output looks
like, nor even that the output must be available outside the
compiler itself.

CBFalconer · Aug 23, 2007

Can anyone explain why the output of the following code:

#define A 1
#define B 2
A.B

after preprocessing only, is:

1 . 2

Why does the preprocessor put spaces between each character, when
the macro invocation has none?

Because that is specified by the C standard. As a purely practical
matter, it keeps individual words separated in macros.

pete · Aug 23, 2007

Hi

Can anyone explain why the output of the following code:

#define A 1
#define B 2
A.B

after preprocessing only, is:

1 . 2

Why does the preprocessor put spaces between each character, when the
macro invocation has none?

Why did you put a dot in between A and B?

What should AB mean?

Keith Thompson · Aug 23, 2007

Eric Sosman said:
The preprocessor is often -- and incorrectly -- said to
operate on the text of the program's source code. In fact,
it operates on tokens (formally, "preprocessing tokens")
that have been derived from the text. These tokens simply
come one after another, pretty maids all in a row; they do
not need to be separated by spaces. Equally, though, they
do not magically combine with each other just because they
happen to be adjacent.

[...]

Quite correct.

However, some preprocessors are *implemented* as text-to-text
translators; they take input text, analyze it as a sequence of
preprocessor tokens, and generate output text that is then analyzed by
later compiler phases a a sequence of tokens.

In this case, the preprocessor implementation inserts blanks between
'1', '.', and '2' precisely so that the next phase will see them as
distinct tokens, rather than as a single token '1.2' (a floating
constant).

The fact that the preprocessor implementation allows you to see its
output is just an extra feature, not required by the language. It
could just as correctly have produced a sequence of tokens in some
binary form, as long as the next compilation phase is able to
interpret that form as tokens.

(Historically, of course, the preprocessor was a separate program that
did text processing, as it still is in some implementations. This
stuff about "preprocessor tokens" and "tokens" is a formalization of
what it does; text-to-text processing is now just one way to implement
that formalized process.)

borophyll · Aug 23, 2007

Because that is specified by the C standard.

Care to quote which part?

regards, B

Ivan Gotovchits · Aug 23, 2007

Hi

Can anyone explain why the output of the following code:

#define A 1
#define B 2
A.B

after preprocessing only, is:

1 . 2

Why does the preprocessor put spaces between each character, when the
macro invocation has none?

Regards, B

may be you want this:
A##.##B /* No dots will be */

CBFalconer · Aug 23, 2007

Keith said:
.... snip ...

The preprocessor is often -- and incorrectly -- said to operate
on the text of the program's source code. In fact, it operates
on tokens (formally, "preprocessing tokens") that have been
derived from the text. These tokens simply come one after
another, pretty maids all in a row; they do not need to be
separated by spaces. Equally, though, they do not magically
combine with each other just because they happen to be adjacent.

Click to expand...

[...]

Quite correct.

This sounds as if you are agreeing, which I trust is not so. The
point is that the tokens may be alphabetical (or numeric) strings,
and need to remain separated. For example:

#define foo((a), (b)) (a) (b)
....
foo(sizeof, char) ---> sizeof char
or ...> sizeofchar

which have much different meanings.

Laurent Deniau · Aug 23, 2007

may be you want this:
A##.##B /* No dots will be */

This doesn't work and gives 1##.##2

#define AB(a,b) AB_(a,b)
#define AB_(a,b) a ## . ## b

AB(A,B) -> 1.2

a+, ld.

borophyll · Aug 23, 2007

Are the # and ## operators only allowed within a macro definition?

B.

Kenneth Brody · Aug 23, 2007

Hi

Can anyone explain why the output of the following code:

#define A 1
#define B 2
A.B

after preprocessing only, is:

1 . 2

Why does the preprocessor put spaces between each character, when the
macro invocation has none?

Strange... My compiler's preprocessor outputs "1.B". (No spaces,
but it failed to expand "B".) Is it broken?

Note that it does replace "A->B" with "1->2", which is why I may not
have noticed this before. I do have code with macros for struct
entries, but they're probably all within pointers.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>

Keith Thompson · Aug 23, 2007

Kenneth Brody said:
Strange... My compiler's preprocessor outputs "1.B". (No spaces,
but it failed to expand "B".) Is it broken?

Probably, but it's hard to be certain by looking at the output of the
preprocessor, which is normally internal to the compiler. The token
sequence 1 . 2 can't appear in a legal program; if the behavior
doesn't affect any legal programs, and doesn't cause the compiler to
fail to issue a diagnostic for some program with a syntax error or
constraint violation, then strictly speaking it's not a bug as far as
the C standard is concerned. (If your preprocessor has an additional
requirement to generate textual output, it may be a bug if it doesn't
meet that requirement correctly.)

Try the following program. It should print "ok".

#define A a
#define B b
int main(void)
{
struct { int b; } a;
a.b = 42;
if (A.B == 42) {
puts("ok");
}
else {
puts("bug");
}
return 0;
}

But here A and B expand to identifiers, not integer constants, so if
there is a bug this may not trigger it.

Harald van =?UTF-8?B?RMSzaw==?= · Aug 23, 2007

CBFalconer said:
Because that is specified by the C standard.

The C standard does not require or even acknowledge any externally visible
intermediate forms between translation phases.

CBFalconer · Aug 23, 2007

Kenneth said:
Strange... My compiler's preprocessor outputs "1.B". (No spaces,
but it failed to expand "B".) Is it broken?

Your compiler is broken. What is it?

Eric Sosman · Aug 24, 2007

Kenneth said:
Strange... My compiler's preprocessor outputs "1.B". (No spaces,
but it failed to expand "B".) Is it broken?

No. The only required output is a diagnostic; anything
else is outside the scope of the Standard, hence not "broken"
by appeal to the Standard's requirements.

borophyll · Aug 24, 2007

I've just noted another funny thing. Why does

#define A stdio
#define B h
A.B

produce stdio.h and not stdio . h? Would it not see these as three
individual tokens just as in the 1 . 2 example? Why does it combine
them in this instance and not the other? I suspect it is because
there is no ambiguity with stdio.h, which cannot possibly be confused
as a single token in the main compilation phase, whereas 1.2 can??

regards, B.

Keith Thompson · Aug 24, 2007

I've just noted another funny thing. Why does

#define A stdio
#define B h
A.B

produce stdio.h and not stdio . h? Would it not see these as three
individual tokens just as in the 1 . 2 example? Why does it combine
them in this instance and not the other? I suspect it is because
there is no ambiguity with stdio.h, which cannot possibly be confused
as a single token in the main compilation phase, whereas 1.2 can??

Once again, the preprocessor (as far as the standard is concerned)
produces a sequence of tokens that are processed by later compilation
phases. The standard doesn't specify how this sequence of tokens is
represented.

One common implementation (but not the only one) is for the
preprocessor to generate text as output, and for the later compiler
phases to parse the preprocessor's output just as if it were C source
code (i.e., if the preprocessor's input is legal C, its output is also
legal C). In that representation,
stdio.h
and
stdio . h
are exactly equivalent. But given
#define A 1
#define B 2
A.B
the outputs
1.2
and
1 . 2
are different (one token vs. three), so a text-based preprocessor has
to insert blanks to ensure that later compiler phases see the correct
token sequences.

CBFalconer · Aug 24, 2007

Keith said:
.... snip ...

One common implementation (but not the only one) is for the
preprocessor to generate text as output, and for the later
compiler phases to parse the preprocessor's output just as if it
were C source code (i.e., if the preprocessor's input is legal C,
its output is also legal C). In that representation,
stdio.h
and
stdio . h
are exactly equivalent. But given
#define A 1
#define B 2
A.B
the outputs
1.2
and
1 . 2
are different (one token vs. three), so a text-based preprocessor
has to insert blanks to ensure that later compiler phases see the
correct token sequences.

Nicely explained. I have always been basing my attitude purely on
the textual version, which was short sighted.

Kenneth Brody · Aug 24, 2007

Kenneth Brody wrote:
[...]

#define A 1
#define B 2
A.B

Click to expand...

[...]
Strange... My compiler's preprocessor outputs "1.B". (No spaces,
but it failed to expand "B".) Is it broken?

Stranger still. Given:

#define A 111
#define B 222
#define C foo

A.B
C.B

This expands:

A.B --> 111.B
C.B --> foo.222

(In case anyone cares, this is MSVC 6.0a, aka "version 12.00.8804
for 80x86". I have other compilers on other systems, but this is
the one I'm in front of for the nonce.)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>

Keith Thompson · Aug 24, 2007

Kenneth Brody said:
Kenneth Brody wrote:
[...]

#define A 1
#define B 2
A.B

Click to expand...

[...]
Strange... My compiler's preprocessor outputs "1.B". (No spaces,
but it failed to expand "B".) Is it broken?

Click to expand...

Stranger still. Given:

#define A 111
#define B 222
#define C foo

A.B
C.B

This expands:

A.B --> 111.B
C.B --> foo.222

(In case anyone cares, this is MSVC 6.0a, aka "version 12.00.8804
for 80x86". I have other compilers on other systems, but this is
the one I'm in front of for the nonce.)

Interesting. As far as I can tell, we have yet to see a case where
the unexpected preprocessor output can affect the legality of a
program. Perhaps the preprocessor is clever enough to give up in
cases where it knows that its output is going to be syntactically
illegal.

macros	9	Oct 4, 2007
Endianness macros	48	Aug 23, 2013
macros again	1	Aug 24, 2007
Macros	16	Nov 28, 2006
Using python recursion to calculate the Parenthesis part not working	4	Feb 5, 2023
How to position the tooltip comment on these buttons?	9	Nov 4, 2023
macros that build macros	6	Sep 11, 2007
cant understand this code	3	Feb 24, 2014

macros

borophyll

Eric Sosman

CBFalconer

pete

Keith Thompson

borophyll

Ivan Gotovchits

CBFalconer

Laurent Deniau

borophyll

Kenneth Brody

Keith Thompson

Harald van =?UTF-8?B?RMSzaw==?=

CBFalconer

Eric Sosman

borophyll

Keith Thompson

CBFalconer

Kenneth Brody

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads