Case - said:
The preprocessor takes several steps (called phases of translation). The first
is the translation of external encoding, universal character names, and
trigraphs into some uniform internal format. The second phase deletes all
instances of backslash immediately followed by a newline. The third phase
tokenizes the result of the above--from that point on all processing is done
with preprocessing tokens--not text--until the seventh phase (where they are
converted to regular tokens). Macro expansion, specifically, doesn't happen
until the fourth phase.
Preprocessing tokens, BYW, are not a fancy way of saying "text" either. They
are distinct tokens--with a formal grammar and everything. The main differences
between "preprocessing tokens" and "tokens" is that 1) there are more of them,
and 2) the grammar for pp-numbers is much simpler. For example, # and ## are
preprocessing tokens, but they cannot be converted to a tokens in the
translation phase seven.
What matters to me is helping the OP better understand the
level at which CPP works. The term 'text processor', fits
well in this context of explaining the basics. I'm convinced
that the newby OP is not helped much with strict definitions
in terms of 'preprocessing tokens'.
A term that is incorrect, like 'text preprocessor', is never better. All it
does is perpetuate the common myth that the preprocessor operates on text--which
it doesn't.
I admit that you reminded me that the first leads to an error,
and say thanks! However, this does not change my opinion that
the metaphoric 'kind of text processor', can help in understanding
the differences between preprocessing and compilation.
In the metaphoric sense, the compiler of the underlying language is just as much
a text processor as the preprocessor. Despite what you're intentions may have
been and despite what your level of understanding of the preprocessor may be,
simplifying something as to make it incorrect (even if it is slightly) is
misleading and causes poor understanding to continue.
Readdressing the OP's original issue... The preprocessor does not transform
"A
" to "*(A + B)". Even though the underlying language parser might do so
internally--it doesn't have to. It only has to treat the two syntaxes
equivalently. E.g. the identity relation compares equal semantically.
Further, regardless of what macro definitions exist, the preprocessing token
sequence
A
cannot be transformed via macro expansion to
*(A + B)
unless it is used as an argument in another macro invocation. E.g. it would
have to be something like this:
#define EMPTY()
#define DEFER(id) id EMPTY()
#define EAT(...)
#define A *(A + DEFER(EAT)(
#define B ) B) DEFER(EAT)(
#define M(...) __VA_ARGS__)
M( A ) // *(A + B)
In other words, it isn't easy to even when you're trying to do it. (The above
requires a pretty conformant preprocessor, BTW--like gcc).
Regards,
Paul Mensonides