is there a command that can take a C source code as input and outputa token tree

L

learner1020

I know gcc does compiling by converting a C source code into a token
tree, but I don't if there is a command options to make it output just
token tree (in, say, xml format).

Thanks in advance.
 
N

Nobody

I know gcc does compiling by converting a C source code into a token
tree, but I don't if there is a command options to make it output just
token tree (in, say, xml format).

Not in XML. You can use e.g. -fdump-tree-original-raw to get the parse
tree as a list of nodes.
 
J

Jorgen Grahn

Not in XML. You can use e.g. -fdump-tree-original-raw to get the parse
tree as a list of nodes.

And if I recall correctly there are people experimenting with the gcc
source code in this area. People are interested in using gcc as a C++
parser for use in static analysis, because it's so hard to write one
from scratch. (This might not apply to the C compiler; I don't know
much about this.)

/Jorgen
 
G

Gene

I know gcc does compiling by converting a C source code into a token
tree, but I don't if there is a command options to make it output just
token tree (in, say, xml format).

Thanks in advance.

If you are not tied to gcc, look at clang. I recall one of the
project's threads is to emit abstract syntax trees as XML for C,
Objective-C, and C++. Don't know where that effort stands. This is a
new build with benefit of "going to school" on gcc and lots of recent
research and experience. The code looks much easier to get a handle on
than gcc's.
 
B

BGB / cr88192

Jorgen Grahn said:
And if I recall correctly there are people experimenting with the gcc
source code in this area. People are interested in using gcc as a C++
parser for use in static analysis, because it's so hard to write one
from scratch. (This might not apply to the C compiler; I don't know
much about this.)

parsing C is not particularly difficult...

a few kloc of code can do the trick, although it may be a little work to
understand how to write it (it helps to first have experience with simpler
languages, like Scheme and JavaScript, as each will give the experience and
a foundation to build on).


(the real evils are deeper in the compiler internals...).

if my server were up right now (it is down recently because internet
bandwidth here is too limited and others complain if I "waste" the bandwidth
over something so trivial as having a webserver running...), I could post a
link to my parser, which can parse C (and also Java and C#), and emits an
XML-based AST (not a token-tree / CST though, if this is what the OP
wanted).


personally my bias is to avoid things like parser generators, as to me they
seem like more of a trick to make people *think* they are making the task
easier for themselves, but setting themselves up for much pain once they get
past simple languages, and into languages with all sorts of bizarre stuff
going on (such as tokens which may or may not exist or may be parsed
differently depending on context, as may exist in languages such as C++ or
C#, or syntax which is ambiguous apart from knowing prior declarations,
such as in C and C++, ...).

personally, I am a fan of hand-written recursive descent, as IME it seems to
work fairly well, and I just haven't really run into problems where parser
generators would seem to be the right tool for the job.

a lexer may make sense to generate from a tool, although personally I don't
really think this is necessary either.

or such...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top