Problem of finding funtion names in any C file

A

athiane

I want a way to parse out all function names that appear in a couple of
C files.
When the parsing logic finds a function name in a file, it should print
out the Function name, line number and file in which the Function was
found.

What approach should i follow to tackle this problem ?
Is there any option in the gcc compiler that prints out this
information ?
 
P

pemo

athiane said:
I want a way to parse out all function names that appear in a couple
of C files.
When the parsing logic finds a function name in a file, it should
print out the Function name, line number and file in which the
Function was found.

What approach should i follow to tackle this problem ?
Is there any option in the gcc compiler that prints out this
information ?

Don't know what you mean by 'parse out' - display in a console, provide
output/code for 'homework'?
 
M

Micah Cowan

pemo said:
Don't know what you mean by 'parse out' - display in a console, provide
output/code for 'homework'?

'parse out' has a pretty clear meaning, IMHO. It means to recognize it
apart from everything else in the file in question.

To answer the OP's question: this is not a trivial problem. In order
to do what you want, you have to write a program that understands C:
you'll need a preprocessor, a lexical scanner, and a grammar
parser. While tools like lex and yacc (flex/bison) make this easier,
it's overkill.

Is this a homework assignment? If so, please provide more context: the
problem as stated is far too difficult to be mere homework. Does the
source file have to conform to certain additional restrictions?

If this is solely for your own benefit, I strongly suggest you look
for programs that already do this (why reinvent the wheel?). The
standard "ctags" program can do something awfully close to, if not
exactly what you need. Check out http://ctags.sourceforge.net/. You
should be able to use grep, perhaps in combination with sed or awk, to
make it do /exactly/ what you want. For more information on any of
those tools (all of which are off-topic for this newsgroup), please
ask at comp.unix.programmer.

HTH.
 
J

John Tsiombikas (Nuclear / Mindlapse)

To answer the OP's question: this is not a trivial problem. In order
to do what you want, you have to write a program that understands C:
you'll need a preprocessor, a lexical scanner, and a grammar
parser. While tools like lex and yacc (flex/bison) make this easier,
it's overkill.

I might be missing something, since I haven't given much thought to the
matter admittedly, but I believe that detecting the functions should be
quite easy. I remember I did something similar for C++ and it didn't
work because my simple algorithm was also matching constructors. (so I
then did a preprocessing stage to collect everything followed by the
class keyword in a file, and exclude those, but that's way OT now).

So my idea would be to first pass the source from a pre-processor to get
rid of macros, and then match every valid symbol (sequence of characters
and numbers starting from a character), followed by a '('.

Am I missing something obvious that would ruin this?
 
K

Keith Thompson

John Tsiombikas (Nuclear / Mindlapse) said:
I might be missing something, since I haven't given much thought to the
matter admittedly, but I believe that detecting the functions should be
quite easy. I remember I did something similar for C++ and it didn't
work because my simple algorithm was also matching constructors. (so I
then did a preprocessing stage to collect everything followed by the
class keyword in a file, and exclude those, but that's way OT now).

So my idea would be to first pass the source from a pre-processor to get
rid of macros, and then match every valid symbol (sequence of characters
and numbers starting from a character), followed by a '('.

Am I missing something obvious that would ruin this?

I think that by "valid symbol (sequence of characters and numbers
starting from a character)", you really mean (or *should* mean) "valid
identifier (sequence of letters, digits, and underscores starting with
a letter or underscore)".

According to your description, you'd find all function *calls* as well
as function definitions and declarations; it's hard to tell whether
that's consistent with the requirements. You'd also find things like
sizeof(int), unless you filter out keywords.

You'll also miss some legal occurences of function names like:
(printf)("Hello, world\n");
and catch some occurrences of names of function pointer objects.

It's probably possible to all function names using some simplified
grammar. The problem is simpler if you assume the source is correct
and don't care about catching syntax errors.

On the other hand, since full C parsers already exist in the wild,
adapting one to give you the information you want is probably easier
than writinga simpler parser that does *only* what you want.
 
J

Jaspreet

athiane said:
I want a way to parse out all function names that appear in a couple of
C files.
When the parsing logic finds a function name in a file, it should print
out the Function name, line number and file in which the Function was
found.

What approach should i follow to tackle this problem ?
Is there any option in the gcc compiler that prints out this
information ?

Why reinvent the wheel when you have so many wheel manufacturers
around. You could use any of the freely available parsers (like ctags).
I have been using cscope though.

I dont know much of ctags but cscope does exactly that (print function
name, line number, file name) when it finds a function calling instance
in a file.

Would they not serve you or do you have some specific requirements ?
 
J

John Tsiombikas (Nuclear / Mindlapse)

I think that by "valid symbol (sequence of characters and numbers
starting from a character)", you really mean (or *should* mean) "valid
identifier (sequence of letters, digits, and underscores starting with
a letter or underscore)".

Yes, that's what I meant, forgot to include the _ and used the wrong
term.
According to your description, you'd find all function *calls* as well
as function definitions and declarations; it's hard to tell whether
that's consistent with the requirements. You'd also find things like
sizeof(int), unless you filter out keywords.

You'll also miss some legal occurences of function names like:
(printf)("Hello, world\n");
and catch some occurrences of names of function pointer objects.

True, you are absolutely right, I forgot about sizeof(int) and calls
through function pointers, and the (printf)("Hello, world\n"); wouldn't
even cross my mind, I had to look twice to realize you call a function
there, most unusual but valid :)

So it is indeed more invovled than I thought initially.
 
C

CBFalconer

John Tsiombikas (Nuclear / Mindlapse) said:
I might be missing something, since I haven't given much thought
to the matter admittedly, but I believe that detecting the
functions should be quite easy. I remember I did something
similar for C++ and it didn't work because my simple algorithm
was also matching constructors. (so I then did a preprocessing
stage to collect everything followed by the class keyword in a
file, and exclude those, but that's way OT now).

So my idea would be to first pass the source from a pre-processor
to get rid of macros, and then match every valid symbol (sequence
of characters and numbers starting from a character), followed by
a '('.

Am I missing something obvious that would ruin this?

Don't bother with the macro sweep. Then you will be including
functional macros. My xrefc program does this, and tags all
functional refs with a terminal () in the symbol table. That
distinguishes them from pointer references, but the results are
adjacent in the output. If, in addition, you organize the code
properly (definition before use) the first occurance of the
function name is that definition. Handy. You don't bother
following include files.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
N

Netocrat

"John Tsiombikas (Nuclear / Mindlapse)" <[email protected]> writes:
[re a simple parser to extract a list of function names from C source]
I think that by "valid symbol (sequence of characters and numbers
starting from a character)", you really mean (or *should* mean) "valid
identifier (sequence of letters, digits, and underscores starting with
a letter or underscore)".

According to your description, you'd find all function *calls* as well
as function definitions and declarations; it's hard to tell whether
that's consistent with the requirements. You'd also find things like
sizeof(int), unless you filter out keywords.

You'll also miss some legal occurences of function names like:
(printf)("Hello, world\n");
and catch some occurrences of names of function pointer objects.

You could also get false positives from string literals:
char help_msg[] = "Declare the prototype as int somefunc(int);";
 
C

CBFalconer

Netocrat said:
Keith Thompson wrote:
.... snip on parsing function names ...
You'll also miss some legal occurences of function names like:
(printf)("Hello, world\n");
and catch some occurrences of names of function pointer objects.

You could also get false positives from string literals:
char help_msg[] = "Declare the prototype as int somefunc(int);";

Not if you make the lexical scanner handle complete strings in the
first place. Pseudocode:

while (EOF != (ch = getnextch())) {
switch (chclass(ch)) {
case alpha:
case '_': acquireid(); break;
case '"': acquirestring(); break;
case ' ': break;
.....
}
}


--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
A

athif2

I want to be able to find function names from any C file and use that
information to form a hierarchical function call tree, with details on
which line number, file name the function is in. I would like it to
print this function call tree to the console prompt or a file.
Eg;
C file, test.c:

1 int func1()
2 {
3 func3();
4 }
5
6 int func2()
7 {
8 func4();
9 func5();
10 }
11
12
13 int main()
14 {
15
16 func1();
17 func2();
18
19 return 0;
20 }

The function call tree after processing above C file should be :

+main() (line=13, file=test.c)
|_____func1(line=16, file=test.c)
| |_____func3(line=3, file=test.c)
|
|_____func2(line=17, file=test.c)
|_____func4(line=8, file=test.c)
|_____func5(line=9, file=test.c)

If already some tool provides such a listing it would fit my need ?
Any shortcuts to tackle this problem are welcome.
 
R

Rod Pemberton

I want to be able to find function names from any C file and use that
information to form a hierarchical function call tree, with details on
which line number, file name the function is in. I would like it to
print this function call tree to the console prompt or a file.

As Jaspreet pointed out, there are a large number of parsers available.
Some of the specialized ones are check, cproto, cdecl, ctool, cxref, etc. I
think cxref is one of the ones you want. I recall there being another
program on DECUS which would build a complete searchable database from your
source, but I don't know what it was called. Most of these are available
from comp.sources.unix or DECUS.

comp.sources.unix, cxref is in volume1:
http://ftp.sunet.se/pub/usenet/ftp.uu.net/comp.sources.unix/

DECUS:
Index (files unavailable) http://www.decus.org/encompass/software/
Files ftp://ftp.encompassus.org/lib/


Rod Pemberton
 
N

Netocrat

Netocrat said:
Keith Thompson wrote:
... snip on parsing function names ...
You'll also miss some legal occurences of function names like:
(printf)("Hello, world\n");
and catch some occurrences of names of function pointer objects.

You could also get false positives from string literals:
char help_msg[] = "Declare the prototype as int somefunc(int);";

Not if you make the lexical scanner handle complete strings in the first
place. Pseudocode:

while (EOF != (ch = getnextch())) {
switch (chclass(ch)) {
case alpha:
case '_': acquireid(); break;
case '"': acquirestring(); break;
case ' ': break;
....
}
}

Sure, there are ways to handle it, but as the suggestion stood there were
a few holes. You'd also need to make sure that the scanner didn't count
double-quotes within comments.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,141
Latest member
BlissKeto
Top