Regex for C function header

C

chris.ritchie

I'm parsing a C source file for function headers. Anyone know a good
algorithm to extract these?

It's difficult because I have to distinguish headers from calls and
headers can look very different.
i.e.
void name(int var){

void
name(int var)
{

void name(
int var){

etc.

Thanks.
 
D

Dr.Ruud

(e-mail address removed) schreef:
I'm parsing a C source file for function headers. Anyone know a good
algorithm to extract these?

It's difficult because I have to distinguish headers from calls and
headers can look very different.
i.e.
void name(int var){

void
name(int var)
{

void name(
int var){

etc.

Those look all very similar:

/ (\w+) \s+ (\w+) \s*
[(] \s* ([^)]* \S) \s* [)] \s*
[{]
/x

(untested, supposed to set $1 $2 $3)

but you won't get away with it. Think about how comments can change
things. Read 'man indent'.
 
C

chris.ritchie

But wouldn't this require arbitrarily concatenating lines (from my
@file_lines)? I ran that without any tweaking and didn't get it to
match.

My life would be much easier if there couldn't be newlines between
words in the declaration.

Hmm, running Tidy on the source sounds like a good idea, but I'm not
sure I can bank the effectiveness of my algorithm on it -- you mean
counting tabs I assume.

I'm not concerned about comments. I could just remove them as a
preprocess.

Thanks.
 
X

Xicheng Jia

But wouldn't this require arbitrarily concatenating lines (from my
@file_lines)? I ran that without any tweaking and didn't get it to
match.
=> My life would be much easier if there couldn't be newlines between
=> words in the declaration.

you can read your file in slurp-mode
Hmm, running Tidy on the source sounds like a good idea, but I'm not
sure I can bank the effectiveness of my algorithm on it -- you mean
counting tabs I assume.

you nes at least some anchor to distinguish the function names with
blocks.:) here is a snippet which can handle the data you gave, you
still need something more to do:)

perl -0777ne '
while( /^(?![\s#}\/])([^{]+)/mg ) {
($x=$1)=~s/\n+|(?<=[()])\s*//g;
print "$x;\n";
}' yourfile.c

___output_for_your_data___
void name(int var);
void name(int var);
void name(int var);
___
1. add anchor and using multi-line mode ^ and 'm'
^(?![\s#}\/]) : first character in a line cannot be whitespace, #,
slash...
([^{]+) : all non-'{' characters...
2. you may need to remove variable names between parenthesis though. :)
3. you may need to remove "int main();" though
4. top-level extern variable declarations... ??? duno how though
5. .......long long way to go though.. :)

Xicheng
 
X

Xicheng Jia

Xicheng said:
But wouldn't this require arbitrarily concatenating lines (from my
@file_lines)? I ran that without any tweaking and didn't get it to
match.
=> My life would be much easier if there couldn't be newlines between
=> words in the declaration.

you can read your file in slurp-mode
Hmm, running Tidy on the source sounds like a good idea, but I'm not
sure I can bank the effectiveness of my algorithm on it -- you mean
counting tabs I assume.

you nes at least some anchor to distinguish the function names with
blocks.:) here is a snippet which can handle the data you gave, you
still need something more to do:)

perl -0777ne '
while( /^(?![\s#}\/])([^{]+)/mg ) {
($x=$1)=~s/\n+|(?<=[()])\s*//g;
print "$x;\n";
}' yourfile.c

___output_for_your_data___
void name(int var);
void name(int var);
void name(int var);
___
1. add anchor and using multi-line mode ^ and 'm'
^(?![\s#}\/]) : first character in a line cannot be whitespace, #,
slash...
([^{]+) : all non-'{' characters...
=> 2. you may need to remove variable names between parenthesis though.
:)
3. you may need to remove "int main();" though
4. top-level extern variable declarations... ??? duno how though
5. .......long long way to go though.. :)

Sorry I was in a rush and made lots of errors with my English. :-(

I assumed your C codes are indented well and only some special lines
don't have leading whitespaces..

the following regexes work roughly with the function definitions of my
current C code. may have problems with the declarations/definitions of
the external variabes(struct, enum, or union which have opening brace).
may leave off something like

extern "C" {...}

and may have some problems when the function auguments contain pointers
to functions.

But I guess you want to export only "extern" instead of "static"
functions, then you might be able to add some more keywords to these
regexes. if so, things would be easier..:)
______________________________________
perl -0777ne '
# slurp in the C source code
# starting from the beginning of a line,
# capture all continuous non-opening braces
# into $1 and make sure the first char
# is not whitespace, #, }, {, /, * ...
while( /^([^\s#}{*\/][^{]+)/mg ) {
my $x = $1;
# skip function declarations
# may skip also the first function difinition.:(
next if $1 =~ /;/;
# get rid of the name of the function arguments
$x =~ s/\s+(\**)\s*\w+([,)])\s*/\1\2 /g;
# get rif of extra whitespaces
$x=~s/(?<=[()])\s+//g;
# get rid of newlines in prototypes
$x=~tr/\n//d;
print "$x;\n";
}' file.c
_______________________________________
Have fun... :) hehe

XC..
 
C

chris.ritchie

Ah thanks, this is good. It's coming along.

But part of it is now matching words that aren't reserved.

Ordinarily, I'd put
$var !~ (for|if|case...)
to match any word that isn't reserved. But this is part of a large
affirmative match.

That is, it's
$var =~ <large block> <word that must not be reserved> <large block>

I need something like [^abcd] but instead of single characters, it
needs to match words.

Continuing thanks...


Xicheng said:
Xicheng said:
But wouldn't this require arbitrarily concatenating lines (from my
@file_lines)? I ran that without any tweaking and didn't get it to
match.
=> My life would be much easier if there couldn't be newlines between
=> words in the declaration.

you can read your file in slurp-mode
Hmm, running Tidy on the source sounds like a good idea, but I'm not
sure I can bank the effectiveness of my algorithm on it -- you mean
counting tabs I assume.

you nes at least some anchor to distinguish the function names with
blocks.:) here is a snippet which can handle the data you gave, you
still need something more to do:)

perl -0777ne '
while( /^(?![\s#}\/])([^{]+)/mg ) {
($x=$1)=~s/\n+|(?<=[()])\s*//g;
print "$x;\n";
}' yourfile.c

___output_for_your_data___
void name(int var);
void name(int var);
void name(int var);
___
1. add anchor and using multi-line mode ^ and 'm'
^(?![\s#}\/]) : first character in a line cannot be whitespace, #,
slash...
([^{]+) : all non-'{' characters...
=> 2. you may need to remove variable names between parenthesis though.
:)
3. you may need to remove "int main();" though
4. top-level extern variable declarations... ??? duno how though
5. .......long long way to go though.. :)

Sorry I was in a rush and made lots of errors with my English. :-(

I assumed your C codes are indented well and only some special lines
don't have leading whitespaces..

the following regexes work roughly with the function definitions of my
current C code. may have problems with the declarations/definitions of
the external variabes(struct, enum, or union which have opening brace).
may leave off something like

extern "C" {...}

and may have some problems when the function auguments contain pointers
to functions.

But I guess you want to export only "extern" instead of "static"
functions, then you might be able to add some more keywords to these
regexes. if so, things would be easier..:)
______________________________________
perl -0777ne '
# slurp in the C source code
# starting from the beginning of a line,
# capture all continuous non-opening braces
# into $1 and make sure the first char
# is not whitespace, #, }, {, /, * ...
while( /^([^\s#}{*\/][^{]+)/mg ) {
my $x = $1;
# skip function declarations
# may skip also the first function difinition.:(
next if $1 =~ /;/;
# get rid of the name of the function arguments
$x =~ s/\s+(\**)\s*\w+([,)])\s*/\1\2 /g;
# get rif of extra whitespaces
$x=~s/(?<=[()])\s+//g;
# get rid of newlines in prototypes
$x=~tr/\n//d;
print "$x;\n";
}' file.c
_______________________________________
Have fun... :) hehe

XC..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,430
Messages
2,571,676
Members
48,796
Latest member
Greg L.

Latest Threads

Top