Source Code Analyzing

D

Daniel Zinn

Hey,

I am not sure if I tried hard enough[1], but perhaps you can guide me, where
to go on with reading, or even propose some code sniplets. Ok, here is the
problem:

I want to analyze a Perl program. I want to transform every "line/statement"
of a given Perl script into a more abstract syntax which only deals with
function and variable definitions and calls/usage.

So, basicly my new "simplified" grammar looks like:
data Prog = Var VarName -- a variable is used
| Call FuncName -- a function is called
| Sub FuncName Prog -- FuncName is new function; body=Prog
| SubClos Prog -- an anom. function is created as closure
| Dyn VarName -- a "local" variable declaration
| Lex VarName -- a "my" variable declaration
| Block Prog -- just a { ... } block
| Skip -- something else
(sorry for this Haskell code over here, please don't feel offended ;))

So, since Perl is 'kind of' not easy to parse directly, I thought of using
the B module. After some browsing, it turned out that B::Xref is pretty
close. Unfortunately, I don't get the blocks, and the ordering is strange.
Since it should be very easy to get a representation like the one described
above, once you have the parse tree, I want to ask, if someone has some
experience with the B module(s). If not, it would be nice, if you could
point me to some readings where I could learn how to handle this framework.

Apart from this, I also want to do some variable assignment tracking, so I
really would like to understand how the Parse tree is organized and how you
can do transformation on it. So, is there any good place to start reading?


Thank you so much,
Daniel

[1] I experimented around with many B::???? modules,
tried to understand the B::Xref source code,
tired to figure out how to traverse the parse tree,
but could not find anything where to start, for example how to get this
tree at the first place...
 
D

Dr.Ruud

Daniel Zinn schreef:
I want to analyze a Perl program. I want to transform every
"line/statement" of a given Perl script into a more abstract syntax
which only deals with function and variable definitions and
calls/usage.

http://ali.as/ "Parsing Perl"

CPAN: PPI
 
U

Uri Guttman

R> Daniel Zinn schreef:
R> http://ali.as/ "Parsing Perl"

R> CPAN: PPI

i was going to mention that module too. it will do what the OP wanted
but it has one caveat, it isn't a true deep parser of perl5. it does a
high quality lexical analysis (which is what the OP wanted). it won't
load modules and pragmas which can change the syntax of following code
(which perl5 handles, of course).

uri
 
D

Daniel Zinn

Uri said:
i was going to mention that module too. it will do what the OP wanted
but it has one caveat, it isn't a true deep parser of perl5. it does a
high quality lexical analysis (which is what the OP wanted). it won't
load modules and pragmas which can change the syntax of following code
(which perl5 handles, of course).

First, thank you for that hint. I didn't know about PPI.
After reading about it (and trying it out) I really think, I want to use the
B framework[1].

B::Xref is almost what I need. Unfortunately, it does not tell me where
{ and } are in the code[2]. I am pretty sure that B knows about these
blocks, but they are just not noticed by the B::Xref module.

So, can anyone help me to understand how B::Xref (or B in general) works,
and even better could anyone give me a hint how to "print out" { and } --
starting and ending block delimiters?

Thank you,
Daniel

[1]
The main reason is that I want to change the Perl code slightly without
breaking it's meaning. The B framework is geared to transform Perl code
without changing the meaning (well, it tries the best). The PPI interface,
on the otherside is to high-level. For example in 'print "x = $x\n";' PPI
tells me that there is a double quoted string - and I have to parse the
string on myself if I want to figure out the x is used inside this string.
Also the B modules do a much better job in understanding the Perl code
(they, for example load pm files...)

[2]
this is important, because:
8<---------------------------
sub foo {
{
local $x = 5; # line 5
print $x,"\n"; # line 6
}
print "$x\n"; # line 8
}
my $bla;
local $x = 6;
foo();
8<---------------------------
is transformed into something like:
localNesting.pl foo 5 main $ x intro
localNesting.pl foo 6 main $ x used
localNesting.pl foo 8 main $ x used

unfortunately, $x in line 8 is not the $x which is introduced in line 5,
because of the curly braces.
 
U

Uri Guttman

DZ> The main reason is that I want to change the Perl code slightly without
DZ> breaking it's meaning. The B framework is geared to transform Perl code
DZ> without changing the meaning (well, it tries the best). The PPI interface,
DZ> on the otherside is to high-level. For example in 'print "x = $x\n";' PPI
DZ> tells me that there is a double quoted string - and I have to parse the
DZ> string on myself if I want to figure out the x is used inside this string.
DZ> Also the B modules do a much better job in understanding the Perl code
DZ> (they, for example load pm files...)

B:: has its problems too. just thought you should know it. PPI will
allow you to also modify the code and print it out.

DZ> [2]
DZ> this is important, because:
DZ> 8<---------------------------
DZ> sub foo {
DZ> {
DZ> local $x = 5; # line 5
DZ> print $x,"\n"; # line 6
DZ> }
DZ> print "$x\n"; # line 8
DZ> }
DZ> my $bla;
DZ> local $x = 6;
DZ> foo();
DZ> 8<---------------------------
DZ> is transformed into something like:
DZ> localNesting.pl foo 5 main $ x intro
DZ> localNesting.pl foo 6 main $ x used
DZ> localNesting.pl foo 8 main $ x used

DZ> unfortunately, $x in line 8 is not the $x which is introduced in line 5,
DZ> because of the curly braces.

i believe PPI will help you with nesting. for sure it will tell you line
numbers and such. but good luck with either module. i am curious as to
what perl code do you need to parse and why you need to modify the code?

uri
 
D

Daniel Zinn

Uri Guttman schrieb:

By the way, is there someone who has some experience with the B::??? stuff?
B:: has its problems too. just thought you should know it. PPI will
allow you to also modify the code and print it out.

Can you think about some specific problems?

except those:
8<---------------------------------------------------------------------------
BEGIN {
eval ( time % 2 ? 'sub foo() { print "foo()\n"; }' :
'sub foo($) { print "foo(".shift.")\n"; }' );
}
foo();
8<---------------------------------------------------------------------------
i believe PPI will help you with nesting. for sure it will tell you line
numbers and such.

Yes, it does. PPi is good for the nesting. But it is still _very_ close to
the original source. Well, perhaps I should use PPI - at least I understand
how to use it...

Though I don't like that I have to parse strings on my own, since this can
be very tedious: my $x = 1; my $y = 3; print "well: @{[ $x + $y + 38]} \n";

resolves to PPI::Token::Quote::Double '"well: @{[ $x + $y + 38]} \n"'

whereas B::Xref tells me:
parseStr.pl (main) 4 (lexical) $ x intro
parseStr.pl (main) 4 (lexical) $ y intro
parseStr.pl (main) 5 main $ " used
parseStr.pl (main) 5 (lexical) $ x used
parseStr.pl (main) 5 (lexical) $ y used
parseStr.pl (main) 5 ? @? ? used

though these ? are not very good either :-/
but good luck with either module. i am curious as to
what perl code do you need to parse and why you need to modify the code?

It's for a class project. I want to identify functions that can be
bypassed/cached. To do this, I need (besides other stuff) where which
variables a how defined and used and the same for the functions. Well,
based on the grammar above, I have a small Hugs+Perl program that does the
variable usage/definition analysis - but I still need to transform the
program :-/


Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,187
Latest member
RosaDemko

Latest Threads

Top