Source Code Analyzing

Discussion in 'Perl Misc' started by Daniel Zinn, Mar 12, 2006.

  1. Daniel Zinn

    Daniel Zinn Guest

    Hey,

    I am not sure if I tried hard enough[1], but perhaps you can guide me, where
    to go on with reading, or even propose some code sniplets. Ok, here is the
    problem:

    I want to analyze a Perl program. I want to transform every "line/statement"
    of a given Perl script into a more abstract syntax which only deals with
    function and variable definitions and calls/usage.

    So, basicly my new "simplified" grammar looks like:
    data Prog = Var VarName -- a variable is used
    | Call FuncName -- a function is called
    | Sub FuncName Prog -- FuncName is new function; body=Prog
    | SubClos Prog -- an anom. function is created as closure
    | Dyn VarName -- a "local" variable declaration
    | Lex VarName -- a "my" variable declaration
    | Block Prog -- just a { ... } block
    | Skip -- something else
    (sorry for this Haskell code over here, please don't feel offended ;))

    So, since Perl is 'kind of' not easy to parse directly, I thought of using
    the B module. After some browsing, it turned out that B::Xref is pretty
    close. Unfortunately, I don't get the blocks, and the ordering is strange.
    Since it should be very easy to get a representation like the one described
    above, once you have the parse tree, I want to ask, if someone has some
    experience with the B module(s). If not, it would be nice, if you could
    point me to some readings where I could learn how to handle this framework.

    Apart from this, I also want to do some variable assignment tracking, so I
    really would like to understand how the Parse tree is organized and how you
    can do transformation on it. So, is there any good place to start reading?


    Thank you so much,
    Daniel

    [1] I experimented around with many B::???? modules,
    tried to understand the B::Xref source code,
    tired to figure out how to traverse the parse tree,
    but could not find anything where to start, for example how to get this
    tree at the first place...
    Daniel Zinn, Mar 12, 2006
    #1
    1. Advertising

  2. Daniel Zinn

    Dr.Ruud Guest

    Daniel Zinn schreef:

    > I want to analyze a Perl program. I want to transform every
    > "line/statement" of a given Perl script into a more abstract syntax
    > which only deals with function and variable definitions and
    > calls/usage.


    http://ali.as/ "Parsing Perl"

    CPAN: PPI

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    echo 78B3649D3C106E1056C42A10F |perl -pe 'tr/0-9A-F/rectal JunkshoP,/'
    Dr.Ruud, Mar 12, 2006
    #2
    1. Advertising

  3. Daniel Zinn

    Uri Guttman Guest

    >>>>> "R" == Ruud <> writes:

    R> Daniel Zinn schreef:
    >> I want to analyze a Perl program. I want to transform every
    >> "line/statement" of a given Perl script into a more abstract syntax
    >> which only deals with function and variable definitions and
    >> calls/usage.


    R> http://ali.as/ "Parsing Perl"

    R> CPAN: PPI

    i was going to mention that module too. it will do what the OP wanted
    but it has one caveat, it isn't a true deep parser of perl5. it does a
    high quality lexical analysis (which is what the OP wanted). it won't
    load modules and pragmas which can change the syntax of following code
    (which perl5 handles, of course).

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Mar 12, 2006
    #3
  4. Daniel Zinn

    Daniel Zinn Guest

    Uri Guttman wrote:
    >
    > i was going to mention that module too. it will do what the OP wanted
    > but it has one caveat, it isn't a true deep parser of perl5. it does a
    > high quality lexical analysis (which is what the OP wanted). it won't
    > load modules and pragmas which can change the syntax of following code
    > (which perl5 handles, of course).


    First, thank you for that hint. I didn't know about PPI.
    After reading about it (and trying it out) I really think, I want to use the
    B framework[1].

    B::Xref is almost what I need. Unfortunately, it does not tell me where
    { and } are in the code[2]. I am pretty sure that B knows about these
    blocks, but they are just not noticed by the B::Xref module.

    So, can anyone help me to understand how B::Xref (or B in general) works,
    and even better could anyone give me a hint how to "print out" { and } --
    starting and ending block delimiters?

    Thank you,
    Daniel

    [1]
    The main reason is that I want to change the Perl code slightly without
    breaking it's meaning. The B framework is geared to transform Perl code
    without changing the meaning (well, it tries the best). The PPI interface,
    on the otherside is to high-level. For example in 'print "x = $x\n";' PPI
    tells me that there is a double quoted string - and I have to parse the
    string on myself if I want to figure out the x is used inside this string.
    Also the B modules do a much better job in understanding the Perl code
    (they, for example load pm files...)

    [2]
    this is important, because:
    8<---------------------------
    sub foo {
    {
    local $x = 5; # line 5
    print $x,"\n"; # line 6
    }
    print "$x\n"; # line 8
    }
    my $bla;
    local $x = 6;
    foo();
    8<---------------------------
    is transformed into something like:
    localNesting.pl foo 5 main $ x intro
    localNesting.pl foo 6 main $ x used
    localNesting.pl foo 8 main $ x used

    unfortunately, $x in line 8 is not the $x which is introduced in line 5,
    because of the curly braces.
    Daniel Zinn, Mar 12, 2006
    #4
  5. Daniel Zinn

    Uri Guttman Guest

    >>>>> "DZ" == Daniel Zinn <> writes:

    DZ> The main reason is that I want to change the Perl code slightly without
    DZ> breaking it's meaning. The B framework is geared to transform Perl code
    DZ> without changing the meaning (well, it tries the best). The PPI interface,
    DZ> on the otherside is to high-level. For example in 'print "x = $x\n";' PPI
    DZ> tells me that there is a double quoted string - and I have to parse the
    DZ> string on myself if I want to figure out the x is used inside this string.
    DZ> Also the B modules do a much better job in understanding the Perl code
    DZ> (they, for example load pm files...)

    B:: has its problems too. just thought you should know it. PPI will
    allow you to also modify the code and print it out.

    DZ> [2]
    DZ> this is important, because:
    DZ> 8<---------------------------
    DZ> sub foo {
    DZ> {
    DZ> local $x = 5; # line 5
    DZ> print $x,"\n"; # line 6
    DZ> }
    DZ> print "$x\n"; # line 8
    DZ> }
    DZ> my $bla;
    DZ> local $x = 6;
    DZ> foo();
    DZ> 8<---------------------------
    DZ> is transformed into something like:
    DZ> localNesting.pl foo 5 main $ x intro
    DZ> localNesting.pl foo 6 main $ x used
    DZ> localNesting.pl foo 8 main $ x used

    DZ> unfortunately, $x in line 8 is not the $x which is introduced in line 5,
    DZ> because of the curly braces.

    i believe PPI will help you with nesting. for sure it will tell you line
    numbers and such. but good luck with either module. i am curious as to
    what perl code do you need to parse and why you need to modify the code?

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Mar 12, 2006
    #5
  6. Daniel Zinn

    Daniel Zinn Guest

    Uri Guttman schrieb:

    By the way, is there someone who has some experience with the B::??? stuff?

    > B:: has its problems too. just thought you should know it. PPI will
    > allow you to also modify the code and print it out.


    Can you think about some specific problems?

    except those:
    8<---------------------------------------------------------------------------
    BEGIN {
    eval ( time % 2 ? 'sub foo() { print "foo()\n"; }' :
    'sub foo($) { print "foo(".shift.")\n"; }' );
    }
    foo();
    8<---------------------------------------------------------------------------

    > i believe PPI will help you with nesting. for sure it will tell you line
    > numbers and such.


    Yes, it does. PPi is good for the nesting. But it is still _very_ close to
    the original source. Well, perhaps I should use PPI - at least I understand
    how to use it...

    Though I don't like that I have to parse strings on my own, since this can
    be very tedious: my $x = 1; my $y = 3; print "well: @{[ $x + $y + 38]} \n";

    resolves to PPI::Token::Quote::Double '"well: @{[ $x + $y + 38]} \n"'

    whereas B::Xref tells me:
    parseStr.pl (main) 4 (lexical) $ x intro
    parseStr.pl (main) 4 (lexical) $ y intro
    parseStr.pl (main) 5 main $ " used
    parseStr.pl (main) 5 (lexical) $ x used
    parseStr.pl (main) 5 (lexical) $ y used
    parseStr.pl (main) 5 ? @? ? used

    though these ? are not very good either :-/

    > but good luck with either module. i am curious as to
    > what perl code do you need to parse and why you need to modify the code?


    It's for a class project. I want to identify functions that can be
    bypassed/cached. To do this, I need (besides other stuff) where which
    variables a how defined and used and the same for the functions. Well,
    based on the grammar above, I have a small Hugs+Perl program that does the
    variable usage/definition analysis - but I still need to transform the
    program :-/


    Daniel
    Daniel Zinn, Mar 13, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bj daniels

    analyzing a csv using sql commands

    bj daniels, May 12, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    401
    bj daniels
    May 12, 2004
  2. Joe Ross
    Replies:
    0
    Views:
    585
    Joe Ross
    Aug 30, 2005
  3. Mike Landis

    Analyzing and tyding Java code

    Mike Landis, Oct 21, 2003, in forum: Java
    Replies:
    5
    Views:
    486
    Christopher Dean
    Oct 23, 2003
  4. Zach

    analyzing C code?

    Zach, Feb 1, 2007, in forum: C Programming
    Replies:
    12
    Views:
    563
  5. axeman

    help analyzing cause of return code

    axeman, Feb 22, 2006, in forum: Perl Misc
    Replies:
    7
    Views:
    131
    axeman
    Feb 23, 2006
Loading...

Share This Page