Removing Perl comments and strings using regexps

Discussion in 'Perl Misc' started by Brendan Byrd/SineSwiper, Jul 17, 2003.

  1. I'm in the middle of fixing the LXR tool to work with Perl. Almost
    finished, but I've ran into a problem that completely fries my brain.
    In order to detect Perl subroutine/variable declarations, I need to
    remove comments and strings so that they don't get confused for
    declarations.

    My current code works for most cases:

    # Remove escaped/variable characters for strings/comments
    $contents =~ s/\\\$//g;
    $contents =~ s/[\\\$][\"\'\#]//g;

    # Remove literal strings
    $contents =~ s/\"[^\"]*?\"//gs;
    $contents =~ s/\'[^\']*?\'//gs;

    # Remove comments
    $contents =~ s/\#[^\n]*//g;

    However, if you run into a comment with a quotemark, it screws up
    everything:

    # I'll be back

    I can switch the order of the removes, but then I encounter the reverse
    problem: a string with a pound sign.

    $aaa = '#FF0088';

    The remove comment line kills the second quote mark, and the remove
    string then runs away and gobs up everything in sight (until you have
    another string with the same pound sign problem). Is there any sort of
    regexp that would work for this program:

    # Sine's program
    $aaa = 'test #'; # test statement
    $bbb = 'halo #
    there'; # multi-line using ' marks
    # #'#'#'#''######'

    --
    Brendan Byrd/SineSwiper <>
    Perl hacker, computer wizard, and all-around internet guru
    Resonator Software <http://www.ResonatorSoft.org/>
     
    Brendan Byrd/SineSwiper, Jul 17, 2003
    #1
    1. Advertising

  2. Brendan Byrd/SineSwiper

    Jay Tilton Guest

    Brendan Byrd/SineSwiper <> wrote:

    : I'm in the middle of fixing the LXR tool to work with Perl. Almost
    : finished, but I've ran into a problem that completely fries my brain.
    : In order to detect Perl subroutine/variable declarations, I need to
    : remove comments and strings so that they don't get confused for
    : declarations.

    How about just running the program through the Xref backend?

    perl -MO=Xref foo.pl
     
    Jay Tilton, Jul 17, 2003
    #2
    1. Advertising

  3. Jay Tilton wrote:

    > Brendan Byrd/SineSwiper <> wrote:
    >
    > : I'm in the middle of fixing the LXR tool to work with Perl. Almost
    > : finished, but I've ran into a problem that completely fries my brain.
    > : In order to detect Perl subroutine/variable declarations, I need to
    > : remove comments and strings so that they don't get confused for
    > : declarations.
    >
    > How about just running the program through the Xref backend?
    >
    > perl -MO=Xref foo.pl
    >


    Interesting. Is there a way to load that as a module and send the input
    of a variable (the program) to Xref?

    --
    Brendan Byrd/SineSwiper <>
    Perl hacker, computer wizard, and all-around internet guru
    Resonator Software <http://www.ResonatorSoft.org/>
     
    Brendan Byrd/SineSwiper, Jul 18, 2003
    #3
  4. Also sprach Brendan Byrd/SineSwiper:

    > Jay Tilton wrote:
    >
    >> Brendan Byrd/SineSwiper <> wrote:
    >>
    >> : I'm in the middle of fixing the LXR tool to work with Perl. Almost
    >> : finished, but I've ran into a problem that completely fries my brain.
    >> : In order to detect Perl subroutine/variable declarations, I need to
    >> : remove comments and strings so that they don't get confused for
    >> : declarations.
    >>
    >> How about just running the program through the Xref backend?
    >>
    >> perl -MO=Xref foo.pl
    >>

    >
    > Interesting. Is there a way to load that as a module and send the input
    > of a variable (the program) to Xref?


    If such a way existed, I'd like to know it, too. How I understand the
    B:: modules and the generic compiler backend O there is no such way.
    They are triggered in a CHECK block right after a script has been
    compiled. A B:: module simply provides a callback that is invoked on the
    optree of a script. So it can't work on strings.

    Tassilo
    --
    $_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
    pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
    $_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval
     
    Tassilo v. Parseval, Jul 18, 2003
    #4
  5. Tassilo v. Parseval wrote:

    > Also sprach Brendan Byrd/SineSwiper:
    >
    > If such a way existed, I'd like to know it, too. How I understand the
    > B:: modules and the generic compiler backend O there is no such way.
    > They are triggered in a CHECK block right after a script has been
    > compiled. A B:: module simply provides a callback that is invoked on the
    > optree of a script. So it can't work on strings.


    Well, if I could code this into the current development version of LXR,
    it wouldn't need strings (all looking at files anyway), and I could just
    call another instance of perl with an OPEN command. A little sloppy,
    but it beats re-inventing the wheel.

    --
    Brendan Byrd/SineSwiper <>
    Perl hacker, computer wizard, and all-around internet guru
    Resonator Software <http://www.ResonatorSoft.org/>
     
    Brendan Byrd/SineSwiper, Jul 18, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    1,255
    Tom Regner
    Jul 15, 2004
  2. Ernst Elzas
    Replies:
    0
    Views:
    314
    Ernst Elzas
    Jan 15, 2006
  3. Mike

    raw strings in regexps

    Mike, Dec 8, 2006, in forum: Python
    Replies:
    4
    Views:
    312
  4. Jari Williamsson

    Regexps and HTML comments

    Jari Williamsson, Nov 22, 2007, in forum: Ruby
    Replies:
    1
    Views:
    300
    Greg Willits
    Nov 22, 2007
  5. Replies:
    3
    Views:
    135
Loading...

Share This Page