Writing a C++ Style Checker

Discussion in 'Perl Misc' started by ids, Sep 19, 2007.

  1. ids

    ids Guest

    Hi,

    I'm new to Perl. I'm trying to use Perl to write a C++ Style Checker
    to validate various coding standards followed in our organization.

    Some of the things I need to do in this tool include:
    - verifying whether identifier naming conventions have been followed
    - differentiating between member variables and local variables
    (because their naming conventions are different)
    - determining method/function boundaries
    - identifying control structures such as 'if', 'while' etc to see
    whether they are written with code blocks (i.e. { }) all the time
    - check whether statements are more than a given width (say 100
    column)
    - etc. etc.

    This need not go in to semantics of the program; what I need is a
    basic style checker.

    What I see is that parsing line by line independently is not going to
    help. This parser needs to build context and remember stuff across
    lines to satisfy above goals.

    Are there any existing Perl based style checkers? If not, can you give
    some advice on how best to structure this program? Or else can you
    give some good references on *design* aspects of Perl?

    (I have a C/C++ background. So I'm familiar with OO design. I'm trying
    to develop a *similar mental model* for Perl programs.)

    Thanks in advance,
    Ishan.
    ids, Sep 19, 2007
    #1
    1. Advertising

  2. ids

    Ben Bullock Guest

    On Wed, 19 Sep 2007 13:00:42 +0000, ids wrote:

    > I'm new to Perl. I'm trying to use Perl to write a C++ Style Checker
    > to validate various coding standards followed in our organization.
    >
    > Some of the things I need to do in this tool include:
    > - verifying whether identifier naming conventions have been followed
    > - differentiating between member variables and local variables
    > (because their naming conventions are different)
    > - determining method/function boundaries
    > - identifying control structures such as 'if', 'while' etc to see
    > whether they are written with code blocks (i.e. { }) all the time


    Well, as a start (probably misses some cases):

    print "Aw shucks" if ($mycode =~ /(if|while)[^{}]*?;/s);

    > - check whether statements are more than a given width (say 100
    > column)


    That doesn't sound hard:

    #!/usr/bin/perl
    use warnings; use strict;
    while (<>) {
    print "Line $.: Oops! Too long!\n" if (/^.{100,}$/);
    }

    > - etc. etc.
    >
    > This need not go in to semantics of the program; what I need is a
    > basic style checker.
    >
    > What I see is that parsing line by line independently is not going to
    > help.


    Well, it can do some of this stuff very rapidly.

    > This parser needs to build context and remember stuff across
    > lines to satisfy above goals.
    >
    > Are there any existing Perl based style checkers? If not, can you give
    > some advice on how best to structure this program? Or else can you
    > give some good references on *design* aspects of Perl?




    > (I have a C/C++ background. So I'm familiar with OO design. I'm trying
    > to develop a *similar mental model* for Perl programs.)


    Are you sure your problem is difficult enough to warrant developing a
    mental model? Sounds like a relatively simple job for Perl to me. Why not
    just code something up and see how it goes?
    Ben Bullock, Sep 19, 2007
    #2
    1. Advertising

  3. ids

    Ben Morrow Guest

    Quoth ids <>:
    > Hi,
    >
    > I'm new to Perl. I'm trying to use Perl to write a C++ Style Checker
    > to validate various coding standards followed in our organization.
    >
    > Some of the things I need to do in this tool include:
    > - verifying whether identifier naming conventions have been followed
    > - differentiating between member variables and local variables
    > (because their naming conventions are different)
    > - determining method/function boundaries
    > - identifying control structures such as 'if', 'while' etc to see
    > whether they are written with code blocks (i.e. { }) all the time
    > - check whether statements are more than a given width (say 100
    > column)
    > - etc. etc.
    >
    > This need not go in to semantics of the program; what I need is a
    > basic style checker.
    >
    > What I see is that parsing line by line independently is not going to
    > help. This parser needs to build context and remember stuff across
    > lines to satisfy above goals.


    This is going to be seriously hard work. What you need is a parser for
    C++, and as C++ is a *very* complex language this is not going to be
    easy to get right, unless you are content to only recognize simple
    constructions without parsing the code properly.

    There is a Parse::RecDescent grammar for some subset of C++ included in
    the Inline::CPP distribution. You may find it useful to start there.

    Alternatively, you may be able to persuade your compiler to do the
    parsing for you. While some of your criteria above (such as blocks on
    ifs) will be lost by such an approach, you may be able to handle these
    with a relatively simple parser, leaving the hard work of 'is this
    identifier a local or member variable' to the compiler.

    What you need to do is either persuade your compiler to produce some
    intermediate parsed form of output (such as from gcc's -fdump-* and -d*
    options), or compile objects with debugging info and then parse that.
    One example of a (now very old) program that does this is c2ph in the
    Perl distribution, which was intended to allow access to C structures by
    parsing stabs debugging information.

    > Are there any existing Perl based style checkers?


    There is Perl::Critic, for style-checking Perl, but that is based on the
    excellent PPI, a Perl module that parses Perl, which took a *lot* of
    work to produce.

    Basically: you have set yourself an *extremely* hard problem :(. You can
    either produce a very 'shallow' and rather incomplete solution, or
    produce a proper solution only after a lot of work. OTOH, a proper C++
    parser for Perl would probably be a good thing... :)

    Ben
    Ben Morrow, Sep 19, 2007
    #3
  4. ids

    Ben Bullock Guest

    On Wed, 19 Sep 2007 17:18:21 +0100, Ben Morrow wrote:

    > Quoth ids <>:
    >> Hi,
    >>
    >> I'm new to Perl. I'm trying to use Perl to write a C++ Style Checker
    >> to validate various coding standards followed in our organization.
    >>
    >> Some of the things I need to do in this tool include:
    >> - verifying whether identifier naming conventions have been followed
    >> - differentiating between member variables and local variables
    >> (because their naming conventions are different)
    >> - determining method/function boundaries
    >> - identifying control structures such as 'if', 'while' etc to see
    >> whether they are written with code blocks (i.e. { }) all the time
    >> - check whether statements are more than a given width (say 100
    >> column)
    >> - etc. etc.
    >>
    >> This need not go in to semantics of the program; what I need is a
    >> basic style checker.
    >>
    >> What I see is that parsing line by line independently is not going to
    >> help. This parser needs to build context and remember stuff across
    >> lines to satisfy above goals.

    >
    > This is going to be seriously hard work. What you need is a parser for
    > C++, and as C++ is a *very* complex language this is not going to be
    > easy to get right, unless you are content to only recognize simple
    > constructions without parsing the code properly.


    Similarly, one also needs to completely parse the English language in order
    to write a spelling checker. For example to check for "right" misspelt
    as "write" requires one to comprehensively parse all possible English
    sentences, distinguish between verbs and adjectives, and detect the word
    "right" where a verb should be. Making such a spelling checker is going to
    be seriously hard work too - maybe it will take the rest of your life. But
    a simple system which catches 99% of errors is a few lines of code.
    Ben Bullock, Sep 20, 2007
    #4
  5. ids

    Ben Morrow Guest

    Quoth Ben Bullock <>:
    > On Wed, 19 Sep 2007 17:18:21 +0100, Ben Morrow wrote:
    > > Quoth ids <>:
    > >>
    > >> I'm new to Perl. I'm trying to use Perl to write a C++ Style Checker
    > >> to validate various coding standards followed in our organization.

    <snip>
    > >
    > > This is going to be seriously hard work. What you need is a parser for
    > > C++, and as C++ is a *very* complex language this is not going to be
    > > easy to get right, unless you are content to only recognize simple
    > > constructions without parsing the code properly.

    >
    > Similarly, one also needs to completely parse the English language in order
    > to write a spelling checker. For example to check for "right" misspelt
    > as "write" requires one to comprehensively parse all possible English
    > sentences, distinguish between verbs and adjectives, and detect the word
    > "right" where a verb should be. Making such a spelling checker is going to
    > be seriously hard work too - maybe it will take the rest of your life. But
    > a simple system which catches 99% of errors is a few lines of code.


    Heh, yes, of course. However, what 'ids' is looking for is more akin to
    a grammar checker than a spellchecker; in fact, it's similar in spirit
    to the MSWord 'grammar' checker (many of its admonitions are more
    matters of style than incorrect grammar per se), which does actually
    have a pretty good grasp of English grammar nowadays.

    In any case, distinguishing a local variable from a class member (or,
    indeed, identifying a declaration at all in C++) is going to require a
    good deal more than a bit of pattern matching, which was really my
    point.

    Ben
    Ben Morrow, Sep 20, 2007
    #5
  6. ids

    ids Guest

    On Sep 19, 9:18 pm, Ben Morrow <> wrote:
    >
    > This is going to be seriously hard work. What you need is a parser for
    > C++, and as C++ is a *very* complex language this is not going to be
    > easy to get right, unless you are content to only recognize simple
    > constructions without parsing the code properly.
    >
    > There is a Parse::RecDescent grammar for some subset of C++ included in
    > the Inline::CPP distribution. You may find it useful to start there.
    >
    > Alternatively, you may be able to persuade your compiler to do the
    > parsing for you. While some of your criteria above (such as blocks on
    > ifs) will be lost by such an approach, you may be able to handle these
    > with a relatively simple parser, leaving the hard work of 'is this
    > identifier a local or member variable' to the compiler.
    >
    > What you need to do is either persuade your compiler to produce some
    > intermediate parsed form of output (such as from gcc's -fdump-* and -d*
    > options), or compile objects with debugging info and then parse that.
    > One example of a (now very old) program that does this is c2ph in the
    > Perl distribution, which was intended to allow access to C structures by
    > parsing stabs debugging information.
    >
    > > Are there any existing Perl based style checkers?

    >
    > There is Perl::Critic, for style-checking Perl, but that is based on the
    > excellent PPI, a Perl module that parses Perl, which took a *lot* of
    > work to produce.
    >
    > Basically: you have set yourself an *extremely* hard problem :(. You can
    > either produce a very 'shallow' and rather incomplete solution, or
    > produce a proper solution only after a lot of work. OTOH, a proper C++
    > parser for Perl would probably be a good thing... :)
    >


    Well, I guess what I need is something in between the two ends that
    two of you proposed.

    What BenB suggested is not sufficient. Applying pattern matchings on a
    line by line basis is not going help. For example, it won't allow me
    to recognize a function implementation.

    In a BNF grammar we can define a function using something similar to
    the following.

    func_impl : type_spec IDENTIFIER '(' opt_param_list ')' block ;

    block : '{' opt_statement_list '}' ;

    // define the other non terminals here

    In order to do this, you need to *build state* as and when you scan
    through. i.e. you need to remember that you saw the opening brace in a
    previous line and you find the matching end brace in another line
    below. This gets even difficult because of nested blocks. So you need
    to keep pushing and popping braces.

    Building of state requires some well organized data structures. I can
    visualize how this is to be done with a language like C++. What you
    need is a set of classes with a suited inheritance structure. I don't
    know how to do it with Perl. That's why asked about a mental model. I
    suppose Perl does support the OO paradigm, but I didn't find any
    material to read about it. I mean, I would like to read "OOA/D with
    Perl" sort of thing.

    The alternative here is to use Flex/Bison with C++. The problem is the
    complexity of the grammar to handle C++. Why I thought I would try
    with Perl is because of the powerful pattern matching ability. But,
    whether I use Flex/Bison or I use Perl, the need to parse the grammar
    is still there. That's what BenM has said.

    I will try starting off with a very simple thing and then expanding
    it.

    Thanks for your help.

    Cheers,
    Ishan.
    ids, Sep 20, 2007
    #6
  7. ids wrote:

    > What I see is that parsing line by line independently is not going to
    > help. This parser needs to build context and remember stuff across
    > lines to satisfy above goals.


    You can start with simple "one-liners" to catch the "low hanging fruits".

    My first Perl script was something like

    if ($line =~ m/$regex_pattern/) {
    print "$line\n";
    }

    to find errors in a very large XML file.

    > Are there any existing Perl based style checkers? If not, can you give
    > some advice on how best to structure this program?


    You should look into the source code and documentation of PPI and
    Perl::Critic. Both have a very nice architecture.

    Helmut Wollmersdorfer
    Helmut Wollmersdorfer, Sep 20, 2007
    #7
  8. ids

    Ben Bullock Guest

    On Thu, 20 Sep 2007 04:52:07 +0000, ids wrote:


    > What BenB suggested is not sufficient. Applying pattern matchings on a
    > line by line basis is not going help. For example, it won't allow me
    > to recognize a function implementation.


    Well, if it gets you 50% of the result for 1% of the effort ...

    >
    > In a BNF grammar we can define a function using something similar to
    > the following.
    >
    > func_impl : type_spec IDENTIFIER '(' opt_param_list ')' block ;
    >
    > block : '{' opt_statement_list '}' ;
    >
    > // define the other non terminals here
    >
    > In order to do this, you need to *build state* as and when you scan
    > through. i.e. you need to remember that you saw the opening brace in a
    > previous line and you find the matching end brace in another line
    > below. This gets even difficult because of nested blocks. So you need
    > to keep pushing and popping braces.


    How much state do you really care about? I don't think you need to "parse
    C++" to do this. Incidentally I once wrote a program to create C header
    files from ANSI C code:

    http://sourceforge.net/projects/cfunctions

    This gets out function declarations from the C source based on a lex (flex)
    parser and one stack written in C by basically discarding anything inside
    a function. (If I had done it in Perl it would have been easier. I don't
    recommend writing it in C.)

    > Building of state requires some well organized data structures. I can
    > visualize how this is to be done with a language like C++. What you
    > need is a set of classes with a suited inheritance structure. I don't
    > know how to do it with Perl. That's why asked about a mental model. I
    > suppose Perl does support the OO paradigm, but I didn't find any
    > material to read about it. I mean, I would like to read "OOA/D with
    > Perl" sort of thing.


    I don't know if it's what you want, but there are two O'Reilly books on
    Perl objects, "Learning Perl Objects, References and Modules" and
    "Intermediate Perl".

    > The alternative here is to use Flex/Bison with C++. The problem is the
    > complexity of the grammar to handle C++. Why I thought I would try with
    > Perl is because of the powerful pattern matching ability. But, whether I
    > use Flex/Bison or I use Perl, the need to parse the grammar is still
    > there. That's what BenM has said.


    But I expect you can just throw away most of the grammar - you don't need
    to parse it but rather just discard most of it. If you're just
    using the program to automatically check for style mistakes, where I
    assume that it is not a fatal problem if you turn up a false positive or
    miss one or two badly named variables, "lazy" methods can do most of the
    job.
    Ben Bullock, Sep 20, 2007
    #8
  9. ids

    Ted Zlatanov Guest

    On Thu, 20 Sep 2007 04:52:07 -0000 ids <> wrote:

    i> What BenB suggested is not sufficient. Applying pattern matchings on a
    i> line by line basis is not going help. For example, it won't allow me
    i> to recognize a function implementation.

    i> In a BNF grammar we can define a function using something similar to
    i> the following.

    i> func_impl : type_spec IDENTIFIER '(' opt_param_list ')' block ;
    i> block : '{' opt_statement_list '}' ;
    i> // define the other non terminals here

    If you feel comfortable with this kind of grammar definition, definitely
    look at the existing Parse::RecDescent solutions and try to extend one
    of them.

    I would suggest that coding standards are much easier to enforce by peer
    review, especially since peer review will catch many errors (bugs and
    inefficiencies) that an automatic checker won't. So if that's your
    goal, please consider what you'll accomplish versus what you are
    actually trying to do.

    Ted
    Ted Zlatanov, Sep 20, 2007
    #9
  10. ids

    Ben Morrow Guest

    Quoth ids <>:
    >
    > In a BNF grammar we can define a function using something similar to
    > the following.
    >
    > func_impl : type_spec IDENTIFIER '(' opt_param_list ')' block ;
    >
    > block : '{' opt_statement_list '}' ;
    >
    > // define the other non terminals here
    >
    > In order to do this, you need to *build state* as and when you scan
    > through. i.e. you need to remember that you saw the opening brace in a
    > previous line and you find the matching end brace in another line
    > below. This gets even difficult because of nested blocks. So you need
    > to keep pushing and popping braces.


    There are several parser modules on CPAN. Parse::RecDescent is very
    flexible but rather slow; Parse::Yapp is a direct clone of yacc, which
    it sounds like you're familiar with. The grammar in Inline::CPP is
    written with P::RD, and may well be sufficient for your needs.

    Ben
    Ben Morrow, Sep 20, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?gb2312?B?zfW2rA==?=

    I need a C++ code style checker.

    =?gb2312?B?zfW2rA==?=, Jul 24, 2007, in forum: C++
    Replies:
    3
    Views:
    2,890
    Ondra Holub
    Jul 25, 2007
  2. Kinokunya
    Replies:
    3
    Views:
    402
  3. Pager O Rama

    MSN BLOCK CHECKER-MSN STATUS CHECKER-MSN PROBLEMS

    Pager O Rama, Apr 4, 2006, in forum: ASP General
    Replies:
    0
    Views:
    213
    Pager O Rama
    Apr 4, 2006
  4. Jacob Grover
    Replies:
    5
    Views:
    299
    Jacob Grover
    Jul 18, 2008
  5. Philipp Kraus

    style guide checker

    Philipp Kraus, Jan 1, 2013, in forum: C++
    Replies:
    3
    Views:
    208
    Bertwim
    Jan 5, 2013
Loading...

Share This Page