FAQ 6.11 How do I use a regular expression to strip C style comments from a file?

Discussion in 'Perl Misc' started by PerlFAQ Server, Feb 10, 2011.

  1. This is an excerpt from the latest version perlfaq6.pod, which
    comes with the standard Perl distribution. These postings aim to
    reduce the number of repeated questions as well as allow the community
    to review and update the answers. The latest version of the complete
    perlfaq is at http://faq.perl.org .

    --------------------------------------------------------------------

    6.11: How do I use a regular expression to strip C style comments from a file?

    While this actually can be done, it's much harder than you'd think. For
    example, this one-liner

    perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c

    will work in many but not all cases. You see, it's too simple-minded for
    certain kinds of C programs, in particular, those with what appear to be
    comments in quoted strings. For that, you'd need something like this,
    created by Jeffrey Friedl and later modified by Fred Curtis.

    $/ = undef;
    $_ = <>;
    s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;
    print;

    This could, of course, be more legibly written with the "/x" modifier,
    adding whitespace and comments. Here it is expanded, courtesy of Fred
    Curtis.

    s{
    /\* ## Start of /* ... */ comment
    [^*]*\*+ ## Non-* followed by 1-or-more *'s
    (
    [^/*][^*]*\*+
    )* ## 0-or-more things which don't start with /
    ## but do end with '*'
    / ## End of /* ... */ comment

    | ## OR various things which aren't comments:

    (
    " ## Start of " ... " string
    (
    \\. ## Escaped char
    | ## OR
    [^"\\] ## Non "\
    )*
    " ## End of " ... " string

    | ## OR

    ' ## Start of ' ... ' string
    (
    \\. ## Escaped char
    | ## OR
    [^'\\] ## Non '\
    )*
    ' ## End of ' ... ' string

    | ## OR

    . ## Anything other char
    [^/"'\\]* ## Chars which doesn't start a comment, string or escape
    )
    }{defined $2 ? $2 : ""}gxse;

    A slight modification also removes C++ comments, possibly spanning
    multiple lines using a continuation character:

    s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;



    --------------------------------------------------------------------

    The perlfaq-workers, a group of volunteers, maintain the perlfaq. They
    are not necessarily experts in every domain where Perl might show up,
    so please include as much information as possible and relevant in any
    corrections. The perlfaq-workers also don't have access to every
    operating system or platform, so please include relevant details for
    corrections to examples that do not work on particular platforms.
    Working code is greatly appreciated.

    If you'd like to help maintain the perlfaq, see the details in
    perlfaq.pod.
     
    PerlFAQ Server, Feb 10, 2011
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,398
  2. KiwiBrian

    Strip all comments

    KiwiBrian, Jul 22, 2004, in forum: HTML
    Replies:
    6
    Views:
    686
    Toby Inkster
    Jul 23, 2004
  3. Aquila
    Replies:
    35
    Views:
    508
    Mathieu Bouchard
    Mar 31, 2005
  4. yelipolok
    Replies:
    4
    Views:
    293
    John W. Krahn
    Jan 27, 2010
  5. PerlFAQ Server
    Replies:
    0
    Views:
    141
    PerlFAQ Server
    Mar 30, 2011
Loading...

Share This Page