find certain strings in java files not inside comments

H

Hike Mike

I want to determine if there any of the following strings in a java
file that do not exist as part of a comment:

System.out.print
System.err.print
printStatckTrace

I'm totally new at Perl and I wrote the following script that get's
called from CVS when developers check in code (the script gets called
with a list of java files starting at $ARGV[1]. The idea is to not
allow any uncommented strings of the above type into the repository
unless they are inside comments.

It does not work for instances of a dis-allowed string that is on a
newline inside of a comment:

/* blah blah blah
System.out.println("foo");
*/

basically the problem is the following match code:


unless ($item =~ m/^((\/\/+)|(\/\*+)|(\*+))/) {
terminate($_, $stringToCheck);



thanks
------

#!/usr/bin/perl -w


use Socket;
use CGI qw:)standard escape);
use strict;


######### That's all you have to do! ###########
# #
# #
# You shouldn't need to edit anything below #
# here! #
################################################

my $sArg = "";
shift @ARGV;
foreach( @ARGV )
{
#print "received ARGV: $_\n";
if ($_ =~ m/java/) {
#print "received java file: $_\n";
$sArg .= $_ . " ";
}
}

chop( $sArg );

if ($sArg) {
print "CVS commitinfo is checking .java files for disallowed code. If
your commit fails, check for un-commented 'System.out.
print', 'System.err.print', or 'printStackTrace()' in: $sArg \n";
}

my @rgFiles = split(" ", $sArg);
my $SystemOut = "System.out.print";
my $SystemErr = "System.err.print";
my $StackTrace = "printStackTrace";
my @badFiles;
my @rawData;
my $counter = 0;
my @instances;
my $item;
foreach( @rgFiles )
{
check($_, $SystemOut);
check($_, $SystemErr);
checkStackTrace($_, $StackTrace);
}

sub check {
my($file, $stringToCheck) = @_;
open(FILE, $file) || return ("could not open file to verify on
commit: $file\n");
@rawData=<FILE>;
@instances = grep(/$stringToCheck/, @rawData);
#print "file " . $_ . " has " . @instances . " elements.\n";
foreach $item (@instances) {
$item = trim($item);
unless ($item =~ m/^((\/\/+)|(\/\*+)|(\*+))/) {
terminate($_, $stringToCheck);
}
}
#if (@instances > 0) {
#terminate($_, $stringToCheck);
#}
close(FILE);
}

sub checkStackTrace {
my($file, $stringToCheck) = @_;
open(FILE, $file) || return ("could not open file to verify on
commit: $file\n");
@rawData=<FILE>;
@instances = grep(/$stringToCheck\({1}\){1}/, @rawData);
#print "file " . $_ . " has " . @instances . " elements.\n";
foreach $item (@instances) {
$item = trim($item);
unless ($item =~ m/^((\/\/+)|(\/\*+)|(\*+))/) {
terminate($_, $stringToCheck);
}
}
close(FILE);
}


sub terminate {
my($file, $badString) = @_;
die "Aborting CVS commit because file $file contains the un-commented
code string: $badString\n";
}

sub trim {
my @out = @_;
for (@out) {
s/^\s+//;
s/\s+//;
}
return wantarray ? @out : $out[0];
}
 
A

A. Sinan Unur

I want to determine if there any of the following strings in a java
file that do not exist as part of a comment:

System.out.print
System.err.print
printStatckTrace

ITYM printStackTrace
I'm totally new at Perl and I wrote the following script that get's
called from CVS when developers check in code (the script gets called
with a list of java files starting at $ARGV[1]. The idea is to not
allow any uncommented strings of the above type into the repository
unless they are inside comments.

It does not work for instances of a dis-allowed string that is on a
newline inside of a comment:

/* blah blah blah
System.out.println("foo");
*/

Well, there is, of course, nothing that is stopping you from doing a lot
fo work here but, why not just run the file throught the C
pre-processor, and match on the strings you are looking for?
basically the problem is the following match code:


unless ($item =~ m/^((\/\/+)|(\/\*+)|(\*+))/) {

I am never ever going to read code like this.

unless($item =~ m{^((//+)|(/\*+)|(\*+))/) {

Anyway, what's up with all the capturing? What do you think this line of
code does?
use Socket;
use CGI qw:)standard escape);

How are these two modules even relevant to what you are doing?
use strict;
use warnings;

rather than perl -w
my $sArg = "";
shift @ARGV;
foreach( @ARGV )
{
#print "received ARGV: $_\n";
if ($_ =~ m/java/) {
#print "received java file: $_\n";
$sArg .= $_ . " ";
}
}

So, do you consider java.c to be a java file? I do not know why you
would do such a thing but the above can be reduced to (untested):

shift @ARGV;
my $sArg = join(' ', grep { /\.java$/ } @ARGV);
chop( $sArg );
Why?

if ($sArg) {
print "CVS commitinfo is checking .java files for disallowed code.
If
your commit fails, check for un-commented 'System.out.
print', 'System.err.print', or 'printStackTrace()' in: $sArg \n";
}

my @rgFiles = split(" ", $sArg);

OK, so you joined all the filenames together, now you are blinding
splitting on space. What if one of the elements of @ARGV contained a
path with a space.
my $SystemOut = "System.out.print";
my $SystemErr = "System.err.print";
my $StackTrace = "printStackTrace";

Do you not see how pointless it is to have n variables whose names are
the same as the values they hold?

my @bad = qw(System.out.print System.err.print printStackTrace);
my @badFiles;
my @rawData;
my $counter = 0;
my @instances;
my $item;

Declare your variables in the smallest applicable scope.
foreach( @rgFiles )
{
check($_, $SystemOut);
check($_, $SystemErr);
checkStackTrace($_, $StackTrace);
}

If the check finds an uncommented 'System.out.print', do you need to
check the other two?


Sinan
 
A

A. Sinan Unur

....

Why not take a stab at stripping comments (and I dare say: quoted
strings) from a copy of $item first? The following will (sort of) do
this, but wont cope with unbalanced quotes in comments or comments
inside quotes - or "\" escaped quotes. But should get you a step or
two closer.

$data = $item;
$data =~ s{//.*?\n}{\n}sg;
$data =~ s{/\*.*?\*/}{\n}sg;
$data =~ s{".*?"}{""}sg;
$data =~ s{'.*?'}{''}sg;
# now check $data

There's probably a slicker way.

You are expected to check the FAQ before posting:

perldoc -q comment

FYI, your signature delimeter is still incorrect. It should be two
dashes, followed by a space, followed by a newline.

Sinan
 
H

Hike Mike

these are interesting. Let me see if I understand you. I have
included some questions as well.
$data =~ s{//.*?\n}{\n}sg;
replace comments that begin with // on a single line with a newline.
Why do I need to include '\n' in '.' if these comments do not extend
beyond a single line (s modifier)?
$data =~ s{/\*.*?\*/}{\n}sg;
replace comments inside of \* */, that extend beyond a single line (s
modifier), with a single newline.

I think that covers all java comments. what is the purpose of the
following:

$data =~ s{".*?"}{""}sg;
$data =~ s{'.*?'}{''}sg;

and also what is the difference between the two (single vs. double
quotes)?
but wont cope with unbalanced quotes in comments or comments
inside quotes - or "\" escaped quotes. But should get you a step or two
closer.

doesn't '.*?' cover all characters including " and \?
 
H

Hike Mike

Who are you talking to? Please quote some context when you reply.

I did include quotes. Do you see your quote above?

Have you read
perldoc -q comment

I have now. thanks.
 
H

Hike Mike

I want to determine if there any of the following strings in a java
ITYM printStackTrace

I didn't understand this from your first reply

Anyway perldoc -q comment gave me the substitution that removes c++
style quotes (same as Java). Here is the new script. Pointers welcome
(not memory locations).

BTW, This script is called by CVS commitinfo with a list of file names
being committed. The script strips comments out and checks for
unwanted code.

-----
#!/usr/bin/perl
use warnings;
use strict;


shift @ARGV;
my @rgFiles = grep { /\.java$/ } @ARGV;

if (@rgFiles) {
print "CVS commitinfo is checking .java files for disallowed code. If
your commit fails, check for un-commented 'System.out.print',
'System.err.print', or 'printStackTrace' in: @rgFiles \n";
} else {
exit 0;
}

my @bad = qw(System.out.print System.err.print printStackTrace);


foreach( @rgFiles )
{
my $cvsFile = $_;
foreach( @bad )
{
my $badWord = $_;
#print "checking $cvsFile for $badWord\n";
check($cvsFile, $badWord);
}
}

sub check {
my($file, $stringToCheck) = @_;
open(FILE, $file) || return ("could not open file to verify on
commit: $file\n");
$/ = undef;
$_ = <FILE>;

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined
$2 ? $2 : ""#gse;
my @rawData = $_;
my $numOfInstances = grep(/$stringToCheck/, @rawData);
unless($numOfInstances == 0) {
close(FILE);
terminate($file, $stringToCheck);
}

close(FILE);
}


sub terminate {
my($file, $badString) = @_;
die "Aborting CVS commit because file $file contains the un-commented
code string: $badString\n";
}

sub trim {
my @out = @_;
for (@out) {
s/^\s+//;
s/\s+//;
}
return wantarray ? @out : $out[0];
}
 
A

A. Sinan Unur

I didn't understand this from your first reply

I think you meant (ITYM) printStackTrace.
Anyway perldoc -q comment gave me the substitution that removes c++
style quotes (same as Java). Here is the new script. Pointers
welcome (not memory locations).

Your life would still be easier if you ran the file through the a C or
C++ preprocessor first.
my @bad = qw(System.out.print System.err.print printStackTrace);


foreach( @rgFiles )
{
my $cvsFile = $_;

for my $cvsFile (@argFiles) {
foreach( @bad )
{
my $badWord = $_;

for my $word (@bad) {
#print "checking $cvsFile for $badWord\n";
check($cvsFile, $badWord);
}
}

A better design would be for check to indicate if a bad word was found,
and then to skip to the next file if it was.

As it is, you are opening, slurping, substituting three times for each
file. Obviously wasteful.
sub check {
my($file, $stringToCheck) = @_;
open(FILE, $file) || return ("could not open file to verify on
commit: $file\n");
$/ = undef;

local $/;
$_ = <FILE>;

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\]) *
'|.[^/"'\\]*)#defined $2 ? $2 : ""#gse;

THis is the one place where a comment about the origin of this piece of
code would be helpful to your reader who will be wasting a lot of time
trying to figure out where the heck this pattern came from.

Again, it would be much better to run the file through a C/C++
preprocessor.
my @rawData = $_;

This is utter nonsense. You read the file into the *scalar* $_ which now
contains all of the lines of the file as a single string. Then, you
create an array with only one member.
my $numOfInstances = grep(/$stringToCheck/, @rawData);

You don't need to find all instances, all you need is one:

perldoc -f index
sub trim {
my @out = @_;
for (@out) {
s/^\s+//;
s/\s+//;
}
return wantarray ? @out : $out[0];
}

What is the use of this?

Sinan
 
A

A. Sinan Unur

Here is how I would have attempted to solve this. Note that the code
below is untested. If necessary, you can change the pre-processing to
use the pattern substitution given in the FAQ:

#!/usr/bin/perl

use strict;
use warnings;

my @files = grep { /\.java$/ } @ARGV;

if( @files ) {
print <<MSG;
CVS commitinfo is checking .java files for disallowed code. If
your commit fails, check for un-commented 'System.out.print',
'System.err.print', or 'printStackTrace' in: @files.

MSG
} else {
exit 0;
}

use constant BAD_WORDS => qw(
System.out.print
System.err.print
printStackTrace
);

for my $file (@files) {
my $content;

eval {
$content = pre_process( $file );
};

if( $@ ) {
warn $@;
next;
}

if( my $bad = contains_bad_words($content, BAD_WORDS) ) {
die <<MSG;
Aborting CVS commit because file $file contains the
un-commented code string: $bad.

MSG
}
}

sub pre_process {
my ($file) = @_;

open my $in, '-|', "cpp -x c++ $file"
or die "Cannot open pipe to cpp for reading: $!";
my $content = do { local $/; <$in> };
close $in or die "Cannot close pipe to cpp: $!";
return $content;
}

sub contains_bad_words {
my ($content, @bad_words) = @_;
for my $word ( @bad_words ) {
return $word if -1 < index $content, $word;
}
return;
}

__END__

D:\Home\asu1\UseNet\clpmisc> cat Test.java
public class Test {
public static void main(String args[]) {
System.out.println("Hello World\n");
}
}

D:\Home\asu1\UseNet\clpmisc> cat Test2.java
public class Test {
public static void main(String args[]) {
// System.out.println("Hello World\n");
}
}

D:\Home\asu1\UseNet\clpmisc> javafilter.pl Test2.java Test.java
CVS commitinfo is checking .java files for disallowed code. If
your commit fails, check for un-commented 'System.out.print',
'System.err.print', or 'printStackTrace' in: Test2.java Test.java.

Aborting CVS commit because file Test.java contains the
un-commented code string: System.out.print.
 
H

Hike Mike

Here is how I would have attempted to solve this. Note that the code
below is untested. If necessary, you can change the pre-processing to
use the pattern substitution given in the FAQ:

I want to use the pattern substitution of the cpp pipe fails so i
tested your code on a windows system. Maybe this is waste of time but
I saw something I didn't understand.

sub pre_process {
my ($file) = @_;
my $result = open (my $in, '-|', "cpp -x c++ $file") or die "Cannot
open pipe to cpp for reading: $!";
print "result: $result\n";
my $content = do { local $/; <$in> };
close $in or die "Cannot close pipe to cpp: $!";
return $content;

}

the open call returns the integer '1264' and not 'undef' on windows
where cpp does not exist. The program then dies on the close with the
following output:

__BEGIN__

D:\projects\edge\workspace\CVSROOT>.\javaFilter.pl foo
LogFormatter.java
CVS commitinfo is checking .java files for disallowed code. If
your commit fails, check for un-commented 'System.out.print',
'System.err.print', or 'printStackTrace' in: LogFormatter.java.

result: 2716
'cpp' is not recognized as an internal or external command,
operable program or batch file.
Cannot close pipe to cpp: at
D:\projects\edge\workspace\CVSROOT\javaFilter.pl l
ine 40.

__END__
 
A

A. Sinan Unur

I want to use the pattern substitution of the cpp pipe fails so i
tested your code on a windows system. Maybe this is waste of time but
I saw something I didn't understand.

sub pre_process {
my ($file) = @_;
my $result = open (my $in, '-|', "cpp -x c++ $file") or die "Cannot
open pipe to cpp for reading: $!";
print "result: $result\n";
my $content = do { local $/; <$in> };
close $in or die "Cannot close pipe to cpp: $!";
return $content;

}

the open call returns the integer '1264' and not 'undef' on windows
where cpp does not exist. The program then dies on the close with the
following output:

I had forgotten about this. Basically, the pipe open will fail only if
perl cannot fork. Of course, the exec might fail after the fork, but the
return value of open will not indicate this.

This is indeed discussed in

perldoc -f open

as well as in "Using open() for IPC" in perldoc perlipc.

See also perldoc -q STDERR

A kludge is to do the following:

#!/usr/bin/perl

use strict;
use warnings;

pre_process();

sub pre_process {
my ($file) = @_;
my $result = open (my $in, '-|', "nonexistent")
or die "Cannot open pipe to nonexistent for reading: $!";
if($result =~ /^\d+$/) {
die "Invalid pipe";
}
print "result: $result\n";
my $content = do { local $/; <$in> };
close $in or die "Cannot close pipe to cpp: $!";
return $content;

}

__END__

Sorry, can't be of more help here.

Sinan
 
H

Hike Mike

Sorry, can't be of more help here.

not a problem. the script works fine on linux where the CVS repository
is located (with the cpp pre-processor):
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,270
Latest member
TopCryptoTwitterChannels_

Latest Threads

Top