language analysis to enforce code standards

J

Jason S. Friedman

Hello, I administer the Informatica ETL tool at my company. Part of
that role involves creating and enforcing standards. I want the
Informatica developers to add comments to certain key objects and I want
to be able to verify (in an automated fashion) that they have done so.

I cannot merely check for non-emptiness; that is trivial to circumvent.
On the other hand, I probably do not need to be able to catch
developers who are determined to not create comments. There are not too
many of them and perhaps they will find it is easier to write a (useful)
comment than to game the system.

Any thoughts on how I might proceed? Stated plainly, how can I tell
when a string more-or-less forms at least one phrase?
 
S

Steven D'Aprano

Hello, I administer the Informatica ETL tool at my company. Part of
that role involves creating and enforcing standards. I want the
Informatica developers to add comments to certain key objects and I want
to be able to verify (in an automated fashion) that they have done so.

I cannot merely check for non-emptiness; that is trivial to circumvent.
On the other hand, I probably do not need to be able to catch
developers who are determined to not create comments. There are not too
many of them and perhaps they will find it is easier to write a (useful)
comment than to game the system.

Any thoughts on how I might proceed? Stated plainly, how can I tell
when a string more-or-less forms at least one phrase?

Define "phrase".


if len(s) > 0:
print "at least one character"
if len(s.split()) > 0:
print "at least one word"
if len(s.split('\n') > 0:
print "at least one line"
 
B

Bruce C. Baker

Jason S. Friedman said:
Hello, I administer the Informatica ETL tool at my company. Part of that
role involves creating and enforcing standards. I want the Informatica
developers to add comments to certain key objects and I want to be able to
verify (in an automated fashion) that they have done so.

I cannot merely check for non-emptiness; that is trivial to circumvent. On
the other hand, I probably do not need to be able to catch developers who
are determined to not create comments. There are not too many of them and
perhaps they will find it is easier to write a (useful) comment than to
game the system.

Any thoughts on how I might proceed? Stated plainly, how can I tell when
a string more-or-less forms at least one phrase?

Well, you *could* try analyzing the comment text using NLTK:
http://www.nltk.org/
 
J

Jean-Michel Pichavant

Steven said:
Define "phrase".


if len(s) > 0:
print "at least one character"
if len(s.split()) > 0:
print "at least one word"
if len(s.split('\n') > 0:
print "at least one line"
You could also verify there are at least N different characters used in
the sentence:

N = 5 # must contains at least 5 different characters
record = []
for c in s:
if c not in record:
record += [c]
if len(record) >= N:
print "at least %s different characters" % N


Jean-Michel
 
P

Peter Otten

Jason said:
Hello, I administer the Informatica ETL tool at my company. Part of
that role involves creating and enforcing standards. I want the
Informatica developers to add comments to certain key objects and I want
to be able to verify (in an automated fashion) that they have done so.

I cannot merely check for non-emptiness; that is trivial to circumvent.
On the other hand, I probably do not need to be able to catch
developers who are determined to not create comments. There are not too
many of them and perhaps they will find it is easier to write a (useful)
comment than to game the system.

Any thoughts on how I might proceed? Stated plainly, how can I tell
when a string more-or-less forms at least one phrase?

Don't be a fool. Have someone other than the author read the comment.

Peter
 
A

Aahz

You could also verify there are at least N different characters used in
the sentence:

N = 5 # must contains at least 5 different characters
record = []
for c in s:
if c not in record:
record += [c]
if len(record) >= N:
print "at least %s different characters" % N

Much simpler and *way* more efficient with a set:

if len(set(s)) < N:
print "Must have at least %s different characters" % N
 
T

Tim Rowe

2009/7/10 Peter Otten said:
Don't be a fool. Have someone other than the author read the comment.

That's the winning answer as far as I'm concerned. Automated tools are
good for picking up some types of accidental mistakes, but for
checking that comments are meaningful (and variable names, for that
matter) you can't do without peer review.

Think about it. What's the purpose of "enforcing standards". Just a
tick in some assurance box to say "we meet these standards"? Ot to
ensure something about the product quality?

No automated tool -- not for a while yet, anyway -- is going to pick
up comments such as:

# increment x
x += 1

or

# You are not expected to understand this.

The former is the sort of thing that any programmer might produce when
against a deadline and forced to comment their code. The latter is a
classic from a programming guru of old. An automatic checker that just
checks that the comment exists without understanding its contents
simply is not adding value but is rather petty bureaucracy that will
annoy the programmers.
 
G

greg

Aahz said:
Much simpler and *way* more efficient with a set:

if len(set(s)) < N:
print "Must have at least %s different characters" % N

Or you could do a dictionary lookup on the words.
I can just see the error message:

"Your comment must include at least one verb,
one noun, one non-cliched adjective and one
Monty Python reference."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top