Any Checkstyle users?

S

slippymississippi

I'm curious, I was looking at using Checkstyle on my projects. For
some reason, it complains when I use hashmaps. What? Why wouldn't I
want to use hashmaps in my code? Heaven forbid I use one of the most
useful data structures in all of Java.
 
M

mikm

What problem does it report? I just installed the Checkstyle plugin
for Eclipse and it reported no problem when I created an new HashMap
aside from the "magic number" complaint in the constructor.

The "magic number" problem means that you should try to use constants
instead of using some number whenever possible.
 
M

mikm

Dah. I should clarify myself. For example, imagine you have a Hotel
class and want to keep track of your guests

import java.util.HashMap;
public class Hotel
{ private HashMap guests;
//...
public Hotel()
{ guests = new HashMap(50);
}
//...
}

In this example "50" is ambiguous. Why 50?

A slightly better way would be

public class Hotel
{ private HashMap guests;
private static final int MAX_GUESTS = 50;
//...
public Hotel()
{ guests = new HashMap(MAX_GUESTS);
}
//...
}

Now, anybody looking at your code knows that the maximum number of
guests is 50 and that the HashMap "guests" can hold that much data.
 
T

Tony Morris

mikm said:
Dah. I should clarify myself. For example, imagine you have a Hotel
class and want to keep track of your guests

import java.util.HashMap;
public class Hotel
{ private HashMap guests;

You should be using a Map reference.
All aside from the false assumption that HashMap is indeed useful.
 
R

Roedy Green

You should be using a Map reference.
All aside from the false assumption that HashMap is indeed useful.

Does that buy you anything other than it prevents you write locking
yourself into ArrayList by using non-Map methods?

The penalty is you are using slower interface references rather that
direct class ones which require a much more convoluted scheme to find
the appropriate method on each call.

Another penalty is you increase the complexity of the declare and I
figure make it less readable, at least to newbies.
 
T

Tony Morris

Roedy Green said:
Does that buy you anything other than it prevents you write locking
yourself into ArrayList by using non-Map methods?

The penalty is you are using slower interface references rather that
direct class ones which require a much more convoluted scheme to find
the appropriate method on each call.

Another penalty is you increase the complexity of the declare and I
figure make it less readable, at least to newbies.

Separating contract from implementation is something that Java has failed
miserably at, while pretending to do otherwise.
The advantages to a correct separation are widely misunderstood and not even
addressed in some of the most prominent literature.
The argument regarding the "performance hit" for lookup of virtual calls is
fall of nothing more than air - I'm sure (at least, I hope) you'll agree. In
fact, it has nothing at all to do with interfaces, and everything to do with
the level of depth in "virtuality" of the lookup, which class types can also
"suffer" from.

"Newbies" very rarely suffer from the apparant complexity because they have
no deeply held notions about how things "should" look, in contrast to the
many contrived ideas by "non-newbies". At least, this is my observation
according to experimentation - I can explain a somewhat trivial concept to a
vulnerable mind, such as my first year students, and I will be asked all the
right questions - to the junk-filled mind, I typically hear religious
refutation and 'reiteration of the guff', which is strongly guarded by ego.
 
D

Daniel Dyer

Does that buy you anything other than it prevents you write locking
yourself into ArrayList by using non-Map methods?

Not sure what you mean by "write locking yourself into ArrayList". I
would have thought the main advantages are that you can switch
implementations (for example change from HashMap to TreeMap) without
having to change any other code. The corrollary of that is that it
prevents your logic from becoming polluted by details of the current
implementation.

I'm interested in Tony's point about HashMap not being useful. What do
you mean by this?
The penalty is you are using slower interface references rather that
direct class ones which require a much more convoluted scheme to find
the appropriate method on each call.

You would have to have some convincing evidence of a significant
performance hit before I would take that into consideration. This is
another one of those micro-optimisations that is best left to the compiler
or VM.

Dan.
 
C

Chris Uppal

mikm said:
{ private HashMap guests;
private static final int MAX_GUESTS = 50;
//...
public Hotel()
{ guests = new HashMap(MAX_GUESTS);
[...]
Now, anybody looking at your code knows that the maximum number of
guests is 50 and that the HashMap "guests" can hold that much data.

Except that that isn't true. The name should be something like
private static final int INITIAL_ESTIMATE_OF_PROBABLE_MAX_GUESTS = 50;
which is unwieldy in the extreme, but calling it MAX_GUESTS is actively wrong
and so must be avoided.

The real problem here is not the use of a "magic number", but the use of a
checking tool which will blindly apply the same so-called rule to every
situation, regardless of whether it is applicable. If Checkstyle doesn't
provide a way to annotate your code to say "I have reviewed this 'problem' code
and it is fine as it is", then I'd suggest dropping Checkstyle.

-- chris
 
R

Roedy Green

I'm sure (at least, I hope) you'll agree. In
fact, it has nothing at all to do with interfaces, and everything to do with
the level of depth in "virtuality" of the lookup, which class types can also
"suffer" from.

Not at all. Calls to methods via interfaces are much trickier than
calls to virtual instance methods.. For classes, the method lives at
a fixed offset in a vtbl for the calls. Bjarne Stroustrup, the father
of C++, invented the vtbl. I am impressed all to heck since I tried
for years to invent it on my own for my own language, coming up with
nothing nearly as elegant. All you need is the method number and the
vtbl for the class this object ACTUALLY is and away you go , not that
much worse that a call to a static method.

But for a call to a method via an interface reference the method is at
a different offset in every implementing method. Various goofy
schemes are used from linearly scanning for it on every call, caching
the last hit or two, miniature HashMaps where you look up the
interface to get the offset of the method to implement each call...

I understand that much research has gone into efficiently implementing
calls to methods via interface references and the penalties are not
nearly so severe as in past.

People use interface references in preference to class references, to
the point making up dummy interfaces that have only one implementor
and always will. I think you should only use interface references
when there is a likely benefit.
 
R

Roedy Green

"write locking yourself into ArrayList"

You make it difficult to change to some other List implementation. If
you use an explicit ArrayList reference, you can inadvertently use
methods defined only for ArrayList but not for other Lists.

Using an List reference, forces you write vanilla code where you can
change the collection implementation later easily.
 
R

Rick Giles

Chris said:
mikm said:
{ private HashMap guests;
private static final int MAX_GUESTS = 50;
//...
public Hotel()
{ guests = new HashMap(MAX_GUESTS);
[...]
Now, anybody looking at your code knows that the maximum number of
guests is 50 and that the HashMap "guests" can hold that much data.

Except that that isn't true. The name should be something like
private static final int INITIAL_ESTIMATE_OF_PROBABLE_MAX_GUESTS = 50;
which is unwieldy in the extreme, but calling it MAX_GUESTS is actively wrong
and so must be avoided.

The real problem here is not the use of a "magic number", but the use of a
checking tool which will blindly apply the same so-called rule to every
situation, regardless of whether it is applicable. If Checkstyle doesn't
provide a way to annotate your code to say "I have reviewed this 'problem' code
and it is fine as it is", then I'd suggest dropping Checkstyle.
Yes, it does; see SuppressionCommentFilter at
http://checkstyle.sourceforge.net/config.html

/RG
 
S

slippymississippi

What problem does it report?

I tried turning on all of the optional checks, and it returned the
IllegalType violation. The description is:

"Checks that particular class are never used as types in variable
declarations, return values or parameters. Includes a pattern check
that by default disallows abstract classes. Rationale: Helps reduce
coupling on concrete classes."
 
S

slippymississippi

BTW, it appears (from more recent posts in the thread) that the issue wasn't
the use of a magic number at all, but of a variable of type HashMap.

Not just HashMap:

"java.util.GregorianCalendar, java.util.Hashtable, java.util.HashSet,
java.util.HashMap, java.util.ArrayList, java.util.LinkedList,
java.util.LinkedHashMap, java.util.LinkedHashSet, java.util.TreeSet,
java.util.TreeMap, java.util.Vector"

I suppose the check is pushing me to pass/return custom data objects
that implement Java-specific data structures behind the scenes? I
can't see how you would write meaningful code without eventually using
one of the listed data structures, but I can see vaguely how it would
be beneficial to wrapper the implementation of these data structures.
 
R

Roedy Green

Except that that isn't true. The name should be something like
private static final int INITIAL_ESTIMATE_OF_PROBABLE_MAX_GUESTS = 50;
which is unwieldy in the extreme, but calling it MAX_GUESTS is actively wrong
and so must be avoided.

How let us say the capacity factor is .75 and you set the size at
100. How big is the array inside 100 or 133? Not knowing that could
give you poor performance with the beasts.

IIRC, I have it documented but I don't recall the answer. See
http://mindprod.com/jgloss/hashmap.html
http://mindprod.com/jgloss/hashtable.html

These things need a fair bit of slop to work. Putting 100 guests in
100 slots is like asking a ballerina to perform in a telephone booth.
 
D

Daniel Dyer

Not just HashMap:

"java.util.GregorianCalendar, java.util.Hashtable, java.util.HashSet,
java.util.HashMap, java.util.ArrayList, java.util.LinkedList,
java.util.LinkedHashMap, java.util.LinkedHashSet, java.util.TreeSet,
java.util.TreeMap, java.util.Vector"

I suppose the check is pushing me to pass/return custom data objects
that implement Java-specific data structures behind the scenes? I
can't see how you would write meaningful code without eventually using
one of the listed data structures, but I can see vaguely how it would
be beneficial to wrapper the implementation of these data structures.

It seems to be advocating that you return java.util.Calendar,
java.util.Map, java.util.Set, java.util.List, etc. In other words, return
the interface type rather than the concrete implementation. No need to
write any wrappers.

Dan.
 
C

Chris Uppal

Not just HashMap:

"java.util.GregorianCalendar, java.util.Hashtable, java.util.HashSet,
java.util.HashMap, java.util.ArrayList, java.util.LinkedList,
java.util.LinkedHashMap, java.util.LinkedHashSet, java.util.TreeSet,
java.util.TreeMap, java.util.Vector"

I suppose the check is pushing me to pass/return custom data objects
that implement Java-specific data structures behind the scenes? I
can't see how you would write meaningful code without eventually using
one of the listed data structures, but I can see vaguely how it would
be beneficial to wrapper the implementation of these data structures.

I think you may be reading too much into what Checkstyle is telling you.

Forgetting about Checkstyle for the moment, there are at least three different
aspects to "good" code that you touch on here. I think it would help to
separate them out from each other.

One is that when you use, say, a HashMap in some private bit of code, how do
you declare the variable that holds the reference. Do you write:
Map map = new HashMap();
or:
HashMap map = new HashMap();
The most popular school of thought is that the first form is to be preferred,
but there are dissenting opinions too. My own opinion is that it doesn't make
much difference in this case, since -- by hypothesis -- we are talking about
private code. If the scope of the variable is restricted, then so is the
"damage" that can possibly be caused by declaring it too narrowly. In short it
doesn't lead to hard-to-manage coupling or brittle code since nothing much can
depend on it.

The second is to ask the same question about externally visible fields and
APIs. For instance the return type of a public or protected method. In /this/
case I think the standard answer is completely correct, and that -- unless you
have some special requirement -- the declared type should be as general as
possible (ideally an interface).

The last issue is the use of "bare" collections in APIs at all. Is it "good"
to export an API which takes a List<MyObject>, or which returns a Map(String,
MyOtherObject>) ? My answer is that its not intrinsically wrong, and is quite
often exactly the right thing to do, but that you should stop and think
whenever you find yourself doing it. If you find yourself doing it much, or if
you find yourself expressing complicated structures as collections (in the
API), then you have quite probably gone too far, and have something of a code
mess, where the responsibility for managing and interpreting a particular raw
structure is scattered all over the code-base, rather than nicely centralised.
For instance, I wouldn't blink at:
List<MyObject>someMethod()
I would feel mildly uneasy about
Map<String, MyObject>someMethod()
and if I came across
List<List<MyObject>>
then my alarm bells would be going off like sky-rockets.

Coming back to Checkstyle. I don't know whether it distinguishes between my
first and second issues, but I think it is trying to warn you about one or the
other. I don't see an easy way that a tool like Checkstyle could provide much
help with the third issue since the cut-off between valid uses of raw
collections, and culpable failure to create a true object, is not easy to
recognise without a true understanding of the code. I suppose some sort of
heuristic could be defined, but I'd not expect it to be much use.

-- chris
 
C

Chris Uppal

Roedy Green wrote:

[me:]
How let us say the capacity factor is .75 and you set the size at
100. How big is the array inside 100 or 133? Not knowing that could
give you poor performance with the beasts.

Good question! I had assumed that the capacity was the number of
actual entries expected -- regardless of load factor. But it seems
that the code intereprets it as the size of the array to allocate, with
the load factor determining how many of the occupied slots are allowed
before it grows itself.

In point of fact, the implementation rounds the capacity up to a power
of two.

Anyway, none of that's clear from the documentation -- which is
unforgivable. I don't care for the choice they've made for the
interpretation of "capacity" (in fact I think it's stupid), but leaving
the matter open to guesswork is <pauses, and chooses a less loaded
word> regrettable.

That being the case, my recommended identifier name is also wrong. I
leave it to the reader to find a managable identifier that expresses
the concept
Initial estimate of the probable maximum number of
expected guests, adjusted to take account of the (undocumented)
relationship between the "load factor" and "capacity" of a
java.util.HashMap.

Personally, I find that just
50
is as clear as anything ;-)

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top