Eric said:
I apologized, and still apologize, for the tone, as I said.
And I stand by the content: Your arguments are nonsense
Apologies are not likely to be taken seriously when you follow them up
by immediately re-committing the same act for which you were asked to
apologize.
When my kids apologize after I catch them with their hands in the cookie
jar, then ten minutes later I catch them with their hands in the same
jar again, I ground the little rotters for a week!
founded on the numerological superstitions
Your unwanted, off-topic, and hostile speculations about me are 100%
wrong. If you'd bothered to actually get to know me before passing
judgment you'd have discovered that I'm one of the least superstitious
people in existence.
My remarks about some numbers occurring, in practice, more commonly than
others are based on statistics and experience, not on Kabbalah or the
Revelation of John* or whatever other nonsense you seem to incorrectly
think to have been the basis.
* This is apparently its correct name, not "Revelations".
Evidence? We don' need no steenkin' evidence!
Experience. Take a look at the integer constants defined in your own
code sometime. You'll find some sequential runs, but anywhere you have
bit fields, initial sizes/capacities, buffer lengths, and whatnot you'll
find a preponderance of even numbers, and often larger powers of two,
numbers divisible by large powers of two (1920, in a HD display pixel
width; 2560 in some other setting; 256 in a file-header size) or by
powers of ten (so, two and five) (private static final int INITIAL_SIZE
= 100, and so forth).
The numbers that come up in actual practice are not uniformly
distributed; they statistically clump near some values, especially zero,
and prefer to have many factors and especially powers of two.
This in turn affects the distribution of any naive hash function of
same, making that clumpy.
A clumpy hash function remains somewhat clumpy under simpler
bit-twiddling massaging and may result in an elevated rate of collisions
when placed in any array of hash buckets smaller than 2147483647 or so.
There is nothing whatsoever "numerological" about this. It is
statistics, experience, and common bleeding sense.
Tou clearly know very little (not an insult, but a fact
subject to objective verification).
About Sun's licensing? My concern was more towards learning the language
itself. I do recall some big scary license document about not sharing
copies of the JDK's contents willy-nilly, which seemed silly since Sun
apparently lets any Tom, Dick, and Harry download it from their servers,
but legalities are legalities, however silly they might seem to a
non-lawyer such as myself.
Have you ever installed Sun's JDK?
Of course. Does anyone post here that hasn't? Aside from the damn
spammers, of course.
What's funny? Either Sun will sue, in which case there's
nothing funny about it, or you're just being silly,
which is funny.
Yes, the tone of this paragraph is antagonistic, but IMHO you
are begging for it.
Wrong again. I have learned something useful, however: you have no sense
of humor. So I won't bother trying to be funny here again, lest I be
insulted by you again for my efforts. Even though there remains at least
the theoretical possibility that one or more other people here might
actually appreciate my humor. I guess because of you they get to lose
out. Well, at least now they know who to blame for that.
*No* function of non-random data can eliminate the non-
randomness.
That's not the point. It's the non-uniformity, or clumpiness, that bears
smoothing out, since clumps are likely to be assigned to a relatively
small subset of hash buckets and thus have relatively many bucket
collisions internally. (Remember that even non-colliding hashCode values
may result in bucket collisions when placed in an actual hash table of
any size smaller than 2^32 -- so, probably, any size at all, since it's
likely the limit is half that. The risk grows the smaller the actual
hash table. The risk of avoidable collisions in clumps grows likewise.)
No, not even the multiplication by phi that catches your fancy.
That multiplication reduces clumpiness in the output to a minimum, given
a certain clumpiness of the input, short of using a cryptographically
secure hash that would probably be significantly slower to compute. I've
already given a sketch of the mathematical explanation for why.
"Would work better?" Sort of depends on (1) the internals
of the hash implementation, of which you profess ignorance,
Unless it's being massaged by a similar multiplication, or fed through a
cryptographic routine, it's doubtful that clumpiness of the input
doesn't elevate the rate of bucket collisions.
and (2) the distribution of the inputs, for which you offer only
unsupported claims that you "stand by, nonetheless." Evidence?
We don' need no steenkin' evidence!
They are not unsupported. They should honestly be self-evident to anyone
with much experience working with numbers in any kind of actual applied
context. The numbers we actually use are not uniformly distributed.
Randomly pick an integer from those you've used today and odds are good
it will be much smaller than 2147483647, fairly good that it will be
positive, fairly good that it will be even, and fairly good that it will
be either single-digit or have relatively many factors.
You seem to be suggesting that you take seriously the notion that an
integer that actually arises in practice is equally likely to be any of
the 4294967296 possible int values in Java.
Since your hostility towards my claims does not make any sense otherwise.
Essentially, you say that my claim that an integer that actually arises
in practice is *not* equally likely to be any of the 4294967296 possible
int values in Java is "unsupported" and therefore we should not believe it.
I say that you are being ridiculously anal to demand some high standard
of evidence for what, begging your pardon, is the bleeding obvious!
So your fears of overloading the even-numbered buckets (if
we're to believe your no-steenkin'-evidence claim about the
prevalence of even numbers) are diminished by a factor of four
at the least, right?
That bit-twiddling will make the clumpy distribution different and
somewhat less clumpy, but it will not eliminate the clumpiness, or even
the tendency for many of the bits to be zeros (many more than half of
them). These will have their (statistical) effects. Those effects won't
be as bad as in the worst-case, but won't be zero either.
I'm sorry, but I'm unable to extract sense from this
paragraph.
I'm sorry, but you're unable to extract sense period, as near as I have
been able to determine.
Seriously. If you honestly think there's a snowball's chance in hell
that all 4294967296 int values actually occur with equal frequency in
typical production software, then you have no business posting to this
newsgroup and probably have no business being put in charge of designing
any tool more important than a Nerf-branded one.
You seem to be saying that List and StringBuilder and Dimension
should not have .equals() methods
More that Dimension should be immutable, mutable-List and StringBuilder
equals should be ==, and there should be an immutable List with the
present List.equals for its equals.
Equality with X, and hash code, should really be lifetime-constant
properties of a thing, after all, for mathematical reasons and to get
rid of the mutable-keys-in-a-map-can-screw-it-up problem.
Perhaps when I'm older and wiser I'll understand you, but for the
moment the logic of your contention eludes me.
That's okay. I am routinely in the position of not being understood by
those younger and less wise than I -- I have kids. I've had worse lip
from them, too, than I've so far received from you.
