M
Martin Gregorie
There are some gems in this list, particularly the last one:
http://www.devtopics.com/21-laws-of-computer-programming/
There are some gems in this list, particularly the last one:
I must admit that I don't really get this code. What should be the
initial value of x? Obviously this is a typo, as x isn't even
initialized and this won't compile.
However, if x is initialized to hash
then x = x * 53 + o.hashCode() doesn't make any sense to me (because
hash==0). What am I missing here?
I learned '31' as the coefficient to use for 32-bit hashes, though,
back in my Knuth-reading days. Is '53' better?
No, it's still O(1). The depth of the nesting is specific to and constant
for a given program. For a depth of nesting k = 6 and a number of (ultimate)
objects to find n, the algorithm will require k * 1 lookups to find a
value, giving O(k * 1) => O( 6 * 1) (as k is a constant) which is O( 1 )
The additive term. 'o.hashCode()' is unlikely to be 0 (though if it
is 'null' the whole thing blows up). If it is, then it rightly
contributes zero to the result. So at each iteration through the loop
after the first, 'x' will also be non-zero, most likely.
You're right, I totally blew that. x should be initialized to 0;
Try it. Add a line after the x = x * 53 + o.hashCode(); to print out
the value of x in each loop.
I disagree. Recall that in a tree structure, the average runtime of
insertions and retrivals is log n. Clearly we have a hierarchy like a
tree here, so we really can't expect to get better than this. And for
un-balanced or degenerate or otherwise poorly constructed trees, the
"tree" becomes a list and we do get O(n) look up times.
I think we are going to have to agree to disagree on this.
Apparently.
That's interesting. I'm vaguely familiar with message passing systems.
Message passing is not new, I did work briefly on one lo these many
years ago.
My understanding is that there are two types of synchronization --
shared memory, and message passing. Java's monitors are a shared memory
design. I'm not sure what it would take to get a monitor in Java to run
on a message passing system. And the rest of the Java memory model might
be impossible to port.
Anyway, good info, but unlikely to affect the Java world, which is
shared memory only afaict.
The Java memory model was specifically created to handle
systems without cache coherency, so it will survive fine.
Apps assuming that it will not be a problem on modern
systems will not.
Patricia said:On 11/22/2010 7:59 AM, ses wrote:
...
A higher level strategy I often follow in this sort of situation is to
isolate the data structure inside a class of its own that only offers
the methods you need. Those methods should be designed in terms of the
callers' needs, not the implementation. For example, if the key is
really a pair of ints, and Integer is only being used because you are
implementing it using collections, the class might have methods such
as:
public boolean insert(int key1, int key2, String data)
public String get(int key1, int key2)
public boolean delete(int key1, int key2)
Implement the class the simplest, most direct way that you can think
of, without spending a lot of time worrying about it.
Effectively, this kicks the problem of how best to implement the data
structure down the road to a time when some additional requirement,
such as a performance or memory usage requirement, has become clear.
At that time, knowing a specific requirement that your initial
implementation is not meeting will often make the correct course of
action obvious.
Patricia
Cache coherency isn't a concern. It's shared memory. Threads require it,
message passing systems generally eschew it. Message passing systems
generally use processes, not threads, to do their concurrency.
There's also problem is simulating shared memory with a message passing
system. There's a fairly high overhead in messaging systems with passing
the message. Trying to copy a large block of memory -- or multiple large
blocks -- to simulate a Java synchronization action could really bog
performance. Do-able, perhaps, but not a great idea.
Read the article you linked to again: Intel is saying they want to
connect 1000's of processors but not use cache coherency. So if you fill
an array with 10000 bytes of output and then have to make it visible by
releasing a shared lock, that memory has to be sent to all systems that
may need it.
One message, sent to N systems. In a shared memory design, this happens
automatically. In message passing, you have to pass a message to each
process. Each process has to wait for each message to receive it.
>
> It gets worse if the initial thread has updated memory hither and yon
> through out the system -- many messages to many processes.
They will not need to send any data - they will need to send an
invalidate cache message.
I believe this it at the HW level, so no processes involved.
markspace said:On the message passing system I worked on, you needed a sender and a
receiver. You won't want to broadcast to all processes, just the ones
that want to listen. If you have 50 threads that need to synchronize,
you don't want to dump the cache of the other 950 in the system. This is
the advantage of message systems, as I understand it. They have an
explicit target, so you can target only 50 instead of the entire 1000.
Message passing systems are typically implemented in the kernel, with
support from hardware, but not fully hardware themselves. In this case,
the kernel would probably need to look up the PID of the processes, get
which CPU they're running (if any, could be blocked and swapped out),
and send the message to those HW caches.
This could be received asynchronously by the CPUs, with out an explicit
receive on the target process, I'll give that. And HW can assist in the
look-up, that's been done. But there's still a context switch to deal
with (these HW access are usually privileged and not available to user
processes). And the context switch is the overhead, as I understand it.
Fine grained context switching is going to cause slowdowns.
When NetBeans auto-generates hashCode() methods, I've noticed it
randomizes the constants it uses, around a small range of values. I
don't think it matters much, but I'm just doing the same thing. Any
constant with some low order bits set as well as some bits set above bit
position 3 will do fine I think.
31, 33, 35, 51, 53, 55, etc. all work pretty much the same, and generate
slightly different hash codes, meaning the chance of collision is
reduced if you use different constants once in a while.
Are you discussing message passing on top of shared memory or shared
memory on top of message passing? I got a little dizzy, but there are
ways to do either that play to the strengths of the underlying platform.
(There's also something to be said for the hybrid approach.)
Upthread, markspace alluded to alternative scenarios where message
passing or shared memory might be the natural paradigm. Different
layers, diverse modules, might call for different patterns.
What we have is a fixed number of
lookups in a HashMap.
I don't think we do. The op's example gave a fixed look-up length, but
his description to me said "I'm going to continue adding HashMaps and
increasing the depth as long as I need to."
I'm too lazy to do the research right now, but I'm going to delve into
it later. Here's what I recall:
- There's a difference between coefficient choices for the pseudorandom
generator (PRG) wrt cycle length - how long the pseudorandom sequence
goes before repeating.
- 31 gives a good cycle length.
- I think I recall 53 also being among the small selection of pretty
decent coefficients long-cycle-wise for 32-bit PRGs.
- I don't recall the choice being as wide as your post conveys.
- Cycle length is not the only measure of PRGs for hash goodness.
- There are ways to reduce the likelihood of hash collision for a given
set of values regardless of the goodness of the pseudorandom sequence,
and the Java API uses them.
Thus I conclude that your advice is sound, though by preference I will
use a prime coefficient, and that will be 31 for 32-bit hashes at least
until I've done the research again.
I for one welcome our new brainy... naw, that's overdone.
I guess any pedantic post correcting someone has to contain at least one
really ridiculous and obvious spelling error, doesn't it?
It is not obvious to me that coefficients that gives long cycles for
LCG's also give good distribution for hashes.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.