Why MULT 31 (hash function for string)?

G

gokkog

Eric said:
I still think it's mis-transcribed. Either that, or it's
not intended as a hash function for strings, but for groups of
k strings stored consecutively (first character of second string
right after the '\0' of the first, and so on).

Next time you post a code snippet, consider including a
brief description of what it's supposed to do. If you don't,
people will have to guess about the intent of the code and
may guess wrong. "//from programming pearls" does not qualify
as a specification ...

Thanks for your suggestion. I was careless when copying the codes
snip but it seems not an independent function for ordinary string
hasing:

http://netlib.bell-labs.com/cm/cs/pearls/markovhash.c
 
A

Ancient_Hacker

Logan said:
To put it another way, is there any modern processor (even an
embedded processor) where "mod" is actually more expensive than
"and"? I agree the former takes more silicon, but it seems
like most chips have gone ahead and devoted the silicon to it
to make it fast.

- Logan

most CPU's nowdays have at least two integer and fp divide units, and
can do a divide in two clock cycles, and overlapped with other
operations as well. So divide and mod are not the 48-cycle monsters
they were just a few silicon generations ago.
 
R

Rainer Weikusat

[...]
To put it another way, is there any modern processor (even an
embedded processor) where "mod" is actually more expensive than
"and"?

ARM (at least up to 9/ v5). The following code

unsigned x(unsigned y, unsigned z)
{
return y % z;
}

unsigned x1(unsigned y, unsigned z)
{
return y & z;
}

compiles to

x:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
str lr, [sp, #-4]!
bl __umodsi3
ldr pc, [sp], #4
..Lfe1:
.size x,.Lfe1-x
.align 2
.global x1
.type x1,function
x1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
and r0, r0, r1
@ lr needed for prologue
mov pc, lr

ie it calls a library routine for mod. With a constant second
operand, the compiler can eventually avoid the 'generic' division
routine, but that will still be several instructions instead of
one.
 
R

Rainer Weikusat

Ancient_Hacker said:
most CPU's nowdays have at least two integer and fp divide units,

"Most CPUs nowadays" *are* *not* 'recent' x86-compatible PC CPUs.
 
B

Ben Pfaff

Logan Shaw said:
To put it another way, is there any modern processor (even an
embedded processor) where "mod" is actually more expensive than
"and"?

Please name a processor on which "mod" and "and" have the same
cost.
 
W

websnarf

I would suggest that there is basically no processor anywhere where
that isn't true. From Cray vector processors to digital watches -- its
the same deal.
[...] I agree the former takes more silicon, but it seems
like most chips have gone ahead and devoted the silicon to it
to make it fast.

- Logan

most CPU's nowdays have at least two integer and fp divide units,

Excuse me? The Alpha processor had a single multi-staged integer
divide unit (i.e., it could compute about 4 bits of integer result per
clock), because those guys were macho. I'm not sure if there are any
others that you could classify as a high performance CPU.

In general CPUs reuse many of their other functional units
(specifically their multipliers and adders) to simulate a divide over
multiple clocks. In fact, most CPUs only provide a floating point
division state machine and use it to generate the integer divide. Its
just a matter of making the most use out of your transisters. The
moment you try to make a divider in your CPU microarchitecture, it
becomes a better idea to turn those transistors into extra multipliers
and adders (on average there will just be a bigger performance impact
for doing so). I.e., it never has been, and never will be a
particularly good idea to make a custom divider, and as a consequence
it will never come down to a single clock (or 2); which I don't think
is possible anyways.

In the Itanium, for example, rather than making a division unit, they
decided to make two parallel multiply and add units instead. Its
clearly better to do things that way.
[...] and
can do a divide in two clock cycles, and overlapped with other
operations as well.

CPU manufacturers have, at various times, done clever things with their
divide mechanisms -- but escaping their inevitable slowness is not one
of them. About the fastest dividers in existence are the SIMD
reciprocal dividers in the latest x86 processors that can compute 4
parallel 32-bit floating point approximate results in about 12 clocks.
But you can't shrink the 12 clocks, you can't get fully IEEE accurate
results, and you can't get fewer than 4 of them done at a time (except
by ignoring the extra ones you compute.) It certainly isn't going to
help you compute an isolated integer modulo.

The AMD Athlon did a neat thing where they would allow other FP
instructions to use the few dead slots in the FPU while it executed the
microcode for its division. But this does nothing for serial
calculations such as computing the offset into a hash (the software
just has to wait for the result before it can proceed.)
[....] So divide and mod are not the 48-cycle monsters
they were just a few silicon generations ago.

Well good ones are about 20 clocks (the Grahm-Smidt formula is an
ingeneous way of reducing a divide to a critical path of about 5 serial
floating point operations). But its *really* hard (and not worth the
effort) to get them to go any faster.
 
A

Ancient_Hacker

Excuse me?

You are correct, there's a scarcity of multiple divide units. Also the
times I gave can be complicated by a minimum latency.

The AMD processors do have separate floating add and multiply units.
They also have 128 bit SIMD instructions that do four FP ops at a gulp.
Also there's reciprocal divide, with three variations of precision,
and in very few cycles.

You're spot on-- it's usually better to devote the silicon to more
multipliers. Division is a hard nut to crack.
 
J

Joe Wright

Richard said:
Eric Sosman said:


<snorfl>

1 is prime, 3 is prime, 5 is prime, 7 is prime, 9 is prime, 11 is prime...
No. A prime is evenly divisible only by itself or unity (1). 2 is prime.
9 isn't prime. Did I miss something?
 
C

CBFalconer

There's a classic hash function to hash strings, where MULT is
defined as "31":

//from programming pearls
unsigned int hash(char *ptr)
{ unsigned int h = 0;
unsigned char *p = ptr;
int n;
for (n = k; n > 0; p++) {
h = MULT * h + *p;
if (*p == 0)
n--;
}
return h % NHASH;
}

Why MULT defined as 31? ( How about 29? 24? or 26? )

The primes 31 and 37 have been experimentally found to be
efficacious. 31 has the advantage that it consists of a string of
1 bits, and multiplication can be easily effected by shifting in
systems without a fast integer multiply.

That function as written above is faulty. Try:

/* Hash a string quantity */
unsigned long hshstrhash(const char * string)
{
unsigned long h;

h = 0;
while (*string) {
h = h * 31UL + (unsigned char) *string++;
}
return h;
} /* hshstrhash */

and leave the reduction modulo NHASH to the caller.
 
R

Richard Heathfield

Joe Wright said:
Really?

A prime is evenly divisible only by itself or unity (1).

That definition is inadequate, since it fails to exclude negative numbers
and fractions from the universe of discourse. (3 is divisible by -1 and -3,
so by your definition it is not prime. If you then say "well, not
negatives, obviously", you still have to face 3/2, which fits your
definition nicely, and yet is not prime.)

Also, your definition is insufficient to the necessary task of excluding 1
from primality.
2 is prime. 9 isn't prime. Did I miss something?

Yes.
 
K

Keith Thompson

Joe Wright said:
No. A prime is evenly divisible only by itself or unity (1). 2 is
prime. 9 isn't prime. Did I miss something?

It's the punchline to a joke, if I'm not mistaken.
 
E

Eric Sosman

Joe said:
No. A prime is evenly divisible only by itself or unity (1). 2 is prime.
9 isn't prime. Did I miss something?

Mathematician: A prime is a positive integer with
two and only two distinct exact divisors, namely, itself
and unity.

Engineer: 3 is prime, 5 is prime, 7 is prime, 9 is an
observational error, 11 is prime, ...

Programmer: 3 is prime, 5 is prime, 7 is prime, 9 is
prime, 11 is prime, ...

(Oh, sorry: I forgot the part about "A mathematician,
an engineer, and a programmer walk into a bar ...")

Working the professions in the opposite direction, it seems
these three about-to-be-plastered savants had been asked to find
the aerodynamic drag on a new railway locomotive. The programmer
offered to write a gigantic non-linear partial differential
equation solver and run it for eight months (plus debugging time)
on a massive bank of linked supercomputers doing somewhat over
three petaflops. The engineer pointed out that this would be
hideously expensive, and instead proposed building a full-scale
model of the locomotive and testing it in a giant wind tunnel.
The mathematician tut-tutted both of them for their pathetic
naïveté and said "Consider a spherical train ..."
 
K

Keith Thompson

WARNING: This article contains no topical material.

Eric Sosman said:
Joe said:
Richard Heathfield wrote: [...]
1 is prime, 3 is prime, 5 is prime, 7 is prime, 9 is prime, 11 is
prime...
No. A prime is evenly divisible only by itself or unity (1). 2 is
prime. 9 isn't prime. Did I miss something?

Mathematician: A prime is a positive integer with
two and only two distinct exact divisors, namely, itself
and unity.

Engineer: 3 is prime, 5 is prime, 7 is prime, 9 is an
observational error, 11 is prime, ...

Programmer: 3 is prime, 5 is prime, 7 is prime, 9 is
prime, 11 is prime, ...

(Oh, sorry: I forgot the part about "A mathematician,
an engineer, and a programmer walk into a bar ...")

<WAY_WAY_OT>

Here's the version I'm familiar with (potentially offensive to
engineers):

A mathematician, a physicist, an engineer, and a computer scientist
are asked to test the hypothesis that all odd numbers greater than 1
are prime.

Mathematician:
3 is prime, 5 is prime, 7 is prime, 9 is composite.
The hypothesis is false.

Physicist:
3 is prime, 5 is prime, 7 is prime, 9 is composite, 11 is prime,
13 is prime, ...
The hypothesis is true within the bounds of experimental error.

Engineer:
3 is prime, 5 is prime, 7 is prime, 9 is prime, 11 is prime, 13 is
prime, 15 is prime, ...

Computer scientist:
3 is prime, 5 is prime, 5 is prime, 5 is prime, 5 is prime, 5 is
prime, 5 is prime, ...

(The computer scientist part is my own original contribution to the
joke.)

</WAY_WAY_OT>
 
R

Richard Harter

Mathematician: A prime is a positive integer with
two and only two distinct exact divisors, namely, itself
and unity.

Engineer: 3 is prime, 5 is prime, 7 is prime, 9 is an
observational error, 11 is prime, ...
(should be a Physicist instead of an Engineer)
Programmer: 3 is prime, 5 is prime, 7 is prime, 9 is
prime, 11 is prime, ...
This is the Engineer; Here's the programmer -
Programmer: 3 is prime, 5 is prime, 5 is prime, 5 is prime, 5 is
prime, ...
 
L

Logan Shaw

Ben said:
Please name a processor on which "mod" and "and" have the same
cost.

Having read much of this discussion, it would appear that I made
a bit of a leap in assuming the two were comparable in speed.

I guess this was all based on what I know about pipelined processors,
which is that many integer operations go through the pipeline without
stalling (otherwise the pipeline isn't that useful). Apparently it
wasn't valid to go from "many" to "most, including division".

Perhaps if we had been comparing left-shifts with multiplies, the
kind of leap I made would've been more reasonable, or at least closer
to reality. But I guess "and" and "mod" are more different than
left-shift and multiply are...

- Logan
 
W

websnarf

Keith said:
Mathematician:
3 is prime, 5 is prime, 7 is prime, 9 is composite.
The hypothesis is false.

My recollection of the Mathematicians take:
3 is prime, 5 is prime, 7 is prime, the rest follow by induction.
 
R

Richard Tobin

Keith Thompson said:
Computer scientist:
3 is prime, 5 is prime, 5 is prime, 5 is prime, 5 is prime, 5 is
prime, 5 is prime, ...

Typical programmer:

3 is prime, 5 is prime, 6.9999999998 is prime, ...

-- Richard
 
K

Keith Thompson

Typical programmer:

3 is prime, 5 is prime, 6.9999999998 is prime, ...

Student with slide rule:

"Two times three is five point nine nine nine ... aw heck, call it six."

(A very *old* joke.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top