D.E. Knuth's strlen

jacob navia · Oct 18, 2008

Norbert said:
Best I know, Alan Mycroft first published this strlen algoritm
in about 1988. Many years ago, I used it for Solaris's strlen:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/sparcv9/gen/strlen.s

-- Norbert

If you read Knuth's paper, you will find that he says (page 20):
< quote >
Several fairly good solutions to this problem were found by Lamport and
others; but Alan Mycroft dsicovered in 1987 that three instructions
actually suffice
<end quote>

CBFalconer · Oct 18, 2008

James said:
.... snip ...

Since I've pointed out bugs in his hashlib before, I
hope it will cheer Chuck up to learn that FSF hsearch()
*still* has a design defect so laughable as to make Chuck's
seem almost well-written by comparison! The first probe
after a collision in their fancy "based-on-Knuth" reprobing
is *always* to the last table entry. (Uh, the whole idea
of fancy reprobing is to avoid secondary collisions.)

I cannot recall ever receiving a bug report from you on hashlib.

Ben Bacarisse · Oct 18, 2008

James Dow Allen said:
(BTW, The only reason I had glibc source on my laptop
is that I wanted to inspect their hsearch() and see
if they ever fixed its *very* egregious performance
bug.

[off-topic] I could not find your bug report, so I filed one.

The only file I really wanted was misc/hsearch_r.c,
but this is a *huge* file, 6410 bytes. Fortunately,
FSF is smart enough to use compression to reduce
download effort and doesn't allow access to this huge
uncompressed file. All I needed to download was
glibc-2.7.tar.gz, which is a mere 21,241,912 bytes.)

[off-topic] You can get just one file from the CVS browser. E.g.:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/misc/?cvsroot=glibc

Since I've pointed out bugs in his hashlib before, I
hope it will cheer Chuck up to learn that FSF hsearch()
*still* has a design defect so laughable as to make Chuck's
seem almost well-written by comparison! The first probe
after a collision in their fancy "based-on-Knuth" reprobing
is *always* to the last table entry. (Uh, the whole idea
of fancy reprobing is to avoid secondary collisions.)

I don't think it is a design defect. It looks to me to be a
common-all-garden bug. Someone thought they could reduce the hash mod
the table size so as to detect the wrap-around without another
variable, but that meant the secondary hash was not as per Knuth.

BTW, the probe is not always to the last entry, but is in all bar two
cases.

[This whole post is nearly off-topic but learners might like to look at
hsearch_r.c at the link above to see how easily one can make an error
with a hash table implementation.]

Antoninus Twink · Oct 19, 2008

I cannot recall ever receiving a bug report from you on hashlib.

When a house is in such a bad state that it's falling down about you,
some people probably think it's not worthwhile asking someone to paper
over the cracks.

(Ha, not a bad example of its genre, IMO - the strained clc metaphor
that doesn't really work...)

Mark L Pappin · Oct 19, 2008

Ben Bacarisse said:
common-all-garden bug.

ITYM "-or-". HTH. HAD.

mlp

Bartc · Oct 19, 2008

Mark L Pappin said:
ITYM "-or-". HTH. HAD.

ITYM HAND?

James Dow Allen · Oct 20, 2008

James Dow Allen said:
James Dow Allen said:

(BTW, The only reason I had glibc source on my laptop
is that I wanted to inspect their hsearch() and see
if they ever fixed its *very* egregious performance
bug.

Click to expand...

[off-topic] I could not find your bug report, so I filed one.

When I mentioned the bug some years ago, someone
directed me to a Usenet group for reporting gnu
bugs, but that group seemed to be defunct!
I saw no reason to pursue it; no one has come forward
who admits to actually *using* hsearch().

... learners might like to look at
hsearch_r.c at the link above to see how easily one
can make an error ...

I noticed its buggy reprobe behavior doing a "black
box" test some years ago, and didn't more than glance
at the source code until you posted. Now I see things
are much worse than I realized. Indeed, as you suggest,
hsearch()'s only utility may be as a how-not-to example.

The hsearch code includes a "frilly" feature (perhaps
added hurriedly to an already-working routine?) with
at least five flaws. The feature expanded a table
entry from (normally) 8 bytes to 12 bytes with the
extra field ("used") used (redundantly) in tests for
empty slots and key equality. (I say "redundant", because
if (hp->key == 0)
was already available to test for empty slots, and
if (!strcmp(hp->key, item.key))
is a necessary and sufficient test for key equality.)

(1) If one chooses to spend 4 (or sizeof(unsigned int))
bytes in this fashion, one should "get one's money
worth!" The field will be compared with "hval" (or
"ohval"!) which is derived via:
unsigned int hval;
hval = ... /* hashfunction(item.key) */ ;
/* ohval = hval + 1 */
hval %= htab->size;
if (hval == 0) ++hval; /* get rid of this */
...
if (...->used != hval) /* bypass strcmp() */

The value of hval *before* the "%= htab->size" should
have been preserved for the short-circuiting key equality
test. I've shown a way to do this with the commented-out
ohval assignment. (Note that as is, false equalities
(where the strcmp() ends up not being bypassed) will be
quite common.

(2) It's not very wrong to pack an empty-slot indicator
into the extra field (though one wonders why they bothered),
but they found a bad solution to an easy problem. Note,
for example, the "+ 1" in my "ohval = " fix. That's it!
You're done! This allows the inequality short-circuit
*and* the empty-slot test, with *no* further worry
about any issues of the "+ 1".

Instead, the coder chose to inject the "+ 1" into
code more intimately associated with the table indexing,
and this leads to the next two bugs. (To avoid a zero
index, he also uses a "+ 1" in the calloc() to allocate
the hash table, but let's not quibble over 12 wasted
bytes.)

(3) As implied in the above code and discussion, a
table of size p will have entries (1,2,3,...,p);
however no hash output maps to (p), while *two* hash
outputs map to 1! (see "if (hval == 0) ++hval;" above).

A nit, perhaps, but why deliberately degrade one's
hash function? The "if (hval == 0)" can be simply
snipped, I think, with no ill effect, but as
explained in (1) and (2), the whole concept is hardly
worth salvage.

(4) Also building on flaw (2), coder managed
to distort the "based-on-Knuth reprobing" and wastes
almost every initial reprobe. Had the coder bothered
to print *any* statistic about reprobes during his
testing, this bug would have become blindingly obvious,

(5) Finally, the whole "used" concept is unnecessary
and may hurt rather than help performance. It increases
the table size by 50% and *increases* instructions
executed (at least if strcmp is wholly or partly in-lined).
It *does* avoid cache-misses (on unequal
key strings) during reprobing, but if this were a goal,
why didn't they switch to quadratic reprobing, whose
virtues include good cache utilization?

I think this hsearch() teaches three useful lessons,
in order of increasing importance:

(1) Be careful when "squeezing" in special-values.

hsearch() bugs #2 and #3 stem from the
++hval; /* make room for special indicator */
Try to avoid such things, or find a way to make them
transparent to most of the code.

(2) Exercise special caution when implementing
tricky or poorly-understood code.

(3) Test your code.

When the only purpose of a method is improved performance,
test the performance! (A "hash table" implemented as
sequential search will pass functional tests.)

printf() is your debugging friend! (We've all needed to
set breakpoints or pore over coredumps back in the day,
but judicious use of "if (TEST) printf()" is the
straightforward way to address most testing and debugging.)
Developing and printing *any* statistic about reprobe
counts during hsearch() evaluation would have made its bug
blindingly obvious. That this code was placed in glibc
without any such test is a black mark indeed on someone.

Don't be afraid to do simple tests. Perhaps the
programmer didn't bother to print probing statistics
because he didn't know how to interpret them, but if he
*had* printed them *anyway*, very little expertise would
have been needed to see that something was wrong.

A few months ago, in a thread about gets(), I mentioned
the original Hubbell Telescope flaw as an example
where a cheap simple test was overlooked amid a debate
about whether to run an expensive thorough test.
In that thread, the self-appointed Deprecator-in-Chief
utterly failed to grasp this simple point, suggesting that
this blind spot affects even average-plus programmers.

James Dow Allen

James Dow Allen · Oct 20, 2008

I cannot recall ever receiving a bug report from you on hashlib.

I do apologize if I've kept hashlib's performance bug
hidden from the newsgroup; that certainly wasn't my intent!

I don't see a smiley-face on your message. If you *are*
sincere, send me e-mail with a valid "From" address
and I will again report the bug and suggested fix.

James Dow Allen

Joachim Schmitz · Oct 20, 2008

James said:
I do apologize if I've kept hashlib's performance bug
hidden from the newsgroup; that certainly wasn't my intent!

I don't see a smiley-face on your message. If you *are*
sincere, send me e-mail with a valid "From" address
and I will again report the bug and suggested fix.

What make you think that his signature is wrong? It claims his email to be
"cbfalconer at maineline dot net". Apparently that is also in the Reply-To:
of his articles.

Bye, Jojo

Nick Keighley · Oct 20, 2008

printf() is your debugging friend! (We've all needed to
set breakpoints or pore over coredumps back in the day,
but judicious use of "if (TEST) printf()" is the
straightforward way to address most testing and debugging.)

..sig material!

CBFalconer · Oct 20, 2008

James said:
I do apologize if I've kept hashlib's performance bug
hidden from the newsgroup; that certainly wasn't my intent!

I don't see a smiley-face on your message. If you *are* sincere,
send me e-mail with a valid "From" address and I will again
report the bug and suggested fix.

All my Usenet addresses are valid, although the From: reaches a
spam trap. My main address is in the reply-to and in my sig,
below. I have no objection to your reporting in the group - the
package is written in standard C, and thus is on-topic.

stdazi · Oct 21, 2008

[email protected] said:
[email protected] said:

[... seventy-eight quoted lines, including a signature, all in
support of one line of content ...]
I'd like to see some benchmarks!

Click to expand...

Then run some yourself. Are you helpless?

No, I'm just too lazy to try it out but still interested in the
results.

CBFalconer · Oct 21, 2008

Eric said:
CBFalconer wrote:

[...]

the package is written in standard C, and thus is on-topic.

Click to expand...

No. If I write a bread-machine controller in standard C,
questions about the proper temperature for baking whole-wheat
gingerbread are not topical here. The C is topical; the bugs
we all write in it are not. (Except insofar as they impinge
on C itself, rather than on an application of C.)

Disagree. Discussions would be about the alleged bugs, and since
written in standard C any such questions are topical.

CBFalconer · Oct 21, 2008

[email protected] said:
Eric Sosman said:

[email protected] said:

[... seventy-eight quoted lines, including a signature, all in
support of one line of content ...]
I'd like to see some benchmarks!

Click to expand...

Then run some yourself. Are you helpless?

Click to expand...

No, I'm just too lazy to try it out but still interested in the
results.

You are also too lazy to properly snip your quotes. I suspect you
will get very little help.

James Dow Allen · Oct 23, 2008

I have no objection to your reporting in the group - the
package is written in standard C, and thus is on-topic.

I've mentioned the performance bug at least twice
in this ng, most recently in
http://groups.google.com/group/comp.lang.c/msg/fe91fa28f0d34c89
.... that was why I assumed you wanted a personal e-mail message.

My first message, constructed after I first downloaded your code,
was:

I did download and inspect the libraries of Paul Hsieh
and Chuck Falconer. Both are excellent efforts; any critical
comments that might follow are not intended to disparage --
especially since you'd have plenty of cause to deprecate
my code if you download it!
... I was startled by the failure to make certain obvious
optimizations. (The bugaboo "premature optimization"
scarcely applies to Chuck's and Paul's modules, which are
stable and respected components.)

Chuck, why not replace
h = (h + h2) % master->currentsz;
with
h += h2;
if (h >= master->cursiz)
h -= master->cursiz;
This is allowed by the constraints you're already imposing
on h and h2. I think there are still machines out there with
slow division.

Some may object to the term "bug" in describing what is
only a performance flaw. But, as I mentioned elsewhere
in this thread, linear search would then qualify as an
unbuggy "hash table"!

James

Ike Naar · Oct 23, 2008

Wouldn't that be "buggy" on a machine with fast division but slow branching?

Phil Carmody · Oct 23, 2008

Wouldn't that be "buggy" on a machine with fast division but slow branching?

On such an architecture, I can imagine the compiler writers
simply replacing the conditional branch/subtract with an
always-performed subtraction of a value that's either 0 or
master->cursiz achieved by some bit twiddling (usually a
subtraction, arithmetic shift, and mask).

Phil

CBFalconer · Oct 23, 2008

James said:
.... snip ...

Some may object to the term "bug" in describing what is only a
performance flaw. But, as I mentioned elsewhere in this thread,
linear search would then qualify as an unbuggy "hash table"!

Note that that appears in a loop. The value of h and h2 may be
large, while the modulus may be using a small number. Thus the
subtraction would require a further loop, and possibly take
longer. In addition, in case I have missed something, I consider
the % operation clearer to the reader. In addition, these days,
division and modulus can be very fast, especially for integers.

It may be worthwhile if implemented on slow hardware.

Yes, I don't consider it a bug.

BTW, unacknowledged bug reports in the newsgroup may well have
never been read.

Lew Pitcher · Oct 27, 2008

<quote>
... one of the most common programming tasks is to search
through a long string of characters in order to find a
particular byte value. For example strings are often
represented as a sequence of nonzero bytes terminated
by 0. In order to locate the end of a string quickly,
we need a fast way to determine whether all eight bytes of a
given word x are nonzero (because theu usually are).
<end quote>

Hmmmmm..... I had to check this quote, because it doesn't entirely
ring "true". After much research (reading both Vol 4 Fascile 1A, and Vol 1,
Fascile 1) I've come to the conclusion that the quote is correct, but
you've taken it out of context.

I discovered that quote above in the fascicle 1a in

"Bitwise Tricks and Techniques"
http://www-cs-faculty.stanford.edu/~knuth/fasc1a.ps.gz

In that document, Knuth explains many boolean tricks, giving
the mathematical background for them too, what many other
books fail to do.

I adapted its algorithm to C. It took me a while because of a
stupid problem, but now it seems to work OK.

The idea is to read 8 bytes at a time and use some
boolean operations that can be done very fast in assembly
to skip over most of the chain, stopping only when a zero
byte appears in one of the eight bytes read.

[snip]

The quote from Vol 4, Fascile 1a, does not refer to real-world bytes and
words, and the count of 8 bytes to a word only represents the measurements
for Knuth's MMIX virtual machine. For a MMIX machine, where each memory
element consists of 64 bits (a "word"), it makes sense to parse a string 8
(8-bit) bytes at a time.

I haven't continued my reading in the fascile past the quote you found, but
I presume that Knuth will tie this comment back to the real world in some
way, either by pointing out the specific differences between MMIX and real
world systems (and thus the differences between the statement and the
real-world implementation), or by formalizing his statement into his usual
platform-agnostic language, that makes no assumptions about wordsize and
bytesize.

FWIW, translated into C, you get a 1:1 ratio of characters to "words".
Translated into 80x86 assembly, you probably would coax that ratio up to
4:1, and thus would code the assembly language to grab 32 bits (4 bytes) at
a time. On an IBM 360-style system, you could use either the 1:1 ratio, or
a 2:1 ratio, or a 4:1 ratio, or even an 8:1 ratio (360-style mainframes are
character addressable, but also offer "halfword" (16-bit), "word" (32-bit),
and "doubleword" (64-bit) data modes).

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------

jacob navia · Oct 27, 2008

Lew said:
Hmmmmm..... I had to check this quote, because it doesn't entirely
ring "true". After much research (reading both Vol 4 Fascile 1A, and Vol 1,
Fascile 1) I've come to the conclusion that the quote is correct, but
you've taken it out of context.

No, not at all.

I discovered that quote above in the fascicle 1a in

"Bitwise Tricks and Techniques"
http://www-cs-faculty.stanford.edu/~knuth/fasc1a.ps.gz

In that document, Knuth explains many boolean tricks, giving
the mathematical background for them too, what many other
books fail to do.

I adapted its algorithm to C. It took me a while because of a
stupid problem, but now it seems to work OK.

The idea is to read 8 bytes at a time and use some
boolean operations that can be done very fast in assembly
to skip over most of the chain, stopping only when a zero
byte appears in one of the eight bytes read.

Click to expand...

[snip]

The quote from Vol 4, Fascile 1a, does not refer to real-world bytes and
words, and the count of 8 bytes to a word only represents the measurements
for Knuth's MMIX virtual machine. For a MMIX machine, where each memory
element consists of 64 bits (a "word"), it makes sense to parse a string 8
(8-bit) bytes at a time.

You are wrong. In the preceding page, Knuth makes it clear he is
speaking about any kind of 64 bit machines, and not only MMIX.

I haven't continued my reading in the fascile past the quote you found, but
I presume that Knuth will tie this comment back to the real world in some
way, either by pointing out the specific differences between MMIX and real
world systems (and thus the differences between the statement and the
real-world implementation), or by formalizing his statement into his usual
platform-agnostic language, that makes no assumptions about wordsize and
bytesize.

No. He doesn't, and the adapting of the algorithm was done by me.

FWIW, translated into C, you get a 1:1 ratio of characters to "words".
Translated into 80x86 assembly, you probably would coax that ratio up to
4:1, and thus would code the assembly language to grab 32 bits (4 bytes) at
a time.

As you (may) know, the 64 bit AMD x64 architecture is now several years
old... It seems you are not up to date what those processors are
concerned.

C pipe	1	Dec 9, 2021
Can someone tell me what's wrong with this question on StackOverflow?	0	Aug 18, 2023
'strlen' : Why valgrind reports invalid read ?	12	Jan 18, 2013
array-size/malloc limit and strlen() failure	26	Apr 2, 2014
Command Line Arguments	0	Mar 7, 2023
C language. work with text	3	Dec 9, 2021
strlen	5	Apr 24, 2006
For loop and strlen	8	Sep 20, 2009

D.E. Knuth's strlen

jacob navia

CBFalconer

Ben Bacarisse

Antoninus Twink

Mark L Pappin

Bartc

James Dow Allen

James Dow Allen

Joachim Schmitz

Nick Keighley

CBFalconer

stdazi

CBFalconer

CBFalconer

James Dow Allen

Ike Naar

Phil Carmody

CBFalconer

Lew Pitcher

jacob navia

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads