Expert Help needed with hash_multimap

J

JN

Hello. Sorry about the length of this post. I am trying to implement a hash
table for my chess program, and decided to try using the VC++ . NET
(Dinkumware) version of hash_map.
The role of the hashtable is to store HashEntry objects, which contain
information about chess positions.
A BitBoard is just an __int64, and this is used to determine the hash key.
I have not shown the definition of class ChessPosition, but all you need to
know is that it contains a number of BitBoards as members. The Member 'A' is
used to create the key.

All of this seems to work quite well (and fast!), but I have a few
questions:

1. I am still confused about the correct way to use hash_compare. The
documentation suggests that you either make your own, or derive your own
class from hash_compare and selectively override functions as required (eg
the hash function). I have gone for the first option, and called it
MyHashCompare . Can you please tell me if I have done anything wrong ?

2. As I insert items, the hash_multimap just gets bigger and BIGGER, and
BIGGER !! (it is eating about 3 megs/second on my system). Is there any way
to put a limit on the size of the hash_multimap, say limiting it 256 MB ? If
not, is it more appropriate to simply use a container with a fixed size. (A
static size would be totally acceptable)

3. When retrieving HashEntry's, another part of my program (the search()
function) expects to get a pointer to a HashEntry, which is why I have
really ugly code like:
pH=&((*it).second);
Is there a better way of interfacing with the rest of the program, and if
so, what function prototype would you recommend for LookupHash() ?


// hash.h:

#ifndef _HASH_H
#define _HASH_H 1

#include <hash_map>
#include "movegen.h"

using namespace std;

size_t Hash(BitBoard B);

enum HashEntryFlags
{
hashfVALUNKNOWN,
hashfEXACT,
hashfALPHA,
hashfBETA
};

class HashEntry
{
public:
HashEntry()
{
Flags=hashfVALUNKNOWN;
}
ChessPosition P;
int Depth;
HashEntryFlags Flags;
int Value;
Move bestMove;
};

class MyHashCompare
{
public:
enum
{
// parameters for hash table
bucket_size = 4, // 0 < bucket_size
min_buckets = 8 // min_buckets = 2 ^^ N, 0 < N
};

size_t operator()(const BitBoard& A) const
{

// hash function. Thanks to Thomas Wang for this
// (http://www.concentric.net/~Ttwang/tech/inthash.htm)

BitBoard B(A);
B += ~(B << 32);
B ^= (B >> 22);
B += ~(B << 13);
B ^= (B >> 8);
B += (B << 3);
B ^= (B >> 15);
B += ~(B << 27);
B ^= (B >> 31);
return static_cast<size_t>(B);
}

bool operator()(const BitBoard A, const BitBoard B) const
{
// Used for determining order when keys are the same
return (A<B);
}
};

typedef hash_multimap<BitBoard,HashEntry,MyHashCompare> HashTable;

bool InsertHash(HashTable& rT, const HashEntry& rH);
const HashEntry* LookupHash(const ChessPosition& rP, const HashTable& rT);

#endif


// hash.cpp:

#include "hash.h"

bool InsertHash(HashTable& rT, const HashEntry& rH)
{
rT.insert(HashTable::value_type((rH.P.A) /* BitBoard 'A' of Position used
as key */,rH));
return true;
}

const HashEntry* LookupHash(const ChessPosition& rP, const HashTable& rT)
{

// Define a pair of iterators, and get an equal range
pair<HashTable::const_iterator,HashTable::const_iterator> p=
rT.equal_range(rP.A);

// Return first entry with a matching Position
const HashEntry* pH=NULL;
for(HashTable::const_iterator it = p.first; it != p.second; ++it)
{

if( ((*it).second.P) == rP )
{
pH=&((*it).second);
break;
}
}
return pH;
}
 
B

Buster

JN said:
3. When retrieving HashEntry's, another part of my program (the search()
function) expects to get a pointer to a HashEntry, which is why I have
really ugly code like:
pH=&((*it).second);

It's not that ugly. If you translate it to "ph = & it->second", it's
crystal clear. Your LookupHash function provides a simple syntax for a
special case, to supplement the existing more general syntax. It's
appropriate for the little translation from iterator to pointer to be
part of this function.
Is there a better way of interfacing with the rest of the program, and if
so, what function prototype would you recommend for LookupHash() ?

It strikes me that what you have is equivalent to using a
hash_[multi]set of HashEntry objects, with hash function

class HashEntryCompare
{
public:
// ...
size_t operator () (const HashEntry & entry) const
{ return compare_bitboards (entry.P.A); }

size_t operator () (const HashEntry & a, const HashEntry & b) const
{ return compare_bitboards (a.P.A, b.P.A); }

private:
// ...
MyHashCompare compare_bitboards;
};

(I've tried to imitate your syntax, because I haven't got the same
hash_map implementation as you, or its documentation.)

If that's the case your code would be simpler with sets. You might need
fewer syntax-simplifying functions.

Regards,
Buster.
 
J

John Harrison

2. As I insert items, the hash_multimap just gets bigger and BIGGER, and
BIGGER !! (it is eating about 3 megs/second on my system). Is there any way
to put a limit on the size of the hash_multimap, say limiting it 256 MB ? If
not, is it more appropriate to simply use a container with a fixed size. (A
static size would be totally acceptable)

I believe that real chess programs handle this by selectively discarding
infrequently accessed parts of the hash table.

Since efficiency is obviously going to be a concern, I think you are
inevitably going to have to write your own custom data structure sooner or
later. Hash tables are actually pretty easy to program, suggest you get a
book on data structures.

john
 
P

P.J. Plauger

Hello. Sorry about the length of this post. I am trying to implement a hash
table for my chess program, and decided to try using the VC++ . NET
(Dinkumware) version of hash_map.
The role of the hashtable is to store HashEntry objects, which contain
information about chess positions.
A BitBoard is just an __int64, and this is used to determine the hash key.
I have not shown the definition of class ChessPosition, but all you need to
know is that it contains a number of BitBoards as members. The Member 'A' is
used to create the key.

All of this seems to work quite well (and fast!), but I have a few
questions:

1. I am still confused about the correct way to use hash_compare. The
documentation suggests that you either make your own, or derive your own
class from hash_compare and selectively override functions as required (eg
the hash function). I have gone for the first option, and called it
MyHashCompare . Can you please tell me if I have done anything wrong ?

Your compare function is:
return (A<B);

which is quite likely a strict weak ordering, as required. Looks fine
to me.
2. As I insert items, the hash_multimap just gets bigger and BIGGER, and
BIGGER !! (it is eating about 3 megs/second on my system). Is there any way
to put a limit on the size of the hash_multimap, say limiting it 256 MB ? If
not, is it more appropriate to simply use a container with a fixed size. (A
static size would be totally acceptable)

You can slow its growth by making bucket_size larger. Right now you
have the default:
bucket_size = 4, // 0 < bucket_size

But note that you'll trade off a smaller hash table against a
slightly slower lookup and insert.
3. When retrieving HashEntry's, another part of my program (the search()
function) expects to get a pointer to a HashEntry, which is why I have
really ugly code like:
pH=&((*it).second);
Is there a better way of interfacing with the rest of the program, and if
so, what function prototype would you recommend for LookupHash() ?

You can also write it as:

pH = &it->second;

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
J

JN

Thanks John,

I certainly have access to many examples of HashTable code, but I wanted to
try an STL-style implentation. However, I seem to already have wasted far
too much time going down this path, so I think I will revert back to a
traditional (C-Style) data structure. btw, when you referred to "real chess
programs", I'm sure you meant it in the nicest possible way :) ..

Regards,
JN
 
J

JN

Thanks Buster.

This is good.

Cheers,

Buster said:
JN said:
3. When retrieving HashEntry's, another part of my program (the search()
function) expects to get a pointer to a HashEntry, which is why I have
really ugly code like:
pH=&((*it).second);

It's not that ugly. If you translate it to "ph = & it->second", it's
crystal clear. Your LookupHash function provides a simple syntax for a
special case, to supplement the existing more general syntax. It's
appropriate for the little translation from iterator to pointer to be
part of this function.
Is there a better way of interfacing with the rest of the program, and if
so, what function prototype would you recommend for LookupHash() ?

It strikes me that what you have is equivalent to using a
hash_[multi]set of HashEntry objects, with hash function

class HashEntryCompare
{
public:
// ...
size_t operator () (const HashEntry & entry) const
{ return compare_bitboards (entry.P.A); }

size_t operator () (const HashEntry & a, const HashEntry & b) const
{ return compare_bitboards (a.P.A, b.P.A); }

private:
// ...
MyHashCompare compare_bitboards;
};

(I've tried to imitate your syntax, because I haven't got the same
hash_map implementation as you, or its documentation.)

If that's the case your code would be simpler with sets. You might need
fewer syntax-simplifying functions.

Regards,
Buster.
 
J

John Harrison

JN said:
Thanks John,

I certainly have access to many examples of HashTable code, but I wanted to
try an STL-style implentation. However, I seem to already have wasted far
too much time going down this path, so I think I will revert back to a
traditional (C-Style) data structure. btw, when you referred to "real chess
programs", I'm sure you meant it in the nicest possible way :) ..

Yes of course. Maybe I was think of my own attempts to write a chess
program. It ran on a batch system (i.e. no user interaction allowed) where I
was limited to 6 submissions a day. Therefore I could only play six moves a
day. It only ever played one game, which was unfinished due to the program
crashing when pawn promotion was a possibility.

I never wrote another chess playing program.

A serious point, I sometimes doubt the benefits of OO for algorithmically
complex situations like a chess playing engine. Because the things that OO
is good at like separating interface and implementation and assigning
responsibilities to well defined sections of code don't really address the
important issues in getting complex algorithm right. The user interface of
your chess program would be a completely different matter of course.

With the STL hash table you'll undoubtedly get a good implementation of a
general purpose hash table, but that doesn't mean that you can't do better
with your application specific knowledge.

John
 
J

JN

Yes of course. Maybe I was think of my own attempts to write a chess
program. It ran on a batch system (i.e. no user interaction allowed) where I
was limited to 6 submissions a day. Therefore I could only play six moves a
day. It only ever played one game, which was unfinished due to the program
crashing when pawn promotion was a possibility.

LOL, you must have been a nervous wreck. The anticipation must have been
unbearable.
A serious point, I sometimes doubt the benefits of OO for algorithmically
complex situations like a chess playing engine. Because the things that OO
is good at like separating interface and implementation and assigning
responsibilities to well defined sections of code don't really address the
important issues in getting complex algorithm right. The user interface of
your chess program would be a completely different matter of course.

I agree with you. I was one of those people who programmed in C before
learning C++, but I am trying to be a good C++ programmer.
I am trying to avoid using all the things that Marshall Cline says are
"Evil" eg arrays, macros, pointers etc, and adopt a modern "standard C++"
style.
However, every time I have tried to rewrite my code to use STL and/or OO, I
have been suffering HUGE performance hits (like 1/10 of the original speed).

As an example, My move generator function populates a fixed-length array of
class Move. I thought it would be a nice idea to use a deque<Move> instead.
That way, I could achieve some nice pre-ordering of moves by pushing
"interesting" moves (eg checks,captures etc) to the front, and "ordinary"
moves to the back. (The move-ordering is fairly important, as it improves
the efficiency of the minimax search algorithm.)
After spending way too much time rewriting my GenerateMoves() function to
use a deque instead of an array, I was horribly dissappointed to find that
my move-generation performance benchmark (which calls GenerateMoves() 1
million times) changed from 5 seconds to over 60 seconds !! Admittedly, the
deque is using dynamic allocation/deallocation, which is probably where most
of my CPU cycles were going.

Needless to say, I reverted back to the plain old array again. It ended up
being much cheaper to simply run a sort routine on the array at the end of
the function to achieve the move-ordering. I know that in theory, STL
containers should be very fast, but in my experience, I have always been
underwhelmed by the speed. Maybe it is the type of apps I write, or maybe I
just "don't get" modern OO paradigms. I don't know. I want to be a believer,
but I am struggling to keep the faith ...

I think you are right about using OO for algorithms. There is no question
that OO is ideal for designing interfaces, and I certainly don't have a
problem with that. If there are any STL gurus out there that want to comment
on this, then I'd love to hear from you ...

-Judd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top