hash_map

A

aaragon

Hello everyone,

I have a VERY BIG set of double values that I want to map to intervals
so I thought a clever way to do this was using a hash table. Let's say
that I want to map all double values in the range 0-0.5 to a single
std::pair<double,double>.

This is what I've done so far:

#include <iostream>
#include <ext/hash_map>
#include <boost/functional/hash.hpp>

using namespace __gnu_cxx;
using namespace std;

struct eqstr
{
bool operator()(const double& o, const double& p) const
{
return (o == p);
}
};

namespace __gnu_cxx{

template<>
struct hash<double>
{
size_t operator()(double __x) const
{
boost::hash<double> double_hash;
return double_hash(__x);
}
};

}

void lookup(const hash_map<double, pair<double,double> , hash<double>,
eqstr>& Map,
const double number)
{
hash_map<double, pair<double,double> , hash<double>,
eqstr>::const_iterator it
= Map.find(number);
cout << number << ": "
<< (it != Map.end() ? "present" : "not present")
<< endl;
}

int main()
{
hash_map<double, pair<double,double> , hash<double>, eqstr> HashMap;
HashMap.insert(make_pair(0.1,make_pair(0.,0.5)));
lookup(HashMap,0.1);
lookup(HashMap,0.05);
}

aaragon@aaragon-laptop:~/Desktop$ ./a.out
0.1: present
0.05: not present

Now, the thing is that I can't map the value of 0.05 to the same pair
because my hashing function doesn't to this. Any ideas???

Thank you,

a^2
 
B

BobR

aaragon said:
Hello everyone,

I have a VERY BIG set of double values that I want to map to intervals
so I thought a clever way to do this was using a hash table. Let's say
that I want to map all double values in the range 0-0.5 to a single
std::pair<double,double>.

This is what I've done so far:

#include <iostream>
#include <ext/hash_map>

non-standard header. (AFAIK)
#include <boost/functional/hash.hpp>

non-standard header.
using namespace __gnu_cxx;
OUCH!!

using namespace std;
ouch!!


struct eqstr
{
bool operator()(const double& o, const double& p) const
{
return (o == p);

Comparing doubles for equality is most often bad [1]. Compare to a range.

if( ( o + 0.0001 > p ) && ( o - 0.0001 < p ) ){ return true;}
return false;

Output the numbers using a stream with 'fixed' and a big 'precision', and
see if they are (really/almost) equal (in the limited output (rounding
errors in stream output)).

Not sure this is your problem, but, it (==) should be fixed.
}
};

namespace __gnu_cxx{

Are you a GNU systems/compiler developer?
template<> struct hash<double> {
size_t operator()(double __x) const {
boost::hash<double> double_hash;
return double_hash(__x);
}
};

}

void lookup(const hash_map<double, pair<double,double> , hash<double>,
eqstr>& Map, const double number){
hash_map<double, pair<double,double> , hash<double>,
eqstr>::const_iterator it = Map.find(number);
cout << number << ": "
<< (it != Map.end() ? "present" : "not present")
<< endl;
}

int main(){
hash_map<double, pair<double,double> , hash<double>, eqstr> HashMap;
HashMap.insert(make_pair(0.1,make_pair(0.,0.5)));
lookup(HashMap,0.1);
lookup(HashMap,0.05);
}

aaragon@aaragon-laptop:~/Desktop$ ./a.out
0.1: present
0.05: not present

Now, the thing is that I can't map the value of 0.05 to the same pair
because my hashing function doesn't to this. Any ideas???
Thank you,
a^2

[1]
{
std::eek:stringstream sos;
sos.setf( std::ios_base::fixed, std::ios::floatfield );
sos.precision( 40 );

double num( 0.1 );

sos<<"double num( 0.1 )="<<num<<std::endl;
std::cout<<sos.str()<<std::endl;
}
// out: double num( 0.1 )=0.10000000000000001
 
?

=?iso-8859-1?q?Erik_Wikstr=F6m?=

Hello everyone,

I have a VERY BIG set of double values that I want to map to intervals
so I thought a clever way to do this was using a hash table. Let's say
that I want to map all double values in the range 0-0.5 to a single
std::pair<double,double>.
[snip]

Now, the thing is that I can't map the value of 0.05 to the same pair
because my hashing function doesn't to this. Any ideas???

I don't know how big your VERY BIG set of doubles is, but unless you
have performed some profiling using "standard solutions" I'd suggest
you do that before trying to optimize. In this case that means to use
std::map. Sure a good hash table is O(1), but if you don't have a good
hash-function you might end up with O(n) instead, a map is guaranteed
to be logarithmic, always.

There's of course the problem with comparing doubles like Bob R
pointed out, but a custom comparison functor would take care of that.
 
J

James Kanze

I have a VERY BIG set of double values that I want to map to intervals
so I thought a clever way to do this was using a hash table. Let's say
that I want to map all double values in the range 0-0.5 to a single
std::pair<double,double>.
Now, the thing is that I can't map the value of 0.05 to the same pair
because my hashing function doesn't to this. Any ideas???
I don't know how big your VERY BIG set of doubles is, but unless you
have performed some profiling using "standard solutions" I'd suggest
you do that before trying to optimize. In this case that means to use
std::map. Sure a good hash table is O(1), but if you don't have a good
hash-function you might end up with O(n) instead, a map is guaranteed
to be logarithmic, always.
There's of course the problem with comparing doubles like Bob R
pointed out, but a custom comparison functor would take care of that.

There are two problems: finding a hash function, and comparing.
For the first, I've yet to find a good solution to generate a
hash code from a double. It's far from trivial. For the
second... there's no guarantee that std::map won't have it as
well. There are two possible problems:

-- He has two double values which really aren't equal. In such
a case, std::map will not find the first using the second as
a key. If they're not equal, they're not equal.

-- He is calculating two values dynamically which should be
equal; the comparison function does something like:

bool operator()( Obj const& lhs, Obj const& rhs ) const
{
return lhs.f() < rhs.f() ;
}

Under certain conditions, with certain compilers, on certain
systems, this results in an unstable comparison functions,
because the intermediate results may be in extended
precision, and because the compiler spills one to memory
(thus reducing it to double precision), but keeps the other
in registers (in extended precision).

I'm very sceptical of the possibilities of using double for an
index, unless the doubles have a known source which ensures an
application correct rounding before they are used as an index.
 
J

James Kanze

aaragon <[email protected]> wrote in message ...

[...]
Comparing doubles for equality is most often bad [1].

Not really. Comparing doubles when you don't know what you're
doing is usually bad.
Compare to a range.

Rarely solves anything.
if( ( o + 0.0001 > p ) && ( o - 0.0001 < p ) ){ return true;}
return false;

Which isn't a legal comparison function for anything, since it
isn't transitive. To use double as a key in a hash table, you
need an equivalence relationship and a hash function which is
compatible with that equivalence relationship: a == b implies
hash(a) == hash(b).
Output the numbers using a stream with 'fixed' and a big
'precision', and see if they are (really/almost) equal (in the
limited output (rounding errors in stream output)).

And check how the hash function works. I've yet to find a good
portable way to hash doubles. (Non portably, you could probably
extract the exponent, sign and the mantissa as integer values,
and hash them as integer values. With special handling for 0,
so that +0.0 and -0.0 hash to the same value. And special
handling for infinities and NaN's---in some applications,
assertion failure would be an acceptable special handling in
such cases.)
Not sure this is your problem, but, it (==) should be fixed.
Are you a GNU systems/compiler developer?

GNU currently provides a hash table similar to the one that will
be in the next version of the standard, and similar to the one
provided by most other compiler vendors. Since it isn't
standard (yet), they place it in a special namespace, to
indicate that it is a compiler specific extension. Obviously,
the name of this namespace must be in the implementation's
namespace. Thus the __.

The rules for using this namespace are exactly the same as for
using std. GNU authorizes users to explicitly specialize
templates which it contains on user defined types.

The question here is what double_hash does. (I'm not sure, but
Boost seems to adopt a strategy roughly like what I described
before, using frexp and ldexp to get at the integral values, and
ignoring the sign (to avoid problems due to +/- 0.0). So it
should be correct, except maybe for infinity and NaNs, but it
probably also is slow enough that an std::map would be faster.)

I'm not sure what you're trying to do, but maybe rounding or
whatever should be done on the values before comparing or
calling the boost hash function. Or just comparing, using
std::map. Basically, etablish a canonical representation for
each equivalence category, and use it.
 
B

BobR

James Kanze wrote in message...
On Jun 14, 5:27 am, "BobR" wrote:

/* """ quote
Are you a GNU systems/compiler developer?

GNU currently provides a hash table similar to the one that will
be in the next version of the standard, and similar to the one
provided by most other compiler vendors. Since it isn't
standard (yet), they place it in a special namespace, to
indicate that it is a compiler specific extension. Obviously,
the name of this namespace must be in the implementation's
namespace. Thus the __.
""" */ unquote

I remember GNU haveing 'hashtable', then it disappeared (into the 'backward'
directory) due to standards. So, now it's (with revisions, I assume) comeing
back? Go figure. <G>

Thanks for the other info.
 
A

aaragon

[...]
struct eqstr
{
bool operator()(const double& o, const double& p) const
{
return (o == p);
Comparing doubles for equality is most often bad [1].

Not really. Comparing doubles when you don't know what you're
doing is usually bad.
Compare to a range.

Rarely solves anything.
if( ( o + 0.0001 > p ) && ( o - 0.0001 < p ) ){ return true;}
return false;

Which isn't a legal comparison function for anything, since it
isn't transitive. To use double as a key in a hash table, you
need an equivalence relationship and a hash function which is
compatible with that equivalence relationship: a == b implies
hash(a) == hash(b).
Output the numbers using a stream with 'fixed' and a big
'precision', and see if they are (really/almost) equal (in the
limited output (rounding errors in stream output)).

And check how the hash function works. I've yet to find a good
portable way to hash doubles. (Non portably, you could probably
extract the exponent, sign and the mantissa as integer values,
and hash them as integer values. With special handling for 0,
so that +0.0 and -0.0 hash to the same value. And special
handling for infinities and NaN's---in some applications,
assertion failure would be an acceptable special handling in
such cases.)
Not sure this is your problem, but, it (==) should be fixed.
Are you a GNU systems/compiler developer?

GNU currently provides a hash table similar to the one that will
be in the next version of the standard, and similar to the one
provided by most other compiler vendors. Since it isn't
standard (yet), they place it in a special namespace, to
indicate that it is a compiler specific extension. Obviously,
the name of this namespace must be in the implementation's
namespace. Thus the __.

The rules for using this namespace are exactly the same as for
using std. GNU authorizes users to explicitly specialize
templates which it contains on user defined types.

The question here is what double_hash does. (I'm not sure, but
Boost seems to adopt a strategy roughly like what I described
before, using frexp and ldexp to get at the integral values, and
ignoring the sign (to avoid problems due to +/- 0.0). So it
should be correct, except maybe for infinity and NaNs, but it
probably also is slow enough that an std::map would be faster.)



I'm not sure what you're trying to do, but maybe rounding or
whatever should be done on the values before comparing or
calling the boost hash function. Or just comparing, using
std::map. Basically, etablish a canonical representation for
each equivalence category, and use it.

--
James Kanze (GABI Software, from CAI) email:[email protected]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Thanks for all your suggestions. It seems that the problem I'm dealing
with is harder than I thought. The problem is as follows:
I have a series of intervals that are obtained at run time:

|_________________|_______________________________|
__________________|_____________|
0.1 0.5 1.3
1.5 1.65

Then, I have large data set of doubles that I need to map into each of
these intervals. Therefore, for example the double 0.56 would enter
the interval (0.5,1.3] (note that the interval is open on the left). I
could just use a sign approach:
double testSign = (0.56 - Imin)*(0.56*Imax);
where Imim and Imax correspond to the extreme values of the interval.
Of course, testSign would only be negative in the right interval so I
could use an if statement to increment a counter for that interval
whenever this happens;
if(testSign < 0)
incrementCounter(interval(i));

However, this would take a lot of time if I have a very BIG data set
(which is my case). So I thought that using a hashing function that
maps any double to the correct interval would be a very clever way to
do this in constant time. So basically, what I need is a function that
maps a double within an interval to a unique value so I can use the
hash_map provided by the __gnu_cxx namespace.
 
J

James Kanze

James Kanze wrote in message...
On Jun 14, 5:27 am, "BobR" wrote:
/* """ quote
GNU currently provides a hash table similar to the one that will
be in the next version of the standard, and similar to the one
provided by most other compiler vendors. Since it isn't
standard (yet), they place it in a special namespace, to
indicate that it is a compiler specific extension. Obviously,
the name of this namespace must be in the implementation's
namespace. Thus the __.
""" */ unquote

You're using OE, too:)?

(This didn't used to be a problem. Since I'm sure that OE
hasn't changed, it must be something either at Google or in our
firewall that has changed. I'm still looking.)
I remember GNU haveing 'hashtable', then it disappeared (into
the 'backward' directory) due to standards. So, now it's (with
revisions, I assume) comeing back? Go figure. <G>

There was a proposal for a hash table right at the end of the
last standard's effort. It was too late to be really
considered, but many (most) implementations added it anyway.
Regretfully, they added it each with their own subtle variations
and differences.

It was quickly realized that adding things like this to the
standard namespace was not a good idea. Which left the library
writers in a crunch; they could move it to a private namespace
where it belonged, but this would break user code. G++ chose to
break user code, VC++ no.

When the standards committee got back to the question, they were
faced with the fact that many implementations had, or had had, a
variant already in std::, and that the specifications for that
hash table varied. Rather than create even more problems with
existing user code, they chose to rename it unordered_[set,map].

The last time I looked at it, the specification used separate
template arguments for the equivalence function and the hash
function, which seems like a sure recepe for incompatible
versions to me. But nothing is final yet.
 
J

James Kanze

Thanks for all your suggestions. It seems that the problem I'm
dealing with is harder than I thought.

That's a frequent case with floating point:).
The problem is as follows: I have a series of intervals that
are obtained at run time:
|_________________|_______________________________|__________________|_____________|
0.1 0.5 1.3 1.5 1.65
Then, I have large data set of doubles that I need to map into each of
these intervals. Therefore, for example the double 0.56 would enter
the interval (0.5,1.3] (note that the interval is open on the left). I
could just use a sign approach:
double testSign = (0.56 - Imin)*(0.56*Imax);
where Imim and Imax correspond to the extreme values of the interval.
Of course, testSign would only be negative in the right interval so I
could use an if statement to increment a counter for that interval
whenever this happens;
if(testSign < 0)
incrementCounter(interval(i));
However, this would take a lot of time if I have a very BIG data set
(which is my case). So I thought that using a hashing function that
maps any double to the correct interval would be a very clever way to
do this in constant time. So basically, what I need is a function that
maps a double within an interval to a unique value so I can use the
hash_map provided by the __gnu_cxx namespace.

To calculate the hash code, however, you need to normalize the
value in the interval. Hash codes require a very strong concept
of equivalence.

This can be done very easily with std::map, which uses an
ordering relationship. And while std::map is O(lg n), rather
than O(1), it typically takes a very big set for the difference
to be measurable.
 
A

aaragon

Thanks for all your suggestions. It seems that the problem I'm
dealing with is harder than I thought.

That's a frequent case with floating point:).


The problem is as follows: I have a series of intervals that
are obtained at run time:
|_________________|_______________________________|__________________|_____________|
0.1 0.5 1.3 1.5 1.65
Then, I have large data set of doubles that I need to map into each of
these intervals. Therefore, for example the double 0.56 would enter
the interval (0.5,1.3] (note that the interval is open on the left). I
could just use a sign approach:
double testSign = (0.56 - Imin)*(0.56*Imax);
where Imim and Imax correspond to the extreme values of the interval.
Of course, testSign would only be negative in the right interval so I
could use an if statement to increment a counter for that interval
whenever this happens;
if(testSign < 0)
incrementCounter(interval(i));
However, this would take a lot of time if I have a very BIG data set
(which is my case). So I thought that using a hashing function that
maps any double to the correct interval would be a very clever way to
do this in constant time. So basically, what I need is a function that
maps a double within an interval to a unique value so I can use the
hash_map provided by the __gnu_cxx namespace.

To calculate the hash code, however, you need to normalize the
value in the interval. Hash codes require a very strong concept
of equivalence.

This can be done very easily with std::map, which uses an
ordering relationship. And while std::map is O(lg n), rather
than O(1), it typically takes a very big set for the difference
to be measurable.

Well, with the map you still need a one to one correspondence between
keys and values. The different running time complexity relies in the
different data structures that are used by the map and the hash table
but they need the same mapping. Therefore, I just need to find a
function that maps several double values to a single std::pair (like a
mathematical function pair<double,double> = f(double)). Once I have
this, I could implement the mapping using either the std::map or the
hash table, it doesn't matter.
 
B

BobR

James Kanze wrote in message...
James Kanze wrote in message...
/* """ quote
""" */ unquote
You're using OE, too:)?
(This didn't used to be a problem. Since I'm sure that OE
hasn't changed, it must be something either at Google or in our
firewall that has changed. I'm still looking.)

Yep, OE. I am currently doing most reading on the win partition.
I'm still 'tweaking' my GNU Debian partition (rebuilding all after
an base HD went south on me [1]), so, I don't want to load the
newsgroups twice. Will eventially go GNU, and keep win for my
Adventure games (or use Wine when I can).

[1] Hard Drives used to last 10 years (mtbf), but seems like they only
last 2-3 years now. Hard to find small ( <100G) drives anymore
(salvage yard is my friend <G>).
Add one month to the warranty == fail. ;-{

Thanks for the 'hash table' info.
 
J

James Kanze

On Jun 15, 4:03 am, James Kanze <[email protected]> wrote:
Well, with the map you still need a one to one correspondence between
keys and values. The different running time complexity relies in the
different data structures that are used by the map and the hash table
but they need the same mapping.

I don't think so. A map or set is ordered, so you can e.g.
insert just the lower bounds of the ranges, and use lower_bound
to look it up.
Therefore, I just need to find a
function that maps several double values to a single std::pair (like a
mathematical function pair<double,double> = f(double)). Once I have
this, I could implement the mapping using either the std::map or the
hash table, it doesn't matter.

You can use std::map to implement that function. Or, even and
std::vector with the ranges, sorted, and then use
std::lower_bound for the lookup.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top