How should unordered_map be used correctly ... ?

M

mast4as

In the context of what I am trying to do. I am trying to get a
technique implemented where cells are created in a linear table and
indexed using a hash key. So after making a little bit of research on
the web I decided to try to use unordered_map which I thought would
save me time from having to lean how to write my own hash table ;-)
However I can't get it to work apparently.

* I wrote this small program but when I run it it tells me that the
table size of that the end is 1?

* I am guessing I am not using it properly because the hash function
which is recommended in the paper I am trying to implement is
computing an integer while the returned value from the hash function
is a size_t type. So right there I must do something wrong already.

* Now this where I am really confused about unordered map. If the
returned type is size_t for the hash function does that mean that the
map size can not be greater than 255 ? Sorry I am sure that will sound
like an idiotic question from someone who has not understood what they
are and how they work at all. But precisely I seek some light here and
would love to hear some comments back on my questions...

For what I am trying to do (using potentially more than a few million
points positions to index an array of cell in 3D space) unordered_map
are what I need ? Or do I need to write something of my own because
they can't handle table size that big ?

Thanks a lot for your help and sorry for being confused. If you have
also good reference on hash map technique I would be very grateful (of
course I am searching on my side in the meantime).

-coralie

#include <cstdio>
#include <cmath>
#include <cstdlib>

#include <tr1/unordered_map>

typedef struct point {
float x, y, z;
};

struct hashfunc {
size_t operator() (const point &pt) const { printf("in here\n");
return 541 * pt.x + 79 * pt.y + 31 * pt.z; }
};

struct setequal {
bool operator() (const point &pa, const point &pb) const { return
(pa.x == pb.x); }
};

typedef std::tr1::unordered_map<point, int, hashfunc, setequal>
tablegrid;

float signedrand() { (drand48()-0.5)*100; }

int main (int argc, const char * argv[])
{
int dim = 128;
int gridsize = dim * dim * dim;
printf("grid size %d\n", gridsize);
tablegrid grid;
point pt;
for (int i = 0; i < gridsize; ++i) {
if (i%64==0) fprintf(stderr, "\b\b\b\b%d%c", int(100*i/
float(gridsize-1)), '%');
pt.x = signedrand(); pt.y = signedrand(); pt.z = signedrand();
grid[pt] = int(drand48() * 100);
}
printf("\nsize of table %d\n", grid.size());
return 0;
}
 
Ö

Öö Tiib

In the context of what I am trying to do. I am trying to get a
technique implemented where cells are created in a linear table and
indexed using a hash key. So after making a little bit of research on
the web I decided to try to use unordered_map which I thought would
save me time from having to lean how to write my own hash table ;-)
However I can't get it to work apparently.

* I wrote this small program but when I run it it tells me that the
table size of that the end is 1?

* I am guessing I am not using it properly because the hash function
which is recommended in the paper I am trying to implement is
computing an integer while the returned value from the hash function
is a size_t type. So right there I must do something wrong already.

* Now this where I am really confused about unordered map. If the
returned type is size_t for the hash function does that mean that the
map size can not be greater than 255 ? Sorry I am sure that will sound
like an idiotic question from someone who has not understood what they
are and how they work at all. But precisely I seek some light here and
would love to hear some comments back on my questions...

For what I am trying to do (using potentially more than a few million
points positions to index an array of cell in 3D space) unordered_map
are what I need ? Or do I need to write something of my own because
they can't handle table size that big ?

Thanks a lot for your help and sorry for being confused. If you have
also good reference on hash map technique I would be very grateful (of
course I am searching on my side in the meantime).

-coralie

#include <cstdio>
#include <cmath>
#include <cstdlib>

#include <tr1/unordered_map>

typedef struct point {
        float x, y, z;

};

struct hashfunc {
    size_t operator() (const point &pt) const { printf("in here\n");
return 541 * pt.x + 79 * pt.y + 31 * pt.z; }

};

struct setequal {
    bool operator() (const point &pa, const point &pb) const { return
(pa.x == pb.x); }

};

Seems confusing name ... "hasSameX" or something? Seems you are
written it for strange reasons.
typedef std::tr1::unordered_map<point, int, hashfunc, setequal>
tablegrid;


'point' is a key and 'int' is mapped type? Seems like it is not what
you described that you are trying to achieve.

float signedrand() { (drand48()-0.5)*100; }

int main (int argc, const char * argv[])
{
        int dim = 128;
        int gridsize = dim * dim * dim;
        printf("grid size %d\n", gridsize);
        tablegrid grid;
        point pt;
        for (int i = 0; i < gridsize; ++i) {
                if (i%64==0) fprintf(stderr, "\b\b\b\b%d%c", int(100*i/
float(gridsize-1)), '%');
                pt.x = signedrand(); pt.y = signedrand(); pt.z = signedrand();
                grid[pt] = int(drand48() * 100);
        }
        printf("\nsize of table %d\n", grid.size());
    return 0;



}
 
M

mast4as

Seems confusing name ... "hasSameX" or something? Seems you are
written it for strange reasons.

Sorry, I am not too sure to understand what you mean by that ;-)
'point' is a key and 'int' is mapped type? Seems like it is not what
you described that you are trying to achieve.

Hum yes the mapped type is an int indeed because I was trying to see
if I could unordered_map to work before going any further. But in
reality for what I want to do it would be a pointer to a cell
structure/class ...

class Cell {...};

typedef std::tr1::unordered_map<point, Cell *, hashfunc, setequal>
tablegrid;
 
Ö

Öö Tiib

Sorry, I am not too sure to understand what you mean by that ;-)

I meant that i am confused by that functior type name. ;) Should it be
for checking equality of hashes? That are points? 'set' prefix is
usually used for mutator function names.
Hum yes the mapped type is an int indeed because I was trying to see
if I could unordered_map to work before going any further. But in
reality for what I want to do it would be a pointer to a cell
structure/class ...

class Cell {...};

typedef std::tr1::unordered_map<point, Cell *, hashfunc, setequal>
tablegrid;

So that hashfunc should calculate hash (point) from a mapped type
(Cell*)? It is seemingly not what it is doing.
 
M

mast4as

Oops and there was a bug in my code ...

signedrand didn't return a value...

float signedrand() { return (drand48()-0.5)*100; }

With this corrected the map has a size = to the number of points
"inserted". So that's much better ;-)

grid size 2097152
size of table 2097147

How is that possible ? Considering that hash function is only
returning a value of size_t how can it be that the value is so big ? I
am confused... I would imagine that a lot of the points used in the
hash function would return the same value (and therefore I was
expected a table much smaller).

Any help ? thank you -c
 
Ö

Öö Tiib

So that hashfunc should calculate hash (point) from a mapped type
(Cell*)? It is seemingly not what it is doing.

No it should calculate int from key type (point).
 
M

mast4as

Sorry I will try to explain again what I am trying to do. I have a set
of points (particles in 3D space) and need to insert them in voxels.
Instead of creating a static 3D grid, I want to use a dynamic one. The
idea is to do something like that:

for (each point in point set)
find cell for point p (using a hash map)
if cell doesn't exist create it (insert it in hash map)
insert p in cell

so the paper recommends to use a hash map for this. What they do is
that they use this hash function

int hashfunc(const point &pt) { return 541 * pt.x + 79 * pt.y + 31 *
pt.z; }

where pt is the key indeed. The result of the function (hash) being
used to make a lookup in the table. If the bucket (cell) for that hash
exists then I return it otherwise I need to create it.

I hope it makes things clearer

Thank you very much -c
 
Ö

Öö Tiib

Oops and there was a bug in my code ...

signedrand didn't return a value...

float signedrand() { return (drand48()-0.5)*100; }

With this corrected the map has a size = to the number of points
"inserted". So that's much better ;-)

grid size 2097152
size of table 2097147

How is that possible ? Considering that hash function is only
returning a value of size_t how can it be that the value is so big ? I
am confused... I would imagine that a lot of the points used in the
hash function would return the same value (and therefore I was
expected a table much smaller).

Perhaps the point::x you compare in setequal collides 5 times.
 
Ö

Öö Tiib

Sorry I will try to explain again what I am trying to do. I have a set
of points (particles in 3D space) and need to insert them in voxels.
Instead of creating a static 3D grid, I want to use a dynamic one. The
idea is to do something like that:

for (each point in point set)
  find cell for point p (using a hash map)
  if cell doesn't exist create it (insert it in hash map)
  insert p in cell

so the paper recommends to use a hash map for this. What they do is
that they use this hash function

int hashfunc(const point &pt) { return 541 * pt.x + 79 * pt.y + 31 *
pt.z; }

where pt is the key indeed. The result of the function (hash) being
used to make a lookup in the table. If the bucket (cell) for that hash
exists then I return it otherwise I need to create it.

I hope it makes things clearer

OK. It is clearer. But like i understand unordered_map is only weakly
sorted by hash. Hash causes some sort of "buckets" of same order in
map. In buckets the equality is decided by your "setequal".
 
M

mast4as

OK. It is clearer. But like i understand unordered_map is only weakly
sorted by hash. Hash causes some sort of "buckets" of same order in
map. In buckets the equality is decided by your "setequal".

Ah thank you ... I guessed that already through your previous answer
so decided that maybe things would be easier for all of us ;-) if i
was actually finished my code ha ha ha... so here it is... it seems to
work. Let me know if you see anything that I am doing wrong (I mean I
know I use struct and all that which is not for the C++ purist but I
am just prototyping here to understand how unordered_map works. Thx a
lot for your feedbacks.

Thx Öö Tiib (not sure how to pronounce your name ;-)

-c


#include <cstdio>
#include <cmath>
#include <cstdlib>

#include <tr1/unordered_map>

//using namespace stdext;
//http://stackoverflow.com/questions/2099540/defining-custom-hash-
function-and-equality-function-for-unordered-map


template<typename T>
struct point {
T x, y, z;
};

struct hashfunc {
size_t operator() (const point<int> &pt) const { return 541 * pt.x
+ 79 * pt.y + 31 * pt.z; }
};

struct setequal {
bool operator() (const point<int> &pa, const point<int> &pb) const
{ return (pa.x == pb.x && pa.y == pb.y && pa.z == pb.z); }
};

typedef struct voxel {
float data;
};

typedef std::tr1::unordered_map<point<int>, voxel *, hashfunc,
setequal> tablegrid;

float signedrand() { return (drand48()-0.5)*2; }

int main (int argc, const char * argv[])
{
int dim = 1;
float cellsize = 0.01;
//int gridsize = dim * dim * dim;
//printf("grid size %d\n", gridsize);
int npts = 1e6;
tablegrid grid;
point<int> pt;
for (int i = 0; i < npts; ++i) {
if (i%512==0) fprintf(stderr, "\b\b\b\b%d%c", int(100*i/
float(npts-1)), '%');
pt.x = int(dim * signedrand() / cellsize);
pt.y = int(dim * signedrand() / cellsize);
pt.z = int(dim * signedrand() / cellsize);
tablegrid::const_iterator it = grid.find(pt);
if (it != grid.end()) {
// do nothing
}
else {
grid[pt] = new voxel;
}
//printf("val %d\n", hashfunc(pt));
}
printf("\nsize of table %d\n", grid.size());
// delete mem
tablegrid::const_iterator it = grid.begin();
for (; it != grid.end(); ++it) {
delete it->second;
}
printf("delete mem done\n", grid.size());
return 0;
}
 
J

James Kanze

In the context of what I am trying to do. I am trying to get
a technique implemented where cells are created in a linear
table and indexed using a hash key. So after making a little
bit of research on the web I decided to try to use
unordered_map which I thought would save me time from having
to lean how to write my own hash table ;-) However I can't get
it to work apparently.
* I wrote this small program but when I run it it tells me that the
table size of that the end is 1?
* I am guessing I am not using it properly because the hash
function which is recommended in the paper I am trying to
implement is computing an integer while the returned value
from the hash function is a size_t type. So right there I must
do something wrong already.

There's an implicit conversion of int to size_t, although in
general, calculating a hash function is something you do using
unsigned values.
* Now this where I am really confused about unordered map. If the
returned type is size_t for the hash function does that mean that the
map size can not be greater than 255 ?

From where do you get the value 255? And of course, the maximum
hash value doesn't limit the number of elements in an unordered
map.
Sorry I am sure that will sound like an idiotic question from
someone who has not understood what they are and how they work
at all. But precisely I seek some light here and would love to
hear some comments back on my questions...
For what I am trying to do (using potentially more than a few million
points positions to index an array of cell in 3D space) unordered_map
are what I need ? Or do I need to write something of my own because
they can't handle table size that big ?
Thanks a lot for your help and sorry for being confused. If you have
also good reference on hash map technique I would be very grateful (of
course I am searching on my side in the meantime).
#include <cstdio>
#include <cmath>
#include <cstdlib>
#include <tr1/unordered_map>
typedef struct point {
float x, y, z;
};
struct hashfunc {
size_t operator() (const point &pt) const { printf("in here\n");
return 541 * pt.x + 79 * pt.y + 31 * pt.z; }
};

Not too sure that this is a good hash function. (In fact, I'm
fairly sure it isn't, in general.) But generating a good hash
function for floating point values is a bit tricky. What sort
of values do x, y and z take on?
struct setequal {
bool operator() (const point &pa, const point &pb) const { return
(pa.x == pb.x); }
};
typedef std::tr1::unordered_map<point, int, hashfunc, setequal>
tablegrid;

There seems to be an inconsistency in these three definitions.
An essential constraint of unordered_map is that if Pred(a) ==
Pred(b), Hash(a) == Hash(b). In your case, Pred only takes into
account the x values, Hash takes all three into account. So
that something like:
point { 10, 20, 30 };
and
point { 10, 0, 0 };
will compare equal, but have different hash values.

Not meeting the constraints of the instantiation arguments of
a template results in undefined behavior. Anything can happen.
float signedrand() { (drand48()-0.5)*100; }

What's this? What do your return? (This is, I think, what is
actually causing your error: when I try it with g++, the
function systematically returns nan. But again, it's undefined
behavior.)
 
M

mast4as

Hey James

Thanks for your input. If you look at my latest post (well the one
before this one ;-) I have posted the complete code in which I have
made the changes you mentioned. For instance signedrand now return a
value. And the comparison is now between all the elements of the
point.

You said the hash function I was using isn't good but that's the one
they use in the paper i am following. I am converting the particles
positions (floating point) into discrete coordinates (grid cell
positions) so at the end my x, y, z values for the hash function are
integers.

As for the 256 sorry to show maybe a big hole in my knowledge here but
I assumed that a size_t could only encode 256 values (8 bits). I am
probably mistaken so if you can shred some light on this I would be
more than happy. I want to understand ;-)

Thanks a lot -c
 
M

mast4as

As for the 256 sorry to show maybe a big hole in my knowledge here but
I assumed that a size_t could only encode 256 values (8 bits). I am
probably mistaken so if you can shred some light on this I would be
more than happy. I want to understand ;-)

So I am complete fool okay okay I hear you whistles already

typedef unsigned int size_t;

I should better. Listen I have been using C++ for quite a few years
now and I always always had another meaning for size_t in mind.
That's good. Today I have learned something. That's what happened
where you are self taught I presume. Thanks a lot everyone for your
input.
-c
 
M

mast4as

Not too sure that this is a good hash function.  (In fact, I'm
fairly sure it isn't, in general.)  But generating a good hash
function for floating point values is a bit tricky.  What sort
of values do x, y and z take on?

Oh so last question then. I remap the float values to integer but they
are signed ones. So what happens in that case if the returned value is
an unsigned int. Will it work or do I need to get x, y z as unsigned
integers ? In the current version of the code I have it seems to work
with integer. So even when z y or z are negative they seem to be
properly inserted in the table. I don't know why (technically) but
practically it does ;-)))
-c
 
J

James Kanze

[...]
You said the hash function I was using isn't good but that's the one
they use in the paper i am following.

I said it's not good in general. It may be perfectly adequate
for the set of values you will probably encounter. (And
creating a truly good hash function for float or double, which
will be effective for almost any set of input, is extremely
difficult.)
I am converting the particles
positions (floating point) into discrete coordinates (grid cell
positions) so at the end my x, y, z values for the hash function are
integers.
As for the 256 sorry to show maybe a big hole in my knowledge here but
I assumed that a size_t could only encode 256 values (8 bits). I am
probably mistaken so if you can shred some light on this I would be
more than happy. I want to understand ;-)

According to the standard, size_t is a typedef to an unsigned
integral type large enough to contain the size of the largest
possible object. On most systems (probably all modern systems),
it is the same size as a pointer.
 
J

James Kanze

Oh so last question then. I remap the float values to integer but they
are signed ones. So what happens in that case if the returned value is
an unsigned int. Will it work or do I need to get x, y z as unsigned
integers ? In the current version of the code I have it seems to work
with integer. So even when z y or z are negative they seem to be
properly inserted in the table. I don't know why (technically) but
practically it does ;-)))

Interesting question; I had to look it up: when converting
a floating point type to an integral type, the conversion
truncates (rounds to zero); the behavior is undefined if the
resulting value cannot be represented in the target type. So
formally, if the destination type is unsigned, the behavior is
undefined.

Practically, converting a signed integral type to an unsigned is
always defined (the results are modulo 2^n, where n is the
number of bits), and I suspect that most implementations actually
applie the same rules, or something similar, for converting
float to integral type. If you want to be sure, however,
convert first to int, then to size_t.

If your floating point expression results in absolute values
larger than the max of a size_t, you will likely have problems.
(IIRC, your x, y and z were in the range of -50...50, so this
shouldn't be a problem in your case. It would be if they could
take on any real values.)
 
N

Nick Keighley

So I am complete fool okay okay I hear you whistles already

typedef unsigned int size_t;

I should better. Listen I have been using C++ for quite a few years
now and I always always had another meaning for size_t in mind.
That's good. Today I have learned something. That's what happened
where you are self taught I presume. Thanks a lot everyone for your
input.

size_t is an alias (a typedef) for an unsigned integral type large
enough to hold an array index. For instance C's malloc() (memory
allocating) function returns a size_t. A size_t cannot possibly be so
small that it will only hold 255 values.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top