How should unordered_map be used correctly ... ?

Discussion in 'C++' started by mast4as, Dec 19, 2010.

  1. mast4as

    mast4as Guest

    In the context of what I am trying to do. I am trying to get a
    technique implemented where cells are created in a linear table and
    indexed using a hash key. So after making a little bit of research on
    the web I decided to try to use unordered_map which I thought would
    save me time from having to lean how to write my own hash table ;-)
    However I can't get it to work apparently.

    * I wrote this small program but when I run it it tells me that the
    table size of that the end is 1?

    * I am guessing I am not using it properly because the hash function
    which is recommended in the paper I am trying to implement is
    computing an integer while the returned value from the hash function
    is a size_t type. So right there I must do something wrong already.

    * Now this where I am really confused about unordered map. If the
    returned type is size_t for the hash function does that mean that the
    map size can not be greater than 255 ? Sorry I am sure that will sound
    like an idiotic question from someone who has not understood what they
    are and how they work at all. But precisely I seek some light here and
    would love to hear some comments back on my questions...

    For what I am trying to do (using potentially more than a few million
    points positions to index an array of cell in 3D space) unordered_map
    are what I need ? Or do I need to write something of my own because
    they can't handle table size that big ?

    Thanks a lot for your help and sorry for being confused. If you have
    also good reference on hash map technique I would be very grateful (of
    course I am searching on my side in the meantime).

    -coralie

    #include <cstdio>
    #include <cmath>
    #include <cstdlib>

    #include <tr1/unordered_map>

    typedef struct point {
    float x, y, z;
    };

    struct hashfunc {
    size_t operator() (const point &pt) const { printf("in here\n");
    return 541 * pt.x + 79 * pt.y + 31 * pt.z; }
    };

    struct setequal {
    bool operator() (const point &pa, const point &pb) const { return
    (pa.x == pb.x); }
    };

    typedef std::tr1::unordered_map<point, int, hashfunc, setequal>
    tablegrid;

    float signedrand() { (drand48()-0.5)*100; }

    int main (int argc, const char * argv[])
    {
    int dim = 128;
    int gridsize = dim * dim * dim;
    printf("grid size %d\n", gridsize);
    tablegrid grid;
    point pt;
    for (int i = 0; i < gridsize; ++i) {
    if (i%64==0) fprintf(stderr, "\b\b\b\b%d%c", int(100*i/
    float(gridsize-1)), '%');
    pt.x = signedrand(); pt.y = signedrand(); pt.z = signedrand();
    grid[pt] = int(drand48() * 100);
    }
    printf("\nsize of table %d\n", grid.size());
    return 0;
    }
     
    mast4as, Dec 19, 2010
    #1
    1. Advertising

  2. mast4as

    Öö Tiib Guest

    On Dec 19, 5:33 pm, mast4as <> wrote:
    > In the context of what I am trying to do. I am trying to get a
    > technique implemented where cells are created in a linear table and
    > indexed using a hash key. So after making a little bit of research on
    > the web I decided to try to use unordered_map which I thought would
    > save me time from having to lean how to write my own hash table ;-)
    > However I can't get it to work apparently.
    >
    > * I wrote this small program but when I run it it tells me that the
    > table size of that the end is 1?
    >
    > * I am guessing I am not using it properly because the hash function
    > which is recommended in the paper I am trying to implement is
    > computing an integer while the returned value from the hash function
    > is a size_t type. So right there I must do something wrong already.
    >
    > * Now this where I am really confused about unordered map. If the
    > returned type is size_t for the hash function does that mean that the
    > map size can not be greater than 255 ? Sorry I am sure that will sound
    > like an idiotic question from someone who has not understood what they
    > are and how they work at all. But precisely I seek some light here and
    > would love to hear some comments back on my questions...
    >
    > For what I am trying to do (using potentially more than a few million
    > points positions to index an array of cell in 3D space) unordered_map
    > are what I need ? Or do I need to write something of my own because
    > they can't handle table size that big ?
    >
    > Thanks a lot for your help and sorry for being confused. If you have
    > also good reference on hash map technique I would be very grateful (of
    > course I am searching on my side in the meantime).
    >
    > -coralie
    >
    > #include <cstdio>
    > #include <cmath>
    > #include <cstdlib>
    >
    > #include <tr1/unordered_map>
    >
    > typedef struct point {
    >         float x, y, z;
    >
    > };
    >
    > struct hashfunc {
    >     size_t operator() (const point &pt) const { printf("in here\n");
    > return 541 * pt.x + 79 * pt.y + 31 * pt.z; }
    >
    > };
    >
    > struct setequal {
    >     bool operator() (const point &pa, const point &pb) const { return
    > (pa.x == pb.x); }
    >
    > };


    Seems confusing name ... "hasSameX" or something? Seems you are
    written it for strange reasons.

    >
    > typedef std::tr1::unordered_map<point, int, hashfunc, setequal>
    > tablegrid;



    'point' is a key and 'int' is mapped type? Seems like it is not what
    you described that you are trying to achieve.


    > > float signedrand() { (drand48()-0.5)*100; }

    >
    > int main (int argc, const char * argv[])
    > {
    >         int dim = 128;
    >         int gridsize = dim * dim * dim;
    >         printf("grid size %d\n", gridsize);
    >         tablegrid grid;
    >         point pt;
    >         for (int i = 0; i < gridsize; ++i) {
    >                 if (i%64==0) fprintf(stderr, "\b\b\b\b%d%c", int(100*i/
    > float(gridsize-1)), '%');
    >                 pt.x = signedrand(); pt.y = signedrand(); pt.z = signedrand();
    >                 grid[pt] = int(drand48() * 100);
    >         }
    >         printf("\nsize of table %d\n", grid.size());
    >     return 0;
    >
    >
    >
    > }
     
    Öö Tiib, Dec 19, 2010
    #2
    1. Advertising

  3. mast4as

    mast4as Guest

    > Seems confusing name ... "hasSameX" or something? Seems you are
    > written it for strange reasons.


    Sorry, I am not too sure to understand what you mean by that ;-)

    > 'point' is a key and 'int' is mapped type? Seems like it is not what
    > you described that you are trying to achieve.


    Hum yes the mapped type is an int indeed because I was trying to see
    if I could unordered_map to work before going any further. But in
    reality for what I want to do it would be a pointer to a cell
    structure/class ...

    class Cell {...};

    typedef std::tr1::unordered_map<point, Cell *, hashfunc, setequal>
    tablegrid;
     
    mast4as, Dec 19, 2010
    #3
  4. mast4as

    Öö Tiib Guest

    On Dec 19, 6:29 pm, mast4as <> wrote:
    > > Seems confusing name ... "hasSameX" or something? Seems you are
    > > written it for strange reasons.

    >
    > Sorry, I am not too sure to understand what you mean by that ;-)


    I meant that i am confused by that functior type name. ;) Should it be
    for checking equality of hashes? That are points? 'set' prefix is
    usually used for mutator function names.

    > > 'point' is a key and 'int' is mapped type? Seems like it is not what
    > > you described that you are trying to achieve.

    >
    > Hum yes the mapped type is an int indeed because I was trying to see
    > if I could unordered_map to work before going any further. But in
    > reality for what I want to do it would be a pointer to a cell
    > structure/class ...
    >
    > class Cell {...};
    >
    > typedef std::tr1::unordered_map<point, Cell *, hashfunc, setequal>
    > tablegrid;


    So that hashfunc should calculate hash (point) from a mapped type
    (Cell*)? It is seemingly not what it is doing.
     
    Öö Tiib, Dec 19, 2010
    #4
  5. mast4as

    mast4as Guest

    Oops and there was a bug in my code ...

    signedrand didn't return a value...

    float signedrand() { return (drand48()-0.5)*100; }

    With this corrected the map has a size = to the number of points
    "inserted". So that's much better ;-)

    grid size 2097152
    size of table 2097147

    How is that possible ? Considering that hash function is only
    returning a value of size_t how can it be that the value is so big ? I
    am confused... I would imagine that a lot of the points used in the
    hash function would return the same value (and therefore I was
    expected a table much smaller).

    Any help ? thank you -c
     
    mast4as, Dec 19, 2010
    #5
  6. mast4as

    Öö Tiib Guest

    On Dec 19, 6:49 pm, Öö Tiib <> wrote:
    >
    > So that hashfunc should calculate hash (point) from a mapped type
    > (Cell*)? It is seemingly not what it is doing.


    No it should calculate int from key type (point).
     
    Öö Tiib, Dec 19, 2010
    #6
  7. mast4as

    mast4as Guest

    Sorry I will try to explain again what I am trying to do. I have a set
    of points (particles in 3D space) and need to insert them in voxels.
    Instead of creating a static 3D grid, I want to use a dynamic one. The
    idea is to do something like that:

    for (each point in point set)
    find cell for point p (using a hash map)
    if cell doesn't exist create it (insert it in hash map)
    insert p in cell

    so the paper recommends to use a hash map for this. What they do is
    that they use this hash function

    int hashfunc(const point &pt) { return 541 * pt.x + 79 * pt.y + 31 *
    pt.z; }

    where pt is the key indeed. The result of the function (hash) being
    used to make a lookup in the table. If the bucket (cell) for that hash
    exists then I return it otherwise I need to create it.

    I hope it makes things clearer

    Thank you very much -c
     
    mast4as, Dec 19, 2010
    #7
  8. mast4as

    Öö Tiib Guest

    On Dec 19, 6:55 pm, mast4as <> wrote:
    > Oops and there was a bug in my code ...
    >
    > signedrand didn't return a value...
    >
    > float signedrand() { return (drand48()-0.5)*100; }
    >
    > With this corrected the map has a size = to the number of points
    > "inserted". So that's much better ;-)
    >
    > grid size 2097152
    > size of table 2097147
    >
    > How is that possible ? Considering that hash function is only
    > returning a value of size_t how can it be that the value is so big ? I
    > am confused... I would imagine that a lot of the points used in the
    > hash function would return the same value (and therefore I was
    > expected a table much smaller).


    Perhaps the point::x you compare in setequal collides 5 times.
     
    Öö Tiib, Dec 19, 2010
    #8
  9. mast4as

    Öö Tiib Guest

    On Dec 19, 7:04 pm, mast4as <> wrote:
    > Sorry I will try to explain again what I am trying to do. I have a set
    > of points (particles in 3D space) and need to insert them in voxels.
    > Instead of creating a static 3D grid, I want to use a dynamic one. The
    > idea is to do something like that:
    >
    > for (each point in point set)
    >   find cell for point p (using a hash map)
    >   if cell doesn't exist create it (insert it in hash map)
    >   insert p in cell
    >
    > so the paper recommends to use a hash map for this. What they do is
    > that they use this hash function
    >
    > int hashfunc(const point &pt) { return 541 * pt.x + 79 * pt.y + 31 *
    > pt.z; }
    >
    > where pt is the key indeed. The result of the function (hash) being
    > used to make a lookup in the table. If the bucket (cell) for that hash
    > exists then I return it otherwise I need to create it.
    >
    > I hope it makes things clearer


    OK. It is clearer. But like i understand unordered_map is only weakly
    sorted by hash. Hash causes some sort of "buckets" of same order in
    map. In buckets the equality is decided by your "setequal".
     
    Öö Tiib, Dec 19, 2010
    #9
  10. mast4as

    mast4as Guest


    > OK. It is clearer. But like i understand unordered_map is only weakly
    > sorted by hash. Hash causes some sort of "buckets" of same order in
    > map. In buckets the equality is decided by your "setequal".


    Ah thank you ... I guessed that already through your previous answer
    so decided that maybe things would be easier for all of us ;-) if i
    was actually finished my code ha ha ha... so here it is... it seems to
    work. Let me know if you see anything that I am doing wrong (I mean I
    know I use struct and all that which is not for the C++ purist but I
    am just prototyping here to understand how unordered_map works. Thx a
    lot for your feedbacks.

    Thx Öö Tiib (not sure how to pronounce your name ;-)

    -c


    #include <cstdio>
    #include <cmath>
    #include <cstdlib>

    #include <tr1/unordered_map>

    //using namespace stdext;
    //http://stackoverflow.com/questions/2099540/defining-custom-hash-
    function-and-equality-function-for-unordered-map


    template<typename T>
    struct point {
    T x, y, z;
    };

    struct hashfunc {
    size_t operator() (const point<int> &pt) const { return 541 * pt.x
    + 79 * pt.y + 31 * pt.z; }
    };

    struct setequal {
    bool operator() (const point<int> &pa, const point<int> &pb) const
    { return (pa.x == pb.x && pa.y == pb.y && pa.z == pb.z); }
    };

    typedef struct voxel {
    float data;
    };

    typedef std::tr1::unordered_map<point<int>, voxel *, hashfunc,
    setequal> tablegrid;

    float signedrand() { return (drand48()-0.5)*2; }

    int main (int argc, const char * argv[])
    {
    int dim = 1;
    float cellsize = 0.01;
    //int gridsize = dim * dim * dim;
    //printf("grid size %d\n", gridsize);
    int npts = 1e6;
    tablegrid grid;
    point<int> pt;
    for (int i = 0; i < npts; ++i) {
    if (i%512==0) fprintf(stderr, "\b\b\b\b%d%c", int(100*i/
    float(npts-1)), '%');
    pt.x = int(dim * signedrand() / cellsize);
    pt.y = int(dim * signedrand() / cellsize);
    pt.z = int(dim * signedrand() / cellsize);
    tablegrid::const_iterator it = grid.find(pt);
    if (it != grid.end()) {
    // do nothing
    }
    else {
    grid[pt] = new voxel;
    }
    //printf("val %d\n", hashfunc(pt));
    }
    printf("\nsize of table %d\n", grid.size());
    // delete mem
    tablegrid::const_iterator it = grid.begin();
    for (; it != grid.end(); ++it) {
    delete it->second;
    }
    printf("delete mem done\n", grid.size());
    return 0;
    }
     
    mast4as, Dec 19, 2010
    #10
  11. mast4as

    James Kanze Guest

    On Dec 19, 3:33 pm, mast4as <> wrote:
    > In the context of what I am trying to do. I am trying to get
    > a technique implemented where cells are created in a linear
    > table and indexed using a hash key. So after making a little
    > bit of research on the web I decided to try to use
    > unordered_map which I thought would save me time from having
    > to lean how to write my own hash table ;-) However I can't get
    > it to work apparently.


    > * I wrote this small program but when I run it it tells me that the
    > table size of that the end is 1?


    > * I am guessing I am not using it properly because the hash
    > function which is recommended in the paper I am trying to
    > implement is computing an integer while the returned value
    > from the hash function is a size_t type. So right there I must
    > do something wrong already.


    There's an implicit conversion of int to size_t, although in
    general, calculating a hash function is something you do using
    unsigned values.

    > * Now this where I am really confused about unordered map. If the
    > returned type is size_t for the hash function does that mean that the
    > map size can not be greater than 255 ?


    From where do you get the value 255? And of course, the maximum
    hash value doesn't limit the number of elements in an unordered
    map.

    > Sorry I am sure that will sound like an idiotic question from
    > someone who has not understood what they are and how they work
    > at all. But precisely I seek some light here and would love to
    > hear some comments back on my questions...


    > For what I am trying to do (using potentially more than a few million
    > points positions to index an array of cell in 3D space) unordered_map
    > are what I need ? Or do I need to write something of my own because
    > they can't handle table size that big ?


    > Thanks a lot for your help and sorry for being confused. If you have
    > also good reference on hash map technique I would be very grateful (of
    > course I am searching on my side in the meantime).


    > #include <cstdio>
    > #include <cmath>
    > #include <cstdlib>


    > #include <tr1/unordered_map>


    > typedef struct point {
    > float x, y, z;
    > };


    > struct hashfunc {
    > size_t operator() (const point &pt) const { printf("in here\n");
    > return 541 * pt.x + 79 * pt.y + 31 * pt.z; }
    > };


    Not too sure that this is a good hash function. (In fact, I'm
    fairly sure it isn't, in general.) But generating a good hash
    function for floating point values is a bit tricky. What sort
    of values do x, y and z take on?

    > struct setequal {
    > bool operator() (const point &pa, const point &pb) const { return
    > (pa.x == pb.x); }
    > };


    > typedef std::tr1::unordered_map<point, int, hashfunc, setequal>
    > tablegrid;


    There seems to be an inconsistency in these three definitions.
    An essential constraint of unordered_map is that if Pred(a) ==
    Pred(b), Hash(a) == Hash(b). In your case, Pred only takes into
    account the x values, Hash takes all three into account. So
    that something like:
    point { 10, 20, 30 };
    and
    point { 10, 0, 0 };
    will compare equal, but have different hash values.

    Not meeting the constraints of the instantiation arguments of
    a template results in undefined behavior. Anything can happen.

    > float signedrand() { (drand48()-0.5)*100; }


    What's this? What do your return? (This is, I think, what is
    actually causing your error: when I try it with g++, the
    function systematically returns nan. But again, it's undefined
    behavior.)

    --
    James Kanze
     
    James Kanze, Dec 20, 2010
    #11
  12. mast4as

    mast4as Guest

    Hey James

    Thanks for your input. If you look at my latest post (well the one
    before this one ;-) I have posted the complete code in which I have
    made the changes you mentioned. For instance signedrand now return a
    value. And the comparison is now between all the elements of the
    point.

    You said the hash function I was using isn't good but that's the one
    they use in the paper i am following. I am converting the particles
    positions (floating point) into discrete coordinates (grid cell
    positions) so at the end my x, y, z values for the hash function are
    integers.

    As for the 256 sorry to show maybe a big hole in my knowledge here but
    I assumed that a size_t could only encode 256 values (8 bits). I am
    probably mistaken so if you can shred some light on this I would be
    more than happy. I want to understand ;-)

    Thanks a lot -c
     
    mast4as, Dec 20, 2010
    #12
  13. mast4as

    mast4as Guest


    >
    > As for the 256 sorry to show maybe a big hole in my knowledge here but
    > I assumed that a size_t could only encode 256 values (8 bits). I am
    > probably mistaken so if you can shred some light on this I would be
    > more than happy. I want to understand ;-)


    So I am complete fool okay okay I hear you whistles already

    typedef unsigned int size_t;

    I should better. Listen I have been using C++ for quite a few years
    now and I always always had another meaning for size_t in mind.
    That's good. Today I have learned something. That's what happened
    where you are self taught I presume. Thanks a lot everyone for your
    input.
    -c
     
    mast4as, Dec 20, 2010
    #13
  14. mast4as

    mast4as Guest


    > > struct hashfunc {
    > >     size_t operator() (const point &pt) const { printf("in here\n");
    > > return 541 * pt.x + 79 * pt.y + 31 * pt.z; }
    > > };

    >
    > Not too sure that this is a good hash function.  (In fact, I'm
    > fairly sure it isn't, in general.)  But generating a good hash
    > function for floating point values is a bit tricky.  What sort
    > of values do x, y and z take on?


    Oh so last question then. I remap the float values to integer but they
    are signed ones. So what happens in that case if the returned value is
    an unsigned int. Will it work or do I need to get x, y z as unsigned
    integers ? In the current version of the code I have it seems to work
    with integer. So even when z y or z are negative they seem to be
    properly inserted in the table. I don't know why (technically) but
    practically it does ;-)))
    -c
     
    mast4as, Dec 20, 2010
    #14
  15. mast4as

    James Kanze Guest

    On Dec 20, 10:01 pm, mast4as <> wrote:

    [...]
    > You said the hash function I was using isn't good but that's the one
    > they use in the paper i am following.


    I said it's not good in general. It may be perfectly adequate
    for the set of values you will probably encounter. (And
    creating a truly good hash function for float or double, which
    will be effective for almost any set of input, is extremely
    difficult.)

    > I am converting the particles
    > positions (floating point) into discrete coordinates (grid cell
    > positions) so at the end my x, y, z values for the hash function are
    > integers.


    > As for the 256 sorry to show maybe a big hole in my knowledge here but
    > I assumed that a size_t could only encode 256 values (8 bits). I am
    > probably mistaken so if you can shred some light on this I would be
    > more than happy. I want to understand ;-)


    According to the standard, size_t is a typedef to an unsigned
    integral type large enough to contain the size of the largest
    possible object. On most systems (probably all modern systems),
    it is the same size as a pointer.

    --
    James Kanze
     
    James Kanze, Dec 21, 2010
    #15
  16. mast4as

    James Kanze Guest

    On Dec 20, 10:14 pm, mast4as <> wrote:
    > > > struct hashfunc {
    > > > size_t operator() (const point &pt) const { printf("in here\n");
    > > > return 541 * pt.x + 79 * pt.y + 31 * pt.z; }
    > > > };


    > > Not too sure that this is a good hash function. (In fact, I'm
    > > fairly sure it isn't, in general.) But generating a good hash
    > > function for floating point values is a bit tricky. What sort
    > > of values do x, y and z take on?


    > Oh so last question then. I remap the float values to integer but they
    > are signed ones. So what happens in that case if the returned value is
    > an unsigned int. Will it work or do I need to get x, y z as unsigned
    > integers ? In the current version of the code I have it seems to work
    > with integer. So even when z y or z are negative they seem to be
    > properly inserted in the table. I don't know why (technically) but
    > practically it does ;-)))


    Interesting question; I had to look it up: when converting
    a floating point type to an integral type, the conversion
    truncates (rounds to zero); the behavior is undefined if the
    resulting value cannot be represented in the target type. So
    formally, if the destination type is unsigned, the behavior is
    undefined.

    Practically, converting a signed integral type to an unsigned is
    always defined (the results are modulo 2^n, where n is the
    number of bits), and I suspect that most implementations actually
    applie the same rules, or something similar, for converting
    float to integral type. If you want to be sure, however,
    convert first to int, then to size_t.

    If your floating point expression results in absolute values
    larger than the max of a size_t, you will likely have problems.
    (IIRC, your x, y and z were in the range of -50...50, so this
    shouldn't be a problem in your case. It would be if they could
    take on any real values.)

    --
    James Kanze
     
    James Kanze, Dec 21, 2010
    #16
  17. On Dec 20, 10:06 pm, mast4as <> wrote:
    > > As for the 256 sorry to show maybe a big hole in my knowledge here but
    > > I assumed that a size_t could only encode 256 values (8 bits). I am
    > > probably mistaken so if you can shred some light on this I would be
    > > more than happy. I want to understand ;-)

    >
    > So I am complete fool okay okay I hear you whistles already
    >
    > typedef unsigned int size_t;
    >
    > I should better. Listen I have been using C++ for quite a few years
    > now and I always always had another meaning for size_t in mind.
    > That's good. Today I have learned something. That's what happened
    > where you are self taught I presume. Thanks a lot everyone for your
    > input.


    size_t is an alias (a typedef) for an unsigned integral type large
    enough to hold an array index. For instance C's malloc() (memory
    allocating) function returns a size_t. A size_t cannot possibly be so
    small that it will only hold 255 values.
     
    Nick Keighley, Dec 30, 2010
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paulo Matos

    Template problem with unordered_map

    Paulo Matos, Aug 3, 2006, in forum: C++
    Replies:
    4
    Views:
    467
    Paulo Matos
    Aug 3, 2006
  2. Rares Vernica

    error with tr1 unordered_map iterator

    Rares Vernica, Feb 24, 2007, in forum: C++
    Replies:
    6
    Views:
    1,816
  3. abir
    Replies:
    6
    Views:
    878
    W Karas
    Jun 26, 2008
  4. abir
    Replies:
    3
    Views:
    1,146
  5. Replies:
    2
    Views:
    6,672
Loading...

Share This Page