Hash of None varies per-machine

Discussion in 'Python' started by ben.taylor@email.com, Apr 3, 2009.

  1. Guest

    Found this while trying to do something unrelated and was curious...

    If you hash an integer (eg. hash(3)) you get the same integer out. If
    you hash a string you also get an integer. If you hash None you get an
    integer again, but the integer you get varies depending on which
    machine you're running python on (which isn't true for numbers and
    strings).

    This raises the following questions:
    1. Is it correct that if you hash two things that are not equal they
    might give you the same hash value? Like, for instance, None and the
    number 261862182320 (which is what my machine gives me if I hash
    None). Note this is just an example, I'm aware hashing integers is
    probably daft. I'm guessing that's fine, since you can't hash
    something to a number without colliding with that number (or at least
    without hashing the number to something else, like hashing every
    number to itself * 2, which would then mean you couldn't hash very
    large numbers)
    2. Should the hash of None vary per-machine? I can't think why you'd
    write code that would rely on the value of the hash of None, but you
    might I guess.
    3. Given that presumably not all things can be hashed (since the
    documentation description of hash() says it gives you the hash of the
    object "if it can be hashed"), should None be hashable?

    Bit esoteric perhaps, but like I said, I'm curious. ;-)

    Ben
     
    , Apr 3, 2009
    #1
    1. Advertising

  2. Paul Rubin Guest

    writes:
    > 1. Is it correct that if you hash two things that are not equal they
    > might give you the same hash value?


    Yes, hashes are 32 bit numbers and there are far more than 2**32
    possible Python values (think of long ints), so obviously there must
    be multiple values that hash to the same slot.

    > 2. Should the hash of None vary per-machine?


    If the docs say this shouldn't happen, then it's a bug. Otherwise,
    it should probably be considered ok.

    > 3. Given that presumably not all things can be hashed (since the
    > documentation description of hash() says it gives you the hash of the
    > object "if it can be hashed"), should None be hashable?


    Yes, anything that can be used as a dict key (basically all immutable
    values with equality comparison) should be hashable.
     
    Paul Rubin, Apr 3, 2009
    #2
    1. Advertising

  3. Paul Rubin <http://> writes:
    >
    > writes:
    > > 1. Is it correct that if you hash two things that are not equal they
    > > might give you the same hash value?

    >
    > Yes, hashes are 32 bit numbers and there are far more than 2**32
    > possible Python values (think of long ints), so obviously there must
    > be multiple values that hash to the same slot.


    This is not true. CPython integers, at least up through the 2.x
    series, are implemented as C *long integers*; on some platforms, this
    means that they're 32 bits long. But on an increasing number of
    platforms, long integes are 64 bits long.

    But, more specifically, consider the following:

    > > 2. Should the hash of None vary per-machine?

    >
    > If the docs say this shouldn't happen, then it's a bug. Otherwise,
    > it should probably be considered ok.
    >
    > > 3. Given that presumably not all things can be hashed (since the
    > > documentation description of hash() says it gives you the hash of the
    > > object "if it can be hashed"), should None be hashable?

    >
    > Yes, anything that can be used as a dict key (basically all immutable
    > values with equality comparison) should be hashable.


    My recollection is that what you're seeing here is that, when hash()
    doesn't have any `proper value' to use other than object-identity, it
    just returns the result of id(). And id() is documented as:

    Return the "identity" of an object. This is an integer (or long
    integer) which is guaranteed to be unique and constant for this
    object during its lifetime. Two objects with non-overlapping
    lifetimes may have the same id() value. (Implementation note:
    this is the address of the object.)

    So, not only is the return-value from id() (and hash(), if there's not
    actually a __hash__ method defined) non-portable between different
    machines, it's not even necessarily portable between two *runs* on the
    *same* machine.

    In practice, your OS will probably start each new process with the
    same virtual memory-address range, and a given *build* of Python will
    probably initialise the portion of its memory-segment leading up to
    the None-object the same way each time, but....

    --
    Don't be afraid to ask (Lf.((Lx.xx) (Lr.f(rr)))).
     
    Joshua Judson Rosen, Apr 3, 2009
    #3
  4. Dave Angel Guest

    wrote:
    > Found this while trying to do something unrelated and was curious...
    >
    > If you hash an integer (eg. hash(3)) you get the same integer out. If
    > you hash a string you also get an integer. If you hash None you get an
    > integer again, but the integer you get varies depending on which
    > machine you're running python on (which isn't true for numbers and
    > strings).
    >
    > This raises the following questions:
    > 1. Is it correct that if you hash two things that are not equal they
    > might give you the same hash value? Like, for instance, None and the
    > number 261862182320 (which is what my machine gives me if I hash
    > None). Note this is just an example, I'm aware hashing integers is
    > probably daft. I'm guessing that's fine, since you can't hash
    > something to a number without colliding with that number (or at least
    > without hashing the number to something else, like hashing every
    > number to itself * 2, which would then mean you couldn't hash very
    > large numbers)
    > 2. Should the hash of None vary per-machine? I can't think why you'd
    > write code that would rely on the value of the hash of None, but you
    > might I guess.
    > 3. Given that presumably not all things can be hashed (since the
    > documentation description of hash() says it gives you the hash of the
    > object "if it can be hashed"), should None be hashable?
    >
    > Bit esoteric perhaps, but like I said, I'm curious. ;-)
    >
    > Ben
    >
    >

    1. Most definitely. Every definition of hash (except for "perfect
    hash") makes it a many-to-one mapping. Its only intent is to reduce the
    likelihood of collision between dissimilar objects. And Python's spec
    that says that integers, longs and floats that are equal are guaranteed
    the same hash value is a new one for me. Thanks for making me look it up.

    2. Nothing guarantees that the Python hash() will return the same value
    for the same object between implementations, or even between multiple
    runs with the same version on the same machine. In fact, the default
    hash for user-defined classes is the id() of the object, which will
    definitely vary between program runs. Currently, id() is implemented to
    just return the address of the object.

    3. Normally, it's just mutable objects that are unhashable. Since None
    is definitely immutable, it should have a hash. Besides, if it weren't
    hashable, it couldn't be usable as a key in a dictionary.

    All my opinions, of course.
    DaveA
     
    Dave Angel, Apr 3, 2009
    #4
  5. On Fri, 03 Apr 2009 10:50:08 -0700, ben.taylor wrote:

    > 1. Is it correct that if you hash two things that are not equal they
    > might give you the same hash value?


    Absolutely. From help(hash):

    hash(...)
    hash(object) -> integer

    Return a hash value for the object. Two objects with the same
    value have the same hash value. The reverse is not necessarily
    true, but likely.


    This is the pigeon-hole principle. On my PC, hash() returns a 32-bit
    integer between -2147483648 and 2147483647, so there are 2**32 unique
    hash values (the pigeon-holes). Presumably on 64-bit versions of Python,
    hash() will return a 64-bit result, giving 2**64 unique hash values. But
    there are an infinite number of possible objects which can be hashed (the
    pigeons), and so you have to have more than one pigeon per pigeon-hole.


    > Like, for instance, None and the
    > number 261862182320 (which is what my machine gives me if I hash None).
    > Note this is just an example, I'm aware hashing integers is probably
    > daft.


    Hashing has a very important role in Python, and you will be using it
    very often behind the scenes. You couldn't use integers as keys in
    dictionaries if they couldn't be hashed: another name for a dict is a
    "hash table".


    > I'm guessing that's fine, since you can't hash something to a
    > number without colliding with that number (or at least without hashing
    > the number to something else, like hashing every number to itself * 2,
    > which would then mean you couldn't hash very large numbers)


    You can hash numbers no matter how big they are.

    >>> hash(float('inf'))

    314159
    >>> hash(2**999)

    128



    > 2. Should the hash of None vary per-machine? I can't think why you'd
    > write code that would rely on the value of the hash of None, but you
    > might I guess.


    The value of hash(None) appears to be the value of id(None), which means
    it is the memory address that None happens to get, which means it will
    depend on the precise order that Python allocates things when it starts
    up, which will vary from platform to platform and version to version.



    > 3. Given that presumably not all things can be hashed (since the
    > documentation description of hash() says it gives you the hash of the
    > object "if it can be hashed"), should None be hashable?


    Any object can be hashed if it has a working __hash__ method. There's no
    reason not to have None hashable -- it costs nothing and allows you to
    use None as a dict key.



    --
    Steven
     
    Steven D'Aprano, Apr 4, 2009
    #5
  6. Steven D'Aprano <> wrote:

    > You can hash numbers no matter how big they are.


    > >>> hash(float('inf'))

    > 314159


    Cute. And hash(float('-inf')) is -271828...


    --
    Thomas Bellman, Lysator Computer Club, Linköping University, Sweden
    "God is real, but Jesus is an integer." ! bellman @ lysator.liu.se
    ! Make Love -- Nicht Wahr!
     
    Thomas Bellman, Apr 4, 2009
    #6
  7. "Steven D'Aprano" <steve@REMOVE..urce.com.au> wrote:

    On Fri, 03 Apr 2009 10:50:08 -0700, ben.taylor wrote:

    >> 2. Should the hash of None vary per-machine? I can't think why you'd
    >> write code that would rely on the value of the hash of None, but you
    >> might I guess.

    >
    >The value of hash(None) appears to be the value of id(None), which means
    >it is the memory address that None happens to get, which means it will
    >depend on the precise order that Python allocates things when it starts
    >up, which will vary from platform to platform and version to version.
    >
    >> 3. Given that presumably not all things can be hashed (since the
    >> documentation description of hash() says it gives you the hash of the
    >> object "if it can be hashed"), should None be hashable?

    >
    >Any object can be hashed if it has a working __hash__ method. There's no
    >reason not to have None hashable -- it costs nothing and allows you to
    >use None as a dict key.


    So what happens if I try to pickle the dict and keep it for next time?
    Will I be able to access whatever I have associated with None?
    (directly - mydict[None], not in a for loop.)
    And if I send the pickle to another machine and unpickle it,
    what then? - is unpickling smart enough to construct the dict
    with the local hash of None?

    - Seems to me that if it isn't, and you want to do this, there would
    have to be a fixed, well known value for the hash of None.

    - Hendrik
     
    Hendrik van Rooyen, Apr 4, 2009
    #7
  8. On Sat, 04 Apr 2009 13:09:06 +0200, Hendrik van Rooyen wrote:

    >>Any object can be hashed if it has a working __hash__ method. There's no
    >>reason not to have None hashable -- it costs nothing and allows you to
    >>use None as a dict key.

    >
    > So what happens if I try to pickle the dict and keep it for next time?


    You pickle the dict and keep it for next time.

    > Will I be able to access whatever I have associated with None?


    Yes.

    > (directly
    > - mydict[None], not in a for loop.) And if I send the pickle to another
    > machine and unpickle it, what then?


    It just works.

    > - is unpickling smart enough to
    > construct the dict with the local hash of None?


    Yes.

    > - Seems to me that if it isn't, and you want to do this, there would
    > have to be a fixed, well known value for the hash of None.


    Seems to me you have misunderstood the way pickling works.

    On one machine:

    >>> import pickle
    >>> pickle.dump({None: "hello world"}, open("pickled", 'w'))


    And then on another:

    >>> import pickle
    >>> pickle.load(open('pickled'))

    {None: 'hello world'}

    It just works.


    --
    Steven
     
    Steven D'Aprano, Apr 4, 2009
    #8
  9. "Steven D'Aprano" <steve@R..rsource.com.au> wrote:


    >Seems to me you have misunderstood the way pickling works.


    Yeah right - have you ever looked at the pickle code?

    Good to hear it "just works"

    :)

    - Hendrik
     
    Hendrik van Rooyen, Apr 4, 2009
    #9
  10. On 03 Apr 2009 10:57:05 -0700, Paul Rubin <http> wrote:
    > writes:
    >> 1. Is it correct that if you hash two things that are not equal they
    >> might give you the same hash value?

    >
    > Yes, hashes are 32 bit numbers and there are far more than 2**32
    > possible Python values (think of long ints), so obviously there must
    > be multiple values that hash to the same slot.


    For example, on this machine:

    Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
    [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.

    >>>> hash( "latox" ) == hash( "uzyky" )

    True

    YMMV.

    --
    To email me, substitute nowhere->spamcop, invalid->net.
     
    Peter Pearson, Apr 4, 2009
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kevin Frey
    Replies:
    0
    Views:
    434
    Kevin Frey
    Nov 19, 2006
  2. length power
    Replies:
    2
    Views:
    109
    Rustom Mody
    Apr 10, 2014
  3. Skip Montanaro
    Replies:
    0
    Views:
    73
    Skip Montanaro
    Apr 10, 2014
  4. Johannes Schneider

    Re: why i have the output of [None, None, None]

    Johannes Schneider, Apr 10, 2014, in forum: Python
    Replies:
    0
    Views:
    63
    Johannes Schneider
    Apr 10, 2014
  5. Terry Reedy
    Replies:
    0
    Views:
    72
    Terry Reedy
    Apr 10, 2014
Loading...

Share This Page