Sets are not extensional

C

Csaba Henk

(FYI: the phenomena described in the sequel were seen produced by the following
ruby instances: a vanilla 1.8.1, a Debianish 1.8.2 and a cvs snapshot
from the 10th of May, identifying itself as 1.9.0.)

require 'set'
#the empty set:
Set[]
=> #<Set: {}>
#the singleton of the empty set:
Set[ [Set[]] ]
=> #<Set: {[#<Set: {}>]}>
#if sets were extensional as God ordered, this would give the same, but...
Set[ [Set[], Set[]] ]
=> #<Set: {[#<Set: {}>, #<Set: {}>]}>
#and also...
Set[Set[]].member? Set[]
=> false
[Set[], Set[]].uniq
=> [#<Set: {}>, #<Set: {}>]
Set[ [Set[]] ] - Set[ [Set[]] ]
=> #<Set: {[#<Set: {}>]}>
#anyway...
Set[] == Set[]
=> true
Set[ [Set[]] ] == Set[ [Set[]] ]
=> false

(Set.new behaves the same way as Set[])

Now I don't see what's the purpose of having a set class which is not
extensional (ie., two sets behave as the same one in all respect if their
elements are the same). But, given it's non-extensional, why isn't it
consequently non-extensional ("Set[] == Set[]" evaluates in an extensional
manner, that is, to "true")? And how does Array#uniq work? -- by what rule
does it dispose elements if not by getting "true" by a "==" comparison with
an other member of the array?

--
Csaba

"There's more to life, than books, you know but not much more..."
[The Smiths]
***
If you want to send me a mail, see it in the mailto link at
http://www.renyi.hu/~ekho/egyelore.html
 
K

Kristof Bastiaensen

Set[Set[]].member? Set[]
=> false
And how does Array#uniq work? -- by what rule does it dispose elements
if not by getting "true" by a "==" comparison with an other member of
the array?

I think you raise a valid question. Set is implemented using hashtables,
and compares hashes instead of objects. A different hashtable gets a
different hash value, which is the reason for the strange(?) behaviour:

{}.hash == {}.hash
=> false
Set[].hash == Set[].hash
=> false
[].hash == [].hash
=> true

I have no idea for the reason why different hashes have different
hashvalues, while different arrays may get the same value (based on the
contents of the array).

KB
 
G

George Ogata

Kristof Bastiaensen said:
{}.hash == {}.hash
=> false
Set[].hash == Set[].hash
=> false
[].hash == [].hash
=> true

I have no idea for the reason why different hashes have different
hashvalues, while different arrays may get the same value (based on the
contents of the array).

This seems like a blatant bug to me. `a.eql? b' must imply `a.hash ==
b.hash', yet:

irb(main):001:0> a = {}; b = {}
=> {}
irb(main):002:0> a.eql?(b)
=> true
irb(main):003:0> a.hash == b.hash
=> false
 
C

Charles Mills

Kristof Bastiaensen said:
{}.hash == {}.hash
=> false
Set[].hash == Set[].hash
=> false
[].hash == [].hash
=> true

I have no idea for the reason why different hashes have different
hashvalues, while different arrays may get the same value (based on
the
contents of the array).

This seems like a blatant bug to me. `a.eql? b' must imply `a.hash ==
b.hash', yet:

I agree:
a = {} #=> {}
a.hash #=> 1650134
a.id #=> 1650134
b = {} #=> {}
b.hash #=> 1639814
# So hash on an empty hash returns the objects id....
a = {"a"=>"b"} #=> {"a"=>"b"}
a.hash #=> 1682094
a.clear #=> {}
a.hash #=> 1682094
# Looks like no two hashes hash to one another...
# Here is the reason
/* in object.c */
rb_mKernel = rb_define_module("Kernel");
rb_include_module(rb_cObject, rb_mKernel);
/* ..snip.. */
rb_define_method(rb_mKernel, "object_id", rb_obj_id, 0);
/* in hash.c the "hash" method is never overridden. So rb_obj_id() is
the "hash" method for hashes. */

-Charlie
 
R

Robert Klemme

Charles Mills said:
Kristof Bastiaensen said:
{}.hash == {}.hash
=> false
Set[].hash == Set[].hash
=> false
[].hash == [].hash
=> true

I have no idea for the reason why different hashes have different
hashvalues, while different arrays may get the same value (based on
the
contents of the array).

This seems like a blatant bug to me. `a.eql? b' must imply `a.hash ==
b.hash', yet:

Yes, this issue has come up before IIRC. Maybe Matz did something about
this but it's not included in a release version.
I agree:
# So hash on an empty hash returns the objects id....
# Looks like no two hashes hash to one another...
# Here is the reason
/* in object.c */
rb_mKernel = rb_define_module("Kernel");
rb_include_module(rb_cObject, rb_mKernel);
/* ..snip.. */
rb_define_method(rb_mKernel, "object_id", rb_obj_id, 0);
/* in hash.c the "hash" method is never overridden. So rb_obj_id() is
the "hash" method for hashes. */

Where do you take that conclusion from? From the source quotes you name,
there is no indication of a connection between #hash and #id. And you're
showing code for Kernel which is unrelated to this thread. The only
conclusion so far is, that Hash#hash is Object#hash (because it's not
overridden as you state) and thus can't take into account the contents of
the Hash. Can you please clarify this?

Kind regards

robert
 
F

Florian Gross

Moin!
The only
conclusion so far is, that Hash#hash is Object#hash (because it's not
overridden as you state) and thus can't take into account the contents of
the Hash. Can you please clarify this?

irb(main):001:0> {1 => 2}.method:)hash)
=> #<Method: Hash(Kernel)#hash>

So it is indeed not overwritten.
Kind regards
robert

More regards,
Florian Gross
 
R

Robert Klemme

Florian Gross said:
Moin!


irb(main):001:0> {1 => 2}.method:)hash)
=> #<Method: Hash(Kernel)#hash>

So it is indeed not overwritten.

Thanks for confirming. But I was after something differnt: The statement
I was referring to is
/* in hash.c the "hash" method is never overridden. So rb_obj_id() is
the "hash" method for hashes. */

I can't see the connection between hash and id - at least not in the code
snippets posted.

Regards

robert
 
C

Charles Mills

Charles Mills said:
{}.hash == {}.hash
=> false
Set[].hash == Set[].hash
=> false
[].hash == [].hash
=> true

I have no idea for the reason why different hashes have different
hashvalues, while different arrays may get the same value (based on
the
contents of the array).

This seems like a blatant bug to me. `a.eql? b' must imply `a.hash
==
b.hash', yet:

Yes, this issue has come up before IIRC. Maybe Matz did something
about
this but it's not included in a release version.
I agree:
# So hash on an empty hash returns the objects id....
# Looks like no two hashes hash to one another...
# Here is the reason
/* in object.c */
rb_mKernel = rb_define_module("Kernel");
rb_include_module(rb_cObject, rb_mKernel);
/* ..snip.. */
rb_define_method(rb_mKernel, "object_id", rb_obj_id, 0);

/* I intended to show this line: */
rb_define_method(rb_mKernel, "hash", rb_obj_id, 0);

Anyway, so the code above shows class Object including module Kernel.
Module Kernel defines "hash" as rb_obj_id(). So Object gets
rb_obj_id() as its "hash" method.
Where do you take that conclusion from? From the source quotes you
name,
there is no indication of a connection between #hash and #id. And
you're
showing code for Kernel which is unrelated to this thread. The only
conclusion so far is, that Hash#hash is Object#hash (because it's not
overridden as you state) and thus can't take into account the contents
of
the Hash. Can you please clarify this?

Don't forget you can always look at the source yourself :)
-Charlie
 
R

Robert Klemme

Charles Mills said:
Charles Mills said:
On Aug 5, 2004, at 6:41 PM, George Ogata wrote:


{}.hash == {}.hash
=> false
Set[].hash == Set[].hash
=> false
[].hash == [].hash
=> true

I have no idea for the reason why different hashes have different
hashvalues, while different arrays may get the same value (based on
the
contents of the array).

This seems like a blatant bug to me. `a.eql? b' must imply `a.hash
==
b.hash', yet:

Yes, this issue has come up before IIRC. Maybe Matz did something
about
this but it's not included in a release version.
I agree:
a = {} #=> {}
a.hash #=> 1650134
a.id #=> 1650134
b = {} #=> {}
b.hash #=> 1639814
# So hash on an empty hash returns the objects id....
a = {"a"=>"b"} #=> {"a"=>"b"}
a.hash #=> 1682094
a.clear #=> {}
a.hash #=> 1682094
# Looks like no two hashes hash to one another...
# Here is the reason
/* in object.c */
rb_mKernel = rb_define_module("Kernel");
rb_include_module(rb_cObject, rb_mKernel);
/* ..snip.. */
rb_define_method(rb_mKernel, "object_id", rb_obj_id, 0);

/* I intended to show this line: */
rb_define_method(rb_mKernel, "hash", rb_obj_id, 0);

Anyway, so the code above shows class Object including module Kernel.
Module Kernel defines "hash" as rb_obj_id(). So Object gets
rb_obj_id() as its "hash" method.

Ah, ok. A reasonable choice btw.
Don't forget you can always look at the source yourself :)

Oh, really? :) I do not have them here and was lacking a bit of time and
enthusiams. So I thought I convice you to post this yourself. :)

Kind regards

robert
 
C

Csaba Henk

require 'set'
#the empty set:
Set[]
=> #<Set: {}>
#the singleton of the empty set:
Set[ [Set[]] ]
=> #<Set: {[#<Set: {}>]}>
#if sets were extensional as God ordered, this would give the same, but...
Set[ [Set[], Set[]] ]
=> #<Set: {[#<Set: {}>, #<Set: {}>]}>
#and also...
Set[Set[]].member? Set[]
=> false
[Set[], Set[]].uniq
=> [#<Set: {}>, #<Set: {}>]
Set[ [Set[]] ] - Set[ [Set[]] ]
=> #<Set: {[#<Set: {}>]}>
#anyway...
Set[] == Set[]
=> true
Set[ [Set[]] ] == Set[ [Set[]] ]
=> false

(Set.new behaves the same way as Set[])

Now I don't see what's the purpose of having a set class which is not
extensional (ie., two sets behave as the same one in all respect if their
elements are the same). But, given it's non-extensional, why isn't it
consequently non-extensional ("Set[] == Set[]" evaluates in an extensional
manner, that is, to "true")? And how does Array#uniq work? -- by what rule
does it dispose elements if not by getting "true" by a "==" comparison with
an other member of the array?

I've read the thread spawned by this post, and now I summarize that part of
what was written there which is relevant to my original problem, and wasn't
stated there explicitly.

"Set" can be forced to behave extensionally by doing so:

class Set
def eql? other
subset? other and other.subset? self
end
def hash
to_a.hash
end
end

-- as it seems to me, Array#uniq uses *both* of eql? and hash methods, and
to get rid of the (IMHO pathological) behaviour illustrated by the above
examples, both methods need to be overwritten. Concerning the above
re-implementation of these methods, what I did with "hash" is just a hack, of
course, I don't claim that it clean or right to do so, it just something
which does the job.

--
Csaba

"There's more to life, than books, you know but not much more..."
[The Smiths]
***
If you want to send me a mail, see it in the mailto link at
http://www.renyi.hu/~ekho/egyelore.html
 
K

Kristof Bastiaensen

Hi,

[snip]
I've read the thread spawned by this post, and now I summarize that part
of what was written there which is relevant to my original problem, and
wasn't stated there explicitly.

I think the thread was relevant to your post, since it described
what was causing the bug. The following should solve the problems
(but can be very slow for large Hashes/Sets):

class Array
def hash
to_a.hash
end
end

Regards,
KB
 
C

Charles Mills

Hi,

[snip]
I've read the thread spawned by this post, and now I summarize that
part
of what was written there which is relevant to my original problem,
and
wasn't stated there explicitly.

I think the thread was relevant to your post, since it described
what was causing the bug. The following should solve the problems
(but can be very slow for large Hashes/Sets):

class Array
def hash
to_a.hash
end
end
I think you mean:

class Hash
def hash
to_a.hash
end
end

-Charlie
 
R

Reimer Behrends

Csaba Henk (csaba@phony_for_avoiding_spam.org) wrote:
[...]
"Set" can be forced to behave extensionally by doing so:

class Set
def eql? other
subset? other and other.subset? self
end
def hash
to_a.hash
end
end

Note that to_a.hash is effectively non-deterministic. If @hash == { 1 =>
true, 2 => true }, then to_a can be both [1, 2] and [2, 1], with
different hash values. To generate a correct hash function, you have to
use something like:

to_a.map{|el| el.hash}.sort.hash

or:

to_a.inject(0){|a,b| a ^ b.hash}

Reimer Behrends
 
K

Kristof Bastiaensen

Hi,

[snip]
I've read the thread spawned by this post, and now I summarize that
part
of what was written there which is relevant to my original problem, and
wasn't stated there explicitly.
I think the thread was relevant to your post, since it described what
was causing the bug. The following should solve the problems (but can
be very slow for large Hashes/Sets):

class Array
def hash
to_a.hash
end
end
I think you mean:

class Hash
def hash
to_a.hash
end
end

-Charlie

I did. Thanks!

KB
 
K

Kristof Bastiaensen

hi,

Note that to_a.hash is effectively non-deterministic. If @hash == { 1 =>
true, 2 => true }, then to_a can be both [1, 2] and [2, 1], with different
hash values.

As far as I know, hash.to_a always generates the same array.
The reason is that hashes are in fact unordered, so if you
give the same elements in a different order, it is still the
same Hash. That means that when hash1 == hash2, then
hash1.to_a == hash2.to_a. So fortunately there is no need to
use map or sort.

Regards,
KB
 
R

Reimer Behrends

Kristof Bastiaensen ([email protected]) wrote:
[...]
As far as I know, hash.to_a always generates the same array.

No. Hash#to_a will generate the same array for the same hash object,
but for different hash objects with equal keys and values, it will
not. Same for Hash#keys and Hash#values (Set#to_a is based on
Hash#keys).

$ cat hashtest.rb
h = { }; 20.times{|i| h = true };p h.keys
h2 = { }; 20.times{|i| h2[19-i] = true };p h2.keys
puts h == h2

$ ruby -v hashtest.rb
ruby 1.8.1 (2003-12-25) [i686-linux]
[16, 5, 11, 0, 17, 6, 12, 1, 18, 7, 13, 2, 19, 8, 14, 3, 9, 15, 4, 10]
[5, 16, 0, 11, 6, 17, 1, 12, 7, 18, 2, 13, 8, 19, 3, 14, 9, 4, 15, 10]
true

Reimer Behrends
 
R

Robert Klemme

Kristof Bastiaensen said:
Hi,

On Fri, 06 Aug 2004 18:08:57 +0000, Csaba Henk wrote:
[snip]
I've read the thread spawned by this post, and now I summarize that
part
of what was written there which is relevant to my original problem, and
wasn't stated there explicitly.


I think the thread was relevant to your post, since it described what
was causing the bug. The following should solve the problems (but can
be very slow for large Hashes/Sets):

class Array
def hash
to_a.hash
end
end
I think you mean:

class Hash
def hash
to_a.hash
end
end

-Charlie

I did. Thanks!

IMHO a solution that recalculates the hash value on each change is more
efficient. I imagine something like this (incomplete):

class FixedHash < Hash
def initialize(*a,&b)
super
calc_hash
end


def []=(key,val)
@hc ^= self[key].hash << 3 if has_key? key
@hc ^= val.hash << 3
super
end

def clear
super
calc_hash
end

# all other modifying methods must be
# changed, too

def hash() @hc end

private

HASH_MASK = (1 << 32) - 1

def calc_hash
@hc = 0
each {|k,v| @hc ^= (k.hash ^ (v.hash << 3)) }
@hc &= HASH_MASK
end
end

h = FixedHash.new => {}
h2 = FixedHash.new => {}
7.times{|i| h = true; h2[6-i] = true } => 7
h => {5=>true, 0=>true, 6=>true, 1=>true, 2=>true, 3=>true, 4=>true}
h2 => {5=>true, 0=>true, 6=>true, 1=>true, 2=>true, 3=>true, 4=>true}
h == h2 => true
h.sort

=> [[0, true], [1, true], [2, true], [3, true], [4, true], [5, true], [6,
true]]=> [[0, true], [1, true], [2, true], [3, true], [4, true], [5, true], [6,
true]]=> 16

Kind regards

robert
 
K

Kristof Bastiaensen

Kristof Bastiaensen ([email protected]) wrote: [...]
As far as I know, hash.to_a always generates the same array.

No. Hash#to_a will generate the same array for the same hash object, but
for different hash objects with equal keys and values, it will not. Same
for Hash#keys and Hash#values (Set#to_a is based on Hash#keys).

<snip>

You are correct. I had tested it with small Hashes ({1 => 2, 2 => 1}),
but of course in large Hashes objects which fall in the same bucket can
appear on different positions as you said. Perhaps the best solution is
to recalculate the Hashvalue each time (as Robert Klemme proposed).

Thanks,
KB
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,078
Latest member
MakersCBDBlood

Latest Threads

Top