Sets, uniqueness not unique.

Discussion in 'Ruby' started by Hugh Sasse, Sep 13, 2005.

  1. Hugh Sasse

    Hugh Sasse Guest

    I have been splitting a comma separated values file, and putting
    some of the values into an Student class, simply a collection of strings,
    so that I can build a database table from them:

    require 'set'

    class Student
    attr_accessor :forename, :surname, :birth_dt,
    :picture, :coll_status
    def initialize(forename0, surname0, birth_dt0,
    picture0, coll_status0)
    @forename = forename0
    @surname = surname0
    @birth_dt = birth_dt0
    @picture = picture0
    puts "in student.new() picture is #{picture0.inspect}, @picture is #{@picture.inspect} " if $debug
    @coll_status = coll_status0
    end

    def eql?(other)
    # if self.forename == "John" and other.forename == "John"
    debug = true
    # end
    res = [:forename, :surname, :birth_dt, :picture, :coll_status].all? do |msg|
    print "#{self.send(msg)} == #{(other.send(msg))} gives #{self.send(msg) == (other.send(msg))}" if debug
    self.send(msg) == (other.send(msg))
    end
    return res
    end

    def to_s
    "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}"
    end
    end

    And in the body of my program I read the records in from the csv and
    add the students if they are new. They tend to be clustered in the
    input, hence the last_student test.

    class TableMaker
    INPUT = "hugh.csv"

    ACCEPTED_MODULES = /^\"TECH(100[1-7]|200\d|201[01]|300\d|301[0-2])/

    # Read in the database and populate the tables.
    def initialize(input=INPUT)

    @students = Set.new()
    # [...]
    open(input, 'r') do |infp|
    while record = infp.gets
    record.chomp!
    puts "record is #{record}"
    forename, surname, birth_dt, institution_id, aos_code,
    various, other, fields,
    picture, coll_status, full_desc = record.split(/\s*\|\s*/)

    next unless aos_code =~ ACCEPTED_MODULES

    puts "from record, picture is [#{picture.inspect}]." if $debug
    # Structures for student
    student = Student.new(forename, surname, birth_dt, picture, coll_status)
    if student == last_student
    student = last_student
    else
    student.freeze

    # Avoid duplicates
    unless @students.include? student
    @students.add student
    end
    last_student = student
    end
    # [...]
    end
    end
    end

    # [...]

    end


    This being a Set I don't really need the call to include? now, but
    it's there (from when I was using a hash for this).

    I find two things that seem odd to me:

    1. eql? is never getting called, despite include?.

    2. I end up with duplicate students.

    Sets *can't* hold duplicates, and include depends on eql? for Sets.
    So what's going on? I have checked, and the duplicate students seem to
    have identical strings, so I wrote the eql? to be sure.

    I bet this will be a self.kick(self) reason, but I can't see it yet.

    Thank you,
    Hugh
    Hugh Sasse, Sep 13, 2005
    #1
    1. Advertising

  2. Hi --

    On Wed, 14 Sep 2005, Hugh Sasse wrote:

    > I have been splitting a comma separated values file, and putting
    > some of the values into an Student class, simply a collection of strings,
    > so that I can build a database table from them:

    [...]
    > picture, coll_status, full_desc = record.split(/\s*\|\s*/)


    I notice you mentioned comma separation but you're splitting on a
    pipe. I don't know if this is related to the problem, but I thought
    I'd flag it just in case.

    Can you provide a couple of sample lines of data?


    David

    --
    David A. Black
    David A. Black, Sep 13, 2005
    #2
    1. Advertising

  3. Hugh Sasse

    Ara.T.Howard Guest

    On Wed, 14 Sep 2005, Hugh Sasse wrote:

    > require 'set'
    >
    > class Student
    > attr_accessor :forename, :surname, :birth_dt,
    > :picture, :coll_status
    > def initialize(forename0, surname0, birth_dt0,
    > picture0, coll_status0)
    > @forename = forename0
    > @surname = surname0
    > @birth_dt = birth_dt0
    > @picture = picture0
    > puts "in student.new() picture is #{picture0.inspect}, @picture is
    > #{@picture.inspect} " if $debug
    > @coll_status = coll_status0
    > end
    >
    > def eql?(other)
    > # if self.forename == "John" and other.forename == "John"
    > debug = true
    > # end
    > res = [:forename, :surname, :birth_dt, :picture, :coll_status].all? do
    > |msg|
    > print "#{self.send(msg)} == #{(other.send(msg))} gives #{self.send(msg)
    > == (other.send(msg))}" if debug
    > self.send(msg) == (other.send(msg))
    > end
    > return res
    > end
    >
    > def to_s
    > "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}"
    > end
    > end


    well this works:

    s0 = Student::new 'a', 'b', 'c', 'd', 'e'
    s1 = Student::new 'a', 'b', 'c', 'd', 'e'
    p(s0.eql?(s1)) #=> true

    but this doesn't

    p s0 == s1 #=> false

    > And in the body of my program I read the records in from the csv and
    > add the students if they are new. They tend to be clustered in the
    > input, hence the last_student test.
    >
    > class TableMaker
    > INPUT = "hugh.csv"
    >
    > ACCEPTED_MODULES = /^\"TECH(100[1-7]|200\d|201[01]|300\d|301[0-2])/
    >
    > # Read in the database and populate the tables.
    > def initialize(input=INPUT)
    >
    > @students = Set.new()
    > # [...]
    > open(input, 'r') do |infp|
    > while record = infp.gets
    > record.chomp!


    try : record.strip!

    > puts "record is #{record}"
    > forename, surname, birth_dt, institution_id, aos_code,
    > various, other, fields,
    > picture, coll_status, full_desc = record.split(/\s*\|\s*/)


    or
    fields = record.split(%r/\|/).map{|field| field.strip}
    forename, surname, birth_dt, institution_id, aos_code,
    various, other, fields,
    picture, coll_status, full_desc =


    if you don't do one of these two things the either

    - forname may have leading space
    - full_desc may have trailing space

    that's because chomp! only blows away trailing newline - not extraneous
    spaces and leading space on record is never dealt with.

    >
    > next unless aos_code =~ ACCEPTED_MODULES
    >
    > puts "from record, picture is [#{picture.inspect}]." if $debug
    > # Structures for student
    > student = Student.new(forename, surname, birth_dt, picture,
    > coll_status)
    > if student == last_student


    so, as shown above, this (==) does not work

    > student = last_student
    > else
    > student.freeze
    >
    > # Avoid duplicates
    > unless @students.include? student
    > @students.add student
    > end
    > last_student = student
    > end
    > # [...]
    > end
    > end
    > end
    >
    > # [...]
    >
    > end
    >
    >
    > This being a Set I don't really need the call to include? now, but
    > it's there (from when I was using a hash for this).
    >
    > I find two things that seem odd to me:
    >
    > 1. eql? is never getting called, despite include?.


    set uses Object#hash - so maybe something like (untested)

    class Student
    def hash
    %w( forename surname birth_dt picture coll_status).inject(0){|n,m| n += send(m).hash}
    end
    end

    i dunno if this will wrap and cause issues though...

    if so maybe something like

    class Student
    def hash
    %w( forename surname birth_dt picture coll_status).map{|m| send %m}.join.hash
    end
    end

    or, perhaps simple something like:

    class Student < ::Hash
    FIELDS = %w( forename surname birth_dt picture coll_status )
    def initialize(*fs)
    FIELDS.each do |f|
    self[f] = (fs.shift || raise(ArgumentError, "no #{ f }!"))
    end
    end
    def eql? other
    values == other.values
    end
    alias == eql?
    def keys
    FIELDS
    end
    def values
    values_at(*FIELDS)
    end
    def hash
    FIELDS.map{|m| self[m]}.join.hash
    end
    end

    s0 = Student::new 'a', 'b', 'c', 'd', 'e'
    s1 = Student::new 'a', 'b', 'c', 'd', 'e'

    require 'set'
    set = Set::new
    set.add s0
    set.add s1
    p set #=> #<Set: {{"forename"=>"a", "coll_status"=>"e", "birth_dt"=>"c", "picture"=>"d", "surname"=>"b"}}>

    the FIELDS const can be used to do ordered prints, etc.

    it sure seems odd that set doesn't use 'eql?' or '==' up front though doesn't
    it?

    -a
    --
    ===============================================================================
    | email :: ara [dot] t [dot] howard [at] noaa [dot] gov
    | phone :: 303.497.6469
    | Your life dwells amoung the causes of death
    | Like a lamp standing in a strong breeze. --Nagarjuna
    ===============================================================================
    Ara.T.Howard, Sep 13, 2005
    #3
  4. Hugh Sasse

    Hugh Sasse Guest

    On Wed, 14 Sep 2005, David A. Black wrote:

    > Hi --
    >
    > On Wed, 14 Sep 2005, Hugh Sasse wrote:
    >
    >> I have been splitting a comma separated values file, and putting
    >> some of the values into an Student class, simply a collection of strings,
    >> so that I can build a database table from them:

    > [...]
    >> picture, coll_status, full_desc = record.split(/\s*\|\s*/)

    >
    > I notice you mentioned comma separation but you're splitting on a
    > pipe. I don't know if this is related to the problem, but I thought
    > I'd flag it just in case.


    Yes, sorry, I was using the generic term for this, to facilitate
    explaining the concept of what I was doing. The data I
    get is pipe(|) separated.
    >
    > Can you provide a couple of sample lines of data?


    Not really, this is data about real people, and data protection law
    means I can't. But I can tell you that the splitting works fine, the
    selection of fields for the student works correctly, the students
    don't end up with data from other fields, and the nature of the
    split command means that we can be sure they are all Strings.

    Therefore I think it boils down to:

    How can two collections of strings appear to be the same and yet
    both of them end up in the Set structure? Whitespace is always
    white, be it tab or space, so that's one way, but I still think that
    should look different to == or to eql?
    >
    > David
    >

    Thank you,
    Hugh
    Hugh Sasse, Sep 13, 2005
    #4
  5. Hi --

    On Wed, 14 Sep 2005, Hugh Sasse wrote:

    > This being a Set I don't really need the call to include? now, but
    > it's there (from when I was using a hash for this).
    >
    > I find two things that seem odd to me:
    >
    > 1. eql? is never getting called, despite include?.
    >
    > 2. I end up with duplicate students.
    >
    > Sets *can't* hold duplicates, and include depends on eql? for Sets.


    Are you sure about that latter point? In set.rb:

    def include?(o)
    @hash.include?(o)
    end

    and in hash.c:

    if (st_lookup(RHASH(hash)->tbl, key, 0)) {
    return Qtrue;
    ... }

    I haven't followed the trail beyond that... but I think any two
    student objects will count as different hash keys, even if they have
    similar string data.


    David

    --
    David A. Black
    David A. Black, Sep 13, 2005
    #5
  6. Hugh Sasse

    Hugh Sasse Guest

    On Wed, 14 Sep 2005, Ara.T.Howard wrote:

    > On Wed, 14 Sep 2005, Hugh Sasse wrote:
    >
    >> require 'set'
    >>
    >> class Student
    >> attr_accessor :forename, :surname, :birth_dt,
    >> :picture, :coll_status
    >> def initialize(forename0, surname0, birth_dt0,

    [...]
    >> end
    >>
    >> def eql?(other)

    [...]
    >> end
    >>
    >> def to_s
    >> "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}"
    >> end
    >> end

    >
    > well this works:
    >
    > s0 = Student::new 'a', 'b', 'c', 'd', 'e'
    > s1 = Student::new 'a', 'b', 'c', 'd', 'e'
    > p(s0.eql?(s1)) #=> true
    >
    > but this doesn't
    >
    > p s0 == s1 #=> false


    Hmmm. Yes, I should have more unit tests!
    >
    >> And in the body of my program I read the records in from the csv and


    (well. pipe separated -- see other reply :))

    >> add the students if they are new. They tend to be clustered in the
    >> input, hence the last_student test.
    >>
    >> class TableMaker

    [...]
    >> def initialize(input=INPUT)

    [...]
    >> open(input, 'r') do |infp|
    >> while record = infp.gets
    >> record.chomp!

    >
    > try : record.strip!


    >
    >> puts "record is #{record}"
    >> forename, surname, birth_dt, institution_id, aos_code,
    >> various, other, fields,
    >> picture, coll_status, full_desc = record.split(/\s*\|\s*/)

    >
    > or
    > fields = record.split(%r/\|/).map{|field| field.strip}
    > forename, surname, birth_dt, institution_id, aos_code,
    > various, other, fields,
    > picture, coll_status, full_desc =


    I think the former may be faster, but I'll look into these, thanks.
    >
    >
    > if you don't do one of these two things the either
    >
    > - forname may have leading space
    > - full_desc may have trailing space


    Yes, I'd missed that.
    >
    > that's because chomp! only blows away trailing newline - not extraneous
    > spaces and leading space on record is never dealt with.
    >
    >>
    >> next unless aos_code =~ ACCEPTED_MODULES
    >>
    >> puts "from record, picture is [#{picture.inspect}]." if $debug
    >> # Structures for student
    >> student = Student.new(forename, surname, birth_dt, picture,
    >> coll_status)
    >> if student == last_student

    >
    > so, as shown above, this (==) does not work


    OK, I'll just lose optimisation, but thanks.
    >
    >> student = last_student
    >> else
    >> student.freeze
    >>
    >> # Avoid duplicates
    >> unless @students.include? student
    >> @students.add student
    >> end
    >> last_student = student

    [...]
    >>
    >> This being a Set I don't really need the call to include? now, but
    >> it's there (from when I was using a hash for this).
    >>
    >> I find two things that seem odd to me:
    >>
    >> 1. eql? is never getting called, despite include?.

    >
    > set uses Object#hash - so maybe something like (untested)
    >
    > class Student
    > def hash
    > %w( forename surname birth_dt picture coll_status).inject(0){|n,m| n +=
    > send(m).hash}
    > end
    > end
    >
    > i dunno if this will wrap and cause issues though...


    Nor me.
    >
    > if so maybe something like
    >
    > class Student
    > def hash
    > %w( forename surname birth_dt picture coll_status).map{|m| send
    > %m}.join.hash
    > end
    > end


    Yes, that seems safer
    >
    > or, perhaps simple something like:
    >
    > class Student < ::Hash
    > FIELDS = %w( forename surname birth_dt picture coll_status )

    [...]
    > end
    >
    > s0 = Student::new 'a', 'b', 'c', 'd', 'e'
    > s1 = Student::new 'a', 'b', 'c', 'd', 'e'
    >
    > require 'set'
    > set = Set::new
    > set.add s0
    > set.add s1
    > p set #=> #<Set: {{"forename"=>"a", "coll_status"=>"e", "birth_dt"=>"c",
    > "picture"=>"d", "surname"=>"b"}}>
    >
    > the FIELDS const can be used to do ordered prints, etc.


    Yes, I might factor that in to my current solution. I didn't want
    to allow just any keys, so that's why I didn't subclass Hash, but
    it's an interesting approach.
    >
    > it sure seems odd that set doesn't use 'eql?' or '==' up front though doesn't
    > it?


    Probably a reason I don't know about. The Pickaxe II says it uses
    eql? and hash (p731) but doesn't say where.
    >
    > -a
    > --


    Thank you for such a full response,
    Hugh.
    Hugh Sasse, Sep 13, 2005
    #6
  7. Hugh Sasse

    Hugh Sasse Guest

    On Wed, 14 Sep 2005, David A. Black wrote:

    > Hi --
    >
    > On Wed, 14 Sep 2005, Hugh Sasse wrote:
    >
    >>
    >> Sets *can't* hold duplicates, and include depends on eql? for Sets.

    >
    > Are you sure about that latter point? In set.rb:


    Yes I was, but it turns out that it was with the certainty that comes
    before falling flat on one's face. I remembered seeing it in the ri
    docs, and sure enough, it isn't there!

    [What was that fidonet .sig? "Open mouth, insert foot, echo
    internationally"? :)]

    >
    > def include?(o)
    > @hash.include?(o)
    > end
    >
    > and in hash.c:
    >
    > if (st_lookup(RHASH(hash)->tbl, key, 0)) {
    > return Qtrue;
    > ... }
    >
    > I haven't followed the trail beyond that... but I think any two
    > student objects will count as different hash keys, even if they have
    > similar string data.


    Which would explain a lot. Thank you. Ara's hash function should
    fix this for me.
    >
    >
    > David
    >

    Thank you,
    Hugh
    Hugh Sasse, Sep 13, 2005
    #7
  8. Hugh Sasse

    Hugh Sasse Guest

    On Wed, 14 Sep 2005, David A. Black wrote:

    > Hi --
    >
    > On Wed, 14 Sep 2005, Hugh Sasse wrote:
    >
    >> This being a Set I don't really need the call to include? now, but
    >> it's there (from when I was using a hash for this).
    >>
    >> I find two things that seem odd to me:
    >>
    >> 1. eql? is never getting called, despite include?.
    >>
    >> 2. I end up with duplicate students.
    >>
    >> Sets *can't* hold duplicates, and include depends on eql? for Sets.

    >
    > Are you sure about that latter point? In set.rb:
    >
    > def include?(o)
    > @hash.include?(o)
    > end
    >
    > and in hash.c:
    >
    > if (st_lookup(RHASH(hash)->tbl, key, 0)) {
    > return Qtrue;
    > ... }
    >
    > I haven't followed the trail beyond that... but I think any two
    > student objects will count as different hash keys, even if they have
    > similar string data.
    >
    >
    > David


    Right, there is some definite wierdness going on here. I removed
    the definition of eql? and set the hash to use MD5 sums. I still
    didn't get unique entries in my set. Now I have

    require 'md5'

    class Student
    # [...]
    FIELDS = [:forename, :surname, :birth_dt, :picture, :coll_status]
    def initialize(forename0, surname0, birth_dt0,
    picture0, coll_status0)
    # [...]
    @hash = FIELDS.inject(MD5.new()) do |d,m|
    d << send(m)
    end.hexdigest.hex
    end

    def hash
    @hash
    end

    def eql?(other)
    self.hash == other.hash
    end

    end

    And this works. Remmove the definition of eql? and include? always
    gives untrue (I've not checked to see if it is nil or false).


    This is in accordance with the entry in Pickaxe2 (page 570,
    Object#hash) and ri, that:
    ------------------------------------------------------------ Object#hash
    obj.hash => fixnum
    ------------------------------------------------------------------------
    Generates a +Fixnum+ hash value for this object. This function must
    have the property that +a.eql?(b)+ implies +a.hash == b.hash+. The
    hash value is used by class +Hash+. Any hash value that exceeds the
    capacity of a +Fixnum+ will be truncated before being used.

    (I'm not sure if my digests are too big)

    What i don't really know is what the sufficient conditions are for
    this? Is it *necessary* to change hash and eql together? What are the
    defaults for Set?

    I suspect that my eql? ought to be

    def eql?(other)
    FIELDS.inject(true) do |b,v|
    t && (self.send(m) == other.send(m))
    end
    end

    for that matter

    Hugh
    Hugh Sasse, Sep 14, 2005
    #8
  9. On Wed, Sep 14, 2005 at 09:14:10PM +0900, Hugh Sasse wrote:
    > On Wed, 14 Sep 2005, David A. Black wrote:
    > >On Wed, 14 Sep 2005, Hugh Sasse wrote:
    > >>This being a Set I don't really need the call to include? now, but
    > >>it's there (from when I was using a hash for this).
    > >>
    > >>I find two things that seem odd to me:
    > >>1. eql? is never getting called, despite include?.
    > >>2. I end up with duplicate students.
    > >>
    > >>Sets *can't* hold duplicates, and include depends on eql? for Sets.

    > >
    > >Are you sure about that latter point? In set.rb:
    > >
    > > def include?(o)
    > > @hash.include?(o)
    > > end
    > >
    > >and in hash.c:
    > >
    > > if (st_lookup(RHASH(hash)->tbl, key, 0)) {
    > > return Qtrue;
    > > ... }
    > >
    > >I haven't followed the trail beyond that... but I think any two
    > >student objects will count as different hash keys, even if they have
    > >similar string data.


    object.c:

    VALUE
    rb_obj_id(VALUE obj)
    {
    if (SPECIAL_CONST_P(obj)) {
    return LONG2NUM((long)obj);
    }
    return (VALUE)((long)obj|FIXNUM_FLAG);
    }
    [...]
    rb_define_method(rb_mKernel, "hash", rb_obj_id, 0);

    [...]
    > What i don't really know is what the sufficient conditions are for
    > this? Is it *necessary* to change hash and eql together? What are the
    > defaults for Set?


    The defaults are actually those of Hash. You can follow the call chain
    starting from

    static struct st_hash_type objhash = {
    rb_any_cmp,
    rb_any_hash,
    };

    in hash.c. For user-defined classes, it will end up using #hash and #eql?
    defined in Kernel. [rb_any_cmp and rb_any_hash have some extra logic for
    Symbol, Fixnum and String values, and some core classes redefine the
    associated methods].

    Given the above definition of Kernel#hash, if you redefine it, you'll
    most probably want to change #eql? too (see below). As far as Hash
    objects (and hence Sets) are concerned, modifying #eql? while keeping
    #hash unchanged would be effectless (unless you restrict it further so
    that obj.eql?(obj) is false, which doesn't seem quite right).


    static VALUE
    rb_obj_equal(VALUE obj1, VALUE obj2)
    {
    if (obj1 == obj2) return Qtrue;
    return Qfalse;
    }

    [...]
    rb_define_method(rb_mKernel, "eql?", rb_obj_equal, 1);

    --
    Mauricio Fernandez
    Mauricio Fernández, Sep 14, 2005
    #9
  10. Hugh Sasse

    Hugh Sasse Guest

    ---559023410-440155785-1126707084=:29921
    Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-440155785-1126707084=:29921"

    This message is in MIME format. The first part should be readable text,
    while the remaining parts are likely unreadable without MIME-aware tools.

    ---559023410-440155785-1126707084=:29921
    Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
    Content-Transfer-Encoding: QUOTED-PRINTABLE

    On Wed, 14 Sep 2005, Mauricio Fern=E1ndez wrote:

    > The defaults are actually those of Hash. You can follow the call chain
    > starting from
    >
    > static struct st_hash_type objhash =3D {
    > rb_any_cmp,
    > rb_any_hash,
    > };
    >
    > in hash.c. For user-defined classes, it will end up using #hash and #eql?
    > defined in Kernel. [rb_any_cmp and rb_any_hash have some extra logic for
    > Symbol, Fixnum and String values, and some core classes redefine the
    > associated methods].
    >

    OK, it seems I'm thinking along the right lines now. Here is what I
    did in the end:

    --- /tmp/T0oTa4V2 Wed Sep 14 15:05:35 2005
    +++ populate_tables.rb Wed Sep 14 15:01:24 2005
    @@ -9,10 +9,31 @@

    $debug =3D true

    +module StringCollection
    +
    + def hash
    + (self.class)::FIELDS.inject(MD5.new()) do |d,m|
    + d << send(m)
    + end.hexdigest.hex
    + end
    +
    + def eql?(other)
    + (self.class)::FIELDS.inject(true) do |b,v|
    + begin
    + b && (self.send(v) =3D=3D other.send(v))
    + rescue
    + b =3D false
    + end
    + end
    + end
    +
    +end
    +
    class Student
    - attr_accessor :forename, :surname, :birth_dt,
    - :picture, :coll_status
    + include StringCollection
    +
    FIELDS =3D [:forename, :surname, :birth_dt, :picture, :coll_status]
    + FIELDS.each{|f| attr_accessor f }

    def initialize(forename0, surname0, birth_dt0,
    picture0, coll_status0)
    @@ -22,28 +43,22 @@
    @picture =3D picture0
    puts "in student.new() picture is #{picture0.inspect}, @picture is #{=
    @picture.inspect} " if $debug
    @coll_status =3D coll_status0
    - @hash =3D FIELDS.inject(MD5.new()) do |d,m|
    - d << send(m)
    - end.hexdigest.hex
    end

    - def hash
    - @hash
    - end

    - def eql?(other)
    - self.hash =3D=3D other.hash
    - end
    -
    def to_s
    - "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}=
    , #{@hash}"
    + "#{@surname}, #{@forename}, #{@birth_dt}, #{@picture}, #{@coll_status}=
    , #{hash}"
    end

    end

    class CourseModule
    - attr_accessor :aos_code, :dept_code, :aos_type, :full_desc

    + include StringCollection
    +
    + FIELDS =3D [:aos_code, :dept_code, :aos_type, :full_desc]
    + FIELDS.each{|f| attr_accessor f }
    +
    def initialize( aos_code, dept_code, aos_type, full_desc)
    @aos_code =3D aos_code
    @dept_code =3D dept_code



    I was particularly pleased to be able not to repeat the FIELDS, by
    means of attr_accessor, and that the idea of doing
    (self.class)::FIELDS
    actually worked.

    In the hope that this helps someone else, and thank you,
    Hugh

    ---559023410-440155785-1126707084=:29921--
    ---559023410-440155785-1126707084=:29921--
    Hugh Sasse, Sep 14, 2005
    #10
  11. Hugh Sasse

    Ara.T.Howard Guest


    > OK, it seems I'm thinking along the right lines now. Here is what I did in
    > the end:


    < snip code >

    > I was particularly pleased to be able not to repeat the FIELDS, by means of
    > attr_accessor, and that the idea of doing (self.class)::FIELDS actually
    > worked.


    i do alot of that type of thing and use my traits lib a lot for it - it can
    make it pretty compact. for instance:

    harp:~ > cat a.rb
    require 'md5'
    require 'traits'

    module TraitCollection
    def initialize(*list)
    list = [ list ].flatten
    wt.each_with_index do |t,i|
    v = list or
    raise ArgumentError, "no <#{ t }> given in <#{ list.inspect }>!"
    send t, v
    end
    end
    def to_s
    (rt.map{|t| [t, send(t)].join '='}. << "hash=#{ hash }").inspect
    end
    alias inspect to_s
    def hash
    rt.inject:):MD5::new()){|d,m| d << send(m)}.hexdigest.hex
    end
    def eql?(other)
    rt.inject(true){|b,v| b && (send(v) == other.send(v)) rescue false}
    end
    def wt; self::class::writer_traits; end
    def rt; self::class::reader_traits; end
    def self::included other
    super; class << other; class << self; alias [] new; end; end
    end
    end

    class Student
    include TraitCollection
    traits *%w( forename surname birth_dt picture coll_status )
    end
    class Course
    include TraitCollection
    traits *%w( aos_code dept_code aos_type full_desc )
    end

    require 'set'

    sset = Set::new
    s0, s1 = Student[%w( a b c d e )], Student[%w( f g h i j )]
    sset.add s0
    42.times{ sset.add s1 }
    p sset

    cset = Set::new
    c0, c1 = Course[%w( a b c d )], Course[%w( e f g h )]
    cset.add c0
    42.times{ cset.add c1 }
    p cset

    harp:~ > ruby a.rb
    #<Set: {["forename=a", "coll_status=b", "birth_dt=c", "picture=d", "surname=e", "hash=227748192848680293725464448333830731654"], ["forename=f", "coll_status=g", "birth_dt=h", "picture=i", "surname=j", "hash=116663401890982171087417074910604104991"]}>

    #<Set: {["dept_code=a", "full_desc=b", "aos_code=c", "aos_type=d", "hash=301716283811389038011477436469853762335"], ["dept_code=e", "full_desc=f", "aos_code=g", "aos_type=h", "hash=41821698252824551223787888325781077799"]}>


    cheers.

    -a
    --
    ===============================================================================
    | email :: ara [dot] t [dot] howard [at] noaa [dot] gov
    | phone :: 303.497.6469
    | Your life dwells amoung the causes of death
    | Like a lamp standing in a strong breeze. --Nagarjuna
    ===============================================================================
    Ara.T.Howard, Sep 14, 2005
    #11
  12. On Wed, Sep 14, 2005 at 11:12:16PM +0900, Hugh Sasse wrote:
    > + def eql?(other)
    > + (self.class)::FIELDS.inject(true) do |b,v|
    > + begin
    > + b && (self.send(v) == other.send(v))
    > + rescue
    > + b = false
    > + end
    > + end
    > + end


    Just one minor comment:

    batsman@tux-chan:~$ cat /tmp/fdsfdsdsd.rb
    class Foo
    FIELDS = %w[name stuff foo bar]
    attr_reader(*FIELDS)

    def initialize(name, stuff, foo, bar)
    @name, @stuff, @foo, @bar = name, stuff, foo, bar
    end

    def eql1?(other)
    (self.class)::FIELDS.inject(true) do |b,v|
    begin
    b && (self.send(v) == other.send(v))
    rescue
    b = false
    end
    end
    end

    def eql2?(other)
    # maybe add self.class::FIELDS == other.class::FIELDS test plus rescue NameError ?
    self.class::FIELDS.each{|m| break false if self.send(m) != other.send(m) } && true
    rescue NoMethodError
    false
    end
    end

    require 'benchmark'
    a = Foo.new("a", "b", "c", "d")
    b = Foo.new("e", "b", "c", "d")
    c = Foo.new("a", "b", "c", "e")

    TIMES = 100000
    %w[a b c].each{|x| puts "#{x} = #{eval(x).inspect}"}
    Benchmark.bmbm do |x|
    %w[a b c].each do |o|
    %w[eql1? eql2?].each do |m|
    s = "a.#{m}(#{o})"
    x.report("#{s}: #{eval(s)}") { eval("TIMES.times{#{s}}") }
    end
    end
    end
    batsman@tux-chan:~$ ruby -v /tmp/fdsfdsdsd.rb
    ruby 1.8.3 (2005-05-22) [i686-linux]
    a = #<Foo:0xb7dc9c98 @name="a", @bar="d", @foo="c", @stuff="b">
    b = #<Foo:0xb7dc9c20 @name="e", @bar="d", @foo="c", @stuff="b">
    c = #<Foo:0xb7dc9ba8 @name="a", @bar="e", @foo="c", @stuff="b">
    Rehearsal -----------------------------------------------------
    a.eql1?(a): true 1.520000 0.000000 1.520000 ( 1.658224)
    a.eql2?(a): true 0.880000 0.000000 0.880000 ( 0.970675)
    a.eql1?(b): false 1.070000 0.000000 1.070000 ( 1.156081)
    a.eql2?(b): false 0.360000 0.010000 0.370000 ( 0.410011)
    a.eql1?(c): false 1.570000 0.000000 1.570000 ( 1.734145)
    a.eql2?(c): false 0.910000 0.000000 0.910000 ( 1.003833)
    -------------------------------------------- total: 6.320000sec

    user system total real
    a.eql1?(a): true 1.510000 0.010000 1.520000 ( 1.679369)
    a.eql2?(a): true 0.890000 0.000000 0.890000 ( 0.950153)
    a.eql1?(b): false 1.100000 0.010000 1.110000 ( 1.200057)
    a.eql2?(b): false 0.360000 0.000000 0.360000 ( 0.383755)
    a.eql1?(c): false 1.560000 0.010000 1.570000 ( 1.739114)
    a.eql2?(c): false 0.920000 0.000000 0.920000 ( 0.978109)

    --
    Mauricio Fernandez
    Mauricio Fernández, Sep 14, 2005
    #12
  13. Hugh Sasse

    Hugh Sasse Guest

    ---559023410-1663602767-1126715479=:29921
    Content-Type: MULTIPART/MIXED; BOUNDARY="-559023410-1663602767-1126715479=:29921"

    This message is in MIME format. The first part should be readable text,
    while the remaining parts are likely unreadable without MIME-aware tools.

    ---559023410-1663602767-1126715479=:29921
    Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
    Content-Transfer-Encoding: QUOTED-PRINTABLE

    On Thu, 15 Sep 2005, Mauricio Fern=E1ndez wrote:

    > Just one minor comment:
    >
    > batsman@tux-chan:~$ cat /tmp/fdsfdsdsd.rb
    > class Foo
    > FIELDS =3D %w[name stuff foo bar]
    > attr_reader(*FIELDS)


    That's rather nice :)
    [...]
    > def eql2?(other)
    > # maybe add self.class::FIELDS =3D=3D other.class::FIELDS test plus re=

    scue NameError ?

    Good point.

    > self.class::FIELDS.each{|m| break false if self.send(m) !=3D other.sen=

    d(m) } && true

    Nice optimisation! I was having enough of a job keeping my head
    around inject to think of that!
    [...]
    > Rehearsal -----------------------------------------------------
    > a.eql1?(a): true 1.520000 0.000000 1.520000 ( 1.658224)
    > a.eql2?(a): true 0.880000 0.000000 0.880000 ( 0.970675)


    [and similar]

    That makes quite a difference. Thank you.
    >
    > --=20
    > Mauricio Fernandez
    >


    Hugh
    >


    ---559023410-1663602767-1126715479=:29921--
    ---559023410-1663602767-1126715479=:29921--
    Hugh Sasse, Sep 14, 2005
    #13
  14. Hugh Sasse

    Hugh Sasse Guest

    On Thu, 15 Sep 2005, Christian Neukirchen wrote:

    > This seems to be the canonical way to define compund hashes:
    >
    > class Student
    > def hash
    > [@forename, @surname, @birth_dt, @picture, @coll_status].hash
    > end
    > end


    That does seem to preserve the properties I need for strings, and is
    probably cheaper than MD5sums.
    >

    [...]
    >
    > Set uses a Hash to store the objects.
    >
    > That said, I think it would be nice to have something along this in
    > the stdlib:
    >
    > class Student
    > equal_compares :mad:forename, :mad:surname, :mad:birth_dt, :mad:picture, :mad:coll_status
    > end
    >
    > Above call should result in appropriate definitions of ==, eql? and


    I don't know how it could know how to create the different
    definitions correctly given a completely open spec as to what the
    vars are.

    > hash. (Something like "ordered_by" would be pretty useful too.)


    I think that could be tricky too.
    >

    Thank you.
    Hugh
    Hugh Sasse, Sep 14, 2005
    #14
  15. Christian Neukirchen wrote:
    > Hugh Sasse <> writes:
    >
    >> On Thu, 15 Sep 2005, Christian Neukirchen wrote:
    >>
    >>> This seems to be the canonical way to define compund hashes:
    >>>
    >>> class Student
    >>> def hash
    >>> [@forename, @surname, @birth_dt, @picture, @coll_status].hash
    >>> end
    >>> end

    >>
    >> That does seem to preserve the properties I need for strings, and is
    >> probably cheaper than MD5sums.
    >>>

    >> [...]
    >>>
    >>> Set uses a Hash to store the objects.
    >>>
    >>> That said, I think it would be nice to have something along this in
    >>> the stdlib:
    >>>
    >>> class Student
    >>> equal_compares :mad:forename, :mad:surname, :mad:birth_dt, :mad:picture,
    >>> :mad:coll_status end
    >>>
    >>> Above call should result in appropriate definitions of ==, eql? and

    >>
    >> I don't know how it could know how to create the different
    >> definitions correctly given a completely open spec as to what the
    >> vars are.

    >
    > Well, you just list all instance variables that define the
    > object... if they are the same, the objects are eql?.
    >
    >>> hash. (Something like "ordered_by" would be pretty useful too.)

    >>
    >> I think that could be tricky too.

    >
    > In the end, [*fields] <=> [*other.fields] does the job.


    You can also steal the code from RCR 293 for a general solution:
    http://rcrchive.net/rcr/show/293

    Kind regards

    robert
    Robert Klemme, Sep 15, 2005
    #15
  16. Hugh Sasse

    Hugh Sasse Guest

    On Fri, 16 Sep 2005, Robert Klemme wrote:

    >
    > You can also steal the code from RCR 293 for a general solution:
    > http://rcrchive.net/rcr/show/293


    Hmm, that's interesting, but I don't get:

    code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
    ") << " end\n"

    Shouldn't hash return a Fixnum?

    ------------------------------------------------------------ Object#hash
    obj.hash => fixnum
    ------------------------------------------------------------------------
    Generates a +Fixnum+ hash value for this object. This function must
    have the property that +a.eql?(b)+ implies +a.hash == b.hash+. The
    hash value is used by class +Hash+. Any hash value that exceeds the
    capacity of a +Fixnum+ will be truncated before being used.

    The function above appears to return a string with numbers separated
    by " ^ ".
    >


    > Kind regards
    >
    > robert
    >

    Thank you,
    Hugh
    >
    >
    Hugh Sasse, Sep 15, 2005
    #16
  17. Hugh Sasse wrote:
    > On Fri, 16 Sep 2005, Robert Klemme wrote:
    >
    >>
    >> You can also steal the code from RCR 293 for a general solution:
    >> http://rcrchive.net/rcr/show/293

    >
    > Hmm, that's interesting, but I don't get:
    >
    > code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
    > ") << " end\n"
    >
    > Shouldn't hash return a Fixnum?


    Definitely!

    > ------------------------------------------------------------
    > Object#hash obj.hash => fixnum
    > ------------------------------------------------------------------------
    > Generates a +Fixnum+ hash value for this object. This function
    > must have the property that +a.eql?(b)+ implies +a.hash ==
    > b.hash+. The hash value is used by class +Hash+. Any hash value
    > that exceeds the capacity of a +Fixnum+ will be truncated
    > before being used.
    >
    > The function above appears to return a string with numbers separated
    > by " ^ ".


    Nope. The join appears during code generation and not during evaluation
    of the method. You can easily verify this by printing code after it's
    completed. :)

    Kind regards

    robert
    Robert Klemme, Sep 15, 2005
    #17
  18. Hugh Sasse

    Hugh Sasse Guest

    On Fri, 16 Sep 2005, Robert Klemme wrote:

    > Hugh Sasse wrote:


    >> Hmm, that's interesting, but I don't get:
    >>
    >> code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
    >> ") << " end\n"
    >>
    >> Shouldn't hash return a Fixnum?

    >
    > Definitely!
    >

    [...]
    >> The function above appears to return a string with numbers separated
    >> by " ^ ".

    >
    > Nope. The join appears during code generation and not during evaluation
    > of the method. You can easily verify this by printing code after it's
    > completed. :)


    Oh, then it's exclusive or. I'm clearly being as sharp as a sponge
    today.

    While my brain is behaving like cottage cheese, it's probably not
    the time to ask how one might guarantee that you don't stomp on the
    hashes of other ojects in the system. If you have an even number of
    elements, all the same Fixnum, like [1,1,1,1] then they would hash
    to 0, as would [2,2], I "think".
    irb(main):004:0> [1,1].inject(0) { |a,b| a ^= b.hash}
    => 0
    irb(main):005:0> [2,1,1,2].inject(0) { |a,b| a ^= b.hash}
    => 0
    irb(main):006:0>

    >
    > Kind regards
    >
    > robert
    >

    Hugh
    Hugh Sasse, Sep 15, 2005
    #18
  19. Hugh Sasse wrote:
    > On Fri, 16 Sep 2005, Robert Klemme wrote:
    >
    >> Hugh Sasse wrote:

    >
    >>> Hmm, that's interesting, but I don't get:
    >>>
    >>> code << "def hash() " << fields.map {|f| "self.#{f}.hash" }.join(" ^
    >>> ") << " end\n"
    >>>
    >>> Shouldn't hash return a Fixnum?

    >>
    >> Definitely!
    >>

    > [...]
    >>> The function above appears to return a string with numbers separated
    >>> by " ^ ".

    >>
    >> Nope. The join appears during code generation and not during
    >> evaluation of the method. You can easily verify this by printing
    >> code after it's completed. :)

    >
    > Oh, then it's exclusive or. I'm clearly being as sharp as a sponge
    > today.


    I'll have to remember that phrase - I could use it myself from time to
    time. :)

    > While my brain is behaving like cottage cheese, it's probably not
    > the time to ask how one might guarantee that you don't stomp on the
    > hashes of other ojects in the system. If you have an even number of
    > elements, all the same Fixnum, like [1,1,1,1] then they would hash
    > to 0, as would [2,2], I "think".
    > irb(main):004:0> [1,1].inject(0) { |a,b| a ^= b.hash}
    > => 0
    > irb(main):005:0> [2,1,1,2].inject(0) { |a,b| a ^= b.hash}
    > => 0


    Btw, the assignment is superfluous. The result of a^b.hash is the next
    iteration's a.

    > irb(main):006:0>


    Yes. The algorithm can certainly be improved on. Typically you rather do
    something similar to

    (a.hash ^ (b.hash << 3) ^ (c.hash << 7)) & MAX_HASH

    09:53:59 [~]: irbs
    >> [2,1,1,2].inject(0) { |a,b| ((a << 3) ^ b.hash) & 0xFFFF_FFFF}

    => 2781
    >> [1,2, 1,2].inject(0) { |a,b| ((a << 3) ^ b.hash) & 0xFFFF_FFFF}

    => 1885
    >>


    i.e. by shifting you make sure that order matters etc.

    Kind regards

    robert
    Robert Klemme, Sep 16, 2005
    #19
  20. Hugh Sasse

    Hugh Sasse Guest

    On Fri, 16 Sep 2005, Robert Klemme wrote:

    > Hugh Sasse wrote:
    >> While my brain is behaving like cottage cheese, it's probably not
    >> the time to ask how one might guarantee that you don't stomp on the
    >> hashes of other ojects in the system. If you have an even number of
    >> elements, all the same Fixnum, like [1,1,1,1] then they would hash
    >> to 0, as would [2,2], I "think".
    >> irb(main):004:0> [1,1].inject(0) { |a,b| a ^= b.hash}
    >> => 0
    >> irb(main):005:0> [2,1,1,2].inject(0) { |a,b| a ^= b.hash}
    >> => 0

    >
    > Btw, the assignment is superfluous. The result of a^b.hash is the next
    > iteration's a.


    Yes, good point, the result of the block....

    ------------------------------------------------------ Enumerable#inject
    enum.inject(initial) {| memo, obj | block } => obj
    enum.inject {| memo, obj | block } => obj
    ------------------------------------------------------------------------
    Combines the elements of _enum_ by applying the block to an
    accumulator value (_memo_) and each element in turn. At each step,
    _memo_ is set to the value returned by the block. The first form
    ================================================
    [...]
    >
    >> irb(main):006:0>

    >
    > Yes. The algorithm can certainly be improved on. Typically you rather do
    > something similar to
    >
    > (a.hash ^ (b.hash << 3) ^ (c.hash << 7)) & MAX_HASH
    >
    > 09:53:59 [~]: irbs
    >>> [2,1,1,2].inject(0) { |a,b| ((a << 3) ^ b.hash) & 0xFFFF_FFFF}

    > => 2781

    lo>> [1,2, 1,2].inject(0) { |a,b| ((a << 3) ^ b.hash) & 0xFFFF_FFFF}
    > => 1885


    Ah, so that's what MAX_HASH is, I couldn't remember how big Fixnums
    were.

    I was thinking about something like a linear congruential
    random number generator like:
    brains hgs 22 %> irb
    irb(main):001:0> [2,1,1,2].inject(0) {|a,b| ((a * 31)+b.hash) % 4093082899}
    => 151936
    irb(main):002:0> [2,2,1,1].inject(0) {|a,b| ((a * 31)+b.hash) % 4093082899}
    => 153856
    irb(main):003:0> [2,1,2,1].inject(0) {|a,b| ((a * 31)+b.hash) % 4093082899}
    => 151996
    irb(main):004:0>

    like
    http://www.cs.bell-labs.com/cm/cs/pearls/markovhash.c

    from "Programming Pearls".

    with a largish prime grabbed from
    http://primes.utm.edu/lists/small/small.html#10

    being the biggest I could see less than 0XFFFF_FFFF (4294967295)


    ------------------------------------------------ Class: Fixnum < Integer
    A +Fixnum+ holds +Integer+ values that can be represented in a
    native machine word (minus 1 bit). If any operation on a +Fixnum+

    Should that be 0x7FFF_FFFF? (2147483647)
    According to
    http://www.rsok.com/~jrm/printprimes.html
    this would seem to be a prime number, so could be used as the
    modulus anyway.

    > i.e. by shifting you make sure that order matters etc.
    >
    > Kind regards
    >
    > robert
    >

    Thank you,
    Hugh
    Hugh Sasse, Sep 16, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    15
    Views:
    563
    Dirk Thierbach
    Mar 19, 2005
  2. Replies:
    29
    Views:
    1,198
    Arne Vajhøj
    Mar 21, 2009
  3. johkar
    Replies:
    6
    Views:
    912
    johkar
    Apr 19, 2009
  4. deathweaselx86
    Replies:
    5
    Views:
    1,084
    Raymond Hettinger
    Jun 25, 2011
  5. Adam Gardner
    Replies:
    5
    Views:
    139
    Sebastian Hungerecker
    Nov 19, 2008
Loading...

Share This Page