Removing Duplicate Objects from Object List

J

Jeff Nyman

Greetings all.

Does anyone have a good idea of how to write a loop that checks if two
objects are equal? By "equal" here I refer to the 'eql' method, to test if
the objects have the same value.

I have set of Rule objects that will be stored in a RuleList object. I know
how to cycle through the RuleList. I'm just doing this:

$ruleList.selection.each { |rule|
...
}

The problem is that I need to go through each rule and check if it is equal
to *any* of the other rules that are in the list. If a duplicate is found,
one of the duplicate rules should be removed.

Every solution I've tried has ended up either removing objects incorrectly
or not finding the duplicates in the first place.

Here's an example of some Rule objects:

<Tendent::Rule:0x2d5f7a8 @filter="first after 202G_OrdAdd",
@value="203G_OrdUpdateFirst", @point="203G_OrdUpdateFirst">
<Tendent::Rule:0x2d5f71c @filter="last", @value="203G_OrdUpdateLast",
@point="203G_OrdUpdateLast">
<Tendent::Rule:0x2d5f6a4 @filter="first after 202G_OrdAdd",
@value="203G_OrdUpdateFirst", @point="203G_OrdUpdateFirst">
<Tendent::Rule:0x2d5f62c @filter="last", @value="203G_OrdUpdateLast",
@point="203G_OrdUpdateLast">

Here is what they look like as strings:

203G_OrdUpdateFirst, first after 202G_OrdAdd, 203G_OrdUpdateFirst
203G_OrdUpdateLast, last, 203G_OrdUpdateLast
203G_OrdUpdateFirst, first after 202G_OrdAdd, 203G_OrdUpdateFirst
203G_OrdUpdateLast, last, 203G_OrdUpdateLast

So as you can see, I have four rules, but actually only two are unique.
(That just happens to be the case here. In other cases, perhaps there will
be six rules, and two will be unique.)

Can anyone see an efficient way to do this?

Is it better to just convert these into an array? I know the Array class has
the 'uniq' method. The problem is that I would still need the rules to then
be objects as well. In other words, even if I put all the objects in an
array and modify the array, I would need to reflect the changes in the
object list itself, such that the duplicate objects no longer exist.

- Jeff
 
G

gaspode

Greetings all.

Does anyone have a good idea of how to write a loop that checks if two
objects are equal? By "equal" here I refer to the 'eql' method, to test if
the objects have the same value.

I have set of Rule objects that will be stored in a RuleList object. I know
how to cycle through the RuleList. I'm just doing this:

$ruleList.selection.each { |rule|
...

}The problem is that I need to go through each rule and check if it is equal
to *any* of the other rules that are in the list. If a duplicate is found,
one of the duplicate rules should be removed.

Every solution I've tried has ended up either removing objects incorrectly
or not finding the duplicates in the first place.

Here's an example of some Rule objects:

<Tendent::Rule:0x2d5f7a8 @filter="first after 202G_OrdAdd",
@value="203G_OrdUpdateFirst", @point="203G_OrdUpdateFirst">
<Tendent::Rule:0x2d5f71c @filter="last", @value="203G_OrdUpdateLast",
@point="203G_OrdUpdateLast">
<Tendent::Rule:0x2d5f6a4 @filter="first after 202G_OrdAdd",
@value="203G_OrdUpdateFirst", @point="203G_OrdUpdateFirst">
<Tendent::Rule:0x2d5f62c @filter="last", @value="203G_OrdUpdateLast",
@point="203G_OrdUpdateLast">

Here is what they look like as strings:

203G_OrdUpdateFirst, first after 202G_OrdAdd, 203G_OrdUpdateFirst
203G_OrdUpdateLast, last, 203G_OrdUpdateLast
203G_OrdUpdateFirst, first after 202G_OrdAdd, 203G_OrdUpdateFirst
203G_OrdUpdateLast, last, 203G_OrdUpdateLast

So as you can see, I have four rules, but actually only two are unique.
(That just happens to be the case here. In other cases, perhaps there will
be six rules, and two will be unique.)

Can anyone see an efficient way to do this?

Is it better to just convert these into an array? I know the Array class has
the 'uniq' method. The problem is that I would still need the rules to then
be objects as well. In other words, even if I put all the objects in an
array and modify the array, I would need to reflect the changes in the
object list itself, such that the duplicate objects no longer exist.

- Jeff

Jeff,

How are you storing the Rules in your RuleSet at the moment? Personally
I'd use an Array (or simply subclass Array) and then you get to use
Array.uniq without shifting objects back and forth.

Stephen
 
J

Jeff Nyman

How are you storing the Rules in your RuleSet at the moment? Personally
I'd use an Array (or simply subclass Array) and then you get to use
Array.uniq without shifting objects back and forth.

Essentially, I have a RuleList class like this:

<code>
class RuleList
def initialize
@rules = Array.new
end

def append(this_rule)
@rules.push(this_rule)
end

def selection
@rules.find_all { |rule| rule }
end
end
</code>

Then I have a Rule class like this:

<code>
class Rule
attr_accessor :point, :filter, :value

def initialize(point, filter, value)
@point = point
@filter = filter
@value = value
end

def to_s
"#@point, #@filter, #@value"
end
end
</code>

When a rule object needs to be added to the list, I do this:

$ruleList.append(Rule.new(step.point2, rule, value))

Does that give enough detail?

In playing around a bit more, I tried this:

rules_array = $ruleList.selection.collect { |rule| rule }

Then I tried:

rules_array.uniq!

The problem is that this finds nothing as a duplicate. But that makes sense
(I think) because the object ID is probably being considered as part of the
test and those will, of course, not be duplicates.

It sounds like you're saying it would be better to not use a Rule class in
the first place. Is that accurate?

- Jeff
 
D

Daniel Harple

The problem is that I need to go through each rule and check if it
is equal
to *any* of the other rules that are in the list. If a duplicate is
found,
one of the duplicate rules should be removed.

class Rule
attr_accessor :point, :filter, :value

def initialize(point, filter, value)
@point = point
@filter = filter
@value = value
end
end

When a rule object needs to be added to the list, I do this:

$ruleList.append(Rule.new(step.point2, rule, value))

Does that give enough detail?

In playing around a bit more, I tried this:

rules_array = $ruleList.selection.collect { |rule| rule }

Then I tried:

rules_array.uniq!

The problem is that this finds nothing as a duplicate. But that
makes sense
(I think) because the object ID is probably being considered as
part of the
test and those will, of course, not be duplicates.

It sounds like you're saying it would be better to not use a Rule
class in
the first place. Is that accurate?

You should implement #eql? and #hash methods on your class, and store
all instances in a [Set](http://ruby-doc.org/stdlib/libdoc/set/rdoc/
classes/Set.html).

require "set"

class Rule
attr_accessor :point, :filter, :value

def initialize(point, filter, value)
@point = point
@filter = filter
@value = value
end

def eql?(rule)
rule.point.eql?(@point) &&
rule.filter.eql?(@filter) &&
rule.value.eql?(@value)
end

def hash
@point.hash + @filter.hash + @value.hash
end
end

rules_set = Set.new
rules_set << Rule.new(1, 1, 1)
rules_set << Rule.new(1, 1, 1) # duplicate rule
rules_set << Rule.new(1, 1, 2)
rules_set.size # => 2
rules_set # => #<Set: {#<Rule:0x89d78 @value=2, @filter=1, @point=1>,
#<Rule:0x89db4 @value=1, @filter=1, @point=1>}>

-- Daniel
 
G

gaspode

Essentially, I have a RuleList class like this:
When a rule object needs to be added to the list, I do this:

$ruleList.append(Rule.new(step.point2, rule, value))

Does that give enough detail?

Plenty

In playing around a bit more, I tried this:

rules_array = $ruleList.selection.collect { |rule| rule }

Then I tried:

rules_array.uniq!

The problem is that this finds nothing as a duplicate. But that makes sense
(I think) because the object ID is probably being considered as part of the
test and those will, of course, not be duplicates.

The reason that it isn't working as you expect is that the uniq method
uses eql?, which in turn uses the hash method (I think, somebody
correct me if I'm full of it). If you implement the hash method (to
return the same value for identical Rules) in your Rule class, this
should work fine.
It sounds like you're saying it would be better to not use a Rule class in
the first place. Is that accurate?

No, your current Rule class is good. Just implement hash!

Rather than doing:
rules_array = $ruleList.selection.collect { |rule| rule }
rules_array.uniq!

you could add a uniq and uniq! method to your RuleList that just
delegates the work to the underlying Array

<code>
def uniq
@rules.uniq
end

def uniq!
@rules.uniq!
end
</code>

If it is the case that you NEVER want the same Rule in there twice,
just do the check in the append method (also after implementing the
hash method)

<code>
def append(this_rule)
@rules.push(this_rule) unless @rules.include?(this_rule)
end
</code>
 
S

Simon Kröger

Daniel said:
If instead of declaring your @rules as an array, you declare it as a set
you
will get no duplicates for free (I think)

yes but you will loose order (if that is important)
However, you need to incorporate the <=> operator in your Rule class to
tell ruby how your objects relate to each other.
ie are they <, >, or =

That won't do the trick for uniq as uniq is using a hash internally do
find duplicates. You have to define #hash and #eql? for this to work.
(or was it #hash and #== ?)

cheers

Simon
 
J

Jeff Nyman

Thank you to all of you!

With everything said here, I definitely have this working now. Not only
that, but I learned a lot more about hash and Set.

(Just when you think you have a grasp of Ruby, you find you were only at the
tip of the iceberg ...)

- Jeff
 
M

Mike

uniq is failing because, even though the attributes of each instance of
the rule is 'eq' to the other, the compared instances are different.

class Foo
def initialize(a,b)
@a = a
@b = b
end
end

x = [Foo.new:)a, :b), Foo.new:)c, :d), Foo.new:)a, :b)]
p x
p x.uniq

ruby tst.rb
[#<Foo:0x3b6128 @a=:a, @b=:b>, #<Foo:0x3b6114 @a=:c, @b=:d>,
#<Foo:0x3b6100 @a=:a, @b=:b>]
[#<Foo:0x3b6128 @a=:a, @b=:b>, #<Foo:0x3b6114 @a=:c, @b=:d>,
#<Foo:0x3b6100 @a=:a, @b=:b>]

I tried defining eq? and hash and uniq still fails. hash returns
identical values for objects with identical content and eq? returns
true in this case, but uniq does not remove them.

Probably the right thing to do is to write a couple of loops.
 
A

ara.t.howard

uniq is failing because, even though the attributes of each instance of
the rule is 'eq' to the other, the compared instances are different.

class Foo
def initialize(a,b)
@a = a
@b = b
end
end

x = [Foo.new:)a, :b), Foo.new:)c, :d), Foo.new:)a, :b)]
p x
p x.uniq

ruby tst.rb
[#<Foo:0x3b6128 @a=:a, @b=:b>, #<Foo:0x3b6114 @a=:c, @b=:d>,
#<Foo:0x3b6100 @a=:a, @b=:b>]
[#<Foo:0x3b6128 @a=:a, @b=:b>, #<Foo:0x3b6114 @a=:c, @b=:d>,
#<Foo:0x3b6100 @a=:a, @b=:b>]

I tried defining eq? and hash and uniq still fails. hash returns
identical values for objects with identical content and eq? returns
true in this case, but uniq does not remove them.


harp:~ > cat a.rb
class Foo
ATTRIBUTES = %w( a b )
ATTRIBUTES.each{|at| attr at}

def initialize(a,b) @a, @b = a, b end
def parts() ATTRIBUTES.map{|at| send at} end
def eql?(other) parts == other.parts end
def hash() parts.hash end
end

p [ Foo.new:)a, :b), Foo.new:)c, :d), Foo.new:)a, :b) ]
p [ Foo.new:)a, :b), Foo.new:)c, :d), Foo.new:)a, :b) ].uniq


harp:~ > ruby a.rb
[#<Foo:0xb75d137c @a=:a, @b=:b>, #<Foo:0xb75d1368 @a=:c, @b=:d>, #<Foo:0xb75d1340 @a=:a, @b=:b>]
[#<Foo:0xb75d1214 @a=:a, @b=:b>, #<Foo:0xb75d1200 @a=:c, @b=:d>]



-a
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,998
Latest member
MarissaEub

Latest Threads

Top