Performance of Hash.inject

Daniel Sheppard · Apr 16, 2004

I was doing some playing with some implementations for Hash#== and ended
up with some very counterintuitive timings.

I have four equals methods, and I would have expected their performance
to run (from best to worst):

directInject - stops evaluating == after first bad match, no
intermediate arrays
keysInject - as above, but must create keys array
directCollect - evaluated == for all items all the time, intermediate
array of results
keysCollect - as above, but must create keys array.

Actually running the code however, I ended up with the following
timings:

keysCollect
equal 0.18
not equal 0.11

directCollect
equal 0.221
not equal 0.24

directInject
equal 0.31
not equal 0.221

keysInject
equal 0.24
not equal 0.15

Can anybody shed any light on why it's not performing as I would assume?

-----

def keysCollect(hash1, hash2)
hash1.size == hash2.size &&
!(hash1.keys.collect{|k| !hash2[k].nil? && hash1[k] == hash2[k]
}.include?(false))
end
def directCollect(hash1, hash2)
hash1.size == hash2.size &&
!(hash1.collect{|k,v| hash2[k] == v }.include?(false))
end
def directInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.inject(true) { |val, kv| val && hash2[kv[0]] == kv[1] }
end
def keysInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.keys.inject(true) { |val, k| val && hash1[k] == hash2[k] }
end

def time(p, *args)
time = Time.now
1000.times { p.call *args}
Time.now - time
end

hash1 = Hash.new
hash2 = Hash.new
xx = 'a'
50.times { hash1[xx] = xx; xx = xx.next }
50.times { hash2[xx] = xx; xx = xx.next }

[:keysCollect, :directCollect, :directInject, :keysInject].each { |s|
puts s
puts "equal #{time(method(s), hash1, hash1)}"
puts "not equal #{time(method(s), hash1, hash2)}"
}
#####################################################################################
This email has been scanned by MailMarshal, an email content filter.
#####################################################################################

George Ogata · Apr 16, 2004

Daniel Sheppard said:
I was doing some playing with some implementations for Hash#== and ended
up with some very counterintuitive timings.

I have four equals methods, and I would have expected their performance
to run (from best to worst):

directInject - stops evaluating == after first bad match, no
intermediate arrays
keysInject - as above, but must create keys array
directCollect - evaluated == for all items all the time, intermediate
array of results
keysCollect - as above, but must create keys array.

Your *Inject methods don't actually quit early; they just stop
evaluating the second expression once val is false. An explicit break
speeds up the quick-exit case considerably.

def directInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.inject(true) { |val, kv| val && hash2[kv[0]] == kv[1] }

hash1.inject(true) { |val, kv| val or break; hash2[kv[0]] == kv[1] }

end
def keysInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.keys.inject(true) { |val, k| val && hash1[k] == hash2[k] }

hash1.keys.inject(true) { |val, k| val or break; hash1[k] == hash2[k] }

end

Unless you're using a really old ruby, though, Enumerable#all? should
outperform all of the above.

---- revised code ----

def keysCollect(hash1, hash2)
hash1.size == hash2.size &&
!(hash1.keys.collect{|k| !hash2[k].nil? && hash1[k] == hash2[k]
}.include?(false))
end
def directCollect(hash1, hash2)
hash1.size == hash2.size &&
!(hash1.collect{|k,v| hash2[k] == v }.include?(false))
end
def directInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.inject(true) { |val, kv| val or break; hash2[kv[0]] == kv[1] }
end
def keysInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.keys.inject(true) { |val, k| val or break; hash1[k] == hash2[k] }
end
def using_all hash1, hash2
hash1.size == hash2.size &&
hash1.all?{|k, v| hash2[k] == v}
end

def time(p, *args)
time = Time.now
1000.times { p.call *args}
Time.now - time
end

hash1 = Hash.new
hash2 = Hash.new
xx = 'a'
2000.times { hash1[xx] = xx; xx = xx.next }
2000.times { hash2[xx] = xx; xx = xx.next }

[:keysCollect, :directCollect, :directInject, :keysInject,
:using_all].each { |s|
puts
puts s
puts "equal #{time(method(s), hash1, hash1)}"
puts "not equal #{time(method(s), hash1, hash2)}"
}

---- output ----

keysCollect
equal 3.071011
not equal 1.992086

directCollect
equal 3.211872
not equal 3.447227

directInject
equal 4.202059
not equal 0.004962

keysInject
equal 3.343774
not equal 0.147189

using_all
equal 2.84662
not equal 0.003133

----

Incidentally, if you're using this to compare
equality-of-Hash-sans-default-proc, none of these are adequate in the
general case since keys may exist that map to the default value.

Perhaps:

def using_all hash1, hash2
hash1.size == hash2.size &&
hash1.all?{|k, v| hash2.has_key?(k) && hash2[k] == v}
end

Or, as mentioned elsewhere, the shorter:

{}.update(h1) == {}.update(h2)

.... which seems to have better worst-case, but worse best-case
performance.

HTH

Robert Klemme · Apr 16, 2004

George Ogata said:
Daniel Sheppard said:

I was doing some playing with some implementations for Hash#== and ended
up with some very counterintuitive timings.

I have four equals methods, and I would have expected their performance
to run (from best to worst):

directInject - stops evaluating == after first bad match, no
intermediate arrays
keysInject - as above, but must create keys array
directCollect - evaluated == for all items all the time, intermediate
array of results
keysCollect - as above, but must create keys array.

Click to expand...

Your *Inject methods don't actually quit early; they just stop
evaluating the second expression once val is false. An explicit break
speeds up the quick-exit case considerably.

def directInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.inject(true) { |val, kv| val && hash2[kv[0]] == kv[1] }

Click to expand...

hash1.inject(true) { |val, kv| val or break; hash2[kv[0]] == kv[1] }

end
def keysInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.keys.inject(true) { |val, k| val && hash1[k] == hash2[k] }

Click to expand...

hash1.keys.inject(true) { |val, k| val or break; hash1[k] == hash2[k] }

end

Click to expand...

I would not use inject in this case anyway. I'd prefer returns for fast
exit:

def directInject(hash1, hash2)
return false unless hash1.size == hash2.size
hash1.each { |k, v| return false unless hash2[k] == v }
true
end

Unless you're using a really old ruby, though, Enumerable#all? should
outperform all of the above.

You mean

def directInject(hash1, hash2)
hash1.size == hash2.size and
hash1.all? { |k, v| hash2[k] == v }
end

Yeah, that's even better.

robert

Ruby Golf: Object Diff	5	Jul 19, 2003
Nuby - help with Ruby object references	1	Feb 24, 2006
Need for speed -> a C extension?	27	Apr 18, 2011
Adventures in Optimization.. When a Ruby primitive is the Hog.	0	Dec 2, 2008
performance and style advice requested	12	Sep 14, 2003
sqlite3 import performance	1	Sep 5, 2008
subclass of hash, that times last read	2	May 28, 2008
questions of idiom	3	Jun 7, 2010

Performance of Hash.inject

Daniel Sheppard

George Ogata

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads