Performance of Hash.inject

D

Daniel Sheppard

I was doing some playing with some implementations for Hash#== and ended
up with some very counterintuitive timings.

I have four equals methods, and I would have expected their performance
to run (from best to worst):

directInject - stops evaluating == after first bad match, no
intermediate arrays
keysInject - as above, but must create keys array
directCollect - evaluated == for all items all the time, intermediate
array of results
keysCollect - as above, but must create keys array.

Actually running the code however, I ended up with the following
timings:

keysCollect
equal 0.18
not equal 0.11

directCollect
equal 0.221
not equal 0.24

directInject
equal 0.31
not equal 0.221

keysInject
equal 0.24
not equal 0.15

Can anybody shed any light on why it's not performing as I would assume?


-----

def keysCollect(hash1, hash2)
hash1.size == hash2.size &&
!(hash1.keys.collect{|k| !hash2[k].nil? && hash1[k] == hash2[k]
}.include?(false))
end
def directCollect(hash1, hash2)
hash1.size == hash2.size &&
!(hash1.collect{|k,v| hash2[k] == v }.include?(false))
end
def directInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.inject(true) { |val, kv| val && hash2[kv[0]] == kv[1] }
end
def keysInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.keys.inject(true) { |val, k| val && hash1[k] == hash2[k] }
end

def time(p, *args)
time = Time.now
1000.times { p.call *args}
Time.now - time
end

hash1 = Hash.new
hash2 = Hash.new
xx = 'a'
50.times { hash1[xx] = xx; xx = xx.next }
50.times { hash2[xx] = xx; xx = xx.next }

[:keysCollect, :directCollect, :directInject, :keysInject].each { |s|
puts s
puts "equal #{time(method(s), hash1, hash1)}"
puts "not equal #{time(method(s), hash1, hash2)}"
}
#####################################################################################
This email has been scanned by MailMarshal, an email content filter.
#####################################################################################
 
G

George Ogata

Daniel Sheppard said:
I was doing some playing with some implementations for Hash#== and ended
up with some very counterintuitive timings.

I have four equals methods, and I would have expected their performance
to run (from best to worst):

directInject - stops evaluating == after first bad match, no
intermediate arrays
keysInject - as above, but must create keys array
directCollect - evaluated == for all items all the time, intermediate
array of results
keysCollect - as above, but must create keys array.

Your *Inject methods don't actually quit early; they just stop
evaluating the second expression once val is false. An explicit break
speeds up the quick-exit case considerably.
def directInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.inject(true) { |val, kv| val && hash2[kv[0]] == kv[1] }
hash1.inject(true) { |val, kv| val or break; hash2[kv[0]] == kv[1] }
end
def keysInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.keys.inject(true) { |val, k| val && hash1[k] == hash2[k] }
hash1.keys.inject(true) { |val, k| val or break; hash1[k] == hash2[k] }

Unless you're using a really old ruby, though, Enumerable#all? should
outperform all of the above.

---- revised code ----

def keysCollect(hash1, hash2)
hash1.size == hash2.size &&
!(hash1.keys.collect{|k| !hash2[k].nil? && hash1[k] == hash2[k]
}.include?(false))
end
def directCollect(hash1, hash2)
hash1.size == hash2.size &&
!(hash1.collect{|k,v| hash2[k] == v }.include?(false))
end
def directInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.inject(true) { |val, kv| val or break; hash2[kv[0]] == kv[1] }
end
def keysInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.keys.inject(true) { |val, k| val or break; hash1[k] == hash2[k] }
end
def using_all hash1, hash2
hash1.size == hash2.size &&
hash1.all?{|k, v| hash2[k] == v}
end

def time(p, *args)
time = Time.now
1000.times { p.call *args}
Time.now - time
end

hash1 = Hash.new
hash2 = Hash.new
xx = 'a'
2000.times { hash1[xx] = xx; xx = xx.next }
2000.times { hash2[xx] = xx; xx = xx.next }

[:keysCollect, :directCollect, :directInject, :keysInject,
:using_all].each { |s|
puts
puts s
puts "equal #{time(method(s), hash1, hash1)}"
puts "not equal #{time(method(s), hash1, hash2)}"
}

---- output ----

keysCollect
equal 3.071011
not equal 1.992086

directCollect
equal 3.211872
not equal 3.447227

directInject
equal 4.202059
not equal 0.004962

keysInject
equal 3.343774
not equal 0.147189

using_all
equal 2.84662
not equal 0.003133

----

Incidentally, if you're using this to compare
equality-of-Hash-sans-default-proc, none of these are adequate in the
general case since keys may exist that map to the default value.

Perhaps:

def using_all hash1, hash2
hash1.size == hash2.size &&
hash1.all?{|k, v| hash2.has_key?(k) && hash2[k] == v}
end

Or, as mentioned elsewhere, the shorter:

{}.update(h1) == {}.update(h2)

.... which seems to have better worst-case, but worse best-case
performance.

HTH
 
R

Robert Klemme

George Ogata said:
Daniel Sheppard said:
I was doing some playing with some implementations for Hash#== and ended
up with some very counterintuitive timings.

I have four equals methods, and I would have expected their performance
to run (from best to worst):

directInject - stops evaluating == after first bad match, no
intermediate arrays
keysInject - as above, but must create keys array
directCollect - evaluated == for all items all the time, intermediate
array of results
keysCollect - as above, but must create keys array.

Your *Inject methods don't actually quit early; they just stop
evaluating the second expression once val is false. An explicit break
speeds up the quick-exit case considerably.
def directInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.inject(true) { |val, kv| val && hash2[kv[0]] == kv[1] }
hash1.inject(true) { |val, kv| val or break; hash2[kv[0]] == kv[1] }
end
def keysInject(hash1, hash2)
hash1.size == hash2.size &&
hash1.keys.inject(true) { |val, k| val && hash1[k] == hash2[k] }
hash1.keys.inject(true) { |val, k| val or break; hash1[k] == hash2[k] }

I would not use inject in this case anyway. I'd prefer returns for fast
exit:

def directInject(hash1, hash2)
return false unless hash1.size == hash2.size
hash1.each { |k, v| return false unless hash2[k] == v }
true
end
Unless you're using a really old ruby, though, Enumerable#all? should
outperform all of the above.

You mean

def directInject(hash1, hash2)
hash1.size == hash2.size and
hash1.all? { |k, v| hash2[k] == v }
end

Yeah, that's even better.

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top