kryglik said:
Well, insertion order will be same, won't be?
Not necessarily. If Marshal.dump uses the current order in the hash for
writing then the insertion order during loading is likely to be
different from the original insertion order.
h = {}
h_slovo = []
if not true #HERE YOU SWITCH IF GENERATE/LOAD
for x in 1..40_000
if x % 2000 == 0
puts x
end
slovo = ""
for z in 1..32
slovo += (rand(25)+65).chr
end
if rand(10) == 3
h_slovo << slovo
end
h.update slovo => rand(1_000_000)
slovo = nil
end
f = File.new("hash.marshal","w+")
f.puts(Marshal.dump(h))
f.close
f = File.new("hash1.marshal","w+")
f.puts(Marshal.dump(h_slovo))
f.close
else
puts "loading hash"
h = Marshal.load(File.open("hash.marshal","r"))
puts "loading hash1"
h_slovo = Marshal.load(File.open("hash1.marshal","r"))
puts "done"
end
t1 = Time.now
for slovo1 in h_slovo
h.has_key? h_slovo
end
t2 = Time.now
puts "hash: " + h.size.to_s
puts "hled: " + h_slovo.size.to_s
puts t2-t1
Your time measurement is far too unprecise on a modern system. You'll
have to repeat lookups to get more accurate results. I get a difference
between 5 and 10 percent with the attached code - regardless of GC
enabled or disabled. Strange though that the copy is always slower.
This is an interesting issue.
Note, if you freeze keys lookups in the original hash are much faster
because then hash keys are not copied on insertion (an optimisation of
unfrozen Strings as hash keys), i.e., during lookup if the key is the
same it's also the same object which will be the first test - and of
course this test is much faster than sequential comparison of characters.
Another note, a more realistic scenario would be if not 10% of all keys
were used for lookup but if there also were keys that are not present in
the hash.
Kind regards
robert