strange speeds

Discussion in 'Ruby' started by kryglik, Jan 2, 2006.

  1. kryglik

    kryglik Guest

    Hello ruby people,
    I've tried to make some speed test in ruby to find out how fast is
    searching in hash (using has_key?)

    The results were OK but I discovered curious thing -- when i generate
    big hash the speed is different when i unserialize (marshal.load) the
    same hash.

    Unserialized Hash is 1.5x faster in has_key? method. Tried it with
    GC.disable as well, results were same.

    What am I missing?

    Thanx for any ideas
     
    kryglik, Jan 2, 2006
    #1
    1. Advertising

  2. kryglik <> wrote:
    > Hello ruby people,
    > I've tried to make some speed test in ruby to find out how fast is
    > searching in hash (using has_key?)
    >
    > The results were OK but I discovered curious thing -- when i generate
    > big hash the speed is different when i unserialize (marshal.load) the
    > same hash.
    >
    > Unserialized Hash is 1.5x faster in has_key? method. Tried it with
    > GC.disable as well, results were same.
    >
    > What am I missing?
    >
    > Thanx for any ideas


    The first reason that comes to mind is different insertion order. But I
    hardly believe it could make such a big difference. Care to post your
    testing code?

    Kind regards

    robert
     
    Robert Klemme, Jan 2, 2006
    #2
    1. Advertising

  3. kryglik

    kryglik Guest

    Robert Klemme wrote:
    > kryglik <> wrote:
    >> Hello ruby people,
    >> I've tried to make some speed test in ruby to find out how fast is
    >> searching in hash (using has_key?)
    >>
    >> The results were OK but I discovered curious thing -- when i generate
    >> big hash the speed is different when i unserialize (marshal.load) the
    >> same hash.
    >>
    >> Unserialized Hash is 1.5x faster in has_key? method. Tried it with
    >> GC.disable as well, results were same.
    >>
    >> What am I missing?
    >>
    >> Thanx for any ideas

    >
    > The first reason that comes to mind is different insertion order. But I
    > hardly believe it could make such a big difference. Care to post your
    > testing code?
    >
    > Kind regards
    >
    > robert
    >


    Well, insertion order will be same, won't be?

    h = {}
    h_slovo = []

    if not true #HERE YOU SWITCH IF GENERATE/LOAD
    for x in 1..40_000
    if x % 2000 == 0
    puts x
    end

    slovo = ""

    for z in 1..32
    slovo += (rand(25)+65).chr
    end

    if rand(10) == 3
    h_slovo << slovo
    end

    h.update slovo => rand(1_000_000)
    slovo = nil
    end

    f = File.new("hash.marshal","w+")
    f.puts(Marshal.dump(h))
    f.close

    f = File.new("hash1.marshal","w+")
    f.puts(Marshal.dump(h_slovo))
    f.close
    else

    puts "loading hash"
    h = Marshal.load(File.open("hash.marshal","r"))
    puts "loading hash1"
    h_slovo = Marshal.load(File.open("hash1.marshal","r"))
    puts "done"
    end

    t1 = Time.now

    for slovo1 in h_slovo
    h.has_key? h_slovo
    end

    t2 = Time.now

    puts "hash: " + h.size.to_s
    puts "hled: " + h_slovo.size.to_s
    puts t2-t1
     
    kryglik, Jan 2, 2006
    #3
  4. kryglik <> wrote:
    > Robert Klemme wrote:
    >> kryglik <> wrote:
    >>> Hello ruby people,
    >>> I've tried to make some speed test in ruby to find out how fast is
    >>> searching in hash (using has_key?)
    >>>
    >>> The results were OK but I discovered curious thing -- when i
    >>> generate big hash the speed is different when i unserialize
    >>> (marshal.load) the same hash.
    >>>
    >>> Unserialized Hash is 1.5x faster in has_key? method. Tried it with
    >>> GC.disable as well, results were same.
    >>>
    >>> What am I missing?
    >>>
    >>> Thanx for any ideas

    >>
    >> The first reason that comes to mind is different insertion order.
    >> But I hardly believe it could make such a big difference. Care to
    >> post your testing code?
    >>
    >> Kind regards
    >>
    >> robert
    >>

    >
    > Well, insertion order will be same, won't be?


    Not necessarily. If Marshal.dump uses the current order in the hash for
    writing then the insertion order during loading is likely to be different
    from the original insertion order.

    > h = {}
    > h_slovo = []
    >
    > if not true #HERE YOU SWITCH IF GENERATE/LOAD
    > for x in 1..40_000
    > if x % 2000 == 0
    > puts x
    > end
    >
    > slovo = ""
    >
    > for z in 1..32
    > slovo += (rand(25)+65).chr
    > end
    >
    > if rand(10) == 3
    > h_slovo << slovo
    > end
    >
    > h.update slovo => rand(1_000_000)
    > slovo = nil
    > end
    >
    > f = File.new("hash.marshal","w+")
    > f.puts(Marshal.dump(h))
    > f.close
    >
    > f = File.new("hash1.marshal","w+")
    > f.puts(Marshal.dump(h_slovo))
    > f.close
    > else
    >
    > puts "loading hash"
    > h = Marshal.load(File.open("hash.marshal","r"))
    > puts "loading hash1"
    > h_slovo = Marshal.load(File.open("hash1.marshal","r"))
    > puts "done"
    > end
    >
    > t1 = Time.now
    >
    > for slovo1 in h_slovo
    > h.has_key? h_slovo
    > end
    >
    > t2 = Time.now
    >
    > puts "hash: " + h.size.to_s
    > puts "hled: " + h_slovo.size.to_s
    > puts t2-t1


    Your time measurement is far too unprecise on a modern system. You'll have
    to repeat lookups to get more accurate results. I get a difference between 5
    and 10 percent with the attached code - regardless of GC enabled or
    disabled. Strange though that the copy is always slower. This is an
    interesting issue.

    Note, if you freeze keys lookups in the original hash are much faster
    because then hash keys are not copied on insertion (an optimisation of
    unfrozen Strings as hash keys), i.e., during lookup if the key is the same
    it's also the same object which will be the first test - and of course this
    test is much faster than sequential comparison of characters.

    Another note, a more realistic scenario would be if not 10% of all keys were
    used for lookup but if there also were keys that are not present in the
    hash.

    Kind regards

    robert
     
    Robert Klemme, Jan 2, 2006
    #4
  5. kryglik

    kryglik Guest

    Hello
    thanx for nice script!

    I find out that i got a little error :eek:) [see the 'for' cycle in
    benchmark part:
    for slovo1 in h_slovo
    h.has_key? h_slovo (SHOULD NOT BE h_slovo BUT slovo1)
    end

    so when I run your script i got confused about speeds like 1-3s instead
    of my 60s. The difference between marshaled and generated was like 35s
    and 60s (10000 keys in 100.000 hash).

    Still the mysteria continues -- why the different speeds.

    And also thanx for lecture of ruby optimization - not quite a guru (yet
    :-D).

    Benchmark is a nice module :)
    Robert Klemme wrote:
    > kryglik <> wrote:
    >
    >> Robert Klemme wrote:
    >>
    >>> kryglik <> wrote:
    >>>
    >>>> Hello ruby people,
    >>>> I've tried to make some speed test in ruby to find out how fast is
    >>>> searching in hash (using has_key?)
    >>>>
    >>>> The results were OK but I discovered curious thing -- when i
    >>>> generate big hash the speed is different when i unserialize
    >>>> (marshal.load) the same hash.
    >>>>
    >>>> Unserialized Hash is 1.5x faster in has_key? method. Tried it with
    >>>> GC.disable as well, results were same.
    >>>>
    >>>> What am I missing?
    >>>>
    >>>> Thanx for any ideas
    >>>
    >>>
    >>> The first reason that comes to mind is different insertion order.
    >>> But I hardly believe it could make such a big difference. Care to
    >>> post your testing code?
    >>>
    >>> Kind regards
    >>>
    >>> robert
    >>>

    >>
    >> Well, insertion order will be same, won't be?

    >
    >
    > Not necessarily. If Marshal.dump uses the current order in the hash for
    > writing then the insertion order during loading is likely to be
    > different from the original insertion order.
    >
    >> h = {}
    >> h_slovo = []
    >>
    >> if not true #HERE YOU SWITCH IF GENERATE/LOAD
    >> for x in 1..40_000
    >> if x % 2000 == 0
    >> puts x
    >> end
    >>
    >> slovo = ""
    >>
    >> for z in 1..32
    >> slovo += (rand(25)+65).chr
    >> end
    >>
    >> if rand(10) == 3
    >> h_slovo << slovo
    >> end
    >>
    >> h.update slovo => rand(1_000_000)
    >> slovo = nil
    >> end
    >>
    >> f = File.new("hash.marshal","w+")
    >> f.puts(Marshal.dump(h))
    >> f.close
    >>
    >> f = File.new("hash1.marshal","w+")
    >> f.puts(Marshal.dump(h_slovo))
    >> f.close
    >> else
    >>
    >> puts "loading hash"
    >> h = Marshal.load(File.open("hash.marshal","r"))
    >> puts "loading hash1"
    >> h_slovo = Marshal.load(File.open("hash1.marshal","r"))
    >> puts "done"
    >> end
    >>
    >> t1 = Time.now
    >>
    >> for slovo1 in h_slovo
    >> h.has_key? h_slovo
    >> end
    >>
    >> t2 = Time.now
    >>
    >> puts "hash: " + h.size.to_s
    >> puts "hled: " + h_slovo.size.to_s
    >> puts t2-t1

    >
    >
    > Your time measurement is far too unprecise on a modern system. You'll
    > have to repeat lookups to get more accurate results. I get a difference
    > between 5 and 10 percent with the attached code - regardless of GC
    > enabled or disabled. Strange though that the copy is always slower.
    > This is an interesting issue.
    >
    > Note, if you freeze keys lookups in the original hash are much faster
    > because then hash keys are not copied on insertion (an optimisation of
    > unfrozen Strings as hash keys), i.e., during lookup if the key is the
    > same it's also the same object which will be the first test - and of
    > course this test is much faster than sequential comparison of characters.
    >
    > Another note, a more realistic scenario would be if not 10% of all keys
    > were used for lookup but if there also were keys that are not present in
    > the hash.
    >
    > Kind regards
    >
    > robert
     
    kryglik, Jan 2, 2006
    #5
  6. kryglik <> wrote:
    > Hello
    > thanx for nice script!
    >
    > I find out that i got a little error :eek:) [see the 'for' cycle in
    > benchmark part:
    > for slovo1 in h_slovo
    > h.has_key? h_slovo (SHOULD NOT BE h_slovo BUT slovo1)
    > end
    >
    > so when I run your script i got confused about speeds like 1-3s
    > instead of my 60s. The difference between marshaled and generated was
    > like 35s and 60s (10000 keys in 100.000 hash).


    In this case it's probably due to hash calculation for the hash you
    accidentally used as key.

    > Still the mysteria continues -- why the different speeds.


    Yeah, I guess we would have to dive into the sources to find out...

    > And also thanx for lecture of ruby optimization - not quite a guru
    > (yet :-D).


    Take your time. :)

    > Benchmark is a nice module :)


    It really is!

    Kind regards

    robert
     
    Robert Klemme, Jan 3, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul

    testing algorithm speeds

    Paul, Oct 19, 2004, in forum: C++
    Replies:
    2
    Views:
    341
  2. Joshua Jung
    Replies:
    2
    Views:
    5,078
    Joshua Jung
    Jun 30, 2006
  3. Chaos

    Urllib2/Pycurl/HTTP Speeds?

    Chaos, Aug 8, 2006, in forum: Python
    Replies:
    0
    Views:
    783
    Chaos
    Aug 8, 2006
  4. Q about ASP speeds

    , Sep 6, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    495
  5. Steve Hawkwind

    execution speeds

    Steve Hawkwind, Sep 11, 2006, in forum: Java
    Replies:
    3
    Views:
    311
    Daniel Dyer
    Sep 11, 2006
Loading...

Share This Page