Adventures in Optimization... or why CONST frozen is Good

Discussion in 'Ruby' started by John Carter, Dec 1, 2008.

  1. John Carter

    John Carter Guest

    ...or when a language design level optimization is a pessimization.

    Ruby allows destructive string operations. String instance methods
    with a "Bang!" at the end.

    Consider this code.

    a = ['froot']
    b=a.first
    c = {"d"=>b}

    Now a[0], b, c["d"] refer to _exactly_ the same string instance

    > a[0].object_id

    => -605300798
    > b.object_id

    => -605300798
    > c["d"].object_id

    => -605300798

    So if I do a destructive operation on any of them, all are clobbered.

    a.last.sub!(/oo/,"ui")
    => "fruit"
    irb(main):009:0> b
    => "fruit"
    irb(main):010:0> c
    => {"d"=>"fruit"}

    Traditionally destructive ops have been allowed in languages such as
    Lisp etc. as an optimization. You don't have to "new" a new object
    instance if you don't want to.

    The other day I was optimizing my code, when I decided to hunt
    unnecessary object allocation.

    I used my MemoryProfiler snippet to find that String's were by far the
    most common object I was generating.

    http://rubyforge.org/snippet/detail.php?type=snippet&id=70

    So I extended that to find _which_ was the most common string I was
    generating.

    def MemoryProfile::string_duplicates
    Dir.chdir "/tmp"
    ObjectSpace::garbage_collect
    sleep 10 # Give the GC thread a chance

    tally = Hash.new(0)
    ObjectSpace.each_object do |obj|
    next if obj.class != String
    tally[obj]+=1
    end

    open( LOG_FILE, 'a') do |outf|
    outf.puts '='*70
    outf.puts "
    String Duplicates report for #{$0}


    "
    tally.keys.find_all{|s| tally > 1}.sort_by{|s| tally}.each do |s|
    outf.puts "#{s}\t#{tally}"
    end
    end
    end


    The answer, by a long shot, was "U".

    Somewhere in my code I had the line
    symbols_needed[symbol_name] = 'U'

    I could replace that with the symbol :U, but other places that had
    Good Reasons of using strings would break.

    Now I have a class CONSTANT...
    UNDEFINED = 'U'.freeze

    and
    symbols_needed[symbol_name] = UNDEFINED

    Of course, if anywhere I apply a destructive op to one of those
    thousands of references, my code will die.

    Bit at least the "freeze" will cause a loud and messy death, not a
    subtle and hidden bug.

    So as I said at the start, the optimization to allow the occasional
    destructive op to a string... can be a pessimization in every case where
    you assign a string literal.

    a= "froot"
    => "froot"
    irb(main):002:0> a.object_id
    => -605331808
    irb(main):003:0> a= "froot"
    => "froot"
    irb(main):004:0> a.object_id
    => -605352198


    John Carter Phone : (64)(3) 358 6639
    Tait Electronics Fax : (64)(3) 359 4632
    PO Box 1645 Christchurch Email :
    New Zealand
    John Carter, Dec 1, 2008
    #1
    1. Advertising

  2. John Carter

    ara.t.howard Guest

    On Dec 1, 2008, at 4:53 PM, John Carter wrote:

    > ...or when a language design level optimization is a pessimization.
    >
    > Ruby allows destructive string operations. String instance methods
    > with a "Bang!" at the end.
    >
    > Consider this code.
    >
    > a = ['froot']
    > b=a.first
    > c = {"d"=>b}
    >
    > Now a[0], b, c["d"] refer to _exactly_ the same string instance
    >
    >> a[0].object_id

    > => -605300798
    >> b.object_id

    > => -605300798
    >> c["d"].object_id

    > => -605300798
    >
    > So if I do a destructive operation on any of them, all are clobbered.
    >
    > a.last.sub!(/oo/,"ui")
    > => "fruit"
    > irb(main):009:0> b
    > => "fruit"
    > irb(main):010:0> c
    > => {"d"=>"fruit"}
    >
    > Traditionally destructive ops have been allowed in languages such as
    > Lisp etc. as an optimization. You don't have to "new" a new object
    > instance if you don't want to.
    >
    > The other day I was optimizing my code, when I decided to hunt
    > unnecessary object allocation.
    >
    > I used my MemoryProfiler snippet to find that String's were by far the
    > most common object I was generating.
    >
    > http://rubyforge.org/snippet/detail.php?type=snippet&id=70
    >
    > So I extended that to find _which_ was the most common string I was
    > generating.
    >
    > def MemoryProfile::string_duplicates
    > Dir.chdir "/tmp"
    > ObjectSpace::garbage_collect
    > sleep 10 # Give the GC thread a chance
    >
    > tally = Hash.new(0)
    > ObjectSpace.each_object do |obj|
    > next if obj.class != String
    > tally[obj]+=1
    > end
    >
    > open( LOG_FILE, 'a') do |outf|
    > outf.puts '='*70
    > outf.puts "
    > String Duplicates report for #{$0}
    >
    >
    > "
    > tally.keys.find_all{|s| tally > 1}.sort_by{|s|
    > tally}.each do |s|
    > outf.puts "#{s}\t#{tally}"
    > end
    > end
    > end
    >
    >
    > The answer, by a long shot, was "U".
    >
    > Somewhere in my code I had the line
    > symbols_needed[symbol_name] = 'U'
    >
    > I could replace that with the symbol :U, but other places that had
    > Good Reasons of using strings would break.
    >
    > Now I have a class CONSTANT...
    > UNDEFINED = 'U'.freeze
    >
    > and
    > symbols_needed[symbol_name] = UNDEFINED
    >
    > Of course, if anywhere I apply a destructive op to one of those
    > thousands of references, my code will die.
    >
    > Bit at least the "freeze" will cause a loud and messy death, not a
    > subtle and hidden bug.
    >
    > So as I said at the start, the optimization to allow the occasional
    > destructive op to a string... can be a pessimization in every case
    > where
    > you assign a string literal.
    >
    > a= "froot"
    > => "froot"
    > irb(main):002:0> a.object_id
    > => -605331808
    > irb(main):003:0> a= "froot"
    > => "froot"
    > irb(main):004:0> a.object_id
    > => -605352198
    >
    >
    > John Carter Phone : (64)(3) 358 6639
    > Tait Electronics Fax : (64)(3) 359 4632
    > PO Box 1645 Christchurch Email :
    > New Zealand
    >
    >


    thanks john. interesting.

    a @ http://codeforpeople.com/
    --
    we can deny everything, except that we have the possibility of being
    better. simply reflect on that.
    h.h. the 14th dalai lama
    ara.t.howard, Dec 1, 2008
    #2
    1. Advertising

  3. > Now I have a class CONSTANT...
    > UNDEFINED = 'U'.freeze
    >
    > and
    > symbols_needed[symbol_name] = UNDEFINED


    Yes, that's the "right" solution with Ruby today, and you'll see this
    done in a lot of Ruby libraries. (Perhaps it would be nice if there were
    some syntax to define an inline frozen string literal)

    I don't consider this any sort of "optimisation" though. It's
    fundamental to the nature of Ruby that there is only one kind of value,
    which is a reference to an object. An assignment always copies only the
    reference.

    This is a breath of fresh air when compared to, say, Perl. Is this value
    a scalar? Is it a scalar number or string, or a reference to an Array or
    a Hash, or a typeglob, or a filehandle, or ...?

    However, you could argue that string literals should have been immutable
    (like Symbol). The language would end up being somewhat different to
    use:

    a = "hello" # maybe Symbol or StringLiteral
    b = String.new(a) # mutable String
    b << " world"

    You'd also have to have a load of rules to work out. Should a.dup return
    the same Symbol, or a new mutable String? Should (a + "world") return a
    new Symbol, or a new mutable String?

    From this point of view, just having String keeps things simple, even if
    it does end up creating a load of garbage objects. In those cases where
    this matters, your approach (of profiling and zapping) is a good one.
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Dec 2, 2008
    #3
  4. 2008/12/2 Brian Candler <>:
    >> Now I have a class CONSTANT...
    >> UNDEFINED = 'U'.freeze
    >>
    >> and
    >> symbols_needed[symbol_name] = UNDEFINED

    >
    > Yes, that's the "right" solution with Ruby today, and you'll see this
    > done in a lot of Ruby libraries. (Perhaps it would be nice if there were
    > some syntax to define an inline frozen string literal)
    >
    > I don't consider this any sort of "optimisation" though. It's
    > fundamental to the nature of Ruby that there is only one kind of value,
    > which is a reference to an object. An assignment always copies only the
    > reference.


    Completely agree, this comes as no surprise. Actually, this is an
    obvious design decision, if you want to use the same value to denote a
    particular state then just use one object.

    Another, probably more subtle issue is this:

    irb(main):001:0> s="foo"
    => "foo"
    irb(main):002:0> h={s=>1}
    => {"foo"=>1}
    irb(main):003:0> s.equal? h.keys.first
    => false
    irb(main):004:0> [s.object_id, h.keys.first.object_id]
    => [1073539250, 1073539280]
    irb(main):005:0> s.freeze
    => "foo"
    irb(main):006:0> h={s=>1}
    => {"foo"=>1}
    irb(main):007:0> s.equal? h.keys.first
    => true
    irb(main):008:0> [s.object_id, h.keys.first.object_id]
    => [1073539250, 1073539250]

    In other words, there is a hidden dup going on if the Hash key is a
    String which is not frozen.

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    Robert Klemme, Dec 2, 2008
    #4
  5. John Carter

    F. Senault Guest

    Le 02 décembre à 16:33, Robert Klemme a écrit :

    > In other words, there is a hidden dup going on if the Hash key is a
    > String which is not frozen.


    For some values of hidden :

    16:41 grappa:~> qri 'Hash#[]='
    --------------------------------------------------------------- Hash#[]=
    hsh[key] = value => value
    hsh.store(key, value) => value
    ------------------------------------------------------------------------
    Element Assignment---Associates the value given by value with the
    key given by key. key should not have its value changed while it
    is in use as a key (a String passed as a key will be duplicated
    and frozen).

    Fred
    --
    Assignments: telling a variable what it stands for, and/or what value(s)
    it should have is coercive and paternalistic: variables should be free
    to choose their own names and value-sets from a range of non-sexist,
    non-racist options. (Tanuki in the SDM, on politically-correct coding)
    F. Senault, Dec 2, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,761
    Smokey Grindel
    Dec 2, 2006
  2. est
    Replies:
    1
    Views:
    537
  3. LC Geldenhuys
    Replies:
    5
    Views:
    249
    Robert Klemme
    Feb 18, 2004
  4. John Carter
    Replies:
    0
    Views:
    104
    John Carter
    Dec 2, 2008
  5. John Carter
    Replies:
    0
    Views:
    93
    John Carter
    Dec 2, 2008
Loading...

Share This Page