Adventures in Optimization... or why CONST frozen is Good

J

John Carter

...or when a language design level optimization is a pessimization.

Ruby allows destructive string operations. String instance methods
with a "Bang!" at the end.

Consider this code.

a = ['froot']
b=a.first
c = {"d"=>b}

Now a[0], b, c["d"] refer to _exactly_ the same string instance
a[0].object_id => -605300798
b.object_id => -605300798
c["d"].object_id
=> -605300798

So if I do a destructive operation on any of them, all are clobbered.

a.last.sub!(/oo/,"ui")
=> "fruit"
irb(main):009:0> b
=> "fruit"
irb(main):010:0> c
=> {"d"=>"fruit"}

Traditionally destructive ops have been allowed in languages such as
Lisp etc. as an optimization. You don't have to "new" a new object
instance if you don't want to.

The other day I was optimizing my code, when I decided to hunt
unnecessary object allocation.

I used my MemoryProfiler snippet to find that String's were by far the
most common object I was generating.

http://rubyforge.org/snippet/detail.php?type=snippet&id=70

So I extended that to find _which_ was the most common string I was
generating.

def MemoryProfile::string_duplicates
Dir.chdir "/tmp"
ObjectSpace::garbage_collect
sleep 10 # Give the GC thread a chance

tally = Hash.new(0)
ObjectSpace.each_object do |obj|
next if obj.class != String
tally[obj]+=1
end

open( LOG_FILE, 'a') do |outf|
outf.puts '='*70
outf.puts "
String Duplicates report for #{$0}


"
tally.keys.find_all{|s| tally > 1}.sort_by{|s| tally}.each do |s|
outf.puts "#{s}\t#{tally}"
end
end
end


The answer, by a long shot, was "U".

Somewhere in my code I had the line
symbols_needed[symbol_name] = 'U'

I could replace that with the symbol :U, but other places that had
Good Reasons of using strings would break.

Now I have a class CONSTANT...
UNDEFINED = 'U'.freeze

and
symbols_needed[symbol_name] = UNDEFINED

Of course, if anywhere I apply a destructive op to one of those
thousands of references, my code will die.

Bit at least the "freeze" will cause a loud and messy death, not a
subtle and hidden bug.

So as I said at the start, the optimization to allow the occasional
destructive op to a string... can be a pessimization in every case where
you assign a string literal.

a= "froot"
=> "froot"
irb(main):002:0> a.object_id
=> -605331808
irb(main):003:0> a= "froot"
=> "froot"
irb(main):004:0> a.object_id
=> -605352198


John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand
 
A

ara.t.howard

...or when a language design level optimization is a pessimization.

Ruby allows destructive string operations. String instance methods
with a "Bang!" at the end.

Consider this code.

a = ['froot']
b=a.first
c = {"d"=>b}

Now a[0], b, c["d"] refer to _exactly_ the same string instance
a[0].object_id => -605300798
b.object_id => -605300798
c["d"].object_id
=> -605300798

So if I do a destructive operation on any of them, all are clobbered.

a.last.sub!(/oo/,"ui")
=> "fruit"
irb(main):009:0> b
=> "fruit"
irb(main):010:0> c
=> {"d"=>"fruit"}

Traditionally destructive ops have been allowed in languages such as
Lisp etc. as an optimization. You don't have to "new" a new object
instance if you don't want to.

The other day I was optimizing my code, when I decided to hunt
unnecessary object allocation.

I used my MemoryProfiler snippet to find that String's were by far the
most common object I was generating.

http://rubyforge.org/snippet/detail.php?type=snippet&id=70

So I extended that to find _which_ was the most common string I was
generating.

def MemoryProfile::string_duplicates
Dir.chdir "/tmp"
ObjectSpace::garbage_collect
sleep 10 # Give the GC thread a chance

tally = Hash.new(0)
ObjectSpace.each_object do |obj|
next if obj.class != String
tally[obj]+=1
end

open( LOG_FILE, 'a') do |outf|
outf.puts '='*70
outf.puts "
String Duplicates report for #{$0}


"
tally.keys.find_all{|s| tally > 1}.sort_by{|s|
tally}.each do |s|
outf.puts "#{s}\t#{tally}"
end
end
end


The answer, by a long shot, was "U".

Somewhere in my code I had the line
symbols_needed[symbol_name] = 'U'

I could replace that with the symbol :U, but other places that had
Good Reasons of using strings would break.

Now I have a class CONSTANT...
UNDEFINED = 'U'.freeze

and
symbols_needed[symbol_name] = UNDEFINED

Of course, if anywhere I apply a destructive op to one of those
thousands of references, my code will die.

Bit at least the "freeze" will cause a loud and messy death, not a
subtle and hidden bug.

So as I said at the start, the optimization to allow the occasional
destructive op to a string... can be a pessimization in every case
where
you assign a string literal.

a= "froot"
=> "froot"
irb(main):002:0> a.object_id
=> -605331808
irb(main):003:0> a= "froot"
=> "froot"
irb(main):004:0> a.object_id
=> -605352198


John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand


thanks john. interesting.

a @ http://codeforpeople.com/
 
B

Brian Candler

Now I have a class CONSTANT...
UNDEFINED = 'U'.freeze

and
symbols_needed[symbol_name] = UNDEFINED

Yes, that's the "right" solution with Ruby today, and you'll see this
done in a lot of Ruby libraries. (Perhaps it would be nice if there were
some syntax to define an inline frozen string literal)

I don't consider this any sort of "optimisation" though. It's
fundamental to the nature of Ruby that there is only one kind of value,
which is a reference to an object. An assignment always copies only the
reference.

This is a breath of fresh air when compared to, say, Perl. Is this value
a scalar? Is it a scalar number or string, or a reference to an Array or
a Hash, or a typeglob, or a filehandle, or ...?

However, you could argue that string literals should have been immutable
(like Symbol). The language would end up being somewhat different to
use:

a = "hello" # maybe Symbol or StringLiteral
b = String.new(a) # mutable String
b << " world"

You'd also have to have a load of rules to work out. Should a.dup return
the same Symbol, or a new mutable String? Should (a + "world") return a
new Symbol, or a new mutable String?

From this point of view, just having String keeps things simple, even if
it does end up creating a load of garbage objects. In those cases where
this matters, your approach (of profiling and zapping) is a good one.
 
R

Robert Klemme

2008/12/2 Brian Candler said:
Now I have a class CONSTANT...
UNDEFINED = 'U'.freeze

and
symbols_needed[symbol_name] = UNDEFINED

Yes, that's the "right" solution with Ruby today, and you'll see this
done in a lot of Ruby libraries. (Perhaps it would be nice if there were
some syntax to define an inline frozen string literal)

I don't consider this any sort of "optimisation" though. It's
fundamental to the nature of Ruby that there is only one kind of value,
which is a reference to an object. An assignment always copies only the
reference.

Completely agree, this comes as no surprise. Actually, this is an
obvious design decision, if you want to use the same value to denote a
particular state then just use one object.

Another, probably more subtle issue is this:

irb(main):001:0> s="foo"
=> "foo"
irb(main):002:0> h={s=>1}
=> {"foo"=>1}
irb(main):003:0> s.equal? h.keys.first
=> false
irb(main):004:0> [s.object_id, h.keys.first.object_id]
=> [1073539250, 1073539280]
irb(main):005:0> s.freeze
=> "foo"
irb(main):006:0> h={s=>1}
=> {"foo"=>1}
irb(main):007:0> s.equal? h.keys.first
=> true
irb(main):008:0> [s.object_id, h.keys.first.object_id]
=> [1073539250, 1073539250]

In other words, there is a hidden dup going on if the Hash key is a
String which is not frozen.

Kind regards

robert
 
F

F. Senault

Le 02 décembre à 16:33, Robert Klemme a écrit :
In other words, there is a hidden dup going on if the Hash key is a
String which is not frozen.

For some values of hidden :

16:41 grappa:~> qri 'Hash#[]='
--------------------------------------------------------------- Hash#[]=
hsh[key] = value => value
hsh.store(key, value) => value
------------------------------------------------------------------------
Element Assignment---Associates the value given by value with the
key given by key. key should not have its value changed while it
is in use as a key (a String passed as a key will be duplicated
and frozen).

Fred
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top