Some odd ball optimization tricks....

J

John Carter

[Note: parts of this message were removed to make it a legal post.]

Here are two idioms which you may find useful, or at least curious, when
trying to optimize the heck out of a Ruby script....

require 'pp'

# Similar to the "autosequence" facility in SQL.
# Useful for replacing a complex key with a very simply POD proxy object.
file_sequence = 0
file_index = Hash.new{|hash,key| file_sequence+=1;hash[key] = file_sequence}


p file_index["foo"]
p file_index["foo"]
p file_index["bah"]
p file_index["foo"]
p file_index["bah"]
pp file_index


# Similar to "to_sym", but can cope with spaces and weird characters...
# So if tom1, tom2, tom3... go out of scope they can be garbage collected...
# so if you end up just holding tomtom, you have only one copy of "tom" and
you can test for
# equality with .object_id == .object_id!
one_true_string = Hash.new{|hash,string| hash[string] = string}

tom1 = "tom"
tom2 = "tom"
tom3 = "#{tom1}"
tom4 = tom2.clone
tom5 = "t"+"o"+"m"

toms = [tom1, tom2, tom3, tom4, tom5]

pp toms
pp toms.collect{|tom| tom.object_id }
tomtom = toms.collect{|tom| one_true_string[tom] }
pp tomtom
pp tomtom.collect{|tom| tom.object_id }

Running the above results in...
1
1
2
1
2
{"foo"=>1, "bah"=>2}
["tom", "tom", "tom", "tom", "tom"]
[-609503848, -609503908, -609503888, -609503918, -609503978]
["tom", "tom", "tom", "tom", "tom"]
[-609503848, -609503848, -609503848, -609503848, -609503848]

--
John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.
=======================================================================
 
D

David Masover

# Similar to the "autosequence" facility in SQL.
# Useful for replacing a complex key with a very simply POD proxy object.
file_sequence =3D 0
file_index =3D Hash.new{|hash,key| file_sequence+=3D1;hash[key] =3D
file_sequence}
=20
=20
p file_index["foo"]
p file_index["foo"]
p file_index["bah"]
p file_index["foo"]
p file_index["bah"]
pp file_index

I guess I can think of a few rare instances this might be useful, but I've=
=20
never had the keys be the bottleneck, or re-used keys enough in a script fo=
r=20
this to matter. Still, interesting.
# Similar to "to_sym", but can cope with spaces and weird characters...

=2E..what?

irb> 'foo bar'.to_sym
=3D> :"foo bar"
"a@$bc=E2=98=83!d\t".to_sym
=3D> :"a@$bc=E2=98=83!d\t"

Worst thing that happens is 1.8 turns my beautiful Unicode into ugly hex=20
escapes when pretty-printing, since it's all just a binary string to 1.8. B=
ut=20
if I print it straight out with puts, I get the original Unicode stuff back=
,=20
and 1.9.1 handles this gracefully.

I mean, I've got a friggin' SNOWMAN in there. Just what characters have you=
=20
discovered that you can't make a string out of?

And, as it suggests, you can use :"foo" or :'foo' as shorthand for the abov=
e.
# So if tom1, tom2, tom3... go out of scope they can be garbage
collected...

Same is true of any string you call to_sym on.
you can test for
# equality with .object_id =3D=3D .object_id!

I don't know if String is smart enough to do that (I'd hope so), but Symbol=
=20
certainly would be.

So what does this solve over just using symbols, other than the fact that y=
ou=20
could manually prune the one_true_string hash?
 
R

Robert Klemme

2010/8/26 John Carter said:
Here are two idioms which you may find useful, or at least curious, when
trying to optimize the heck out of a Ruby script....

require 'pp'

# Similar to the "autosequence" facility in SQL.
# Useful for replacing a complex key with a very simply POD proxy object.
file_sequence = 0
file_index = Hash.new{|hash,key| file_sequence+=1;hash[key] = file_sequence}


p file_index["foo"]
p file_index["foo"]
p file_index["bah"]
p file_index["foo"]
p file_index["bah"]
pp file_index

I prefer

irb(main):001:0> file_index = Hash.new {|h,k| h[k] = h.size}
=> {}
irb(main):002:0> file_index["foo"]
=> 0
irb(main):003:0> file_index["foo"]
=> 0
irb(main):004:0> file_index["bah"]
=> 1
irb(main):005:0> file_index["foo"]
=> 0
irb(main):006:0> file_index["bah"]
=> 1
irb(main):007:0> file_index
=> {"foo"=>0, "bah"=>1}
# Similar to "to_sym", but can cope with spaces and weird characters...
# So if tom1, tom2, tom3... go out of scope they can be garbage collected...
# so if you end up just holding tomtom, you have only one copy of "tom" and
you can test for
# equality with .object_id == .object_id!

We have #equal? for that.
one_true_string = Hash.new{|hash,string| hash[string] = string}

Note that this wastes a bit of memory because of the unfrozen string
Hash optimization:

irb(main):009:0> one_true_string = Hash.new{|hash,string| hash[string] = string}
=> {}
irb(main):010:0> one_true_string["a"]
=> "a"
irb(main):011:0> one_true_string.each {|k,v| puts k.equal? v}
false
=> {"a"=>"a"}

Better freeze the string:

irb(main):012:0> one_true_string = Hash.new{|hash,string|
hash[string.freeze] = string}
=> {}
irb(main):013:0> one_true_string["a"]
=> "a"
irb(main):014:0> one_true_string.each {|k,v| puts k.equal? v}
true
=> {"a"=>"a"}

Cheers

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,058
Latest member
QQXCharlot

Latest Threads

Top