Why doesn't Ruby "compile" strings?

  • Thread starter Iñaki Baz Castillo
  • Start date
I

Iñaki Baz Castillo

Hi, the following code:

=2D------------------
#!/usr/bin/ruby

require "benchmark"

HELLO_WORLD =3D "hello world"

1.upto(4) do

print "Benchmark using HELLO_WORLD: "
puts Benchmark.realtime { 1.upto(500000) {|i| HELLO_WORLD.upcase } }

print "Benchmark using \"hello_world\": "
puts Benchmark.realtime { 1.upto(500000) {|i| "hello_world".upcase } }

print "Benchmark using 'hello_world': "
puts Benchmark.realtime { 1.upto(500000) {|i| 'hello_world'.upcase } }
=09
end
=2D------------------


gives these results:

=2D-------------------
Benchmark using HELLO_WORLD: 1.1907217502594
Benchmark using "hello_world": 1.53604388237
Benchmark using 'hello_world': 0.816991806030273
Benchmark using HELLO_WORLD: 0.599252462387085
Benchmark using "hello_world": 0.814466714859009
Benchmark using 'hello_world': 0.812573194503784
Benchmark using HELLO_WORLD: 0.595503330230713
Benchmark using "hello_world": 0.813859701156616
Benchmark using 'hello_world': 0.813681602478027
Benchmark using HELLO_WORLD: 0.594272136688232
Benchmark using "hello_world": 0.815742254257202
Benchmark using 'hello_world': 0.811828136444092
=2D-------------------


Let's take the last result so Ruby is "entirely loaded":

=2D-------------------
Benchmark using HELLO_WORLD: 0.594272136688232
Benchmark using "hello_world": 0.815742254257202
Benchmark using 'hello_world': 0.811828136444092
=2D-------------------


This clearly shows that using a constant string is faster than using a stri=
ng=20
writen into the script. So I wonder: why doesn't Ruby "precompile" internal=
ly=20
the strings appearing in the script?

This is, when Ruby interpreter is parsing the script and founds "hello_worl=
d",=20
couldn't it create the string *just once* and keep in memory forever so nex=
t=20
time same string is accessed Ruby doesn't need to initiate it?
Is it imposible due to the design of Ruby?

PS: I don't know if other languages (Python, PHP, Perlo...) do it or not.


=2D-=20
I=C3=B1aki Baz Castillo <[email protected]>
 
M

Marnen Laibow-Koser

Iñaki Baz Castillo wrote:
[...]> This clearly shows that using a constant string is faster than
using a
string
writen into the script. So I wonder: why doesn't Ruby "precompile"
internally
the strings appearing in the script?

Well, it would make garbage collection difficult.
This is, when Ruby interpreter is parsing the script and founds
"hello_world",
couldn't it create the string *just once* and keep in memory forever so
next
time same string is accessed Ruby doesn't need to initiate it?
Is it imposible due to the design of Ruby?

It's not impossible at all: just use symbols instead of strings.
PS: I don't know if other languages (Python, PHP, Perlo...) do it or
not.

I don't think PHP interns strings.

Best,
-- 
Marnen Laibow-Koser
http://www.marnen.org
(e-mail address removed)
 
K

Kirk Haines

Hi, the following code:

-------------------
#!/usr/bin/ruby

require "benchmark"

HELLO_WORLD =3D "hello world"

1.upto(4) do

print "Benchmark using HELLO_WORLD: "
puts Benchmark.realtime { 1.upto(500000) {|i| HELLO_WORLD.upcase }= }

print "Benchmark using \"hello_world\": "
puts Benchmark.realtime { 1.upto(500000) {|i| "hello_world".upcase= }
}

print "Benchmark using 'hello_world': "
puts Benchmark.realtime { 1.upto(500000) {|i| 'hello_world'.upcase= }
}

end
-------------------


This clearly shows that using a constant string is faster than using a
string
writen into the script. So I wonder: why doesn't Ruby "precompile"
internally
the strings appearing in the script?
This is because when you are using the constant, you are referring to the
same object every time.

When you are using the string literals, the interpreter doesn't know what
you are going to do with that string literal, so it's not really safe for i=
t
to assume that it can use a single ruby object to represent all instances o=
f
it. Consequently, it creates a new object each time.

So 500000.times { 'foo' } creates 500000 objects. That's obviously going t=
o
take more time than FOO =3D 'foo'; 500000.times { FOO } as that code just
looks up a constant 500000 times.


Kirk Haines
 
I

Iñaki Baz Castillo

El Domingo, 6 de Diciembre de 2009, Kirk Haines escribi=C3=B3:
=20
This is because when you are using the constant, you are referring to the
same object every time.
=20
When you are using the string literals, the interpreter doesn't know what
you are going to do with that string literal, so it's not really safe for
it to assume that it can use a single ruby object to represent all
instances of it.

Why not? It's obviously a string writen in the script, with no variables in=
to=20
it and so...




=2D-=20
I=C3=B1aki Baz Castillo <[email protected]>
 
I

Iñaki Baz Castillo

El Domingo, 6 de Diciembre de 2009, Marnen Laibow-Koser escribi=C3=B3:
=20
It's not impossible at all: just use symbols instead of strings.

What do you mean with symbols? do you mean using the following?:

puts Benchmark.realtime { 1.upto(500000) {|i| :"hello_world".to_s.upcase =
} }

I expect the same results as when converting the symbol to string (to_s) Ru=
by=20
would generate a new string for each iteration, am I wrong?

Thanks.


=2D-=20
I=C3=B1aki Baz Castillo <[email protected]>
 
M

Marnen Laibow-Koser

Iñaki Baz Castillo said:
El Domingo, 6 de Diciembre de 2009, Marnen Laibow-Koser escribió:

What do you mean with symbols? do you mean using the following?:

puts Benchmark.realtime { 1.upto(500000) {|i|
:"hello_world".to_s.upcase } }

I expect the same results as when converting the symbol to string (to_s)
Ruby
would generate a new string for each iteration, am I wrong?

Only one per iteration -- 'HELLO WORLD'. Your original implementation
would generate two new String objects for each iteration.

Best,
-- 
Marnen Laibow-Koser
http://www.marnen.org
(e-mail address removed)
 
M

Marnen Laibow-Koser

Iñaki Baz Castillo said:
El Domingo, 6 de Diciembre de 2009, Kirk Haines escribió:

Why not? It's obviously a string writen in the script, with no variables
into
it and so...

Irrelevant. Ruby strings are mutable, remember?

In other words, if I do
a = 'hello'
b = 'hello'
a.upcase!

then a is 'HELLO' while b is 'hello'. This would not work as expected
if both 'hello' strings were the same object. It works with symbols
largely because symbols are immutable.

Best,
-- 
Marnen Laibow-Koser
http://www.marnen.org
(e-mail address removed)
 
P

pharrington

Only one per iteration -- 'HELLO WORLD'.  Your original implementation
would generate two new String objects for each iteration.




Best,
-- 
Marnen Laibow-Koserhttp://www.marnen.org
(e-mail address removed)


No, two Strings get created here: one with .to_s and the other
with .upcase
 
B

Brian Candler

Iñaki Baz Castillo said:
Why not? It's obviously a string writen in the script, with no variables
into
it and so...

Suppose you do:

10.times do
puts "hello"
end

The interpreter has no way of knowing that puts does not mutate the
argument passed to it. This is a silly example, but you might have done:

alias :eek:ld_puts :puts
def puts(x)
old_puts x
x.replace "rubbish"
end

So it is forced to create a new string object each time round the loop.

It's a shame that Ruby doesn't have immutable strings. Symbols are the
closest, but they have different semantics to strings.

If the literal syntax "xxx" gave a *frozen* String, then it would be
safe to re-use it. But then if you wanted to append to a string, you'd
have to write:

a = "".dup
a << "stuff"

or perhaps

a = String.new
a << "stuff"

Regards,

Brian.
 
R

Robert Klemme

Suppose you do:

10.times do
puts "hello"
end

The interpreter has no way of knowing that puts does not mutate the
argument passed to it. This is a silly example, but you might have done:

alias :eek:ld_puts :puts
def puts(x)
old_puts x
x.replace "rubbish"
end

So it is forced to create a new string object each time round the loop.

It's a shame that Ruby doesn't have immutable strings. Symbols are the
closest, but they have different semantics to strings.

If the literal syntax "xxx" gave a *frozen* String, then it would be
safe to re-use it. But then if you wanted to append to a string, you'd
have to write:

a = "".dup
a << "stuff"

or perhaps

a = String.new
a << "stuff"

Another thing you could not do with the auto interning (as with Java
String constants):

...each do |whatever|
s = "intro " << whatever << " outro"
store_away(s)
end

Ruby leaves the decision up to you where you want to optimize while
still keeping things nice for other use cases. The code above would
have to look like this if "" and '' would not construct new objects:

...each do |whatever|
s = "intro ".dup << whatever << " outro"
store_away(s)
end

Now, that looks worse IMHO.

Kind regards

robert


PS: Note also that all strings created via "" and '' do share the
internal character buffer until one of them is modified (copy on write)
so it could be more inefficient as it actually is. :)
 
I

Iñaki Baz Castillo

El Domingo, 6 de Diciembre de 2009, Robert Klemme escribi=F3:
=20
Another thing you could not do with the auto interning (as with Java
String constants):
=20
...each do |whatever|
s =3D "intro " << whatever << " outro"
store_away(s)
end
=20
Ruby leaves the decision up to you where you want to optimize while
still keeping things nice for other use cases. The code above would
have to look like this if "" and '' would not construct new objects:
=20
...each do |whatever|
s =3D "intro ".dup << whatever << " outro"
store_away(s)
end
=20
Now, that looks worse IMHO.
=20
Kind regards
=20
robert
=20
=20
PS: Note also that all strings created via "" and '' do share the
internal character buffer until one of them is modified (copy on write)
so it could be more inefficient as it actually is. :)

Ok, thanks a lot to all for so good explanations. 100% understood now :)=20


=2D-=20
I=F1aki Baz Castillo <[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top