Why doesn't Ruby "compile" strings?

Discussion in 'Ruby' started by Iñaki Baz Castillo, Dec 6, 2009.

  1. Hi, the following code:

    =2D------------------
    #!/usr/bin/ruby

    require "benchmark"

    HELLO_WORLD =3D "hello world"

    1.upto(4) do

    print "Benchmark using HELLO_WORLD: "
    puts Benchmark.realtime { 1.upto(500000) {|i| HELLO_WORLD.upcase } }

    print "Benchmark using \"hello_world\": "
    puts Benchmark.realtime { 1.upto(500000) {|i| "hello_world".upcase } }

    print "Benchmark using 'hello_world': "
    puts Benchmark.realtime { 1.upto(500000) {|i| 'hello_world'.upcase } }
    =09
    end
    =2D------------------


    gives these results:

    =2D-------------------
    Benchmark using HELLO_WORLD: 1.1907217502594
    Benchmark using "hello_world": 1.53604388237
    Benchmark using 'hello_world': 0.816991806030273
    Benchmark using HELLO_WORLD: 0.599252462387085
    Benchmark using "hello_world": 0.814466714859009
    Benchmark using 'hello_world': 0.812573194503784
    Benchmark using HELLO_WORLD: 0.595503330230713
    Benchmark using "hello_world": 0.813859701156616
    Benchmark using 'hello_world': 0.813681602478027
    Benchmark using HELLO_WORLD: 0.594272136688232
    Benchmark using "hello_world": 0.815742254257202
    Benchmark using 'hello_world': 0.811828136444092
    =2D-------------------


    Let's take the last result so Ruby is "entirely loaded":

    =2D-------------------
    Benchmark using HELLO_WORLD: 0.594272136688232
    Benchmark using "hello_world": 0.815742254257202
    Benchmark using 'hello_world': 0.811828136444092
    =2D-------------------


    This clearly shows that using a constant string is faster than using a stri=
    ng=20
    writen into the script. So I wonder: why doesn't Ruby "precompile" internal=
    ly=20
    the strings appearing in the script?

    This is, when Ruby interpreter is parsing the script and founds "hello_worl=
    d",=20
    couldn't it create the string *just once* and keep in memory forever so nex=
    t=20
    time same string is accessed Ruby doesn't need to initiate it?
    Is it imposible due to the design of Ruby?

    PS: I don't know if other languages (Python, PHP, Perlo...) do it or not.


    =2D-=20
    I=C3=B1aki Baz Castillo <>
     
    Iñaki Baz Castillo, Dec 6, 2009
    #1
    1. Advertising

  2. Iñaki Baz Castillo wrote:
    [...]> This clearly shows that using a constant string is faster than
    using a
    > string
    > writen into the script. So I wonder: why doesn't Ruby "precompile"
    > internally
    > the strings appearing in the script?


    Well, it would make garbage collection difficult.

    >
    > This is, when Ruby interpreter is parsing the script and founds
    > "hello_world",
    > couldn't it create the string *just once* and keep in memory forever so
    > next
    > time same string is accessed Ruby doesn't need to initiate it?
    > Is it imposible due to the design of Ruby?


    It's not impossible at all: just use symbols instead of strings.

    >
    > PS: I don't know if other languages (Python, PHP, Perlo...) do it or
    > not.


    I don't think PHP interns strings.

    Best,
    -- 
    Marnen Laibow-Koser
    http://www.marnen.org

    --
    Posted via http://www.ruby-forum.com/.
     
    Marnen Laibow-Koser, Dec 6, 2009
    #2
    1. Advertising

  3. Iñaki Baz Castillo

    Kirk Haines Guest

    On Sat, Dec 5, 2009 at 6:48 PM, I=F1aki Baz Castillo <> wrote:

    > Hi, the following code:
    >
    > -------------------
    > #!/usr/bin/ruby
    >
    > require "benchmark"
    >
    > HELLO_WORLD =3D "hello world"
    >
    > 1.upto(4) do
    >
    > print "Benchmark using HELLO_WORLD: "
    > puts Benchmark.realtime { 1.upto(500000) {|i| HELLO_WORLD.upcase }=

    }
    >
    > print "Benchmark using \"hello_world\": "
    > puts Benchmark.realtime { 1.upto(500000) {|i| "hello_world".upcase=

    }
    > }
    >
    > print "Benchmark using 'hello_world': "
    > puts Benchmark.realtime { 1.upto(500000) {|i| 'hello_world'.upcase=

    }
    > }
    >
    > end
    > -------------------
    >
    >
    > This clearly shows that using a constant string is faster than using a
    > string
    > writen into the script. So I wonder: why doesn't Ruby "precompile"
    > internally
    > the strings appearing in the script?
    >
    >

    This is because when you are using the constant, you are referring to the
    same object every time.

    When you are using the string literals, the interpreter doesn't know what
    you are going to do with that string literal, so it's not really safe for i=
    t
    to assume that it can use a single ruby object to represent all instances o=
    f
    it. Consequently, it creates a new object each time.

    So 500000.times { 'foo' } creates 500000 objects. That's obviously going t=
    o
    take more time than FOO =3D 'foo'; 500000.times { FOO } as that code just
    looks up a constant 500000 times.


    Kirk Haines
     
    Kirk Haines, Dec 6, 2009
    #3
  4. El Domingo, 6 de Diciembre de 2009, Kirk Haines escribi=C3=B3:
    > On Sat, Dec 5, 2009 at 6:48 PM, I=C3=B1aki Baz Castillo <> w=

    rote:
    > > Hi, the following code:
    > >
    > > -------------------
    > > #!/usr/bin/ruby
    > >
    > > require "benchmark"
    > >
    > > HELLO_WORLD =3D "hello world"
    > >
    > > 1.upto(4) do
    > >
    > > print "Benchmark using HELLO_WORLD: "
    > > puts Benchmark.realtime { 1.upto(500000) {|i| HELLO_WORLD.upcase=

    }
    > > }
    > >
    > > print "Benchmark using \"hello_world\": "
    > > puts Benchmark.realtime { 1.upto(500000) {|i| "hello_world".upca=

    se
    > > } }
    > >
    > > print "Benchmark using 'hello_world': "
    > > puts Benchmark.realtime { 1.upto(500000) {|i| 'hello_world'.upca=

    se
    > > } }
    > >
    > > end
    > > -------------------
    > >
    > >
    > > This clearly shows that using a constant string is faster than using a
    > > string
    > > writen into the script. So I wonder: why doesn't Ruby "precompile"
    > > internally
    > > the strings appearing in the script?

    >=20
    > This is because when you are using the constant, you are referring to the
    > same object every time.
    >=20
    > When you are using the string literals, the interpreter doesn't know what
    > you are going to do with that string literal, so it's not really safe for
    > it to assume that it can use a single ruby object to represent all
    > instances of it.


    Why not? It's obviously a string writen in the script, with no variables in=
    to=20
    it and so...




    =2D-=20
    I=C3=B1aki Baz Castillo <>
     
    Iñaki Baz Castillo, Dec 6, 2009
    #4
  5. El Domingo, 6 de Diciembre de 2009, Marnen Laibow-Koser escribi=C3=B3:
    > > Is it imposible due to the design of Ruby?

    >=20
    > It's not impossible at all: just use symbols instead of strings.


    What do you mean with symbols? do you mean using the following?:

    puts Benchmark.realtime { 1.upto(500000) {|i| :"hello_world".to_s.upcase =
    } }

    I expect the same results as when converting the symbol to string (to_s) Ru=
    by=20
    would generate a new string for each iteration, am I wrong?

    Thanks.


    =2D-=20
    I=C3=B1aki Baz Castillo <>
     
    Iñaki Baz Castillo, Dec 6, 2009
    #5
  6. Iñaki Baz Castillo wrote:
    > El Domingo, 6 de Diciembre de 2009, Marnen Laibow-Koser escribió:
    >> > Is it imposible due to the design of Ruby?

    >>
    >> It's not impossible at all: just use symbols instead of strings.

    >
    > What do you mean with symbols? do you mean using the following?:
    >
    > puts Benchmark.realtime { 1.upto(500000) {|i|
    > :"hello_world".to_s.upcase } }
    >
    > I expect the same results as when converting the symbol to string (to_s)
    > Ruby
    > would generate a new string for each iteration, am I wrong?


    Only one per iteration -- 'HELLO WORLD'. Your original implementation
    would generate two new String objects for each iteration.

    >
    > Thanks.


    Best,
    -- 
    Marnen Laibow-Koser
    http://www.marnen.org

    --
    Posted via http://www.ruby-forum.com/.
     
    Marnen Laibow-Koser, Dec 6, 2009
    #6
  7. Iñaki Baz Castillo wrote:
    > El Domingo, 6 de Diciembre de 2009, Kirk Haines escribió:
    >> > 1.upto(4) do
    >> > puts Benchmark.realtime { 1.upto(500000) {|i| 'hello_world'.upcase
    >> > the strings appearing in the script?

    >>
    >> This is because when you are using the constant, you are referring to the
    >> same object every time.
    >>
    >> When you are using the string literals, the interpreter doesn't know what
    >> you are going to do with that string literal, so it's not really safe for
    >> it to assume that it can use a single ruby object to represent all
    >> instances of it.

    >
    > Why not? It's obviously a string writen in the script, with no variables
    > into
    > it and so...


    Irrelevant. Ruby strings are mutable, remember?

    In other words, if I do
    a = 'hello'
    b = 'hello'
    a.upcase!

    then a is 'HELLO' while b is 'hello'. This would not work as expected
    if both 'hello' strings were the same object. It works with symbols
    largely because symbols are immutable.

    Best,
    -- 
    Marnen Laibow-Koser
    http://www.marnen.org

    --
    Posted via http://www.ruby-forum.com/.
     
    Marnen Laibow-Koser, Dec 6, 2009
    #7
  8. Iñaki Baz Castillo

    pharrington Guest

    On Dec 5, 9:29 pm, Marnen Laibow-Koser <> wrote:
    > Iñaki Baz Castillo wrote:
    > > El Domingo, 6 de Diciembre de 2009, Marnen Laibow-Koser escribió:
    > >> > Is it imposible due to the design of Ruby?

    >
    > >> It's not impossible at all: just use symbols instead of strings.

    >
    > > What do you mean with symbols? do you mean using the following?:

    >
    > >   puts Benchmark.realtime { 1.upto(500000) {|i|
    > > :"hello_world".to_s.upcase } }

    >
    > > I expect the same results as when converting the symbol to string (to_s)
    > > Ruby
    > > would generate a new string for each iteration, am I wrong?

    >
    > Only one per iteration -- 'HELLO WORLD'.  Your original implementation
    > would generate two new String objects for each iteration.
    >
    >
    >
    > > Thanks.

    >
    > Best,
    > -- 
    > Marnen Laibow-Koserhttp://www.marnen.org
    >
    > --
    > Posted viahttp://www.ruby-forum.com/.



    No, two Strings get created here: one with .to_s and the other
    with .upcase
     
    pharrington, Dec 6, 2009
    #8
  9. Iñaki Baz Castillo wrote:
    >> When you are using the string literals, the interpreter doesn't know what
    >> you are going to do with that string literal, so it's not really safe for
    >> it to assume that it can use a single ruby object to represent all
    >> instances of it.

    >
    > Why not? It's obviously a string writen in the script, with no variables
    > into
    > it and so...


    Suppose you do:

    10.times do
    puts "hello"
    end

    The interpreter has no way of knowing that puts does not mutate the
    argument passed to it. This is a silly example, but you might have done:

    alias :eek:ld_puts :puts
    def puts(x)
    old_puts x
    x.replace "rubbish"
    end

    So it is forced to create a new string object each time round the loop.

    It's a shame that Ruby doesn't have immutable strings. Symbols are the
    closest, but they have different semantics to strings.

    If the literal syntax "xxx" gave a *frozen* String, then it would be
    safe to re-use it. But then if you wanted to append to a string, you'd
    have to write:

    a = "".dup
    a << "stuff"

    or perhaps

    a = String.new
    a << "stuff"

    Regards,

    Brian.
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Dec 6, 2009
    #9
  10. On 06.12.2009 11:17, Brian Candler wrote:
    > Iñaki Baz Castillo wrote:
    >>> When you are using the string literals, the interpreter doesn't know what
    >>> you are going to do with that string literal, so it's not really safefor
    >>> it to assume that it can use a single ruby object to represent all
    >>> instances of it.

    >> Why not? It's obviously a string writen in the script, with no variables
    >> into
    >> it and so...

    >
    > Suppose you do:
    >
    > 10.times do
    > puts "hello"
    > end
    >
    > The interpreter has no way of knowing that puts does not mutate the
    > argument passed to it. This is a silly example, but you might have done:
    >
    > alias :eek:ld_puts :puts
    > def puts(x)
    > old_puts x
    > x.replace "rubbish"
    > end
    >
    > So it is forced to create a new string object each time round the loop.
    >
    > It's a shame that Ruby doesn't have immutable strings. Symbols are the
    > closest, but they have different semantics to strings.
    >
    > If the literal syntax "xxx" gave a *frozen* String, then it would be
    > safe to re-use it. But then if you wanted to append to a string, you'd
    > have to write:
    >
    > a = "".dup
    > a << "stuff"
    >
    > or perhaps
    >
    > a = String.new
    > a << "stuff"


    Another thing you could not do with the auto interning (as with Java
    String constants):

    ...each do |whatever|
    s = "intro " << whatever << " outro"
    store_away(s)
    end

    Ruby leaves the decision up to you where you want to optimize while
    still keeping things nice for other use cases. The code above would
    have to look like this if "" and '' would not construct new objects:

    ...each do |whatever|
    s = "intro ".dup << whatever << " outro"
    store_away(s)
    end

    Now, that looks worse IMHO.

    Kind regards

    robert


    PS: Note also that all strings created via "" and '' do share the
    internal character buffer until one of them is modified (copy on write)
    so it could be more inefficient as it actually is. :)

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Dec 6, 2009
    #10
  11. pharrington wrote:
    > On Dec 5, 9:29�pm, Marnen Laibow-Koser <> wrote:
    >> > I expect the same results as when converting the symbol to string (to_s)

    >> Best,
    >> --�
    >> Marnen�Laibow-Koserhttp://www.marnen.org
    >>
    >> --
    >> Posted viahttp://www.ruby-forum.com/.

    >
    >
    > No, two Strings get created here: one with .to_s and the other
    > with .upcase


    Quite right. What the hell was I thinking? :)

    Best,
    -- 
    Marnen Laibow-Koser
    http://www.marnen.org

    --
    Posted via http://www.ruby-forum.com/.
     
    Marnen Laibow-Koser, Dec 6, 2009
    #11
  12. El Domingo, 6 de Diciembre de 2009, Robert Klemme escribi=F3:
    > On 06.12.2009 11:17, Brian Candler wrote:
    > > I=F1aki Baz Castillo wrote:
    > >>> When you are using the string literals, the interpreter doesn't know
    > >>> what you are going to do with that string literal, so it's not really
    > >>> safe for it to assume that it can use a single ruby object to represe=

    nt
    > >>> all instances of it.
    > >>
    > >> Why not? It's obviously a string writen in the script, with no variabl=

    es
    > >> into
    > >> it and so...

    > >
    > > Suppose you do:
    > >
    > > 10.times do
    > > puts "hello"
    > > end
    > >
    > > The interpreter has no way of knowing that puts does not mutate the
    > > argument passed to it. This is a silly example, but you might have done:
    > >
    > > alias :eek:ld_puts :puts
    > > def puts(x)
    > > old_puts x
    > > x.replace "rubbish"
    > > end
    > >
    > > So it is forced to create a new string object each time round the loop.
    > >
    > > It's a shame that Ruby doesn't have immutable strings. Symbols are the
    > > closest, but they have different semantics to strings.
    > >
    > > If the literal syntax "xxx" gave a *frozen* String, then it would be
    > > safe to re-use it. But then if you wanted to append to a string, you'd
    > > have to write:
    > >
    > > a =3D "".dup
    > > a << "stuff"
    > >
    > > or perhaps
    > >
    > > a =3D String.new
    > > a << "stuff"

    >=20
    > Another thing you could not do with the auto interning (as with Java
    > String constants):
    >=20
    > ...each do |whatever|
    > s =3D "intro " << whatever << " outro"
    > store_away(s)
    > end
    >=20
    > Ruby leaves the decision up to you where you want to optimize while
    > still keeping things nice for other use cases. The code above would
    > have to look like this if "" and '' would not construct new objects:
    >=20
    > ...each do |whatever|
    > s =3D "intro ".dup << whatever << " outro"
    > store_away(s)
    > end
    >=20
    > Now, that looks worse IMHO.
    >=20
    > Kind regards
    >=20
    > robert
    >=20
    >=20
    > PS: Note also that all strings created via "" and '' do share the
    > internal character buffer until one of them is modified (copy on write)
    > so it could be more inefficient as it actually is. :)


    Ok, thanks a lot to all for so good explanations. 100% understood now :)=20


    =2D-=20
    I=F1aki Baz Castillo <>
     
    Iñaki Baz Castillo, Dec 6, 2009
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Axel Bock
    Replies:
    9
    Views:
    350
    Peter Otten
    Nov 12, 2004
  2. Mr. SweatyFinger

    why why why why why

    Mr. SweatyFinger, Nov 28, 2006, in forum: ASP .Net
    Replies:
    4
    Views:
    901
    Mark Rae
    Dec 21, 2006
  3. Mr. SweatyFinger
    Replies:
    2
    Views:
    1,961
    Smokey Grindel
    Dec 2, 2006
  4. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    767
    Malcolm
    Jun 24, 2006
  5. Nagaraj
    Replies:
    1
    Views:
    870
    Lionel B
    Mar 1, 2007
Loading...

Share This Page