Regexp, String, Symbol literals' object_ids

Discussion in 'Ruby' started by Pavel R., Dec 19, 2010.

  1. Pavel R.

    Pavel R. Guest

    Regexp literals:
    5.times { p /abcdasdf/.object_id } -> same!

    String literals:
    5.times { 'asdasdf'.object_id } -> different

    Symbols:
    5.times { p :asdfsf.object_id } -> same!

    Symbols with to_s:
    5.times { p :asdsdfsdf.to_s.object_id } -> different

    Predefined string as a constant
    CONS = 'asdfsdf'
    5.times { p CONS.object_id } -> same! (sure)

    Question:
    Is there some special syntax for string literals ("asdfasdf") to behave
    like /sadfsdf/ as in the examples above? Without predefining a string as
    a constant's value. Or another elegant way to achieve the same goal?

    --
    Posted via http://www.ruby-forum.com/.
    Pavel R., Dec 19, 2010
    #1
    1. Advertising

  2. Ruby has both mutable and immutable strings. A mutable string is
    declared as "string". An immutable string is declared as :string and in
    ruby is called a 'symbol'. So, no, there is no way for "string" to
    behave as :string, since that's by design. Well there is a way but I'd
    not go there :)

    If you want two equivalent string literals to point at the same
    instance, use the symbol notation, as in:

    :test.object_id == :test.object_id #true

    --
    Andrea Dallera
    http://github.com/bolthar/freightrain
    http://usingimho.wordpress.com



    Andrea

    Il 19/12/2010 21:07, Pavel R. ha scritto:
    > Regexp literals:
    > 5.times { p /abcdasdf/.object_id } -> same!
    >
    > String literals:
    > 5.times { 'asdasdf'.object_id } -> different
    >
    > Symbols:
    > 5.times { p :asdfsf.object_id } -> same!
    >
    > Symbols with to_s:
    > 5.times { p :asdsdfsdf.to_s.object_id } -> different
    >
    > Predefined string as a constant
    > CONS = 'asdfsdf'
    > 5.times { p CONS.object_id } -> same! (sure)
    >
    > Question:
    > Is there some special syntax for string literals ("asdfasdf") to behave
    > like /sadfsdf/ as in the examples above? Without predefining a string as
    > a constant's value. Or another elegant way to achieve the same goal?
    >
    Andrea Dallera, Dec 19, 2010
    #2
    1. Advertising

  3. Pavel R.

    Pavel R. Guest

    Andrea Dallera wrote in post #969438:
    > Well there is a way but I'd
    > not go there :)


    Digging into parse.y and other ruby core files?

    --
    Posted via http://www.ruby-forum.com/.
    Pavel R., Dec 19, 2010
    #3
  4. Pavel R.

    Quintus Guest

    Am 19.12.2010 21:07, schrieb Pavel R.:
    > Regexp literals:
    > 5.times { p /abcdasdf/.object_id } -> same!
    >


    How is this possible? For every time the loop is executed there should a
    new regexp be created... Have a look at this which seems confusing to me:

    #ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
    irb(main):001:0> 5.times { p /abcdasdf/.object_id }
    8030280
    8030280
    8030280
    8030280
    8030280
    => 5
    irb(main):002:0> p /abcdasdf/.object_id
    8049600
    => 8049600
    irb(main):003:0> p /abcdasdf/.object_id
    8063560
    => 8063560
    irb(main):004:0>

    The regexp in the loop always stays the same, but if I create some
    others outside the loop, they get a different object ID? Can anybody
    shade some light on this?

    Valete,
    Marvin
    Quintus, Dec 19, 2010
    #4
  5. Pavel R.

    Pavel R. Guest

    How about this? I have just discovered

    pavel@pavel-laptop:~/dev/binexp$ ~/usr/ruby19/bin/irb
    irb(main):001:0> /(?<digits>\d+)/ =~ 'abc123def'
    => 3
    irb(main):002:0> digits
    => "123"
    irb(main):003:0>

    According to `ri Regexp#=~`

    If =~ is used with a regexp literal with named captures, captured
    strings (or nil) is assigned to local variables named by the capture
    names.

    /(?<lhs>\w+)\s*=\s*(?<rhs>\w+)/ =~ " x = y "
    p lhs #=> "x"
    p rhs #=> "y"

    If it is not matched, nil is assigned for the variables.

    /(?<lhs>\w+)\s*=\s*(?<rhs>\w+)/ =~ " x = "
    p lhs #=> nil
    p rhs #=> nil

    This assignment is implemented in the Ruby parser. The parser detects
    'regexp-literal =~ expression' for the assignment. The regexp must be a
    literal without interpolation and placed at left hand side.

    =======>

    It seems Ruby parser can do some magic things!

    --
    Posted via http://www.ruby-forum.com/.
    Pavel R., Dec 19, 2010
    #5
  6. Pavel R.

    Abinoam Jr. Guest

    Try

    ruby-1.9.2-head > 5.times { p /thesame/.object_id.to_s + ' ' +
    /thesame/.object_id.to_s}
    "21522620 21522400"
    "21522620 21522400"
    "21522620 21522400"
    "21522620 21522400"
    "21522620 21522400"
    => 5
    ruby-1.9.2-head > 5.times { p 'thesame'.object_id.to_s + ' ' +
    'thesame'.object_id.to_s}
    "21553480 21553400"
    "21553320 21553240"
    "21553160 21553080"
    "21553000 21552920"
    "21552840 21552760"
    => 5
    ruby-1.9.2-head >

    What is the logic behind this?

    On Sun, Dec 19, 2010 at 6:55 PM, Quintus <> wrote:
    > Am 19.12.2010 21:07, schrieb Pavel R.:
    >> Regexp literals:
    >> 5.times { p /abcdasdf/.object_id } -> same!
    >>

    >
    > How is this possible? For every time the loop is executed there should a
    > new regexp be created... Have a look at this which seems confusing to me:
    >
    > #ruby -v: ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
    > irb(main):001:0> 5.times { p /abcdasdf/.object_id }
    > 8030280
    > 8030280
    > 8030280
    > 8030280
    > 8030280
    > => 5
    > irb(main):002:0> p /abcdasdf/.object_id
    > 8049600
    > => 8049600
    > irb(main):003:0> p /abcdasdf/.object_id
    > 8063560
    > => 8063560
    > irb(main):004:0>
    >
    > The regexp in the loop always stays the same, but if I create some
    > others outside the loop, they get a different object ID? Can anybody
    > shade some light on this?
    >
    > Valete,
    > Marvin
    >
    >
    Abinoam Jr., Dec 20, 2010
    #6
  7. On Sun, Dec 19, 2010 at 8:01 PM, Abinoam Jr. <> wrote:
    > Try
    >
    > ruby-1.9.2-head > 5.times { p /thesame/.object_id.to_s + ' ' =A0+
    > /thesame/.object_id.to_s}
    > "21522620 21522400"
    > "21522620 21522400"
    > "21522620 21522400"
    > "21522620 21522400"
    > "21522620 21522400"
    > =A0=3D> 5
    > ruby-1.9.2-head > 5.times { p 'thesame'.object_id.to_s + ' ' =A0+
    > 'thesame'.object_id.to_s}
    > "21553480 21553400"
    > "21553320 21553240"
    > "21553160 21553080"
    > "21553000 21552920"
    > "21552840 21552760"
    > =A0=3D> 5
    > ruby-1.9.2-head >
    >
    > What is the logic behind this?
    >


    A regular expression literal like /thesameobject/ causes a Regexp
    object to be instantiated at PARSE time.

    Since regular expressions are immutable (as contrasted with strings or
    arrays which also have a literal representation), it really doesn't
    mattter if a new regeexp is created each time the expression
    containing the literal is evaluated. The fact that two occurrences of
    an apparently 'equal' regular expression generate two different
    instances simply reflects the fact that the parser doesn't attempt to
    consolidate equal literals.

    In the case of a string literal, since strings are mutable, a new
    string instance is created each time the expression containing the
    string literal is evaluated.


    --=20
    Rick DeNatale

    Blog: http://talklikeaduck.denhaven2.com/
    Github: http://github.com/rubyredrick
    Twitter: @RickDeNatale
    WWR: http://www.workingwithrails.com/person/9021-rick-denatale
    LinkedIn: http://www.linkedin.com/in/rickdenatale
    Rick DeNatale, Dec 20, 2010
    #7
  8. Andrea Dallera wrote in post #969438:
    > Ruby has both mutable and immutable strings. A mutable string is
    > declared as "string". An immutable string is declared as :string and in
    > ruby is called a 'symbol'. So, no, there is no way for "string" to
    > behave as :string, since that's by design.


    This is a very misleading description, so I'll bite.

    Strings and Symbols are two completely different things in Ruby. In Ruby
    1.9, Symbols have gained some more string-like behaviour(*), but they're
    still fundamentally different.

    Symbols are objects intended for labelling things (e.g. method names,
    hash keys). The main property of Symbols is that there only ever exists
    one Symbol object which represents the same label, i.e. the same
    sequence of characters.(**)

    So when your program loads, and it uses the symbol :foo, which hasn't
    been used before, then a new symbol called :foo is created in the symbol
    table. But every other future use of :foo always returns the same
    object.

    This makes symbols very cheap to test for equality, because:

    * Two symbols are the same iff they have the same object_id
    * Two symbols are different iff they have different object_id

    So testing equality between :a_very_long_symbol_like_this and
    :another_very_long_symbol is only comparing their object_ids, basically
    two integers.

    The property that any future :foo must return the same object_id means
    that the Symbol table is never garbage-collected. A Symbol is for life,
    not just for Christmas.

    Strings are collections of bytes/characters. They can be mutated. There
    can be many String objects in the system which contain the same sequence
    of bytes/characters. Therefore, comparing two Strings always has to be
    done byte-by-byte.(***)

    In general, what you want is a String. If you're reading data from a
    user (e.g. on STDIN or a web-page POST) then it comes in as a String.
    You can convert a String into a Symbol represented by the same set of
    characters:

    a = "foo"
    b = a.intern # b = :foo

    but this can be a dangerous thing to do if the string you are converting
    came from an untrusted source, because it can lead to a simple
    denial-of-service attack as the user floods your symbol table with
    garbage.

    So to summarise, Symbols are used as method names:

    a = 1
    b = a.send:)+, 2) # b = a + 2

    and are often used as hash keys, because the lookup operations are
    cheaper.

    def doit(params)
    puts params[:foo]
    puts params[:bar]
    end

    doit:)foo=>123, :bar=>456)

    If coming from a language like C, think of symbols more as enums rather
    than strings, where the programmer is using an easy-to-read label like
    :foo, but the underlying value is actually a number.

    HTH,

    Brian.


    (*) Example from ruby 1.8:
    >> :foo.size

    NoMethodError: undefined method `size' for :foo:Symbol
    from (irb):2
    from :0

    But:

    1.9.2-p0 > :foo.size
    => 3

    (**) Everything you say about Strings or Symbols in 1.9 has to be
    qualified, because it's such a complex area. Suffice to say, in 1.9 it's
    possible to have two distinct Symbols which are labelled by the same
    series of bytes but with different encodings.

    Things are far simpler in ruby 1.8, where bytes are real bytes, and
    small furry creatures from Alpha Centuri are real small furry creates
    from Alpha Centuri.

    (***) There are in fact some optimisations whereby two distinct string
    objects can share the same underlying data buffer, with copy-on-write.
    But in general comparing strings needs to compare the buffers.

    And even though ruby 1.9 has strings of characters, the comparisons
    *are* done byte-by-byte, not character by character.

    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Dec 20, 2010
    #8
  9. Pavel R.

    Pavel R. Guest

    Ok. But initial question was slightly different.

    Can I write something like

    %c(string i do not want to be created again and again, and i do not want
    to define it as a constant because it is used once in a code)

    ?

    What is a reason if it is impossible in Ruby? It seems to be useful!

    --
    Posted via http://www.ruby-forum.com/.
    Pavel R., Dec 20, 2010
    #9
  10. Pavel R.

    Abinoam Jr. Guest

    [Advice... I'm new at this... be patient]

    Isn't it "%q" ?

    I couldn't figure out a "need" of this that wouldn't fit something like...

    holding_var = %q(string i do not want to be created again and again,
    and i do not want to define it as a constant because it is used once
    in a code)

    5.times { puts "holding_var = #{holding_var} and its object_id is
    #{holding_var.object_id}" }

    If the string is short, you could even use a var name that ressembles
    the string.

    created_once = "created_once"

    Could you show an example? ("used once in a code" vs. "created again and again")

    Abinoam Jr.

    On Mon, Dec 20, 2010 at 6:50 PM, Pavel R. <> wrote:
    > Ok. But initial question was slightly different.
    >
    > Can I write something like
    >
    > %c(string i do not want to be created again and again, and i do not want
    > to define it as a constant because it is used once in a code)
    >
    > ?
    >
    > What is a reason if it is impossible in Ruby? It seems to be useful!
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >
    Abinoam Jr., Dec 21, 2010
    #10
  11. Pavel R.

    Pavel R. Guest

    Example:

    class A
    def m
    a,b,c,d = data.unpack('NNvv')
    e,f,g,h = a.unpack('vNNv')
    # and so on ...
    # do something with data
    end
    end

    To prevent 'NNvv' again and again I can assign a constant

    class A
    Format_NNvv = 'NNvv'
    end

    and use it.

    But there's many different 'NNvv', 'vNNv', 'a*c', ... in my code so I
    need to assign all them to constants. This approach implies a large
    section of assigning constants.

    --
    Posted via http://www.ruby-forum.com/.
    Pavel R., Dec 21, 2010
    #11
  12. Pavel R.

    Pavel R. Guest

    > What kind of "problem" are going to need do so many packs/unpacks that
    > this part (constant or string) will make much difference?


    Working with binary protocols. Smth. like
    https://github.com/pavelrosputko/em-oscar/blob/master/em-oscar/icbm.rb

    Much difference? Actually not so much in icbm.rb source above.

    But one wrote at http://redmine.ruby-lang.org/issues/show/4184#note-3

    I've been able to get 2-3% improvements in Rails apps by simply
    rewriting some 'constant's and inline Arrays as CONSTANTs.

    I have patches to MRI that use cached, immutable Strings for the
    internal #to_s messages on immutable objects; e.g. changing Symbol#to_s,
    Float#to_s, Bignum#to_s, Rational#to_s, etc. to return the same frozen
    String instance. I measured 1-6% performance improvement in the
    standard MRI tests.

    --
    Posted via http://www.ruby-forum.com/.
    Pavel R., Dec 22, 2010
    #12
  13. On Wed, Dec 22, 2010 at 10:56 AM, Pavel R. <> wrote:
    >
    > I've been able to get 2-3% improvements in Rails apps by simply
    > rewriting some 'constant's and inline Arrays as CONSTANTs.


    2-3% of what? If it's 200ms, the gains are much less impressive when
    compared to 2000ms.

    --
    Phillip Gawlowski

    Though the folk I have met,
    (Ah, how soon!) they forget
    When I've moved on to some other place,
    There may be one or two,
    When I've played and passed through,
    Who'll remember my song or my face.
    Phillip Gawlowski, Dec 22, 2010
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John Goche
    Replies:
    8
    Views:
    16,430
  2. baumann@pan
    Replies:
    1
    Views:
    732
    Richard Bos
    Apr 15, 2005
  3. Pavel Smerk
    Replies:
    3
    Views:
    88
    Timothy Goddard
    Aug 27, 2006
  4. Song Ma
    Replies:
    2
    Views:
    221
    Charles Oliver Nutter
    Jul 20, 2008
  5. Joao Silva
    Replies:
    16
    Views:
    343
    7stud --
    Aug 21, 2009
Loading...

Share This Page