What is the difference between :foo and "foo" ?

Y

Yohanes Santoso

James Britt said:
Question: Would using a constant be equally suitable for expressing
intention, and (possibly) less error-prone?

I would say that if your language does not provide a means for you to
define your own identifier, then it would be acceptable.

But this seems pointless in ruby.

# Assume ConstUtils.next_value
# ensures unique values
HOST = ConstUtils.next_value
PORT = ConstUtils.next_value

For discussion simplicity sake, I'd just assume next_value returning
integers.
foo1 = {
HOST => 'localhost',
PORT => 80
}

p foo1 ==> {32=>"localhost", 238=>80}

That makes debugging difficult. Since the value of HOST/PORT is
volatile (it could change depending on how next_value generates the
integers, and also if you, say, insert 'DOMAIN =
ConstUtils.next_value' between HOST and PORT assignments),
understanding past debugging outputs(e.g., in a log file) would be
harder as well.
A downside to using symbols as constants is that this will not raise
any exceptions:


foo1 = {
:hots => 'localhost',
:prt => 80
}

True, typos are such a hard error to prevent. Fortunately, there are
ways around that.


<example>
module ConfigIdentifiers
[:host, :port].each{|x| const_set(x.to_s.upcase, x)}
end

include ConfigIdentifiers
foo1 = { HOST => 'localhost', PORT => 80}
</example>

is a solution.

But my experience as someone who has made many typos such as that (and
still is making them) is my test cases usually catch them and those
that manage to elude are easily identified (and corrected) manually.


YS.
 
A

ara.t.howard

NORTH = :NORTH
SOUTH = :SOUTH
EAST = :EAST
WEST = :WEST


%w( NORTH SOUTH EAST WEST ).each{|c| const_set c, c}

;-)


-a
--
===============================================================================
| ara [dot] t [dot] howard [at] noaa [dot] gov
| all happiness comes from the desire for others to be happy. all misery
| comes from the desire for oneself to be happy.
| -- bodhicaryavatara
===============================================================================
 
J

Jim Weirich

Jim said:
True, but when used properly, this is rarely a concern. If they are
used as programmer names for things, then the number of symbols is
finite and not likely to grow and consume memory as the program runs.

Here I am replying to my own posting, but I think this point could use
some elaboration.

Why are symbols not garbage collected? Because a symbol represents a
mapping from a string name to a unique object. Anytime in the execution
of the program, if that name is used for a symbol, the original symbol
object must be returned. If the symbol is garbage collected, then a
later reference to the symbol name will return a different object.
That's generally frowned upon (although I don't really see the harm. If
the original symbol was GC'ed, nobody cared what the original object was
anyways. But that's the way it works).

This might be one area where the "Symbol isa immutable string" meme
might be doing some real harm. We if think of symbols as strings, then
we tend to build symbols dynamically like we do strings. This is when
the memory leak" problem of symbols becomes a problem.

Here's a rule of thumb ... if a programmer never sees the symbol name in
the code base, then you probably should be using a string rather than a
symbol.

I'm not sure if this helped, or just muddied the water more.
 
J

James Britt

Johannes Friestad wrote:
...
HOST = :host
PORT = :port
foo1 = {
HOST => 'localhost',
PORT => 80
}

which is equally safe, simpler to read, doesn't need the ConstUtil,
and automagically gives a constant with a readable string value.

My hack was a quicky to flesh out the example. After sending it I
thought about assigning symbols. My main point was that if one mistypes
a symbol name, Ruby doesn't care. Unit tests should catch this, but
using constants might just help things along because of the immediate
error. And it might more clearly express intent.


James

--

http://www.ruby-doc.org - Ruby Help & Documentation
http://www.artima.com/rubycs/ - Ruby Code & Style: Writers wanted
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys
http://www.30secondrule.com - Building Better Tools
 
G

gwtmp01

An object with a name seems a good way to put it - maybe 'an object
that is a name, by which it can be referenced anywhere'.

At the new_haven.rb December meeting, I gave a short presentation on
Symbols
versus Strings. I described Symbols as:

Named numbers, you pick the name, Ruby picks the number

The number that Ruby picks is effectively an index into an internal
table
with some bit shifting and masking to encode the index into a 32-bit
Ruby
reference value.

The internal table gets new entries when a symbol literal is parsed
or when String#to_sym is called.


Gary Wright
 
M

Mauricio Fernandez

Jim Weirich wrote:
Why are symbols not garbage collected? Because a symbol represents a
mapping from a string name to a unique object. Anytime in the execution
of the program, if that name is used for a symbol, the original symbol
object must be returned. If the symbol is garbage collected, then a
later reference to the symbol name will return a different object.
That's generally frowned upon (although I don't really see the harm. If
the original symbol was GC'ed, nobody cared what the original object was
anyways. But that's the way it works).

Keep in mind that symbols are immediate values backed by
the global_symbols table (actually global_symbols.tbl and
global_symbols.rev, for the name => id and id => name associations
respectively). Since the lower bits encode information like ID_LOCAL,
ID_INSTANCE, etc., symbol values cannot point to the appropriate entry
in global_symbols the same way VALUEs for normal objects point to RVALUE
slots in the object heaps. [1]

During the mark phase, the stack must be scanned for references to live
objects. It's easy to verify if an unsigned (long) long seems to point
to an object, by seeing if the address falls into the area occupied by
some heap and actually corresponds to a slot. In order to mark symbol entries
in global_symbols, a lookup in global_symbols.rev would be needed for each
word in the stack. I conjecture that this would be relatively expensive, but
there are probably better reasons for the current behavior (it's too late to
read the sources in detail though :)...

[1] Even if those bits were not used, another level of indirection would
be needed due to the way the hash table works.
 
M

Mauricio Fernandez

i never consider that as an impl - i bet your right though... time for me to
read the source.

I think :sym.hash uses the method definition from Kernel (inherited through
Object) and :sym.eql?:)foo) would also reuse the existing definition.

[see C code at the bottom]

So there's seemingly no reason for symbol hashing/comparison not to be
faster than for strings. The benchmarks you linked to, as well as Jim's,
use relatively short strings, but one can exaggerate the effect:


# >> Strings Filling
# >> 5.800000 0.000000 5.800000 ( 6.302649)
# >> Strings Fetching
# >> 3.120000 0.010000 3.130000 ( 3.404679)
# >>
# >> Symbols Filling
# >> 2.120000 0.000000 2.120000 ( 2.326393)
# >> Symbols Fetching
# >> 0.640000 0.000000 0.640000 ( 0.700178)


#!/usr/bin/env ruby

require 'benchmark'

SIZE = 100
N = 10000

RUBY_VERSION # => "1.8.4"

def make_str_keys
(1..SIZE).collect { |i| "long key" * i}
end

def make_sym_keys(strs)
strs.collect { |s| s.intern }
end

def populate(keys)
result = {}
keys.each_with_index do |k, i|
result[k] = i
end
result
end

def fetch(keys, hash)
keys.each do |key| hash[key] end
end

strs = make_str_keys
syms = make_sym_keys(strs)

str_hash = populate(strs)
sym_hash = populate(syms)

puts "Strings Filling"
puts Benchmark.measure {
N.times do
populate(strs)
end
}

puts "Strings Fetching"
puts Benchmark.measure {
N.times do
fetch(strs, str_hash)
end
}

puts
puts "Symbols Filling"
puts Benchmark.measure {
N.times do
populate(syms)
end
}

puts "Symbols Fetching"
puts Benchmark.measure {
N.times do
fetch(syms, sym_hash)
end
}


rb_define_method(rb_mKernel, "hash", rb_obj_id, 0);
...

VALUE
rb_obj_id(VALUE obj)
{
if (SPECIAL_CONST_P(obj)) {
return LONG2NUM((long)obj);
}
return (VALUE)((long)obj|FIXNUM_FLAG);
}


rb_define_method(rb_mKernel, "eql?", rb_obj_equal, 1);
...

static VALUE
rb_obj_equal(VALUE obj1, VALUE obj2)
{
if (obj1 == obj2) return Qtrue;
return Qfalse;
}
 
E

Eero Saynatkari

Yohanes said:
What a coincidence. Seems like Jim and I finally had enough of people
conflating symbols and immutable strings on the same day.

http://microjet.ath.cx/WebWiki/2005.12.27_UsingSymbolsForTheWrongReason.html

While, technically speaking, both of you are absolutely and
undeniably correct (with one correction: Symbols are not Strings),
such a conflation is the best way to get someone over the
initial confusion. As this thread quite well demonstrates,
a definition for Symbols is quite difficult to come up with.

To paraphrase fifteen thousand fourty-three mediocre
comedians over the last three centuries:

"A Symbol is like a word, a sentence, a phrase, a
description or, perhaps, a name. Except sometimes."


E
 
J

Johannes Friestad

i see this claim all the time but never data supporting it, all my test
programs have shown the opposite to be true.

Along with Jim and Mauricio, my tests indicate that symbols are
consistently quicker, even on short strings.

Here's my benchmark
-------
def bmark_string_symb
require 'benchmark'
strings, symbols=3D[], []
n, m=3D100, 1000
hash=3D{}
n.times {|x| strings<<strings<<x.to_s+"key"}
strings.each {|s| symbols<<s.to_sym}
# initialize hash
strings.each {|s| hash=3D1}
symbols.each {|s| hash=3D1}
Benchmark.bm(10) do |b|
b.report("string set") { m.times {|x| strings.each {|s| hash=3Dx}}=
}
b.report("symbol set") { m.times {|x| symbols.each {|s| hash=3Dx}}=
}
b.report("string get") { m.times {|x| strings.each {|s| hash}}}
b.report("symbol get") { m.times {|x| symbols.each {|s| hash}}}
end
end
-------

and here are some results:
-------
irb(main):080:0> bmark_string_symb
user system total real
string set 0.219000 0.016000 0.235000 ( 0.235000)
symbol set 0.141000 0.000000 0.141000 ( 0.141000)
string get 0.078000 0.000000 0.078000 ( 0.078000)
symbol get 0.047000 0.000000 0.047000 ( 0.047000)
=3D> true
=3D> true
irb(main):083:0> bmark_string_symb
user system total real
string set 0.234000 0.000000 0.234000 ( 0.235000)
symbol set 0.063000 0.000000 0.063000 ( 0.062000)
string get 0.078000 0.000000 0.078000 ( 0.078000)
symbol get 0.047000 0.000000 0.047000 ( 0.047000)
=3D> true
-------


There's a fair amount of variation, but symbols appear to behave as
expected (quicker on average), meaning that my guess that symbol
lookup in hashes was done on the basis of their string value was
wrong.
I guess I should learn to refrain from speculating until I've checked close=
r :)

jf
 
D

Devin Mullins

Jim said:
I'm not sure if this helped, or just muddied the water more.
When I first was trying to learn about symbols, attempts to explain
their "intentions" (as names of things, for example), rather than to
explain what they are, just muddied the water for me. Sure, give me some
advice on when and when not to use them, but also, tell me what they
are, so I can decide for myself:
- Like a string, but:
- Object-level equality, so :foo.equal?:)foo) and not 'foo'.equal?('foo')
- That means, with strings, if you say 'foo' 5 times in the code, you're
creating 5 string objects, whereas with symbols, you're only creating one.
- Share many properties with Fixnum (both being "immediate")--
immutable, no singleton classes, object equality, not picked up by
ObjectSpace.each_object...
- Not garbage collected.
- Looks cooler -- by using a different syntax, you can give some visual
distinction between your keys and your values, for instance.

(Also, Johannes Friestad's explanation was pretty good, IMO.)

Devin
 
R

Rich Morin

- That means, with strings, if you say 'foo' 5 times in the code, you're
creating 5 string objects, whereas with symbols, you're only creating one.

I can understand this as a possible implementation, but I
really wonder about the follow-on statement that I've seen,
saying that this can cause memory leaks. If I define a
string and then discard it, shouldn't it be GCed?

-r
 
A

ara.t.howard

Along with Jim and Mauricio, my tests indicate that symbols are
consistently quicker, even on short strings.

<snip test>

that way well be true now. however, if you look at my test it goes to some
lengths to make the test a little more 'real-world':

- creats a large (2 ** 13) hash
- poplate using a semi-random distribution
- selects keys for lookup in a semi-random distribution
- fork for each test to isolate tests somewhat
- disable GC for each test
- runs each test 4 times

in anycase, all i'm driving at is that a pretty heavy duty test (not saying
mine is that test) is required to eliminate the differences data distribution,
gc, and system load have on the results. in particular i can see how a linear
distribution might have a huge effect - seeing as symbols are essentially
numbers and hash more predictably whereas making the jump from '9999' to
'10000' is likely to land in quite a different bucket.

it's nonetheless very interesting to see some tests though.

i use both btw. ;-)

-a
--
===============================================================================
| ara [dot] t [dot] howard [at] noaa [dot] gov
| all happiness comes from the desire for others to be happy. all misery
| comes from the desire for oneself to be happy.
| -- bodhicaryavatara
===============================================================================
 
A

ara.t.howard

BTW: Ruby version 1.8.2, Win XP Pro, Pentium M 2.0 GHz

your test did show symbols being faster on my (linux - 2g cpu, 2g ram) machine
too btw...


but this slightly modified version shows strings being a tiny bit faster:

harp:~ > cat a.rb
require 'benchmark'

n = 2 ** 16
string_hash, symbol_hash = {}, {}

Benchmark.bm(10) do |b|
b.report("string set"){ n.times{|x| string_hash[rand.to_s.freeze] = rand}}
end
Benchmark.bm(10) do |b|
b.report("symbol set"){ n.times{|x| symbol_hash[rand.to_s.intern] = rand}}
end

string_keys = string_hash.keys.sort_by{ rand }
symbol_keys = symbol_hash.keys.sort_by{ rand }

Benchmark.bm(10) do |b|
b.report("string get"){ string_keys.each{|k| string_hash[k]}}
end
Benchmark.bm(10) do |b|
b.report("symbol get"){ symbol_keys.each{|k| symbol_hash[k]}}
end



harp:~ > ./build/ruby-1.8.4/ruby ./a.rb
user system total real
string set 0.470000 0.000000 0.470000 ( 0.471459)
user system total real
symbol set 0.640000 0.020000 0.660000 ( 0.661556)
user system total real
string get 0.100000 0.000000 0.100000 ( 0.101498)
user system total real
symbol get 0.080000 0.000000 0.080000 ( 0.077205)


i think all we are showing here is that there aren't good reasons for one over
the other. but that's good to i supose - since people certainly seem to have
preferences.

cheers.

-a
--
===============================================================================
| ara [dot] t [dot] howard [at] noaa [dot] gov
| all happiness comes from the desire for others to be happy. all misery
| comes from the desire for oneself to be happy.
| -- bodhicaryavatara
===============================================================================
 
J

James Edward Gray II

How about devoting the next Ruby Quiz to coming up with the best-of-
class examples, self paced-tutorial and documentation to settle
the :symbol vs "string" issue?

Hmm, smells like work and documentation combined. Two evils in one
quiz. ;)

I suspect that would make for an unpopular topic.

James Edward Gray II
 
D

Devin Mullins

Devin said:
When I first was trying to learn about symbols, attempts to explain
their "intentions" (as names of things, for example), rather than to
explain what they are, just muddied the water for me. Sure, give me
some advice on when and when not to use them, but also, tell me what
they are, so I can decide for myself:
- Like a string, but:
- Object-level equality, so :foo.equal?:)foo) and not 'foo'.equal?('foo')
- That means, with strings, if you say 'foo' 5 times in the code,
you're creating 5 string objects, whereas with symbols, you're only
creating one.
- Share many properties with Fixnum (both being "immediate")--
immutable, no singleton classes, object equality, not picked up by
ObjectSpace.each_object...
- Not garbage collected.
- Looks cooler -- by using a different syntax, you can give some
visual distinction between your keys and your values, for instance.

- Lacking all those cool String methods like #gsub and #[]

so... nothing like a string.

Yeah, I dunno, the "some named thing" was just a little iffy. The
PickAxe was especially annoying in this respect by trying to imply that
a symbol was the name of a method, variable, or class, specifically.
Maybe I'm just ranting about that.

Sorry for the mess,
Devin
 
J

Jim Weirich

ara said:
but this slightly modified version shows strings being a tiny bit
faster:

The difference is that your version measures more than just hash access
speed. It also includes string and symbol creation times. In
particular, to create a symbol, you must first create a string, so you
have twice as many object creations when using symbols.
 
R

Ron M

also, don't forget that symbols are __never__ freed.
this is a severe memory leak:

loop{ Time::now.to_f.to_s.intern }

I still wonder if the lack of garbage collection of symbols
is merely a weakness in the current implementation, or something
that's part of the Ruby language.

Note that most lisps and javas can garbage collect
symbols and interned strings.


If it's just an implementation misfeature I'd love to
see that limitation go away in YARV.





Notes:
*1 http://community.schemewiki.org/?scheme-faq-language "Most Schemes
do perform garbage-collection of symbols, since otherwise programs
using string->symbol to dynamically create symbols would consume
ever increasing amounts of memory even if the created symbols are
no longer being used."

*2 http://mindprod.com/jgloss/interned.html "In the early JDKs, any
string you interned could never be garbage collected because the
JVM had to keep a reference to in its Hashtable so it could
check each incoming string to see if it already had it in the
pool. With JDK 1.2 came weak references. Now unused interned
strings will be garbage collected."
 
S

Steve Litt

Oh, so many replies to my poor question! What a wonderful community!

I think it's because A LOT of us were wondering the same thing. In general,
RUBY conforms beautifully to Eric Raymond's "Rule of Least Surprise". IMHO
symbols are an exception.

Tell you the truth, I still don't understand, but at least now I have some
theorys thanks to the thread you started. Thank you for that.

SteveT

Steve Litt
http://www.troubleshooters.com
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,077
Latest member
SangMoor21

Latest Threads

Top