Symbols vs Strings

M

matt

Two quick questions:

1) Could someone expand on what a symbol is?
The Programming Ruby book seems to outline that a symbol is a string
with a colon in front of it.
The Agile Rails book calls it "the thing named .. "

2) If a Symbol is string substitution, why use symbols.

I feel I'm missing something obvious.

I suspect that getting #1 fully answered, will indirectly answer #2.

Thanks

Matt
 
D

Devin Mullins

matt said:
1) Could someone expand on what a symbol is?
Ridiculously long explanation follows. Composed of all the answers from
some previous symbol vs string thread.

h1. On Symbols

*So, I've been blindly typing things like "attr_accessor :elephant" and
"link_to :action => 'hemorrhage'" for a while, and I've started to get
annoyed with that colon syntax. Somebody told me that :elephant and
:hemorrhage are Symbols, but I've no clue what that is or means. WTF, mate?*

Symbol. That's the name of the class. We can see that this way:

a = :foo
a.class #=> Symbol

Yes, see, Symbols are objects, and can be treated like objects. You can
point variables to them, you can pass them as parameters, you can return
them from blocks and methods, and you can invoke methods on them.

They're quite simple, really.

*Uhhh.... I was, sort of, looking for more information than that.*

Well, there's a veritable cornucopia of ways in which I can attempt to
share the essense of Symbol with you. First, there is the source code to
the Ruby interpreter, which is, of course, the authoritative source on
this matter. And /as/ the One True Source is the Ruby source code, what
follows is a list of descriptions, analogies, observations, and
generalizations of symbols that attempt to communicate as much of their
being as you desire to know, without communicating their being in whole.

Note: Nobody can learn you but yourself. These words attempt to provide
facts, explanations and perspectives that may help you on your journey
to understanding Symbols -- but you /will/ have to do some work on your
own, whether that be experimentation in irb, or deep, pensive
introspection of the meaning of programming. To quote [a translation of]
<a href="http://en.wikiquote.org/wiki/Plutarch">Plutarch</a>:

<blockquote>We must encourage [each other] -- once we have grasped the
basic points -- to interconnecting everything else on our own, to use
memory to guide our original thinking, and to accept what someone else
says as a starting point, a seed to be nourished and grow. For the
correct analogy for the mind is not a vessel that needs filling but wood
that needs igniting -- no more -- and then it motivates one towards
originality and instills the desire for truth. Suppose someone were to
go and ask his neighbors for fire and find a substantial blaze there,
and just stay there continually warming himself: that is no different
from someone who goes to someone else to get to some of his rationality,
and fails to realize that he ought to ignite his own flame, his own
intellect, but is happy to sit entranced by the lecture, and the words
trigger only associative thinking and bring, as it were, only a flush to
his cheeks and a glow to his limbs; but he has not dispelled or
dispersed, in the warm light of philosophy, the internal dank gloom of
his mind.</blockquote>

*Gawsh, that sounds dangerous.*

It's really not. I just put that disclaimer in there to weed out the
slackers. I'll also add, that to properly glean information from this
page, you should already understand:

* Object-oriented programming
* The "variables are references" way of programming that infuses Ruby
* Many other aspects of Ruby, such as what attr_accessor does.

*All right, then. So... what _are_ these "many ways" to teach me about
Symbols?*

Yeah, right. Thanks. They are:

1. A list of the Symbol's basic properties.
1. Example code for their common usages, a discussion of the
similarities and differences between symbols and their substitutes, and
why symbols exist.
1. An analogy to concepts from other programming languages.
1. A list of some important implementation details behind symbols.
1. When you might *not* want to use Symbols (I know, blasphemy).
1. The gory details of their implementation.
1. Links to other explanations.

You can pick and choose from this menu as you like. And away we go!

*Wow, you know what I just realized? The scrollbox on the right is
frikkin' tiny. This document is huge! I don't want to read all this.*

Well, you should've thought of that before you decided not to understand
Symbols. Also, you can stop reading as soon as you understand Symbols
(but not a second earlier).

h2. A list of the symbols basic properties.

A Symbol literal, in code, is a colon followed by a bare word (/\w+/, in
general, though the regex is in fact more complex than this -- see the
gory implementation details if you really care).

A Symbol's properties can be summed up thusly:

:apple == :apple &&
:apple.to_s == 'apple' &&
:apple.to_i == 23417

:apple is A literal reference to a Symbol object, just as 5 is a literal
reference to the number 5, and "garden" is a literal reference to the
eponymous String object. I wouldn't worry too much about asking what
:apple is "wrapping" or whatnot -- :apple is :apple, as is evident from
line 1.

Line 1 says that we can compare Symbols using the == operator (aka
Symbol#==). The == operator returns true whenever the literal references
look the same (that is, in the source code, :apple is :apple is :apple,
but not :donkey or even :APPLE).

*So a Symbol literal is no different a String literal, eh?*

False. Yeah, it looks like that, but it there are a lot of ways in which
it isn't. For one, Strings come with methods galore, like gsub and slice
and capitalize. Symbol comes with:

===, id2name, inspect, to_i, to_int, to_s, to_sym

You really can't do a whole lot with Symbols, and you're not supposed to.

*So Symbols are just Strings with all the useful methods removed?*

No. Take away Symbol#id2name, Symbol#to_s, and Symbol#to_i and you still
have something useful to Rubyists -- the ability to test for equality.
Here, Symbols look like strings only to the programmer. To the program,
they look like boring, ineffectual objects with which you can do nothing
but test for equality.

*So is that _all_ you can do with a Symbol?*

Well, equality is a big one, and covers many standard usages of Symbols
in Ruby. See the Examples section for details.

You can also convert a Symbol to a String. While Symbols aren't Strings,
they have a close bond with the String class, partly due to Symbol#to_s.
This returns a new String containing whatever you typed following the
colon. (It, of course, does not affect the original object. Short of <a
href="http://rubyforge.org/cgi-bin/viewcvs.cgi/evil/lib/evil.rb?root=evil&view=markup">evil.rb</a>,
Ruby code cannot change the class of a given object.)

You can also convert a Symbol to an Integer. (To be honest, I'm not
quite sure if it's useful to anybody. I've never used it, certainly.)
According to the rdoc on Symbol#to_i, :apple.to_i will equal 23417 for
the life of my Ruby program, no matter how many times I type it. Go
ahead, pop open an irb window and try it out.

As a matter of fact, try all of these things out in irb. Set variables
equal to symbols, pass them to methods and blocks, invoke some methods,
go crazy!

So, you should see, by now, that you can't do a lot with Symbols. You
can reference them through the funny :goatee syntax, you can compare
them for equality, and you can convert them to Strings and Integers.

(Truth be told, you can do a few other things, but they fall out of the
99.9% of use cases for Symbols. After understanding Symbols, peruse the
section on [non-gory] implementation details for more.)

*Okay, that made no sense.*

Well, give it a think some more, or try one of the other sections. I
don't mind; I'm just an HTML document after all.

*No, I mean, if symbols don't /do/ anything, then why the <expletive
deleted> have them in the language in the first place?*

Well, read the next section.

h2. Example code for their common usages, a discussion of the
similarities and differences between symbols and their substitutes, and
why symbols exist.

Symbols are typically used where identity is concerned. Yeah, I know
that's vague. Here's some specific cases:

* Referring to variable or method names
* As keys to a Hash (often when doing that named parameter trick, as
in Rails)
* To refer to a specific set of things, such as :up, :right, :left, :down.

Now let's get down and dirty with real raw code, for each of these in turn:

# Referring to variable or method names
class MyJob
attr_writer :frustration_level #refers to the method names to be created
def print_var(sym)
puts "#{sym} = #{instance_variable_get(sym)}" #refers to the
variable name to be accessed
end
end
java_code = MyJob.new
java_code.frustration_level = "bordering on suicide"
java_code.print_var:)@frustration_level)

# As keys to a Hash (often when doing that named parameter trick, as
in Rails)
connection = { :host => 'eat.mcdonalds.com', :port => 443 } #as keys
to a hash
link_to :action => 'free_willy' #that named parameter trick

def link_to(hashy_thing) #implementing that named parameter trick
do_something_with(hashy_thing[:action])
end

# To refer to a specific set of things, such as :up, :right, :left, :down.
class Pos
attr_accessor :x, :y #looks familiar...
def initialize(x,y) @x,@y = x,y end
def move(dir)
case dir
when :up then @y += 1
when :down then @y -= 1
when :left then @x -= 1
when :right then @x += 1
end
self #return self to make irb sessions friendlier, and to allow
chaining, i suppose
end
end
pos = Pos.new(0,0) #x = 0, y = 0
pos.move :up #x = 0, y = 1
pos.move :left #x = -1, y = 1

Take some time. Read through the code slowly. Swish it around in your
mouth. Change some things and see what happens. Employ irb. Now pause.

Okay. Understand?

*Wait, why do attr_accessor and instance_variable_get require colons in
front of your identifiers, while alias and defined? do not?*

alias and defined? are reserved Ruby keywords. The Ruby parser notices
the keyword, and knows that the next token better be a method/variable
name, or else.

attr_accessor and instance_variable_get, however, are not reserved Ruby
keywords. They are built-in methods, provided through the Kernel module.
Because they are methods, the syntax for invoking them is the same as
for invoking any other method. If you were to do:

attr_writer frustration_level

Ruby would first look for a local variable named 'frustration_level',
and then, failing that, invoke the 'frustration_level' method on self,
and passing the *return value* to attr_writer (or, more likely, a
NoMethodError will whizz by). We don't want that. Instead, we're using
Symbols as a pretty way to pass in the _name_ of the method we want to
create. We pass [a reference to] the Symbol into attr_accessor, and then
attr_accessor invokes #to_s to find out what you typed after the colon.

Okay, do you understand /now/?

*Well, I think so, but -- couldn't you have just used Strings everywhere
for the same purpose? And for the move up/down thing, you could have
just created some UP = 1, DOWN = 2 style constants, or, heck, make four
different methods -- move_up, move_down, etc.!*

Yes, I could have.

*Uh... ?*

For all of the above cases (and all of the ways in which I've seen
Symbols applied), you could use Strings in their place. This is because,
well, what are we doing? We're comparing for equality (as with the case
statement, or the Hash access), or we're calling #to_s to find out its
name (as with the attr_accessor thing). These are both things we can do
with Strings.

So why use Symbols instead?

1. To signal intent. By sticking colons in front of these bare words,
you're saying, "These are the absolutes in my code. These are the things
that do not change. In my application, these are not messages to the
user, tokens to parse for, or anything else that's String-like. These
are *concepts*."
2. (On a related note...) For readability's sake. If you're using a
text editor with syntax highlighting, the advantage of saying <%=
link_to :controller => 'dingleberry', :action => 'pick' %> over <%=
link_to 'controller' => 'dingleberry', 'action' => 'pick' %>, amidst a
sea of RHTML, is IMMEDIATELY obvious (no pun intended, Ruby veterans).
Even if you're not, using symbols in the right places can still aid your
eyes in knowing where to look, and reduce line noise.
3. Slight performance improvement. In cases like the above, where a
fixed number of symbols are used over and over, you can get a slight
performance improvement using Symbols rather than Strings. In a typical
application, it's more likely to be negligable. Also, there seem to be
cases where using Symbols is the slightly *less* performant thing to do,
so I wouldn't dwell on this bullet much.
4. Because they're cool, and all the cool Ruby cats are doing it. Be
careful about this one, too, as there are many cases where Symbols don't
make sense. No Golden Hammer For You.

As for the UP = 1 / DOWN = 2, move_up / move_down suggestions: well,
that's just icky.

h2. An analogy to concepts from other programming languages.

So your brain doesn't think in pure Ruby, yet? That's a shame. It's
really a fun experience.

Ruby's symbols are most analagous to Lisp's symbols, I'm told. They are
also comparable to Java's interned Strings (available through the
String.intern() method), but chances are you haven't heard of
String.intern(), so I doubt that'll gain you much insight.

What might be more useful, rather, is to compare their usages to similar
things you'd be doing in other languages, like, oh, C or Java.

*Java:*

Where as attr_accessor method could take a String or a Symbol, the
Reflection API in Java uses Strings as parameters for method names,
so... yeah, that's that one.

The equivalent of the above Hash example would likely be Strings as keys
to Map (or Map-like) objects. This is the case for Properties and
ServletRequest.getParameters(), for example, and using the Properties
class is often a trick you might employ to pass freeform configuration
lists into your *own* methods.

The up/down/left/right thing general has quite an odeous parallel in
Java 1.4:
class Pos {
static public final UP = 1;
static public final DOWN = 2;
static public final LEFT = 3;
static public final RIGHT = 4;
...

Ouch. Java 5 added enums, so, you know, less pain.

*C:*

The Reflection API in C -- ha! Just kidding. Had you for a second, though.

As for the Hash thing... It's been so long since I've coded C, I just
don't know... what would the replacement for a Hash be? (C++, I'd
imagine, has some STL map class.) Named parameters just don't get used
-- ever -- leading to potentially cryptic code.

The up/down/left/right thing would most likely be accomplished using an
enum. This isn't too bad, but it's anti-Ruby. Why? Because it's
contractual. It requires a static and unchanging list of enum values to
be declared before an enum can be used. Ruby's free-flowing -- the above
#move method would not blink an eye if passed :sideways or
:eek:ut_of_the_way or :to_the_beat as a parameter.

h2. A list of some important implementation details behind symbols.

Yeah, so there are some things you might want to know about Ruby's
symbols before you go applying them willy-nilly. In no particular order:

* Symbols are immutable. Ha! Actually, you should already know that,
by virtue of the fact that Symbol's method list doesn't have any
mutating methods on it. (Compare String#replace and String#gsub!, for
example.)
* Symbols are immediate. They share this property with Fixnums, true,
false, and nil. In Ruby terms:
3.times { puts :streisand.object_id } #=> 6625550, 6625550, 6625550
3.times { puts "yogi bera".object_id } #=> 23531092, 23531068,
23531044
puts :streisand.object_id #=> 6625550
See? Each time you reference a String literal, you're creating a new
one, while each time you reference a Symbol (or any other immediate
object), you're referring to the same one that was created the *first*
time you referenced it.
* You can reference symbols in a couple of other ways. If you want
more than just variable name syntax for your symbols, you can reference
a symbol using :'single quotes' or :"double quotes" as such.
* You can also get access to Symbols _dynamically_, too. As an
extension of the last bullet, you can actually /interpolate/ the
double-quoted symbols in the :"normal #{fashion}". You can also get a
reference to a Symbol from its given String representation using
String#intern or String#to_sym. These should both be used with strong
caution because...
* Symbols are never garbage collected. For most cases, this isn't a
problem. You'll have maybe a hundred or so tiny little symbols floating
around in memory (thanks to their immediacy), and getting touched quite
often. However, if you're pulling Symbols out of your hat dynamically,
then you're juggling gas-torched batons. This, for example, leaks a
thousand symbols:
1000.times {|i| :"number #{i}" }
* At runtime, you can see a list of all the Symbols that have been
sprung into existence, by typing Symbol.all_symbols (returns an Array of
Symbols).
* :bananorama.to_yaml produces a different result from
'bananorama'.to_yaml.

h2. When you might *not* want to use Symbols.

As pointed out earlier, the principal benefit of using Symbols over
Strings is to give your mind and eyes a little less work to do in
figuring out the intent of a given piece of code. Likewise, if what
you're really doing is preparing a message for the user, or doing
something else String-like, maybe you want to stick with Strings.

Bad use of symbols:
num = [:eek:ne, :two][rand(2)]
puts "Your number is: #{num}"

Better to use Strings, instead:
num = ['one', 'two'][rand(2)] #or %w{one two} if you'd like
puts "Your number is: #{num}"

h2. The gory details of their implementation.

For now, don your flame-retardant suit, and visit <a
href="http://ruby-talk.org/cgi-bin/vframe.rb/ruby/ruby-talk/172818?172638-173519+split-mode-vertical">this
thread</a>. I'm too lazy/incompetent to type up a summary of this wizardry.

h2. Links to other explanations.

If none of my descriptions helped, well, then, too bad. Or, you can
click some links.

The following explanations are not necessarily universally condoned by
the Ruby community, but may fit your fancy (for what it's worth):

"Symbols as light-weight
Strings":http://moonbase.rydia.net/mental/blog/programming/ruby-symbols-explained.html
"Symbols as Integers with human faces":[ruby-talk:173442]
"Symbols as ever-present bubbles floating in an imperceptible
ether":[ruby-talk:173076]

Devin
 
M

matt

Excellent work.

There was reference to this being an HTML document, is there an online
version of this that I can reference?

Two questions came out of this:

1) For rails apps that use link_to :blah

where is :blah being made a symbol? Is it in the base controller?

How do I know that I need to use :blah, and not some other symbol ?

(There are other methods that do this, I'm arbitrarily choosing link_to,
it could be paginate, or link_to_remote, or many others, the concept I
hope still remains)

2) Is there any relation of a Ruby Symbol and a C++ pointer or
reference ? It sounded like that to me as I was reading through, but I
could be wrong.


Thanks

Matt



On Thu, 2006-12-21 at 14:24 +0900, Devin Mullins wrote:
 
D

Devin Mullins

matt said:
1) For rails apps that use link_to :blah

where is :blah being made a symbol? Is it in the base controller?
:blah is made a symbol the first time that the interpreter comes across
the :blah token, either in your source code or in the Rails source code.
Each subsequent time that the interpreter finds :blah, it points it to
the preexisting symbol.
How do I know that I need to use :blah, and not some other symbol ?
Convention. You're just passing a Hash, and Rails cares about certain
key/value pairs in that Hash.
2) Is there any relation of a Ruby Symbol and a C++ pointer or
reference ? It sounded like that to me as I was reading through, but I
could be wrong.
Not really. All variables in Ruby are references.

HTH,
Devin
 
J

Jeremy Wells

Devin said:
Convention. You're just passing a Hash, and Rails cares about certain
key/value pairs in that Hash.
I think that rails basically converts all hash keys into symbols
automatically using Hash.symbolize_keys!, so you could pass a string or
a symbol, but passing a symbol would be faster.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top