David Masover wrote:
I would have thought number of cities over time would be finite and
predictable. Granted, the number of cities is probably in the tend or
hundreds of thousands.
So symbols would be appropriate if instead of cities, adad was reading a
text file of state names?
Nope. A typo or an error in the file, and you've got a problem again. It's
similar to when you've got any sort of external input which you want to
compare to a finite list of values. It might be tempting to do this:
Values = [
ne, :two, :three, :four]
...
if Values.include? input_value.to_sym
What you should be doing is this:
Values = ['one','two','three','four'].map(&:freeze).freeze
...
if Values.include? input_value
If it's still not efficient enough (if there are hundreds of values), put them
in a Set or a Hash, but that's even more reason not to convert input to syms.
I'd only do it that way you're suggesting if the file in question was part of
the source distribution, but it sounds like it's coming from an external
service.
How about country names (currently under 200)?
I ask, in an attempt to gauge what is typically considered the accepted
threshold for using symbols.
In my opinion, the threshold of using symbols is whenever it's a finite number
that's generated only from trusted sources -- generally, stuff inside your
application source code.
It also matters how it's being used -- as David Black and Robert Klemme point
out, symbols are generally for labels. They're what's used to refer to
functions and variables by "name" in Ruby. Their other major use is for hashes
of options passed around -- essentially, keyword arguments.
It might also help to think of Symbols as Enum values. Let me put it this way
-- in other languages, like C and Java, you might have a fixed number of
values you might want to work with. For example, suppose I want to open a file
read, write, or both. It's inefficient to actually pass the strings "read",
"write", or "read/write" with every file open, so instead, I might pass an
integer, 1, 2, or 3. But that's annoying to work with, so instead, I'd define
a constant:
#define READ 1
#define WRITE 2
#define READWRITE 3
Now I can do something like:
open("foo", READ)
Then, inside the open function, you'd have something like:
case mode
when READ
when WRITE
...
All of which is just shorthand for:
open("foo", 1)
and
case mode
when 1
when 2
...
This is vastly oversimplified, and not how it's actually done, but it works.
This is also such a common pattern that languages have shortcuts for it. I'm
working from memory here, so the syntax is probably wrong, but the idea holds:
enum { READ, WRITE, READWRITE }
open("foo", READ)
The enum will automatically assign a unique integer value to each of READ,
WRITE, and READWRITE. As long as that same enum is visible to the code of the
open function, and to the code calling it, the number assigned to READ, WRITE,
and READWRITE will be the same each time.
Note that at this point, you really don't have to care what number it is, just
that it's unique, and that doing it this way is just as efficient as manually
specifying a number.
And since it doesn't matter, there's no reason passing 1 should be more
efficient than passing 3085, or anything, as long as it's still a 32-bit
integer. (Or 64-bit, if you're on a 64-bit platform.) If you were really
strapped for space, you could use a single byte value, but there is actually a
real possibility that won't be enough, depending on your application, and you
want to be backwards compatible. So an int makes sens, and besides, enum is
doing all the work for you.
So, symbols -- a concept from Lisp, actually -- take this just a step farther.
Rather than assigning a number that's just unique for that function (READ-1,
WRITE-2, etc), you get a number that's globally unique. When you type
:foo
...what you're really doing is getting a unique integer, which Ruby will
replace any occurrance of the symbol :foo with, anywhere in your source code.
Again, it's oversimplifying -- it's probably implemented as an integer, but
you'll see it as a Symbol object. The entire point of this is so that you can
guarantee that the following two things will be true:
:foo == :foo
:foo != :bar
And of course, you can do case structures using symbols, you can use them in
hash values, and so on.
And because Ruby is reflective, you can do things like:
"foo".to_sym
But scroll back up and look at the "enum" example. This kind of monkying is
properly metaprogramming -- it would be like writing a program that generates
enum statements for a header.
Sometimes, it might actually be appropriate. An obvious example is an ORM --
an intelligent ORM can read the database schema and create methods named after
database columns. You'll get those column names as strings, and you'll turn
them into symbols. You might even be fancy, like Rails, and do some string
manipulation (pluralize them, etc) and create some more methods.
That's not entirely a new idea, either. If this was a compiled language, you'd
probably have a tool that took some specification (maybe XML... ugh) and
converted it into both SQL statements to create that database, and source code
to access it. The main difference is that Ruby is dynamic enough to do this at
runtime, just-in-time, rather than having to actually generate source code.
But the concepts are the same.
Here's a quick rule of thumb:
- Am I metaprogramming?
- Are these keyword arguments, or some sort of options hash?
- Are the symbols created with the colon notation
foo)?
If you answered yes to any of those, symbols are fine. If you answered no to
all of them, probably not.
phew! I think I need a blog.