[Note: parts of this message were removed to make it a legal post.]
On Wed, Feb 3, 2010 at 12:27 PM, Jerome David Sallinger <
Can someone please explain how you would go about deciding whether to
use a Hash or and Array for a given requirement. Can someone give an
idiot proof simple explanation including any pros versus cons.
Understanding what happens under the covers will help you know what is right
for your situation, so first a quick explanation of arrays and hashes.
An array is a section of consecutive memory (Ruby's array class is more
abstract, but generally you can expect something like this). Since the array
doesn't actually store the object, but instead stores the reference to the
object, and all references are the same size, the array can always know
where the nth item is (for some number n). It also means that the items are
inherently ordered, ie the first element you put in will be at index zero,
the second will be at index 1, and so on. If you want to see that order
again, you look at index 0, then index 1, and so on. So if you want order,
then an array makes sense. If you want to be able to call the sixth item,
then you know right were it is, it is in index 5 (not 5, because we start
counting at zero). And we know where index 5 is, because the references to
the objects have a fixed width. So it's in memory location:
location_of_array + size_of_reference * desired_index, and bam, we are
there. So it is very efficient.
But what happens if you don't know where the item you are interested in is
located? Well, then you have to search through the whole array (there are
also searching algorithms, if you can guarantee certain properties about the
array)
However, sometimes you are not so interested in the order things went in, as
you are with relating two pieces of information together.
Enter the hash table. A hash table has a "key" and a "value" it uses the key
in the same way that an array uses an index, to retrieve the value.
Underneath the hash table, is an array, and every key is mapped to an array
index. To do this, it looks at the object's contents, and comes up with a
number (the algorithm used will depend on the object, and how the creator
decided to implement it). That index is then likely to be where the object
is within the array. The key implies an index, but within the array, there
is no ordering of the contents.
You figure out what index it should be located at by deriving some number
from the object, then translating that number onto the array holding the
items. Perhaps the array has ten indexes, and six items in it, then when it
is asked to look for the index containing some object, it figures out that
object's hash number, and translates it to an array index (probably mods it
by ten).
You can see these hash numbers by calling #hash on them here is an example:
x = "abc"
y = "abc"
z = "def"
x.object_id # => 75900
y.object_id # => 75890
x.hash # => 833038373
y.hash # => 833038373
z.hash # => 858800354
x.hash % 10 # => 3
y.hash % 10 # => 3
z.hash % 10 # => 4
Notice that x and y are both different objects (they have different object
IDs), but they contain the same information, two different strings of "abc"
and look at their hash values, they are the same. When String defines #hash,
it somehow looks at the character array underneath the string, and comes up
with a number (in this case, 833038373). This is why two different objects
with the same contents have the same hash value, and if we assume an array
of size ten, then we would expect the string "abc" to be in index number 3.
The string "def" has different contents, and thus a different hash value,
and it maps to a different index.
So by looking at a key, we determine which index the object will map to, and
go see if it is there (though there can be some complications when things
collide).
So, generally, if you are interested simply storing a bunch of items, or in
storing a bunch of items with an ordering, then an array makes sense. If you
are interested in mapping one object to another, then a hash makes sense.
(note that in 1.9, hashes have ordering also, but you still can't say "give
me the 8th item"). If you have ten strings and I just want to keep them
somewhere for later, use an array. If you have four database records, and
want to display them in order, use an array. If you want to pass a bunch of
fields from a form that was submitted, where the form input will be accessed
based on the name of the field, use an array.
These are not hard rules, you need to think about what you are doing, and
which data structure fits your needs. Also, depending on what you do, arrays
can ultimately be most other data structures, you saw underneath of it, a
hash table holds an array, it just defines different rules for how to access
elements. There are lots of examples like this, as thunk pointed out, using
the array methods push and pop will give you a stack (place an item into the
array, and remove it again, where the first one you put in is the last one
you get back out, think of a stack of plates, they put newly cleaned plates
on top of the stack, and you pull the plate you want from the top of the
stack). Using push and shift will give you a queue (same as a stack, except
the first one you put in will be the first one you get out, think of a line
of people waiting for food in a caffeteria). But what I said earlier should
probably give you a fairly good idea which of the two you should be
considering first.
I think that the Hash container (may I use that term?)
I think the common terms are "hash" or "hash table".
BTW, camelCase is considered poor style in Ruby; use underscore_case
instead.
I've always heard underscore_case called snake_case, Textmate, for example,
calls them: camelCase / snake_case / PascalCase
(you can toggle between the three with ^_ as defined in the "Source" bundle)