Hashes versus Arrays

Jerome David Sallinger · Feb 3, 2010

Hello,

Can someone please explain how you would go about deciding whether to
use a Hash or and Array for a given requirement. Can someone give an
idiot proof simple explanation including any pros versus cons.

Sal

IÃ±aki Baz Castillo · Feb 3, 2010

El Mi=C3=A9rcoles, 3 de Febrero de 2010, Jerome David Sallinger escribi=C3=
=B3:

Hello,
=20
Can someone please explain how you would go about deciding whether to
use a Hash or and Array for a given requirement. Can someone give an
idiot proof simple explanation including any pros versus cons.

An array is indexed by an integer (myarray[0]) while a hash is indexed by a=
ny=20
Ruby object (myhash["drinks"]).

In Ruby 1.9 Hashes are also ordered (an important difference between a typi=
cal=20
hash and an array present in Ruby 1.8).

If you need to access the entries based on a numeric index Arrays are valid=
=20
for you. If not, Hashes are great.

=2D-=20
I=C3=B1aki Baz Castillo <[email protected]>

thunk · Feb 3, 2010

Hello,

Can someone please explain how you would go about deciding whether to
use a Hash or and Array for a given requirement. Can someone give an
idiot proof simple explanation including any pros versus cons.

Sal

I think that the Hash container (may I use that term?) can be thought
of primarily as a "Dictionary" - for (fast) random access. What is
so darned "cool" about Hashes is that at any time the data can be: 1.
an Array, 2 Another Hash and so on.... until you can give yourself a
headache.

Array are useful for "Stack" operations. I love the "pop" and "push"
- reminds me of registers in a calculator. The Wee library comes with
a "HP" reverse polish "calculator" implemented in 10 lines of code or
so - amazing and so powerful - but not the stuff of most apps except
maybe sometimes inside Hash. That's when Arrays are useful when the
ORDER is all you need. If you want the last element you can fetch it
and remove it with the thingyAr.pop command. This is a pretty common
situation. (Think keyboard commands and such)

Of course we are entirely OO in Ruby so any element can be anything -
a Web component or - oh no!!! a Hash!

Not sure it is "best practice" and I'm waiting to see how it works out
BUT I have been edging toward fewer containers and using them for more
and more. Many of my methods in my web related work return Hashes now
so I can put names to the data. Using an array and trying to remember
position for the same purpose would be really error prone.

Hashes have some really nice features also!

Like this one: myValue = someHash.fetch( :thingyAr, [] )

What's that about? if the symbol :thingyAr is not found an empty
array is returned - thus you can fall right into your code to do
something with the Array without testing for that annoying "nil".

Marnen Laibow-Koser · Feb 4, 2010

thunk said:
I think that the Hash container (may I use that term?) can be thought
of primarily as a "Dictionary" - for (fast) random access.

So can an Array. The only difference is that array indices have to be
numeric.

What is
so darned "cool" about Hashes is that at any time the data can be: 1.
an Array, 2 Another Hash and so on.... until you can give yourself a
headache.

The same is true of arrays.

Array are useful for "Stack" operations.

But that's not really their primary purpose.

I love the "pop" and "push"
- reminds me of registers in a calculator. The Wee library comes with
a "HP" reverse polish "calculator" implemented in 10 lines of code or
so - amazing and so powerful - but not the stuff of most apps except
maybe sometimes inside Hash.

Uh, what? That last phrase appears not to make sense.

That's when Arrays are useful when the
ORDER is all you need. If you want the last element you can fetch it
and remove it with the thingyAr.pop command. This is a pretty common
situation. (Think keyboard commands and such)

Right, although you'd probably want a more sophisticated Stack object...

Of course we are entirely OO in Ruby so any element can be anything -
a Web component or - oh no!!! a Hash!

Right. Or anything else.

Not sure it is "best practice" and I'm waiting to see how it works out
BUT I have been edging toward fewer containers and using them for more
and more. Many of my methods in my web related work return Hashes now
so I can put names to the data.

If you want to put names to the data, you should be using value objects,
not hashes.

Using an array and trying to remember
position for the same purpose would be really error prone.

Yup. And using a Hash is also error-prone. Define a custom value
object.

Hashes have some really nice features also!

Like this one: myValue = someHash.fetch( :thingyAr, [] )

What's that about? if the symbol :thingyAr is not found an empty
array is returned - thus you can fall right into your code to do
something with the Array without testing for that annoying "nil".

Pretty useless, considering that you can use || for the same thing

BTW, camelCase is considered poor style in Ruby; use underscore_case
instead.

Best,
--Â
Marnen Laibow-Koser
http://www.marnen.org
(e-mail address removed)

Josh Cheek · Feb 4, 2010

[Note: parts of this message were removed to make it a legal post.]

On Wed, Feb 3, 2010 at 12:27 PM, Jerome David Sallinger <

Can someone please explain how you would go about deciding whether to
use a Hash or and Array for a given requirement. Can someone give an
idiot proof simple explanation including any pros versus cons.

Understanding what happens under the covers will help you know what is right
for your situation, so first a quick explanation of arrays and hashes.

An array is a section of consecutive memory (Ruby's array class is more
abstract, but generally you can expect something like this). Since the array
doesn't actually store the object, but instead stores the reference to the
object, and all references are the same size, the array can always know
where the nth item is (for some number n). It also means that the items are
inherently ordered, ie the first element you put in will be at index zero,
the second will be at index 1, and so on. If you want to see that order
again, you look at index 0, then index 1, and so on. So if you want order,
then an array makes sense. If you want to be able to call the sixth item,
then you know right were it is, it is in index 5 (not 5, because we start
counting at zero). And we know where index 5 is, because the references to
the objects have a fixed width. So it's in memory location:
location_of_array + size_of_reference * desired_index, and bam, we are
there. So it is very efficient.

But what happens if you don't know where the item you are interested in is
located? Well, then you have to search through the whole array (there are
also searching algorithms, if you can guarantee certain properties about the
array)

However, sometimes you are not so interested in the order things went in, as
you are with relating two pieces of information together.

Enter the hash table. A hash table has a "key" and a "value" it uses the key
in the same way that an array uses an index, to retrieve the value.
Underneath the hash table, is an array, and every key is mapped to an array
index. To do this, it looks at the object's contents, and comes up with a
number (the algorithm used will depend on the object, and how the creator
decided to implement it). That index is then likely to be where the object
is within the array. The key implies an index, but within the array, there
is no ordering of the contents.

You figure out what index it should be located at by deriving some number
from the object, then translating that number onto the array holding the
items. Perhaps the array has ten indexes, and six items in it, then when it
is asked to look for the index containing some object, it figures out that
object's hash number, and translates it to an array index (probably mods it
by ten).

You can see these hash numbers by calling #hash on them here is an example:

x = "abc"
y = "abc"
z = "def"

x.object_id # => 75900
y.object_id # => 75890

x.hash # => 833038373
y.hash # => 833038373

z.hash # => 858800354

x.hash % 10 # => 3
y.hash % 10 # => 3
z.hash % 10 # => 4

Notice that x and y are both different objects (they have different object
IDs), but they contain the same information, two different strings of "abc"
and look at their hash values, they are the same. When String defines #hash,
it somehow looks at the character array underneath the string, and comes up
with a number (in this case, 833038373). This is why two different objects
with the same contents have the same hash value, and if we assume an array
of size ten, then we would expect the string "abc" to be in index number 3.
The string "def" has different contents, and thus a different hash value,
and it maps to a different index.

So by looking at a key, we determine which index the object will map to, and
go see if it is there (though there can be some complications when things
collide).

So, generally, if you are interested simply storing a bunch of items, or in
storing a bunch of items with an ordering, then an array makes sense. If you
are interested in mapping one object to another, then a hash makes sense.
(note that in 1.9, hashes have ordering also, but you still can't say "give
me the 8th item"). If you have ten strings and I just want to keep them
somewhere for later, use an array. If you have four database records, and
want to display them in order, use an array. If you want to pass a bunch of
fields from a form that was submitted, where the form input will be accessed
based on the name of the field, use an array.

These are not hard rules, you need to think about what you are doing, and
which data structure fits your needs. Also, depending on what you do, arrays
can ultimately be most other data structures, you saw underneath of it, a
hash table holds an array, it just defines different rules for how to access
elements. There are lots of examples like this, as thunk pointed out, using
the array methods push and pop will give you a stack (place an item into the
array, and remove it again, where the first one you put in is the last one
you get back out, think of a stack of plates, they put newly cleaned plates
on top of the stack, and you pull the plate you want from the top of the
stack). Using push and shift will give you a queue (same as a stack, except
the first one you put in will be the first one you get out, think of a line
of people waiting for food in a caffeteria). But what I said earlier should
probably give you a fairly good idea which of the two you should be
considering first.

I think that the Hash container (may I use that term?)

I think the common terms are "hash" or "hash table".

BTW, camelCase is considered poor style in Ruby; use underscore_case
instead.

I've always heard underscore_case called snake_case, Textmate, for example,
calls them: camelCase / snake_case / PascalCase
(you can toggle between the three with ^_ as defined in the "Source" bundle)

David Masover · Feb 6, 2010

People may be over-complicating things a bit.

Can someone please explain how you would go about deciding whether to
use a Hash or and Array for a given requirement.

Is it just a list of values, ordered or not? Put it in an Array. Example:

friends_list = ['Tom', 'Susie', 'Yosef']

A hash is an associative array, not necessarily sorted. Do you have things you
need to associate?

people_list = {'Tom' => :friend, 'Joe' => :enemy}
people_list['Tom'] # should return :friend
people_list['Joe'] = :friend #guess we made up

These are entirely different patterns. You almost never have a requirement
that would make sense for either.

People are complicating this by the fact that arrays can be indexed by
position. For example:

friends_list[1] # who's my second friend?

But in high-level languages like Ruby, especially when you're starting out,
you probably don't need to be doing that.

Can someone give an
idiot proof simple explanation

No. You're a programmer. If you're also an idiot, you end up on thedailywtf.
"Idiot-proof programming" is neither.

Jerome David Sallinger · Feb 6, 2010

Thank you David Masover, an explanation and a patronising comment, what
more could one ask of a fellow human on a Saturday night.

Yossef Mendelssohn · Feb 6, 2010

friends_list =3D ['Tom', 'Susie', 'Yosef']

I am touched to be included in this example.

David Masover · Feb 7, 2010

Thank you David Masover, an explanation and a patronising comment, what
more could one ask of a fellow human on a Saturday night.

If by this you mean my "idiot-proof" comment... my point here wasn't to accuse
you of being an idiot, but rather, that attempts to make this kind of thing
"idiot-proof" don't work. I'd much rather start by assuming you're
intelligent.

Jerome David Sallinger · Feb 7, 2010

I must admit to you that yes I am in idiot and proud of it

Robert Klemme · Feb 8, 2010

2010/2/7 Jerome David Sallinger said:
I must admit to you that yes I am in idiot and proud of it

Wrong group, please repost at alt.soc.confessions. ;-)

Cheers

robert

FAQ 4.44 How do I test whether two arrays or hashes are equal?	2	Apr 20, 2011
Array of Hashes in an array of hashes - Complicated!	16	Sep 15, 2009
arrays and hashes	13	Sep 15, 2006
Hash of Hash of Arrays Question	5	Jul 18, 2009
Sorting hash of hashes	3	Nov 21, 2011
Tasks	1	Nov 29, 2022
Question about arrays of hashes	3	Jan 11, 2008
plural and singular syntax in Perl5, PHP and Perl6	3	Dec 31, 2012

Hashes versus Arrays

Jerome David Sallinger

IÃ±aki Baz Castillo

thunk

Marnen Laibow-Koser

Josh Cheek

David Masover

Jerome David Sallinger

Yossef Mendelssohn

David Masover

Jerome David Sallinger

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads