Hash order bug?

J

Javier Valencia

I have this piece of simple code:

----------------------------------------------
def foo
return 5
end

a = {:gr => false, :und => false, :det => true, :sta => false, :inv =>
false}
puts a.inspect
 
D

dblack

Hi --

I have this piece of simple code:

----------------------------------------------
def foo
return 5
end

a = {:gr => false, :und => false, :det => true, :sta => false, :inv => false}
puts a.inspect

Hashes are unordered. If you need an ordered collection, you'll need
to use an array.


David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS (reviewed on
Slashdot, 7/12/2006!)
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me
 
N

Nicolas Desprès

I have this piece of simple code:

----------------------------------------------
def foo
return 5
end

a =3D {:gr =3D> false, :und =3D> false, :det =3D> true, :sta =3D> false, = :inv =3D>
false}
puts a.inspect

by definition a hash table is not ordered.

[...]

--=20
Nicolas Despr=E8s
 
J

Javier Valencia

Hi --




Hashes are unordered. If you need an ordered collection, you'll need
to use an array.


David

Oh my god, i didn't know it, sorry. Is that a missing feature?
 
D

dblack

Hi --

Oh my god, i didn't know it, sorry. Is that a missing feature?

Debate rages on this point :) It's mostly a speed issue, I think.
At least my memory is that Matz's latest statement was to the effect
that he would do it if it could be done efficiently.

The question came up during a lunch here at OSCON (with me, Jim
Weirich, Pat Eyler, and a fellow named Nicolas whose last name I'm
afraid I don't remember) as to whether making hashes ordered in a
future Ruby would break any existing code. I think the answer is no,
which is kind of interesting considering what a major change it would
be. It's a case where the feature would be purely additive.


David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS (reviewed on
Slashdot, 7/12/2006!)
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me
 
J

Javier Valencia

Nicolas said:
I have this piece of simple code:

----------------------------------------------
def foo
return 5
end

a = {:gr => false, :und => false, :det => true, :sta => false, :inv =>
false}
puts a.inspect

by definition a hash table is not ordered.

[...]

Well, i don't mind ordering alphabetically, just to maintain their
construction order, like an array.
Anyway, thanks all for quick responses, hope to see that implemented in
core-ruby soon :)

Thanks
 
K

Kero

[snip]
Debate rages on this point :) It's mostly a speed issue, I think.
At least my memory is that Matz's latest statement was to the effect
that he would do it if it could be done efficiently.

The question came up during a lunch here at OSCON (with me, Jim
Weirich, Pat Eyler, and a fellow named Nicolas whose last name I'm
afraid I don't remember) as to whether making hashes ordered in a
future Ruby would break any existing code. I think the answer is no,
which is kind of interesting considering what a major change it would
be. It's a case where the feature would be purely additive.

I guess there won't be many ppl using algorithms that use the *un*sorted
aspect of a Hash. Using Kernel#rand instead of Object#hash seems
better. printing a few hash-values suggest they're not that random anyway...

Other than that, would you sort the hash (or iterator) on its keys, its
values or insertion sequence?

NB: why did I read "addictive" at first pass? The question has been raised
more often; and I admit I have overridden Hash#keys (and Hash each_pair,
iirc) to achieve a (partially) sorted effect... The problem is that you can
not generally sort keys (specifically not symbols).
 
P

Paul Battley

The question came up during a lunch here at OSCON (with me, Jim
Weirich, Pat Eyler, and a fellow named Nicolas whose last name I'm
afraid I don't remember) as to whether making hashes ordered in a
future Ruby would break any existing code. I think the answer is no,
which is kind of interesting considering what a major change it would
be. It's a case where the feature would be purely additive.

I think that would depend on whether {:a => 1, :b => 2} == {:b => 2,
:a => 1}. If that's the same, everything should be fine. If it's
different, I know that it would break a number of my tests, if nothing
else.

Paul.
 
W

William James

Javier said:
Oh my god, i didn't know it, sorry. Is that a missing feature?

No. What language has ordered hashes?

When you say
a = {:gr => false, :und => false, :det => true, :sta => false, :inv
=> false}
you are saying that you want to access the values in this fashion:
a[:und]
So the order in which they are stored is irrelevant.
If you want to access them this way
a[1]
then you set them up like this
[ false, false, true, false, false ]

If you want to get a sequence of values in a certain order:
a.values_at( :und, :det, :sta )

If you want to have your cake and eat it too, you could
use an association list:
as = [[:foo,22], [:bar,33], [:baz,44]] => [[:foo, 22], [:bar, 33], [:baz, 44]]
as.assoc:)bar) => [:bar, 33]
as.rassoc( 44 ) => [:baz, 44]
as[0]
=> [:foo, 22]

If the list has a large number of elements, access will be slow.
 
K

Keith Gaughan

What language has ordered hashes?

Java, in the form of the TreeMap collection. The usefulness of them is
limited though, As anecdotal evidence, I've used it twice in the past
four years.

K.
 
A

Alex Young

Kero said:
[snip]
Debate rages on this point :) It's mostly a speed issue, I think.
At least my memory is that Matz's latest statement was to the effect
that he would do it if it could be done efficiently.

The question came up during a lunch here at OSCON (with me, Jim
Weirich, Pat Eyler, and a fellow named Nicolas whose last name I'm
afraid I don't remember) as to whether making hashes ordered in a
future Ruby would break any existing code. I think the answer is no,
which is kind of interesting considering what a major change it would
be. It's a case where the feature would be purely additive.

I guess there won't be many ppl using algorithms that use the *un*sorted
aspect of a Hash.
It's not that it's unordered, it's that it's anordered. The concept
doesn't apply. It's a bucket, not a queue. There are many, many
applications that don't care about entry order.
Using Kernel#rand instead of Object#hash seems
better. printing a few hash-values suggest they're not that random anyway...
The order is unpredictable and implementation dependent.
 
P

Phillip Hutchings

No. What language has ordered hashes?

PHP (*shudder*). Since PHP only has a single array type that supports
both sequential and keyed accessing any string keys remain in order.
It can be handy as PHP's array construction is needlessly verbose, for
example in Ruby I'd do this:

[['item1', 1],['item2',2]]

In PHP it'd look like:
array(array('item1', 1), array('item2', 2));
or
array('item1'=>1, 'item2'=>2);

Which is easier to read?
 
K

Keith Gaughan

And even in the TreeMap case, you sacrifice performance for features.

Very true. While a peek at the source code will reveal that the
Red-Black tree it uses is well-implemented, nothing comes for free.
An unordered hash will be the fastest for lookup and insertion.

That depends, though it *is* generally true. However, your
datastructures should suit the problem domain and the constraints you're
working under.
Adding any
additional features will slow it down out of the necessity of imposing an
order on either keys or values. With TreeMap, I think it's something like
O(log n) lookup times rather than the theoretical O(1) with a straight hash,
and that's not counting any tree rebalancing needed after future insertions.

Theoretically. However, while looking at speed through the lens of time
complexity can be a good rule of thumb, it's sometimes somewhat
misleading.

For instance, say you had a dataset that you had to do lookups upon, and
you'd two choices as to the method to use for lookups. The first is a
linear scan of the unordered data, and the other is some keyed lookup.
The former would give you O(n) performance, while the latter would give
O(1) performance.

If you didn't have any more information, you'd be forgiven for choosing
the latter method, but say scanning each element takes only .01ms,
whereas the calculation required in the latter method takes 2s per
element, it's a no-brainer to pick the the O(n) method over the O(1)
method if your dataset is of the right size.

In the case of hashes, you only get O(1) if (a) the hash can be
calculated in constant time and (b) you're using a perfect hash
function. Strictly speaking, just as Quicksort's performance is O(n^2),
insertion and lookup for hashtables is O(n+m) where the hash degrades to
something resembling a bunch of keyed linked lists and _n_ is the mean
time to convert the hash key to an index, and _m_ is the mean time
required to do a linear scan of the appropriate linked list. But such
cases are rare, but inherent in the way hashes are implemented.

Big-O notation is only meant as a guide to algorithm choice and is no
substitute for and understanding of the tradeoffs involved with each
potential algorithm and how they actually perform when used with your
datasets.

Trees are good when you (a) want to guarantee how long it will take to add,
lookup, or remove an element or (b) need the keys to be ordered by some
property. Most of the time, we don't need these properties.
At minimum, you'll double up memory storing an array and a hash together, or
you'll sacrifice performance maintaining and traversing a balanced tree.
...

Leave hash alone; custom data structures are fine when the peculiar beast
known as an "ordered hash" is actually useful.

Agreed. There's no sense in adding additional complexity to the language
to cope with something like that which most of us only use rarely, if
ever. That is, after all, what libraries are for.

K.
 
K

Keith Gaughan

No. What language has ordered hashes?

PHP (*shudder*). Since PHP only has a single array type that supports
both sequential and keyed accessing any string keys remain in order.
It can be handy as PHP's array construction is needlessly verbose, for
example in Ruby I'd do this:

[['item1', 1],['item2',2]]

In PHP it'd look like:
array(array('item1', 1), array('item2', 2));
or
array('item1'=>1, 'item2'=>2);

Which is easier to read?

That's not really fair on either Ruby or PHP. You're taking a
syntactical element and using it to bash an unrelated part of the
language. If PHP was to introduce the use of square brackets as
syntactic sugar for array(), this argument would go away in a puff of
smoke.

K.

P.S. My .sig generator seems to think I'm talking about ASP.NET :)
 
P

Phillip Hutchings

That's not really fair on either Ruby or PHP. You're taking a
syntactical element and using it to bash an unrelated part of the
language. If PHP was to introduce the use of square brackets as
syntactic sugar for array(), this argument would go away in a puff of
smoke.

Disclaimer: My day job is working with a large PHP webapp. Maybe I'm just jaded.

I believe it's fair. It's one of my reasons for liking Ruby (and
Python, and JavaScript for that matter). Arrays and hashes are one of
the most widely used structures in my development, and to have such a
clumsy constructor is a pain.

However, there are much deeper issues in PHP's implementation. There
are a few array operations (eg array_merge) that will preserve key
mappings on string keys, but not integer keys. Guess what happens if
you want a hash with integer keys?
 
H

Hal Fulton

Debate rages on this point :) It's mostly a speed issue, I think.
At least my memory is that Matz's latest statement was to the effect
that he would do it if it could be done efficiently.

I don't know if it really rages. ;)

I personally favor the idea of a so-called ordered hash,
but that's just me. There's no way to do it in current
Ruby (while still allowing hash literals). Before arguing
with me, read the previous parenthetical piece and understand
it.

I once talked with Matz about it, suggesting a new class (e.g.,
Map or AssociativeArray or whatever). I had assumed a new name
would be needed because a "hash" is inherently unordered, thus
interfering with the class name "Hash." But Matz surprised me
by saying that if it were ordered, the name Hash could be kept,
as the underlying data structure was still a hash -- just with
a little extra data preserved.

I've heard that Nobu made a patch for this, and I heard that
there were some questions about performance (not actual
measurements, just concerns). But I never heard anything beyond
that.
The question came up during a lunch here at OSCON (with me, Jim
Weirich, Pat Eyler, and a fellow named Nicolas whose last name I'm
afraid I don't remember) as to whether making hashes ordered in a
future Ruby would break any existing code. I think the answer is no,
which is kind of interesting considering what a major change it would
be. It's a case where the feature would be purely additive.

It couldn't break code as far as I can see.

I confess to being amused at one person who was so vehemently
against this that he said, "If Ruby added that, I would quit
using Ruby." Truthfully I laughed out loud, because I thought
that was an extreme reaction.

But of course, he had a point in a way. People sometimes say, "If
you don't like Feature X, then don't use it." But you still may
find yourself reading someone else's code that does.

I'll try to make a post soon about a use case for an ordered hash.
I've been meaning to collect a bunch of use cases, but haven't
done so. Right now I can only think of one.


Hal
 
H

Hal Fulton

Paul said:
I think that would depend on whether {:a => 1, :b => 2} == {:b => 2,
:a => 1}. If that's the same, everything should be fine. If it's
different, I know that it would break a number of my tests, if nothing
else.

In previous discussions, Matz said these would still be equal.

That suits me: I could still do h1.to_a == h2.to_a if needed.


Hal
 
H

Hal Fulton

William said:
No. What language has ordered hashes?

When you say
a = {:gr => false, :und => false, :det => true, :sta => false, :inv
=> false}
you are saying that you want to access the values in this fashion:
a[:und]
So the order in which they are stored is irrelevant.
If you want to access them this way
a[1]
then you set them up like this
[ false, false, true, false, false ]

If you want to get a sequence of values in a certain order:
a.values_at( :und, :det, :sta )

Your point is well taken. David Alan Black also argues eloquently
against the concept of an ordered hash.

But to me there is psychological weight in seeing a literal of this form:

{a=>b, c=>d, e=>f}

There is in fact a syntactic order (which turns out to be meaningless
internally).
If you want to have your cake and eat it too, you could
use an association list:

as = [[:foo,22], [:bar,33], [:baz,44]]

=> [[:foo, 22], [:bar, 33], [:baz, 44]]

[snip]

That works fine, but I dislike the ugly syntax.


Hal
 
W

William James

Hal said:
William said:
No. What language has ordered hashes?

When you say
a = {:gr => false, :und => false, :det => true, :sta => false, :inv
=> false}
you are saying that you want to access the values in this fashion:
a[:und]
So the order in which they are stored is irrelevant.
If you want to access them this way
a[1]
then you set them up like this
[ false, false, true, false, false ]

If you want to get a sequence of values in a certain order:
a.values_at( :und, :det, :sta )

Your point is well taken. David Alan Black also argues eloquently
against the concept of an ordered hash.

But to me there is psychological weight in seeing a literal of this form:

{a=>b, c=>d, e=>f}

There is in fact a syntactic order (which turns out to be meaningless
internally).

In awk, such hash literals don't exisit; so when using that language,
one is less tempted to imagine that hashes are ordered. Perhaps
we should think of the above as 'syntactic sugar' for

a = Hash.new; a[a]=b; a[c]=d; a[e]=f

Lisp also uses these.
as = [[:foo,22], [:bar,33], [:baz,44]]

=> [[:foo, 22], [:bar, 33], [:baz, 44]]

[snip]

That works fine, but I dislike the ugly syntax.

Maybe this is better.

a = [ :foo,22, :bar,33, :baz,44 ].to_assoc

a.assoc_set :bar, 99
a.assoc_set :yes, -1
a.assoc_del :baz

The prerequisite:

class Array
def to_assoc
f=nil ; partition{f=!f}.transpose
end
def assoc_set k, v
(pair = assoc(k)) ? self[ index(pair) ] = [k,v] : push( [k,v] )
end
def assoc_del k
(pair = assoc( k )) && slice!( index( pair ) )
end
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top