Why Does Hash Apparently Reorder Its Internal Representation AndOther Associated Ponderings

M

Martin DeMello

Is there a reason why you don't simply inheret from Hash, or is this
just another one of those There's More Than One Way To Do It scenarios?

I'm always a bit reluctant to inherit from core C-based classes,
though I couldn't really say why.

martin
 
D

dblack

Hi --

Martin DeMello said:
Martin DeMello wrote:

You can do this automatically (if you aren't already) by creating an
object that holds a hash and an array, defines []=, each and delete to
do the right thing, and delegates missing methods to the hash.

Hmmm... good idea. I've largely missed out on the
whole method_missing idiom. Sounds like a good use.

I'll try to look into it after I return from JPL next
week. However, if you'd like to throw down an example,
I might be able to work it in now...

Quick proof of concept:

require 'enumerator'

class OHash
include Enumerable

...

def method_missing(*args)
@h.send(*args)
end
end

Is there a reason why you don't simply inheret from Hash, or is this
just another one of those There's More Than One Way To Do It scenarios?

You mean TMTOTMTOWTDIS? :)


David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
----> SEE SPECIAL DEAL FOR RUBY/RAILS USERS GROUPS! <-----
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
http://www.manning.com/black => book, Ruby for Rails
http://www.rubycentral.org => Ruby Central, Inc.
 
A

Austin Ziegler

Sort of, but I liked the interface of Hash too much
to abandon it. So far, I am carrying along an array
of keys in order of creation for the one place that
I need it. Otherwise, I have the beauty of the stock
Hash interface at my disposal.

Indeed. There is a legitimate use case of having both fast random
access on an arbitrary object (in my case a Page object) and insertion
ordered processing when iterating. Speed is an issue for me, but I
think people are trying to prematurely optimize. Where PDF::Writer is
slow, it is not because of an ordered hash.

-austin
 
A

Austin Ziegler

Quick proof of concept:

I'm not sure that everything is as you said, but look at the OHash in
PDF::Writer (PDF::Writer::OHash) and it should implement most of what
you've got here.

It'd still be nice if there were a literal for this.

-austin
 
A

Austin Ziegler

I think what you are thinking of is a red black tree (or just a binary
tree, in general) and not a hash...

No, people are thinking of an associative array or an association list
as someone else called it in this thread. I'm not sure about r/b
trees, but binary trees are most *definitely* not what is wanted since
what is wanted is insertion order in most cases, not an arbitrary
sorted order.
Honestly, complaining that hashes aren't ordered is like complaining
that rand() doesn't return the same number every time. Pray tell, what
made you think it should be ordered? That was an honest question, by the
way! You know enough about programming to come here and decree that this
is very weird yet you didn't already know what a hash is or how one works.
Very strange...

PHP's "array" is an associative array allowing "hash"-like handling
with an ordered iteration. That's probably the source of 99% of the
reasons that people want it.

-austin
 
P

Phrogz

Austin said:
PHP's "array" is an associative array allowing "hash"-like handling
with an ordered iteration. That's probably the source of 99% of the
reasons that people want it.

And, for what it's worth, JavaScript's Object primitive is also an
associative array that retains insertion order for iterating keys,
while exhibiting performance characteristics of a hash.
 
J

Just Another Victim of the Ambient Morality

Austin Ziegler said:
No, people are thinking of an associative array or an association list
as someone else called it in this thread. I'm not sure about r/b
trees, but binary trees are most *definitely* not what is wanted since
what is wanted is insertion order in most cases, not an arbitrary
sorted order.

Ah, of course. I get the two (sorted order and insertion order) mixed
up since they are both rather popular in terms of what people
(surprisingly) expect of a hash...

PHP's "array" is an associative array allowing "hash"-like handling
with an ordered iteration. That's probably the source of 99% of the
reasons that people want it.

I see... This explains a lot, thank you...
 
T

Timothy Hunter

Austin said:
Which is ... questionable advice in Ruby. Since mixins are a *form* of
inheritance.

-austin
Interesting. And I've read on this list many times that inheriting from
a builtin class like Hash is A Bad Thing. Are you saying that I should
stop feeling guilty for deriving Magick::Image from Array? :)
 
I

Isak

Austin said:
No, people are thinking of an associative array or an association list
as someone else called it in this thread. I'm not sure about r/b
trees, but binary trees are most *definitely* not what is wanted since
what is wanted is insertion order in most cases, not an arbitrary
sorted order.


PHP's "array" is an associative array allowing "hash"-like handling
with an ordered iteration. That's probably the source of 99% of the
reasons that people want it.

-austin

I don't think 'ordered' is a good name for something sorted by
_insertion order_.

I am probably 'tainted' by my exposure to the Java API, but just like I
expect an ordered tree to be sorted by value, I'd expect an ordered map
to be sorted by key.

Insertion order maps are very useful too, and I'd love to see both added
to the Ruby stdlib. I(o?)Hash and OHash? Let's keep Hash as lean and
mean as possible.


Isak
 
A

Austin Ziegler

Interesting. And I've read on this list many times that inheriting from
a builtin class like Hash is A Bad Thing. Are you saying that I should
stop feeling guilty for deriving Magick::Image from Array? :)

Is derivation the best choice, or is delegation the best choice? I
don't know, personally. It's always a tough choice, because an Image
is just an array of bytes with special interpretation (but one could
say that of *any* object, I guess).

-austin
 
J

Joel VanderWerf

Austin said:
No, people are thinking of an associative array or an association list
as someone else called it in this thread. I'm not sure about r/b
trees, but binary trees are most *definitely* not what is wanted since
what is wanted is insertion order in most cases, not an arbitrary
sorted order.

Red-black trees are (like binary trees) sorted by keys, rather than
insertion order (but you could use the insertion ordinal as the key to
get insertion order behavior). But they don't scale as well as Hash.
 
H

Hal Fulton

Austin said:
PHP's "array" is an associative array allowing "hash"-like handling
with an ordered iteration. That's probably the source of 99% of the
reasons that people want it.

That probably speaks for me. I've never used PHP, but
an associative array is what I want.

I tend to think in terms of interface rather than
implementation. A hash has the interface I want,
*except* that it doesn't preserve order.


Hal
 
H

Hal Fulton

Phrogz said:
And, for what it's worth, JavaScript's Object primitive is also an
associative array that retains insertion order for iterating keys,
while exhibiting performance characteristics of a hash.

I find that interesting, since so many appear convinced that
a so-called ordered hash would be "too slow."

Do you know anything about the internal implementation?


Hal
 
H

Hal Fulton

Isak said:
I don't think 'ordered' is a good name for something sorted by
_insertion order_.

The fact that something can be "sorted" at all is only because it
has an order, i.e., is sequential.

In fact, I would say an "ordered hash" would be subject to being
sorted just as an array is. (We can sort an array because it has
an order -- first element, second element, and so on. We can "sort"
a regular hash, but we get an array back.)
I am probably 'tainted' by my exposure to the Java API, but just like I
expect an ordered tree to be sorted by value, I'd expect an ordered map
to be sorted by key.

Insertion order maps are very useful too, and I'd love to see both added
to the Ruby stdlib. I(o?)Hash and OHash? Let's keep Hash as lean and
mean as possible.

That seems reasonable to me.

I once proposed the name "Map" for such a class -- there was some reason
this wasn't considered good, but I can't recall why.


Hal
 
J

Just Another Victim of the Ambient Morality

And would eval %{ instance_of_hash[instance_of_fixnum] } really be so
evil? Perhaps that was a little obscure... What's wrong with ordered
access, using a numeric element reference as with Array, to Hash?
Excepting that the order can't be relied upon, but assume that it could.
Even simpler might have been to give an example:

h = {:a => 'a', :b => 'b'}
h[0]
=> 'a'

Similarly one might be able to treat an Array like a Hash as in eval %{
instance_of_array['instance_of_string_representation_of_a_fixnum'] },
such as with:

a = [ 1, 2 ]
a['0']
=> 1

There's no great call for the immediately above I would think, but if I
implemented one then I would implement the other also, simply for the
purposes of symmetry. I'm not even sure there is any need for either,
such that I may be trying to achieve the unnecessary?...

How would this even work?
For instance:


hash = { 0 => 'a', 1 => 'b', 'c' => 'd' }

# for me, this prints 'a', 'd', 'b', in that order...
hash.each { |k,e| puts e }

puts hash[1] # what will this print?


There's no way to tell whether the last line of this code is trying to
access the second element of the hash or the one that maps to the Fixnum
1...
 
P

Phrogz

Hal said:
I find that interesting, since so many appear convinced that
a so-called ordered hash would be "too slow."

Do you know anything about the internal implementation?

I don't. I wouldn't call most JavaScript engines 'fast', but I suspect
that has to do with aspects other than the native Object type.

Without having tried it, I would think that preserving insertion order
would be a (small) memory size addition to each hash (as you've
mentioned before), a *tiny* slowdown in the speed to store a new entry,
and no performance change in accessing an entry.

Anyhow, because I'm too lazy to do so, you (or someone else) can get
source code for the SpiderMonkey[1] and Rhino[2] and check out how
insertion order is preserved on Object.

[1] http://www.mozilla.org/js/spidermonkey/ - the C-based JS engine
[2] http://www.mozilla.org/rhino/ - the Java-based JS engine
 
P

Phrogz

Phrogz said:
And, for what it's worth, JavaScript's Object primitive is also an
associative array that retains insertion order for iterating keys,
while exhibiting performance characteristics of a hash.

Clarification: apparently the ECMAScript spec[1] don't actually require
that the insertion order be preserved; it's just that the major JS
engine implementations happen to do so. On the order of properties, the
spec actually says:

"Step 5: Get name of the next property [...] that doesn't have the
DontEnum attribute.
[...]
The mechanics of enumerating the properties (step 5 in the first
algorithm, step 6 in the second) is implementation dependent. The order
of enumeration is defined by the object. Properties of the object being
enumerated may be deleted during enumeration. If a property that has
not yet been visited during enumeration is deleted, then it will not be
visited. If new properties are added to the object being enumerated
during enumeration, the newly added properties are not guaranteed to be
visited in the active enumeration."

[1]
http://www.ecma-international.org/publications/files/ecma-st/ECMA-262.pdf
 
I

Isak

Hal said:
The fact that something can be "sorted" at all is only because it
has an order, i.e., is sequential.

In fact, I would say an "ordered hash" would be subject to being
sorted just as an array is. (We can sort an array because it has
an order -- first element, second element, and so on. We can "sort"
a regular hash, but we get an array back.)

You're right. Ordered merely means that elements have a position, I had
it confused with 'sorted'.

That seems reasonable to me.

I once proposed the name "Map" for such a class -- there was some reason
this wasn't considered good, but I can't recall why.

Yup, associated arrays are maps, not hashes.

After reading the nutter's on google groups (hate the broken
nntp<->mailing list concept), I realize that calling them hashes isn't
appropriate. Once you change their caracteristics they aren't really
hashes any more.

SortedMap (or TreeMap) and (Insertion)OrderedMap (or perhaps
LinkedHashMap; doubly linked list + hash) are probably better names..?


Isak
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top