A use case for an ordered hash

C

Chad Perrin

I'm in the habit of automagically writing...

hash.keys.sort

. . but if you want them in a *specific* order (other than that
provided by .sort), you'll have to write your own method, of course.
 
H

Hal Fulton

Phillip said:
Unfortunately for you some of us do things that involve searching
through 30,000 sales records to pull out Joe Bob's statistics.
Sometimes you just don't want to do that on the production database,
it just uses far too many resources. Using an ordered hash for one
person's benefit would kill the speed and jack up memory usage.

Your point is well taken.

However, it remains to be seen whether it would kill the speed.
Personally I doubt it would be significant, but I've seen no
numbers one way or the other.

Memory I would consider a more serious issue. Every pair in the
hash would occupy an extra N bytes (addmitedly a low value of N,
like 4). In the case of 30,000 pairs, that's an additional (say)
120K of RAM used.

As for "one person's benefit" -- it's true that the people
wanting this functionality are in the minority. But we are
far more than one person. :)
I know an ordered associative array (which I think is a better
description of the functionality) would make some things very easy,
but replacing the Hash is not the way to go.

That is definitely a better description of the finctionality.

I would even say that "associative array" alone would be a sufficient
term, since "array" implies order. (But I may be wrong there.)

I'd be happy with a separate class for that functionality *IF* there
were a convenient notation for literals. (For example, the combination
of square brackets and arrows that I mentioned previously, which Ruby
currently interprets as an array with a single element which is a hash.)

Without the literal notation, it would be to me just another kludge.


Hal
 
T

Trans

Hal said:
Your point is well taken.

However, it remains to be seen whether it would kill the speed.
Personally I doubt it would be significant, but I've seen no
numbers one way or the other.

Memory I would consider a more serious issue. Every pair in the
hash would occupy an extra N bytes (addmitedly a low value of N,
like 4). In the case of 30,000 pairs, that's an additional (say)
120K of RAM used.

As for "one person's benefit" -- it's true that the people
wanting this functionality are in the minority. But we are
far more than one person. :)


That is definitely a better description of the finctionality.

I would even say that "associative array" alone would be a sufficient
term, since "array" implies order. (But I may be wrong there.)

I'd be happy with a separate class for that functionality *IF* there
were a convenient notation for literals. (For example, the combination
of square brackets and arrows that I mentioned previously, which Ruby
currently interprets as an array with a single element which is a hash.)

Without the literal notation, it would be to me just another kludge.

+1 for Hal

Also, while not a full fledged class in it's own right, you might find
Facets' association.rb useful. Ex.

c = [ :a >> 1, :b >> 2 ]
c.each { |k,v| puts "#{k} associated with #{v} }

T.
 
P

Phillip Hutchings

I'd be happy with a separate class for that functionality *IF* there
were a convenient notation for literals. (For example, the combination
of square brackets and arrows that I mentioned previously, which Ruby
currently interprets as an array with a single element which is a hash.)

Without the literal notation, it would be to me just another kludge.

I would support this as there are uses for it.
 
H

Hal Fulton

Trans said:
Also, while not a full fledged class in it's own right, you might find
Facets' association.rb useful. Ex.

c = [ :a >> 1, :b >> 2 ]
c.each { |k,v| puts "#{k} associated with #{v} }

I've seen that, and it's kind of cool.

But you can't use an object as a key if it already defines >> for
something else (e.g. Fixnum).


Hal
 
L

Logan Capaldo

Trans said:
Also, while not a full fledged class in it's own right, you might
find
Facets' association.rb useful. Ex.
c = [ :a >> 1, :b >> 2 ]
c.each { |k,v| puts "#{k} associated with #{v} }

I've seen that, and it's kind of cool.

But you can't use an object as a key if it already defines >> for
something else (e.g. Fixnum).


Hal

This is probably too verbose, even for a stop gap solution but:

class AssocList
def initialize(first, second)
@array = [ [ first.keys.first, first.values.first ],
[ second.keys.first, second.values.first ] ]
end

def <(other)
@array << [ other.keys.first, other.values.first ]
self
end
end

class Hash
def <(other)
AssocList.new(self, other)
end
end


{:a=>1} < {:b=>2} < {:c=>3}
=> #<AssocList:0x328564 @array=[[:a, 1], [:b, 2], [:c, 3]]>

I was going for f:)a => 1 < :b => 2) but the operator precedence
isn't right :(
 
A

Austin Ziegler

I'm in the habit of automagically writing...

hash.keys.sort

or sometimes

hash.keys.sort_by{|k| hash[k]}.each{|k| value=hash[k]; ... }

In the case that Hal is talking about, and in the case where
PDF::Writer uses an ordered hash, this would be inappropriate.
Insertion-ordered associations are generally better. I will probably
be modifying PDF::Writer a bit to make it clear that it's using an
ordered association, not an ordered "hash", but it would be useful to
have an ordered association. Maybe we *could* take advantage of a
literal for this:

ordered_assoc = ( 1 => 2, 2 => 3, 'a' => 5 )

Maybe. For Ruby 2.0.

-austin
 
H

Hal Fulton

Austin said:
In the case that Hal is talking about, and in the case where
PDF::Writer uses an ordered hash, this would be inappropriate.
Insertion-ordered associations are generally better. I will probably
be modifying PDF::Writer a bit to make it clear that it's using an
ordered association, not an ordered "hash", but it would be useful to
have an ordered association. Maybe we *could* take advantage of a
literal for this:

ordered_assoc = ( 1 => 2, 2 => 3, 'a' => 5 )

Maybe. For Ruby 2.0.

Either (=>) or [=>] would be fine with me. I'd slightly prefer
the latter.

But the big question is, what would the class be called? I dislike
long names such as Association or AssociativeArray or even
OrderedHash. (I suppose the last might be best of those.)

I once suggested Map -- a nice short name -- but there was some
reason this wasn't acceptable.

I'd even suggest (gasp) letting it inherit from Hash.


Hal
 
M

Martin DeMello

Either (=>) or [=>] would be fine with me. I'd slightly prefer
the latter.

Or we could use another %?{ } constructor, which are, after all, being
reserved for future expansion.
But the big question is, what would the class be called? I dislike
long names such as Association or AssociativeArray or even
OrderedHash. (I suppose the last might be best of those.)

OHash or Assoc or even Dict (though the latter at least implies an O(1) lookup).

Also, it would probably include http://www.rcrchive.net/rcr/show/306
along the way :)

m.
 
A

Austin Ziegler

OHash or Assoc or even Dict (though the latter at least implies an O(1) lookup).

Wouldn't it be the same as Hash lookup? It's still got every other
characteristic of a Hash; it just guarantees an order that defaults to
insertion order.

-austin
 
M

Martin DeMello

Wouldn't it be the same as Hash lookup? It's still got every other
characteristic of a Hash; it just guarantees an order that defaults to
insertion order.

True, I suppose you could do it with a doubly linked list overlaid on
a hash. Was thinking of sorted order.

martin
 
G

gwtmp01

True, I suppose you could do it with a doubly linked list overlaid on
a hash. Was thinking of sorted order.

Can't you just maintain an internal array representing the key insertion
order? Something like:

class OHash < Hash
# ...
def each
keyseq.each { |k| yield k, self[k] }
end
end

Where keyseq is just an array that is updated whenever keys are
added/deleted?


Gary Wright
 
A

Austin Ziegler

Can't you just maintain an internal array representing the key insertion
order? Something like:

class OHash < Hash
# ...
def each
keyseq.each { |k| yield k, self[k] }
end
end

Where keyseq is just an array that is updated whenever keys are
added/deleted?

It's a little more complex than that, but that's pretty much what all
existing implementations do. We're talking about wanting a way of
having an ordered associative list (that defaults to insert ordering,
but can be changed to a sorted ordering, perhaps) in the Ruby core
language *with a literal constructor*, because there is value for it.

-austin
 
G

gwtmp01

We're talking about wanting a way of
having an ordered associative list (that defaults to insert ordering,
but can be changed to a sorted ordering, perhaps) in the Ruby core
language *with a literal constructor*, because there is value for it.

Yep, I understand the need for the literal constructor. It is a bit
awkward to do:

OHash.new [ [1,2], [3,4], [4,5] ]

or something similar. Random ideas:

%o{ 1 => 2, 3 => 4 }
%o{ 1,2,3,4 }
{1,2,3,4}! # the ! indicates that order is important!
!{1,2,3,4}
%!{1,2,3,4}

Gary Wright
 
B

Bil Kleb

Hal said:
There have been numerous occasions when I wanted an
ordered hash, but usually I can't remember to write
them down.

Here's my latest one[1]:

I had a string with a bunch of tagged "tolerance fields",
e.g., >>1+/-10% 'tag'<< (a mini DSL?). Based on some sort
of distribution function, e.g., uniform or Gaussian,
I generated an array of samples for each tagged field.
A regular hash was a perfect fit,

samples[tag] = [ sample1, sample2, .. ]

*until* another requirement came up: to present the tags
in order of appearance.

Later,
 
F

Francis Cianfrocca

Here's my latest one[1]:

I had a string with a bunch of tagged "tolerance fields",
e.g., >>1+/-10% 'tag'<< (a mini DSL?). Based on some sort
of distribution function, e.g., uniform or Gaussian,
I generated an array of samples for each tagged field.
A regular hash was a perfect fit,

samples[tag] = [ sample1, sample2, .. ]

*until* another requirement came up: to present the tags
in order of appearance.


I've had to deal with this (interestingly, in something that was
*intended* as a DSL). Just add another field to the structure you
store in the hash and have it increment a global sequence number when
you create it. A little clunky but later you can say:

hash.values.sort {|a,b| a.seq <=> b.seq}.each {|v|
# whatever
}
 
H

Hal Fulton

Austin said:
Wouldn't it be the same as Hash lookup? It's still got every other
characteristic of a Hash; it just guarantees an order that defaults to
insertion order.

OHash lookup should be as fast as Hash lookup.

Insertion would be slightly slower, and iteration (and to_a) would
be slowed down more.


Hal
 
H

Hal Fulton

Bil said:
Here's my latest one[1]:

I had a string with a bunch of tagged "tolerance fields",
e.g., >>1+/-10% 'tag'<< (a mini DSL?). Based on some sort
of distribution function, e.g., uniform or Gaussian,
I generated an array of samples for each tagged field.
A regular hash was a perfect fit,

samples[tag] = [ sample1, sample2, .. ]

*until* another requirement came up: to present the tags
in order of appearance.

Later,
--
Bil
http://fun3d.larc.nasa.gov

[1] Part of a CEV (Crew Exploration Vehicle) task

See, guys? An ordered hash in Ruby would help support
human space exploration! :)

/me ponders and RCR and a Dyson sphere design


Hal
 
H

Hal Fulton

Christian said:
Fun history fact:

#524<5|>lilith:~/src$ grep Dict ruby-0.49/dict.c
...
C_Dict = rb_define_class("Dict", C_Object);
rb_name_class(C_Dict, rb_intern("Hash")); /* alias */
...

Now, that's funny.

At least that seems to show that the name Dict isn't outside
the realm of possibility.

Although it adds to the bad joke potential. "I Dict around
and made a Hash of everything. Array!!"

I'm trying to remember: What is wrong with the name Map?



Hal
 
W

William James

Phillip said:
I think you mean "wouldn't care"?

Anyway: I'm happy that lookups are fast. But 99% of the time,
my hashes are small enough that the time savings is probably
very small in terms of program execution time.

Unfortunately for you some of us do things that involve searching
through 30,000 sales records to pull out Joe Bob's statistics.
Sometimes you just don't want to do that on the production database,
it just uses far too many resources. Using an ordered hash for one
person's benefit would kill the speed and jack up memory usage.

I know an ordered associative array (which I think is a better
description of the functionality) would make some things very easy,
but replacing the Hash is not the way to go.

Something like this would probably sit better, and is still shorter
than the PHP equiv.:

# a = [['key 1','1'],['key 2', '2'],['key 3',3]].toAssoc

# a['key 1']
=> 1

# p a
=> [['key 1','1'],['key 2', '2'],['key 3',3]]

# a = ['key 1','1', 'key 2', '2', 'key 3',3].to_assoc
=> [["key 1", "1"], ["key 2", "2"], ["key 3", 3]]
# a.assoc('key 2')
=> ["key 2", "2"]
# a.set 'key 2', 99
=> ["key 2", 99]
# a
=> [["key 1", "1"], ["key 2", 99], ["key 3", 3]]

class Array
def to_assoc
f=nil ; partition{f=!f}.transpose
end
def set k, v
(pair = assoc(k)) ? self[ index(pair) ] = [k,v] : push( [k,v] )
end
def rm k
(pair = assoc( k )) && slice!( index( pair ) )
end
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top