Proposal: Array#to_h, to simplify hash generation

G

Gavin Sinclair

Hi -talk,

Ruby has wonderful support for chewing and spitting arrays. For
instance, it's easy to produce an array from any Enumerable using
#map. With hashes, however, it's a bit more cumbersome.

For example, the following method is typical of my code:

# return { filename -> size }
def get_local_gz_files
files = {}
Dir["*.gz"].each do |filename|
files[filename] = File.stat(filename).size
end
files
end

The pattern is: create an empty hash, populate it, and return it. Now
Ruby is a wonderfully expressive and terse language. Accordingly, the
two lines devoted to initialising and returning the hash in the above
code seem wasted.

If Ruby had Array#to_h, then I could rewrite it as:

# return { filename -> size }
def get_local_gz_files
Dir["*.gz"].map { |filename|
[ filename, File.stat(filename).size ]
}.to_h
end


The proposed implementation of Array#to_h is per the following code:

class Array
def to_h
hash = {}
self.each do |elt|
raise TypeError unless elt.is_a? Array
key, value = elt[0..1]
hash[key] = value
end
hash
end
end


For the final justification, note that this is the logical reverse of
Hash#to_a:

h = {:x => 5, :y => 10, :z => -1 }
a = h.to_a # => [[:z, -1], [:x, 5], [:y, 10]]

# And now, for my next trick...
a.to_h == h # => true (gosh, that actually worked)


Thoughts?

Gavin
 
T

ts

G> def get_local_gz_files
G> files = {}
G> Dir["*.gz"].each do |filename|
G> files[filename] = File.stat(filename).size
G> end
G> files
G> end

svg% cat b.rb
#!/usr/bin/ruby
def get_local_c_files
Hash[*Dir["*.c"].map do |filename|
[filename, File.stat(filename).size]
end.flatten]
end
p get_local_c_files
svg%

svg% b.rb
{"st.c"=>10714, "range.c"=>10706, "enum.c"=>11250, "util.c"=>22676,
"sprintf.c"=>12332, "re.c"=>38877, "version.c"=>1094, "random.c"=>6485,
"object.c"=>34530, "class.c"=>17870, "main.c"=>988, "compar.c"=>2720,
"array.c"=>43170, "process.c"=>30792, "io.c"=>82748, "dln.c"=>39614,
"variable.c"=>35056, "time.c"=>32796, "string.c"=>69845, "regex.c"=>123352,
"numeric.c"=>36979, "inits.c"=>1765, "dmyext.c"=>20, "dir.c"=>21761,
"signal.c"=>13318, "pack.c"=>39965, "math.c"=>6199, "hash.c"=>39087,
"error.c"=>25114, "parse.c"=>348857, "ruby.c"=>22725, "marshal.c"=>27620,
"lex.c"=>4480, "bignum.c"=>34051, "struct.c"=>15141, "prec.c"=>1677,
"gc.c"=>34935, "file.c"=>58392, "eval.c"=>219839}
svg%


Guy Decoux
 
B

Brian Candler

If Ruby had Array#to_h, then I could rewrite it as:

It does, almost:

irb(main):001:0> a = ["cat","one","dog","two"]
=> ["cat", "one", "dog", "two"]
irb(main):002:0> Hash[*a]
=> {"cat"=>"one", "dog"=>"two"}

I don't remember seeing an exact inverse of Hash#to_a though, i.e. one which
converts [[a,b],[c,d]] to {a=>b, c=>d}

You can always 'flatten' your array, as long as the elements of the hash
you're creating aren't themselves arrays.

Regards,

Brian.
 
Y

Yukihiro Matsumoto

Hi,

In message "Proposal: Array#to_h, to simplify hash generation"

|If Ruby had Array#to_h, then I could rewrite it as:
|
| # return { filename -> size }
| def get_local_gz_files
| Dir["*.gz"].map { |filename|
| [ filename, File.stat(filename).size ]
| }.to_h
| end

It has been proposed several times. The issues are

* whether the name "to_h" is a good name or not. somebody came up
with the name "hashify". I'm not excited by both names.

* what if the original array is not an assoc array (array of arrays
of two elements). raise error? ignore?

matz.
 
G

Gavin Sinclair

In message "Proposal: Array#to_h, to simplify hash generation"
on 03/07/19, Gavin Sinclair <[email protected]> writes:
|If Ruby had Array#to_h, then I could rewrite it as:
|
| # return { filename -> size }
| def get_local_gz_files
| Dir["*.gz"].map { |filename|
| [ filename, File.stat(filename).size ]
| }.to_h
| end
It has been proposed several times.

I thought it sounded familiar, but didn't see an RCR.
The issues are
* whether the name "to_h" is a good name or not. somebody came up
with the name "hashify". I'm not excited by both names.

#to_h sounds good to me - we already have to_s, to_a, to_i, etc. It's
just too sweet that Hash#to_a and Array#to_h should be the inverse of
each other.

What don't you like about #to_h?

#to_hash is fine by me too, but I don't really know the nuances of
to_s/to_str, to_a/to_ary, ...
* what if the original array is not an assoc array (array of arrays
of two elements). raise error? ignore?

Raise error. #to_h is clearly a method to be used with care. People
are unlikely to call it on random objects. Of course, [1,2,3,4].to_h
could be the equivalent to Hash[1,2,3,4]. But then there's the corner
case: [ [1,2], "x", [7,8], "g" ].to_h.

I think I would insist on the input being an assoc array.

Gavin
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: Proposal: Array#to_h, to simplify hash generation"

|I thought it sounded familiar, but didn't see an RCR.

I don't remember the RCR number. Search for "hashify".

|What don't you like about #to_h?

I just didn't feel we had consensus. Besides, "to_h" you've proposed
work for arrays with specific structure (assoc like).

|#to_hash is fine by me too, but I don't really know the nuances of
|to_s/to_str, to_a/to_ary, ...

Longer versions are for implicit conversion. An object that has
"to_str" works like a string if it's given as an argument.

Note we have "to_hash" already. But this would not be the reason for
"to_h". We have "to_io" without the shorter version, for example.

|> * what if the original array is not an assoc array (array of arrays
|> of two elements). raise error? ignore?
|
|Raise error. #to_h is clearly a method to be used with care. People
|are unlikely to call it on random objects. Of course, [1,2,3,4].to_h
|could be the equivalent to Hash[1,2,3,4]. But then there's the corner
|case: [ [1,2], "x", [7,8], "g" ].to_h.
|
|I think I would insist on the input being an assoc array.

TypeError? or ArgumentError?

I just remembered that I thought Hash[ary] might be the better
solution. I'm not sure why I didn't implement it. I have very loose
memory.

matz.
 
G

Gavin Sinclair

In message "Re: Proposal: Array#to_h, to simplify hash generation"
on 03/07/20, Gavin Sinclair <[email protected]> writes:
|I thought it sounded familiar, but didn't see an RCR.
I don't remember the RCR number. Search for "hashify".

It's #12. Interesting: I like the #hashify idea better than my proposal.

My original code could be written

# return { filename -> size }
def get_local_gz_files
Dir["*.gz"].to_hash { |filename| File.stat(filename).size }
end

That does away with the intermediate assoc array, and is overall very
elegant. Best of all, it can be used with any Enumerable type, and it
doesn't have any requirement on the structure of the receiver.

module Enumerable
def to_hash
result = {}
each do |elt|
result[elt] = yield(elt)
end
result
end
end


That is capturing the very idiom I have repeated so many times.

Alternatives to #to_hash are:
hashify (the original and the worst :)
map_hash
hash_map (it is, after all, mapping a collection into a hash)

I think I like "map_hash" the best.

["cat", "dog", "mouse"].map { |s| s.length }
# -> [3, 3, 5]

["cat", "dog", "mouse"].map_hash { |s| s.length }
# -> {"cat"=>3, "mouse"=>5, "dog"=>3}

Gavin
 
K

Kurt M. Dresner

I just didn't feel we had consensus. Besides, "to_h" you've proposed
work for arrays with specific structure (assoc like).

Far be it from me to say anything of much value, but I definitely think
that an instance function of Class Array should have a defined behavior
for all Arrays. Is there any argument to the contrary?

-Kurt
 
K

Kurt M. Dresner

The main problem here is that Array#to_s calls join with the default
field separator, which for some reason is "". To me, this isn't
intuitive. Is there some historical reason why this behavior exists?
Even less intuitive to me is Hash#to_s, because the way the conversion
is done you lose any concept it was a hash.

It's intuitive because it's the opposite of taking a string and putting
each character as an element of an array.

"foobar" -> ['f','o','o','b','a','r'] -> "foobar"

If you want a different .to_s you can just join with something else.
It's pretty easy to just do foobararray.join(',') if you want
"f,o,o,b,a,r", and additionally it's a little easier to read.

-Kurt
 
M

Martin DeMello

Gavin Sinclair said:
Raise error. #to_h is clearly a method to be used with care. People
are unlikely to call it on random objects. Of course, [1,2,3,4].to_h
could be the equivalent to Hash[1,2,3,4]. But then there's the corner
case: [ [1,2], "x", [7,8], "g" ].to_h.

I think I would insist on the input being an assoc array.

And we already have Array methods that assume an associative array.

m.
 
T

Tim Hunter

Far be it from me to say anything of much value, but I definitely think
that an instance function of Class Array should have a defined behavior
for all Arrays. Is there any argument to the contrary?

-Kurt

pack, assoc, and rassoc
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: Proposal: Array#to_h, to simplify hash generation"

|> I think I would insist on the input being an assoc array.
|
|And we already have Array methods that assume an associative array.

I think you mean assoc and rassoc. But they are look-up methods. No
harm would happen for non assoc input for them. I feel like Hash
creation is little bit different.

matz.
 
M

Martin DeMello

Yukihiro Matsumoto said:
In message "Re: Proposal: Array#to_h, to simplify hash generation"
|
|And we already have Array methods that assume an associative array.

I think you mean assoc and rassoc. But they are look-up methods. No
harm would happen for non assoc input for them. I feel like Hash
creation is little bit different.

Actually, I've always felt those were out of place in Array too. And if
they were factored out into an AssocArray mixin, we could conveniently
put hashify there.

martin
 
J

Jason Creighton

Hi -talk,

Ruby has wonderful support for chewing and spitting arrays. For
instance, it's easy to produce an array from any Enumerable using
#map. With hashes, however, it's a bit more cumbersome.

For example, the following method is typical of my code:

# return { filename -> size }
def get_local_gz_files
files = {}
Dir["*.gz"].each do |filename|
files[filename] = File.stat(filename).size
end
files
end

One option, in this case, is to hijack the Hash#new block form:

files = Hash.new { |hash, key| hash[key] = File.stat(key).size }

files is now a "magic" hash that will stat any file that that's used as a key.
If you're not just doing random access, you could fill it like so:

Dir["*.gz"].each { |f| files[f] }

The block will be called to return a value instead of nil when a key is missing:
We assign to the hash to save that value as well. You can do all *sorts* of
weird stuff using this feature:
e = Hash.new { |h, k| eval(k) } => {}
e["Time.now"]
=> Tue Jul 22 13:32:56 MDT 2003
e["Time.now"] => Tue Jul 22 13:33:03 MDT 2003
e = Hash.new { |h, k| h[k] = eval(k) } => {}
e["Time.now"]
=> Tue Jul 22 13:35:09 MDT 2003
=> Tue Jul 22 13:35:09 MDT 2003

Jason Creighton
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top