Get the real object in a Hash key

  • Thread starter Iñaki Baz Castillo
  • Start date
I

Iñaki Baz Castillo

Hi, let's suppose this simple code in which I add internal attributes
to String instances and use such String objects as Hash keys:

------------------------------------------------
h =3D {}

k1 =3D "aaa"
k1.instance_variable_set :mad:name, "Aaa-011"

k2 =3D "bbb"
k2.instance_variable_set :mad:name, "Bbb-268"

h[k1] =3D "Hello"
h[k2] =3D "Bye"
------------------------------------------------

Now I want to lookup in the hash the element whose key matches "aaa"
(using String#eql?):

h["aaa"]
=3D> "Hello"

But I don't want just to get the key associated value ("Hello"), but
also the key object itself (not the "aaa" I passed but k1 object) so I
can check its @name attribute. And I need it in a very efficient way.

However I've realized right now that it's not possible. The hash key
doesn't store the given key as a reference to such object:

-------------------------------------------
puts k1.object_id
=3D> 18140060

puts k2.object_id
=3D> 16245980

h.keys.each {|k| puts k.object_id}
=3D> 16182220
=3D> 20359940
------------------------------------------.


I've realized of it while writting this mail, so forget the previous
question. Now I have another question:

--------------------
myobject =3D MyCustomClass.new

@h =3D {}

@h[myobject] =3D "lalalala"
--------------------

In this case, will Ruby GC delete myobject? or will it remain alive as
it has been used as a key of a hash (which is not GC'd in a supposed
code)?


Thanks a lot.
--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
R

Robert Klemme

However I've realized right now that it's not possible. The hash key
doesn't store the given key as a reference to such object:

This is a special optimization for unfrozen Strings as Hash keys.
Now I have another question:

--------------------
myobject =3D MyCustomClass.new

@h =3D {}

@h[myobject] =3D "lalalala"
--------------------

In this case, will Ruby GC delete myobject? or will it remain alive as
it has been used as a key of a hash (which is not GC'd in a supposed
code)?

The key stays alive at least as long as the Hash instance.

Cheers

robert


--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
I

Iñaki Baz Castillo

2011/4/15 Robert Klemme said:
This is a special optimization for unfrozen Strings as Hash keys.

Oopss, if I freeze the string before inserting it as Hash key it
doesn't occur (I get some object_id) :)
Same occurs if I use a class inheriting from String. Good to know!


Then I come back to my original question:

----------------
k1 =3D "aaa"
k1.freeze

h =3D {}

h[k1] =3D "HELLO"
----------------

Given a string "aaa", how can I get the object k1 from the hash? (I
mean without comparing String#eql? each key with the string "aaa")
Unfortunatelly I think Hash class does not provide a method for it.

Thanks a lot.

--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
R

Robert Klemme

2011/4/15 Robert Klemme said:
This is a special optimization for unfrozen Strings as Hash keys.

Oopss, if I freeze the string before inserting it as Hash key it
doesn't occur (I get some object_id) :)
Same occurs if I use a class inheriting from String. Good to know!


Then I come back to my original question:

----------------
k1 =3D "aaa"
k1.freeze

h =3D {}

h[k1] =3D "HELLO"
----------------

Given a string "aaa", how can I get the object k1 from the hash? (I
mean without comparing String#eql? each key with the string "aaa")
Unfortunatelly I think Hash class does not provide a method for it.

Exactly. And you don't want to do it. A Hash is an associative
storage which associates the value with your key. If you need to
stuff in more information - you need to add it to the value and not
the key. The simplest would be to define a Struct, e.g.

Value =3D Struct.new :name, :val

Then put this into the Hash as values

h[k1] =3D Value["a name", "HELLO"]

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
I

Iñaki Baz Castillo

2011/4/15 Robert Klemme said:
Given a string "aaa", how can I get the object k1 from the hash? (I
mean without comparing String#eql? each key with the string "aaa")
Unfortunatelly I think Hash class does not provide a method for it.

Exactly. =C2=A0And you don't want to do it. =C2=A0A Hash is an associativ= e
storage which associates the value with your key. =C2=A0If you need to
stuff in more information - you need to add it to the value and not
the key. =C2=A0The simplest would be to define a Struct, e.g.

Value =3D Struct.new :name, :val

Then put this into the Hash as values

h[k1] =3D Value["a name", "HELLO"]


Yes, that seems a good solution.

Thanks.

--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
K

Kevin Mahler

Robert K. wrote in post #993000:
Exactly. And you don't want to do it. A Hash is an associative
storage which associates the value with your key. If you need to
stuff in more information - you need to add it to the value and not
the key....

Well you may want to do it -- that's why Hash#assoc exists. Hash keys
can be objects of any sort, and there are use cases for storing
nonsimple keys.

The reason there's no constant-time equivalent of Hash#assoc is
because hashing, by its very nature, cannot be reversed. There's no
method for it because one cannot possibly exist. It's not because one
should never be interested in the key object. Hash#assoc is there for
a reason.

Lispers will recognize assoc as relating to the Lisp function of the
same name which has exactly that use case: key/value pairs where the
key and the value matter as objects in their own right, apart from the
the hashing function result.
 
R

Robert Klemme

Robert K. wrote in post #993000:

Well you may want to do it -- that's why Hash#assoc exists. Hash keys
can be objects of any sort, and there are use cases for storing
nonsimple keys.

I did not argue against complex keys. The issue is with *mutable*
keys. And since adding data to the key object is also associating
(which is done with the value as well) the most natural way would be
to place that additional information there. Not to mention the
questionable approach to stuff something into what is usually
considered a simple value (String).

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
K

Kevin Mahler

Robert K. wrote in post #993026:
I did not argue against complex keys. The issue is with *mutable*
keys. And since adding data to the key object is also associating
(which is done with the value as well) the most natural way would be
to place that additional information there. Not to mention the
questionable approach to stuff something into what is usually
considered a simple value (String).

You said "And you don't want to do it." In fact doing it has its uses.
Mutable keys or not is totally irrelevant, especially when the data
was there before the hash was introduced, as in the original example.

*Of course* making repeated calls to Hash#assoc in order to update
stuff in the key would be stupid. That goes without saying. What would
the purpose of the hash be? If that was your only point then we agree,
although it was a vacuous point.

Also do you realize that an example tends to stand for something which
is not literally the example itself? He has a key. It contains some
data. It's not necessarily true that he should duplicate that data in
the mapped-to values. Mutable or not is beside the point.

I notice this phenomenon a lot: undergeneralization. The String stands
for something. It's his key data. If it were a simple value then the
example wouldn't make sense in the first place. Gee, thanks for
telling us that we shouldn't stuff random shit into a simple value and
then use that as a hash key, whereupon we can't look up stuff in the
hash directly but must use Hash#assoc instead. Again, if that was your
point then we agree, albeit in the obvious and nearly information-free
sense. I'm sure we would also agree that cats would be a poor building
material for helicopters.
 
R

Robert Klemme

Robert K. wrote in post #993026:

You said "And you don't want to do it." In fact doing it has its uses.

Please do not quote out of context: that was referring to the example
with a String instance used as a Hash key and stuffed with additional
instance variables.
Mutable keys or not is totally irrelevant, especially when the data
was there before the hash was introduced, as in the original example.

The topic of key mutability is especially relevant for keys stored in a
Hash. Of course mutations before storing are irrelevant. But if you
change fields of an object which are part of the key (i.e. included in
#hash and #eql?) you need to rehash in order for the Hash to do lookups
properly.

Basically you can have two types of fields in an object used as a Hash key:

1. key properties (used in #hash and #eql?)

2. non key properties (neither used in #hash nor #eql?)

Type 1 properties need of course be part of the key and of course you
need to know them to make any lookups.

Type 2 properties are irrelevant for lookups you can merely consider
them being "associated with the key". This leads to a situation where
you have one instance (per key) with the associated data and potentially
many other instances which might or might not have these properties. If
they are actually defined to be properties (either through attr_accessor
or manually) you end up carrying around baggage which is not used most
of the time.

Type 2 properties should rather go into another instance which should be
stored as value. This also makes it much clearer what's going on.
Splitting up associated data into properties of key objects and an
instance stored in the Hash doesn't really make sense. Then we could as
well store everything in the key instance and don't need the Hash at all.
*Of course* making repeated calls to Hash#assoc in order to update
stuff in the key would be stupid. That goes without saying. What would
the purpose of the hash be? If that was your only point then we agree,
although it was a vacuous point.

Why is the point vacuous? Apparently OP has / had some questions about
these topics and what may look obvious to you might not to others.
Also do you realize that an example tends to stand for something which
is not literally the example itself? He has a key. It contains some
data. It's not necessarily true that he should duplicate that data in
the mapped-to values. Mutable or not is beside the point.

Well, but we cannot read other people's minds. We have to take the
example at face value. Stuffing additional data into a String is not a
good idea and I am not sure whether that occurred to OP or not. So this
might really be what he is attempting. In this case "stuffing the data
into the key" was part of the example and it was nowhere expressed that
this is a fact that could not be changed.

And btw, I did not recommend to duplicate that data in the mapped-to
value. I specifically suggested to place it there exclusively.
I notice this phenomenon a lot: undergeneralization. The String stands
for something. It's his key data. If it were a simple value then the
example wouldn't make sense in the first place. Gee, thanks for
telling us that we shouldn't stuff random shit into a simple value and
then use that as a hash key, whereupon we can't look up stuff in the
hash directly but must use Hash#assoc instead. Again, if that was your
point then we agree, albeit in the obvious and nearly information-free
sense. I'm sure we would also agree that cats would be a poor building
material for helicopters.

As is rudeness for a community.

robert
 
I

Iñaki Baz Castillo

2011/4/15 Kevin Mahler said:
He has a key. It contains some
data. It's not necessarily true that he should duplicate that data in
the mapped-to values.

To clarify, my exact case is the following:

I've coded a parser for SIP (similar to HTTP). The parser generates a
Request object which inherits from Hash, and each SIP request header
(i.e. "From: sip:[email protected]") becomes an entry of the hash
(Request object) as follows:

- The key is "FROM" (capitalized).
- The value is an Array of strings (a s header can have multiple values).

I need to store the key capitalized for fastest lookup, but I also
want to store the original header name (which could be "from", "From",
"frOM" and so).
So my parser adds an instance variable @real_name within the header
name string ("FROM").

When I do the lookup of a header in the Request object, I would like
also to retrieve the key's @real_name, but I've already understood
that this is only possible if taint the key string before inserting it
in the hash and use Hash#assoc. This solution is not good for
performance.

The solution suggested by Robert is adding such information (the
header original name) as a field in the hash entry value, so instead
of having:

request["FROM"]
=3D> [ "sip:[email protected] ]

I would end with something like:

request["FROM"]
=3D> Struct ( "From", [ "sip:[email protected] ] )

The problem this last suggestion introduces is that it breaks the
existing API and makes more complext for a developer to handle the
Request class (which should be as easy as handling a Hash).


Thanks to both for your comments.

--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
J

jake kaiden

hi I=C3=B1aki,

i may well not understand exactly what you need to do, and so be =

oversimplifying, but could you do something similar to what Robert =

suggested (but a bit simpler,) and just use an array as each key's =

value? the header's original name could be added as the first element =

of the array - something like this:

request =3D Hash.new{|key, value| key[value] =3D []}

request["FROM"] =3D ["fRoM", "sip:[email protected]"]

p hash["FROM"][0]

#=3D> "fRoM"


- j

-- =

Posted via http://www.ruby-forum.com/.=
 
R

Robert Klemme

To clarify, my exact case is the following:

Now it gets interesting. :)
I've coded a parser for SIP (similar to HTTP). The parser generates a
Request object which inherits from Hash,

Usually it's better to use composition instead of inheritance to achieve
this. Now your SipRequest inherits *all* methods from Hash including
some that you might not want users to be able to invoke.
and each SIP request header
(i.e. "From: sip:[email protected]") becomes an entry of the hash
(Request object) as follows:

- The key is "FROM" (capitalized).
- The value is an Array of strings (a s header can have multiple values).

I need to store the key capitalized for fastest lookup, but I also
want to store the original header name (which could be "from", "From",
"frOM" and so).

So, to sum it up: you want to have a class for SIP request which allows
(efficient) header field access through [] using header name in any case
spelling.
So my parser adds an instance variable @real_name within the header
name string ("FROM").

When I do the lookup of a header in the Request object, I would like
also to retrieve the key's @real_name, but I've already understood
that this is only possible if taint the key string before inserting it
in the hash and use Hash#assoc. This solution is not good for
performance.

The solution suggested by Robert is adding such information (the
header original name) as a field in the hash entry value, so instead
of having:

request["FROM"]
=> [ "sip:[email protected] ]

I would end with something like:

request["FROM"]
=> Struct ( "From", [ "sip:[email protected] ] )

The problem this last suggestion introduces is that it breaks the
existing API and makes more complext for a developer to handle the
Request class (which should be as easy as handling a Hash).

Here's how I'd do it. First, I would start with the interface, maybe
something like this

module SIP
class Request
def self.parse(io)
# ...
end

# get a header field by symbol
def [](header_name_sym)
end

# return the real name used
def header_name(header_name_sym)
end
end
end

Then I'd think how I could make that API work properly. For example two
variants, error and default value:

module SIP
class Request
HdrInfo = Struct.new name, values
DUMMY = HdrInfo[nil, [].freeze].freeze
LT = "\r\n".freeze

def self.parse(io)
hdr = {}

io.each_line LT do |l|
case l
when /^([^:]+:\s*(.*)$/
# too simplistic parsing!
hdr[$1] = $2.split(/,/).each(&:strip!)
when /^$/
break
else
raise "Not a header line: %p" % l
end
end

new(hdr)
end

def initialize(headers)
@hdr = {}

# assume hdr is String and values is parsed
headers.each do |hdr, values|
@hdr[normalize(hdr)] = HdrInfo[hdr, values]
end
end

# get a header field by symbol
def [](header_name_sym)
@hdr.fetch(normalize(header_name_sym)) do |k|
DUMMY
end.values
end

# return the real name used
def header_name(header_name_sym)
@hdr.fetch(normalize(header_name_sym)).do |k|
raise ArgumentError,
"Header not found %p" % header_name_sym
end.name
end

private
def normalize(h)
/[A-Z]/ =~ h ? h.downcase : h).to_sym
end
end
end

Of course we could build the internal hash straight away during parsing.
The main focus of the example was how to use the header once parsed.
Thanks to both for your comments.

You're welcome.

Kind regards

robert
 
J

Josh Cheek

2011/4/15 Kevin Mahler said:
He has a key. It contains some
data. It's not necessarily true that he should duplicate that data in
the mapped-to values.

To clarify, my exact case is the following:

I've coded a parser for SIP (similar to HTTP). The parser generates a
Request object which inherits from Hash, and each SIP request header
(i.e. "From: sip:[email protected]") becomes an entry of the hash
(Request object) as follows:

- The key is "FROM" (capitalized).
- The value is an Array of strings (a s header can have multiple values).

I need to store the key capitalized for fastest lookup, but I also
want to store the original header name (which could be "from", "From",
"frOM" and so).
So my parser adds an instance variable @real_name within the header
name string ("FROM").

When I do the lookup of a header in the Request object, I would like
also to retrieve the key's @real_name, but I've already understood
that this is only possible if taint the key string before inserting it
in the hash and use Hash#assoc. This solution is not good for
performance.

The solution suggested by Robert is adding such information (the
header original name) as a field in the hash entry value, so instead
of having:

request["FROM"]
=3D> [ "sip:[email protected] ]

I would end with something like:

request["FROM"]
=3D> Struct ( "From", [ "sip:[email protected] ] )

The problem this last suggestion introduces is that it breaks the
existing API and makes more complext for a developer to handle the
Request class (which should be as easy as handling a Hash).


Thanks to both for your comments.

You don't have to have a hash to implement a hash interface. How about
simply creating your own class that supports the interface you want, but
also the functionality you want. Something like this:



class Request

Header =3D Struct.new :key , :value

def self.parse(headers)
request =3D Request.new
headers.each_line do |header|
key, value =3D header.split ": "
request.add_header key , value.chomp
end
request
end

def initialize
@headers =3D Hash.new
end

def add_header(key, value)
@headers[key.upcase] =3D Header[key,value]
end

def [](key)
@headers[key][:value]
end

def original(key)
@headers[key][:key]
end

end


headers =3D <<HEADER
frOM: sip:[email protected]
To: sip:[email protected]
HEADER

request =3D Request.parse headers

request["FROM"] # =3D> "sip:[email protected]"
request.original "FROM" # =3D> "frOM"

request["TO"] # =3D> "sip:[email protected]"
request.original "TO" # =3D> "To"
 
I

Iñaki Baz Castillo

2011/4/16 Robert Klemme said:
Usually it's better to use composition instead of inheritance to achieve
this. =C2=A0Now your SipRequest inherits *all* methods from Hash includin= g some
that you might not want users to be able to invoke.

Thanks to both. However the SIP parser is already done. I've coded it
at C level as a Ruby extension (similar to Mongrel HTTP parser which
returns a Hash instance). I can change it for generating a Hash object
rather than a custom SipRequest object, and then behave as both of you
suggest:

class SipRequest
def initialize(headers=3D{})
@headers =3D headers
end
end

I will consider it and also the suggested methods to handle header
names and values.

Thanks a lot.



--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top