[RCR] subclasses of string as hash keys

Matthias Georgi · May 13, 2004

Current behaviour in Ruby is irritating.

example:

class A < String; end

hash = {}

hash[ A.new('x') ] = nil

p hash.keys.first.class

=> String

Strings get copied, when they get inserted as keys,
this is intended, I think.

But if I choose to insert a subclassed object,
I expect to get the same class back, copied or not.

So I suggest, to change in rb_hash_aset:

- if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
- st_insert(RHASH(hash)->tbl, key, val);
- }

+ if ( RBASIC(key)->klass != rb_cString || st_lookup(RHASH(hash)->tbl,
key, 0)) {
+ st_insert(RHASH(hash)->tbl, key, val);
+ }

This was discussed long time ago in [ruby-talk:8050].

nobu.nokada · May 14, 2004

Hi,

At Fri, 14 May 2004 06:50:49 +0900,
Matthias Georgi wrote in [ruby-talk:100193]:

example:

class A < String; end

hash = {}

hash[ A.new('x').freeze ] = nil
p hash.keys.first.class
=> A

Robert Klemme · May 14, 2004

Hi,

At Fri, 14 May 2004 06:50:49 +0900,
Matthias Georgi wrote in [ruby-talk:100193]:

example:

class A < String; end

hash = {}

Click to expand...

hash[ A.new('x').freeze ] = nil
p hash.keys.first.class
=> A

Although that is a fix I'd rather expect the key to be dup'ed. Does this
impose problems for the implementation or performance?

OTOH, you usually don't subclass String because there's so many places
that create strings that you don't control (i.e #to_s etc.).

Regards

robert

Matthias Georgi · May 14, 2004

Hi,

At Fri, 14 May 2004 06:50:49 +0900,
Matthias Georgi wrote in [ruby-talk:100193]:

example:

class A < String; end

hash = {}

Click to expand...

hash[ A.new('x').freeze ] = nil
p hash.keys.first.class
=> A

Oh yes, didn't know that.

Matthias Georgi · May 14, 2004

OTOH, you usually don't subclass String because there's so many places
that create strings that you don't control (i.e #to_s etc.).

If you have some string representing something sepcial, it's more OOP
to subclass, but the easiest way is to extend class String.
namespaces would be really good solution for that.

Robert Klemme · May 14, 2004

Matthias Georgi said:
If you have some string representing something sepcial, it's more OOP
to subclass,

Click to expand...

This is debatable - and there *have* been lengthy debates about that. For
example, Java's class String is declared 'final', i.e. it can't be
inherited from. I have only a slight tendency to not do it but I agree
that there might be situations where subclassing is the more appropriate
option. Though it seems to me that they are rather rare than common.

Note also, that with Delegator you can quite easily wrap a String and add
functionality as you see fit.

but the easiest way is to extend class String.
namespaces would be really good solution for that.

Click to expand...

How exactly would namespaces solve this problem?

Regards

robert

Matthias Georgi · May 14, 2004

This is debatable - and there *have* been lengthy debates about that.
For
example, Java's class String is declared 'final', i.e. it can't be
inherited from. I have only a slight tendency to not do it but I agree
that there might be situations where subclassing is the more appropriate
option. Though it seems to me that they are rather rare than common.

e.g. a class which represents the an object id, the object id have to be
a string , because it's stored in database. So to get a part of the id,
you define an accessor, which scans the string.
But from a practical view, I'd rather go with some module_functions,
because the conversions all over the code like String#to_oid are too messy.

Note also, that with Delegator you can quite easily wrap a String and add
functionality as you see fit.

Yes, but when it comes to binary representation, I have to call everywhere
to_s, also not beatyful.

How exactly would namespaces solve this problem?

The term namespace is possibly wrong, but my idea is:

namespace MyNames

class String # makes a copy of String class
end

s = "xxx" # constructor of MyNames::String is called

end

Same thing for every string creation.
But that would require a lot of change in the interpreter,
every rb_str_new() must know the current namespace.

Recently I attended a lecture of the creator(Erik Ernst) of the
language gbeta, which solves this problem nicely, but
from a very different approach.

Robert Klemme · May 14, 2004

Matthias Georgi said:
e.g. a class which represents the an object id, the object id have to be
a string , because it's stored in database.

Click to expand...

It has to be convertable to a string, but that does not mean OID *isA*
String.

So to get a part of the id,
you define an accessor, which scans the string.

Click to expand...

.... and by using String methods I can easily break the OID:

class OID < String
RX = /^(\d+)-(\d+)$/

def initialize(str)
raise ArgumentError unless RX =~ str
super
end

def major; RX =~ self; $1; end
def minor; RX =~ self; $2; end
end

oid = OID.new("123456") # wrong format => error, ok!
oid = OID.new("123-456")
oid.gsub!(/.*/, 'X')
oid.major # oops!

I'd say an OID is not a string. It has a String representation, but it is
conceptually something completely different. Especially if your OIDs must
conform to a certain format as shown above. Also, you might want to
change the internal representation if that is more appropriate at some
point in time (e.g. because performance of #major and #minor is too bad).

But from a practical view, I'd rather go with some module_functions,
because the conversions all over the code like String#to_oid are too messy.

Yes, but when it comes to binary representation, I have to call everywhere
to_s, also not beatyful.

The term namespace is possibly wrong, but my idea is:

namespace MyNames

class String # makes a copy of String class
end

s = "xxx" # constructor of MyNames::String is called

end

Same thing for every string creation.
But that would require a lot of change in the interpreter,
every rb_str_new() must know the current namespace.

Click to expand...

That's not how namespaces work (at least in C++). First, class String in
MyNames is a totally new String class that has nothing to do with any
other class with the same name unless you inherit that class. Second,
"xxx" is a literal that is bound to the standard type String (char* in
C/C++), similarly as 1.234 is bound to represent a float. You don't
change that by introducing namespaces. Third, places in code that create
strings one way or the other will always create standard strings. If the
declaration of a new class with the same name as another class in a
different namespace had the side effect that *all* places in code now use
the new class, type safety would be completely down the drain - apart from
all other sorts of problems (how should a compiler handle this in C++?).

What you'd rather want is a mechanism that replaces the binding of certain
literals to types ("xxx" => String, 1.234 => float etc.). But then, in
Ruby it's far easier to simply extend String. Still I think in most cases
it is not a good idea to use a sub class of String as an application
class, since that brings all sorts of problems with it.

Recently I attended a lecture of the creator(Erik Ernst) of the
language gbeta, which solves this problem nicely, but
from a very different approach.

Click to expand...

How do they do it there?

robert

Matthias Georgi · May 14, 2004

That's not how namespaces work (at least in C++). First, class String in
MyNames is a totally new String class that has nothing to do with any
other class with the same name unless you inherit that class. Second,
"xxx" is a literal that is bound to the standard type String (char* in
C/C++), similarly as 1.234 is bound to represent a float. You don't
change that by introducing namespaces. Third, places in code that create
strings one way or the other will always create standard strings. If the
declaration of a new class with the same name as another class in a
different namespace had the side effect that *all* places in code now use
the new class, type safety would be completely down the drain - apart
from
all other sorts of problems (how should a compiler handle this in C++?).

What you'd rather want is a mechanism that replaces the binding of
certain
literals to types ("xxx" => String, 1.234 => float etc.). But then, in
Ruby it's far easier to simply extend String. Still I think in most
cases
it is not a good idea to use a sub class of String as an application
class, since that brings all sorts of problems with it.

I don't think so.
Even if you statically type, you have

namespace A

class String
end

x = "xxx" # syntactic sugar for String.new("xxx")
# statically looking up constant String resolves to A::String
# => A::String.new("xxx")

end

In C++ the OO is broken IMHO, because "" is not an object, rather
a char* pointer, which doesnt fit with namespaces.

extending the String class messes up the namespace,
imagine a 100k lines project, where everyone extends the String
class. Either you get long method names or name crashes will be likely.

A problem is passing the A::String outside the namespace,
i think in this case the string object should'nt respond
to A::String methods.

Besides that A::String is not simply a subclass of string.

How do they do it there?

A:
(# String:<
(# somemethod: ... #);
#)

B: A
(# String::< (# othermethod: ... #);
#)

So here A is a class and B is a subclass of A.

In B there will be created a fresh copy of String
extending String with a new method.

Robert Klemme · May 14, 2004

Matthias Georgi said:
I don't think so.
Even if you statically type, you have

namespace A

class String
end

x = "xxx" # syntactic sugar for String.new("xxx")
# statically looking up constant String resolves to A::String
# => A::String.new("xxx")

Click to expand...

Not quite: it's completely up to the language spec what "xxx" stands for.
If if there were namespaces in Ruby it would not mean that the lookup you
describe would happen. And if asked, I'd strongly vote against it because
this easily messes up a lot of code.

end

In C++ the OO is broken IMHO, because "" is not an object, rather
a char* pointer, which doesnt fit with namespaces.

extending the String class messes up the namespace,
imagine a 100k lines project, where everyone extends the String
class. Either you get long method names or name crashes will be likely.

Click to expand...

That's the exact reason why I recommend to not do it.

A problem is passing the A::String outside the namespace,
i think in this case the string object should'nt respond
to A::String methods.

Click to expand...

This will create lots of errors. Assume someone has any instance from
namespace A and invokes to_s on that instance. He expects a String but he
gets a A::String which doesn't even support the same interface. Bang!
This automated changing of string literals to something else is complete
nonsense. Plus, you'd have to explain how you implement A::String if
"xxx" is already a A::String. You would have to write a C extension to
handle memory management etc. since you don't have access to the standard
String here. etc. pp.

Besides that A::String is not simply a subclass of string.

A:
(# String:<
(# somemethod: ... #);
#)

B: A
(# String::< (# othermethod: ... #);
#)

So here A is a class and B is a subclass of A.

In B there will be created a fresh copy of String
extending String with a new method.

Click to expand...

That's possible in all OO languages - even with Ruby. The crucial point
is how usage of this type is regulated.

It seems to me you're stuck in some dead end street. Better sleep a night
over this.

robert

Yukihiro Matsumoto · May 14, 2004

Hi,

In message "[RCR] subclasses of string as hash keys"

|Current behaviour in Ruby is irritating.
|
|example:
|
|class A < String; end
|
|hash = {}
|
|hash[ A.new('x') ] = nil
|
|p hash.keys.first.class
|
|=> String

It's a bug. It should be A. Thank you.

matz.

Matthias Georgi · May 14, 2004

This will create lots of errors. Assume someone has any instance from
namespace A and invokes to_s on that instance. He expects a String but
he
gets a A::String which doesn't even support the same interface. Bang!

This automated changing of string literals to something else is complete
nonsense.

OK, I went wrong. Forget about the literals. That's not solving the
problem.
Let me formulate in one sentence:
I just want to extend the String class with my own methods, but only if the
methods get called inside my namespace.

There is currently no solution to this, and there are lot of people, who
are extending standard classes.

I can think of a dynamic approach.

I'm looking up the method table of my object at call time:
1. inside current namespace
2. if not found, then proceed with outer namespace
3. lookup method in table, if not found => proceed with outer namespace

The question is how to define the method table (modules?) and also
how to make it efficient.

Plus, you'd have to explain how you implement A::String if
"xxx" is already a A::String. You would have to write a C extension to
handle memory management etc. since you don't have access to the standard
String here. etc. pp.

Sorry, I didn't want an object with a different class, only with extended
interface.

That's possible in all OO languages - even with Ruby. The crucial point
is how usage of this type is regulated.

It was possibly not a good comparison ,because gbeta is statically typed.

It seems to me you're stuck in some dead end street. Better sleep a
night
over this.

Oh yes, I'll do and dreaming about namespaces

gabriele renzi · May 14, 2004

namespace A

class String
end

x = "xxx" # syntactic sugar for String.new("xxx")
# statically looking up constant String resolves to A::String
# => A::String.new("xxx")

end

remove namespace and name it module.
Than make class declaration semantic differnt so that it is like:

module M
class String < ::String
end
end

Now add the magic that include M should merge M::String and String.
Then post anrcr and I'll vote for it

)

Matthias Georgi · May 15, 2004

remove namespace and name it module.
Than make class declaration semantic differnt so that it is like:

module M
class String < ::String
end
end

Now add the magic that include M should merge M::String and String.
Then post anrcr and I'll vote for it )

It's diffcult to define the merging.
There should be some interceptor, who catches missing methods
and lookup in the callers binding for the extension.

I assume there is Kernel#caller_binding
(somethimg similar was posted recently as RCR).

class String
alias _method_missing method_missing
def method_missing(meth, *args, &block)
s = <<-EOF
names = self.class.name.split('::')
mod = Kernel
names.map do |name|
mod = mod.const_get(name)
mod.const_get

String) and mod
end.compact
EOF
exts = eval(s, caller_binding) # collecting the extension modules
exts.each {|ext| extend ext } # extending with my own modules
if respond_to? meth
send(meth, *args, &block) # meth was found in extension
else
_method_missing(meth, *args, &block) # meth is still missing
end
end

end

module A
module String
def to_yaml
..
end
end

class X
def initialize(s)
@s = s.to_yaml # I need the binding from here !!!!
p s.singleton_methods # => ["to_yaml"]
end
end
end

A::X.new "astring"

Every missing method for a normal String object will be intercepted,
and the object gets extended with a module in the calling environment.
That's actually what we needs.
My own extensions are only used in my modules.
The code above is only a demonstration of the algorithm.
There is a problem left, I cannot pass the strings outside
my module, because other people are expecting the standard
String interface, so the whole thing should be integrated in the
language core without singleton classes, extending only for the
actual method call.

Robert Klemme · May 17, 2004

Every missing method for a normal String object will be intercepted,
and the object gets extended with a module in the calling environment.
That's actually what we needs.
My own extensions are only used in my modules.
The code above is only a demonstration of the algorithm.
There is a problem left, I cannot pass the strings outside
my module, because other people are expecting the standard
String interface, so the whole thing should be integrated in the
language core without singleton classes, extending only for the
actual method call.

If that functionality is only needed inside the module then IMHO there is
a much easier way to accomplish this: define functions in a namespace aka
define module methods:

module Foo
def reformat(str)
str.gsub!(/^(.*)$/, '[\\1]')
str
end

class Test
include Foo

def initialize(s);@s=s;end

def do_something
puts reformat( @s ).length
end
end
end

module Bar
def reformat(str)
str.gsub!(/^(.*)$/, '<<\\1>>')
str
end

class Test
include Bar

def initialize(s);@s=s;end

def do_something
puts reformat( @s ).length
end
end
end
4
=> nil6
=> nil

Regards

robert

Extending Hast class with custom [] []= methods	10	Apr 23, 2008
Q: hash parameter passed to c++ ruby extension is incomplete	2	Mar 11, 2009
Ruby Hash Keys and Related Questions	6	Feb 23, 2011
Hash keys don't work as expected	10	Mar 2, 2007
PATCH to make internal Hash class retain order...	20	Aug 12, 2006
Fetching objects used as hash keys	2	Feb 6, 2007
Nested hash	1	Apr 5, 2010
Hash freezes String keys are returns copy	8	Dec 13, 2004

[RCR] subclasses of string as hash keys

Matthias Georgi

nobu.nokada

Robert Klemme

Matthias Georgi

Matthias Georgi

Robert Klemme

Matthias Georgi

Robert Klemme

Matthias Georgi

Robert Klemme

Yukihiro Matsumoto

Matthias Georgi

gabriele renzi

Matthias Georgi

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads