My Experiences Subclassing String

F

Fuzzyman

I recently went through a bit of a headache trying to subclass
string.... This is because the string is immutable and uses the
mysterious __new__ method rather than __init__ to 'create' a string.
To those who are new to subclassign the built in types, my experiences
might prove helpful. Hopefully not too many innacuracies :)

I've just spent ages trying to subclass string.... and I'm very proud
to say I finally managed it !

The trouble is that the string type (str) is immutable - which means
that new instances are created using the mysterious __new__ method
rather than __init__ !! :) You still following me.... ?

SO :

class newstring(str):
def __init__(self, value, othervalue):
str.__init__(self, value)
self.othervalue = othervalue

astring = newstring('hello', 'othervalue')

fails miserably. This is because the __new__ method of the str is
called *before* the __init__ value.... and it says it's been given too
many values. What the __new__ method does is actually return the new
instance - for a string the __init__ method is just a dummy.

The bit I couldn't get (and I didn't have access to a python manual at
the time) - if the __new__ method is responsible for returning the new
instance of the string, surely it wouldn't have a reference to self;
since the 'self' wouldn't be created until after __new__ has been
called......

Actually thats wrong - so, a simple string type might look something
like this :

class newstring(str):
def __new__(self, value):
return str.__new__(self, value)
def __init__(self, value):
pass

See how the __new__ method returns the instance and the __init__ is
just a dummy.
If we want to add the extra attribute we can do this :


class newstring(str):
def __new__(self, value, othervalue):
return str.__new__(self, value)
def __init__(self, value, othervalue):
self.othervalue = othervalue

The order of creation is that the __new__ method is called which
returns the object *then* __init__ is called. Although the __new__
method receives the 'othervalue' it is ignored - and __init__ uses it.
In practise __new__ could probably do all of this - but I prefer to
mess around with __new__ as little as possible ! I was just glad I got
it working..... What it means is that I can create my own class of
objects - that in most situations will behave like strings, but have
their own attributes. The only restriction is that the string value is
immutable and must be set when the object is created. See the
excellent path module by Jason Orendorff for another example object
that behaves like a string but also has other attributes - although it
doesn't use the __new__ method; or the __init__ method I think.

Regards,

Fuzzy

Posted to Voidspace - Techie Blog :
http://www.voidspace.org.uk/voidspace/index.shtml
Experiences used in the python modules at :
http://www.voidspace.org.uk/atlantibots/pythonutils.html
 
P

Paul McGuire

Fuzzyman said:
I recently went through a bit of a headache trying to subclass
string.... This is because the string is immutable and uses the
mysterious __new__ method rather than __init__ to 'create' a string.
To those who are new to subclassign the built in types, my experiences
might prove helpful. Hopefully not too many innacuracies :)

The bit I couldn't get (and I didn't have access to a python manual at
the time) - if the __new__ method is responsible for returning the new
instance of the string, surely it wouldn't have a reference to self;
since the 'self' wouldn't be created until after __new__ has been
called......

Actually thats wrong - so, a simple string type might look something
like this :

class newstring(str):
def __new__(self, value):
return str.__new__(self, value)
def __init__(self, value):
pass

See how the __new__ method returns the instance and the __init__ is
just a dummy.
If we want to add the extra attribute we can do this :


class newstring(str):
def __new__(self, value, othervalue):
return str.__new__(self, value)
def __init__(self, value, othervalue):
self.othervalue = othervalue

The order of creation is that the __new__ method is called which
returns the object *then* __init__ is called. Although the __new__
method receives the 'othervalue' it is ignored - and __init__ uses it.
<snip>

Fuzzy -

I recently went down this rabbit hole while trying to optimize Literal
handling in pyparsing. You are close in your description, but there is one
basic concept that I think still needs to be sorted out for you.

Think of __new__ as a class-level factory method, not an instance method.
That first argument that you passed to your example as 'self' is not the
self instance, it is the class being new'ed. By luck, even though you
called it 'self', you passed it to str.__new__ where the class argument is
supposed to go, so everything still worked.

The canonical/do-nothing __new__ method looks like this:

class A(object):
def __new__(cls,*args):
return object.__new__(cls)

There's nothing stopping you from looking at the args tuple to see if you
want to do more than this, but in truth that's what __init__ is for.

Here's a sample of using __new__ to return a different class of object,
depending on the initialization arguments:

class SpecialA(object):
pass

class A(object):
def __new__(cls,*args):
print cls,":",args
if len(args)>0 and args[0]==2:
return object.__new__(SpecialA)
return object.__new__(cls)

obj = A()
print type(obj)
obj = A(1)
print type(obj)
obj = A(1,"test")
print type(obj)
obj = A(2,"test")
print type(obj)

gives the following output:

<class '__main__.A'> : ()
<class '__main__.A'>
<class '__main__.A'> : (1,)
<class '__main__.A'>
<class '__main__.A'> : (1, 'test')
<class '__main__.A'>
<class '__main__.A'> : (2, 'test')
<class '__main__.SpecialA'>


HTH,
-- Paul
 
F

Fuzzyman

class SpecialA(object):
pass

class A(object):
def __new__(cls,*args):
print cls,":",args
if len(args)>0 and args[0]==2:
return object.__new__(SpecialA)
return object.__new__(cls)

obj = A()
print type(obj)
obj = A(1)
print type(obj)
obj = A(1,"test")
print type(obj)
obj = A(2,"test")
print type(obj)

gives the following output:

<class '__main__.A'> : ()
<class '__main__.A'>
<class '__main__.A'> : (1,)
<class '__main__.A'>
<class '__main__.A'> : (1, 'test')
<class '__main__.A'>
<class '__main__.A'> : (2, 'test')
<class '__main__.SpecialA'>


HTH,
-- Paul

Thanks Paul, that was helpful and interesting.
I've posted the following correction to my blog :

Ok... so this is a correction to my post a couple of days ago about
subclassing the built in types (in python).

I *nearly* got it right. Because new is the 'factory method' for
creating new instances it is actually a static method and *doesn't*
receive a reference to self as the first instance... it receives a
reference to the class as the first argument. By convention in python
this is a variable named cls rather than self (which refers to the
instance itself). What it means is that the example I gave *works*
fine, but the terminology is slightly wrong...

See the docs on the new style classes unifying types and classes. Also
thanks to Paul McGuire on comp.lang.pyton for helping me with this.

My example ought to read :
class newstring(str):
def __new__(cls, value, *args, **keywargs):
return str.__new__(cls, value)
def __init__(self, value, othervalue):
self.othervalue = othervalue

See how the __new__ method collects all the other arguments (using the
*args and **keywargs collectors) but ignores them - they are rightly
dealt with by __init__. You *could* examine these other arguments in
__new__ and even return an object that is an instance of a different
class depending on the parameters - see the example Paul gives...

Get all that then ? :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top