trouble subclassing str

B

Brent

I'd like to subclass the built-in str type. For example:

--

class MyString(str):

def __init__(self, txt, data):
super(MyString,self).__init__(txt)
self.data = data

if __name__ == '__main__':

s1 = MyString("some text", 100)

--

but I get the error:

Traceback (most recent call last):
File "MyString.py", line 27, in ?
s1 = MyString("some text", 12)
TypeError: str() takes at most 1 argument (2 given)

I am using Python 2.3 on OS X. Ideas?
 
P

Paul McGuire

My first thought is "make sure that subclassing str is really what you
want to do." Here is a place where I have a subclass of str that
really is a special kind of str:

class PaddedStr(str):
def __new__(cls,s,l,padc=' '):
if l > len(s):
s2 = "%s%s" % (s,padc*(l-len(s)))
return str.__new__(str,s2)
else:
return str.__new__(str,s)

print ">%s<" % PaddedStr("aaa",10)
print ">%s<" % PaddedStr("aaa",8,".")


(When subclassing str, you have to call str.__new__ from your
subclass's __new__ method, since str's are immutable. Don't forget
that __new__ requires a first parameter which is the input class. I
think the rest of my example is pretty self-explanatory.)

But if you are subclassing str just so that you can easily print your
objects, look at implementing the __str__ instance method on your
class. Reserve inheritance for true "is-a" relationships. Often,
inheritance is misapplied when the designer really means "has-a" or
"is-implemented-using-a", and in these cases, the supposed superclass
is better referenced using a member variable, and delegating to it.

-- Paul
 
S

Steven D'Aprano

But if you are subclassing str just so that you can easily print your
objects, look at implementing the __str__ instance method on your
class. Reserve inheritance for true "is-a" relationships. Often,
inheritance is misapplied when the designer really means "has-a" or
"is-implemented-using-a", and in these cases, the supposed superclass
is better referenced using a member variable, and delegating to it.

Since we've just be talking about buzzwords in another thread, and the
difficulty self-taught folks have in knowing what they are, I don't
suppose somebody would like to give a simple, practical example of what
Paul means?

I'm going to take a punt here and guess. Instead of creating a sub-class
of str, Paul suggests you simply create a class:

class MyClass:
def __init__(self, value):
# value is expected to be a string
self.value = self.mangle(value)
def mangle(self, s):
# do work on s to make sure it looks the way you want it to look
return "*** " + s + " ***"
def __str__(self):
return self.value

(only with error checking etc for production code).

Then you use it like this:

py> myprintablestr = MyClass("Lovely Spam!")
py> print myprintablestr
*** Lovely Spam!!! ***

Am I close?
 
D

Donn Cave

Steven D'Aprano said:
Since we've just be talking about buzzwords in another thread, and the
difficulty self-taught folks have in knowing what they are, I don't
suppose somebody would like to give a simple, practical example of what
Paul means?

I'm going to take a punt here and guess. Instead of creating a sub-class
of str, Paul suggests you simply create a class:

class MyClass:
def __init__(self, value):
# value is expected to be a string
self.value = self.mangle(value)
def mangle(self, s):
# do work on s to make sure it looks the way you want it to look
return "*** " + s + " ***"
def __str__(self):
return self.value

(only with error checking etc for production code).

Then you use it like this:

py> myprintablestr = MyClass("Lovely Spam!")
py> print myprintablestr
*** Lovely Spam!!! ***

Am I close?

That's how I read it, with "value" as the member variable
that you delegate to.

Left unexplained is ``true "is-a" relationships''. Sounds
like an implicit contradiction -- you can't implement
something that truly is something else. Without that, and
maybe a more nuanced replacement for "is-implemented-using-a",
I don't see how you could really be sure of the point.

Donn Cave, (e-mail address removed)
 
J

John Machin

Brent said:
I'd like to subclass the built-in str type. For example:

You'd like to build this weird-looking semi-mutable object as a
perceived solution to what problem? Perhaps an alternative is a class of
objects which have a "key" (your current string value) and some data
attributes? Maybe simply a dict ... adict["some text"] = 100?
class MyString(str):

def __init__(self, txt, data):
super(MyString,self).__init__(txt)
self.data = data

if __name__ == '__main__':

s1 = MyString("some text", 100)


but I get the error:

Traceback (most recent call last):
File "MyString.py", line 27, in ?
s1 = MyString("some text", 12)
TypeError: str() takes at most 1 argument (2 given)

I am using Python 2.3 on OS X. Ideas?

__init__ is not what you want.

If you had done some basic debugging before posting (like putting a
print statement in your __init__), you would have found out that it is
not even being called.

Suggestions:

1. Read the manual section on __new__
2. Read & run the following:

class MyString(str):

def __new__(cls, txt, data):
print "MyString.__new__:"
print "cls is", repr(cls)
theboss = super(MyString, cls)
print "theboss:", repr(theboss)
new_instance = theboss.__new__(cls, txt)
print "new_instance:", repr(new_instance)
new_instance.data = data
return new_instance

if __name__ == '__main__':

s1 = MyString("some text", 100)
print "s1:", type(s1), repr(s1)
print "s1.data:", s1.data

3. Note, *if* you provide an __init__ method, it will be called
[seemingly redundantly???] after __new__ has returned.

HTH,
John
 
P

Paul McGuire

Dang, that class should be:

class PaddedStr(str):
def __new__(cls,s,l,padc=' '):
if l > len(s):
s2 = "%s%s" % (s,padc*(l-len(s)))
return str.__new__(cls,s2)
else:
return str.__new__(cls,s)

-- Paul
 
K

Kent Johnson

Donn said:
Left unexplained is ``true "is-a" relationships''. Sounds
like an implicit contradiction -- you can't implement
something that truly is something else. Without that, and
maybe a more nuanced replacement for "is-implemented-using-a",
I don't see how you could really be sure of the point.

Try this article for an explanation of is-a:
http://www.objectmentor.com/resources/articles/lsp.pdf

IMO Robert Martin explains what good OO design is better than anyone else. His book "Agile Software Development" is excellent.

Kent
 
P

Paul McGuire

From purely Python terms, there is a distinction that one of these
classes (PaddedStr) is immutable, while the other is not. Python only
permits immutable objects to act as dictionary keys, so this would one
thing to differentiate these two approaches.

But on a more abstract, implementation-independent level, this is a
distinction of inheritance vs. composition and delegation. Inheritance
was one of the darling concepts in the early days of O-O programming,
with promises of reusability and development speed. But before long,
it turned out that inheritance comes with some unfriendly baggage -
dependencies between subclasses and superclasses made refactoring more
difficult, and modifications to supertypes had unwanted effects on
subclasses. Sometimes subclasses would use some backdoor knowledge of
the supertype data, thereby limiting flexibility in the superclass -
this phenomenon is often cited as "inheritance breaks encapsulation."

One check for good inheritance design is the Liskov Substitution
Principle (LSP) (Thanks for the Robert Martin link, Kent - you beat me
to it). Borrowing from the Wiki-pedia:
"In general, the principle mandates that at all times objects from a
class can be swapped with objects from an inheriting class, without the
user noticing any other new behaviour. It has effects on the paradigms
of design by contract, especially regarding to specification:
- postconditions for methods in the subclass should be more strict than
those in the superclass
- preconditions for methods in the subclass should be less strict than
those in the superclass
- no new exceptions should be introduced in the subclass"
(http://en.wikipedia.org/wiki/Liskov_substitution_principle)

One thing I like about this concept is that is fairly indepedent of
language or implementation features. I get the feeling that many such
rules/guidelines seem to be inspired by limitations or gimmicks that
are found in programming language X (usually C++ or Java), and then
mistakenly generalized to be universal O-O truths.

Looking back to PaddedStr vs. MyString, you can see that PaddedStr will
substitute for str, and for that matter, the MyString behavior that is
given could be a reasonable subclass of str, although maybe better
named StarredStr. But let's take a slightly different MyString, one
like this, where we subclass str to represent a person's name:

class Person(str):
def __new__(cls,s,data):
self = str.__new__(cls,s)
self.age = data
return self

p = Person("Bob",10)
print p,p.age

This is handy enough for printing out a Person and getting their name.
But consider a father and son, both named "Bob".

p1 = Person("Bob",10)
p2 = Person("Bob",35) # p1's dad, also named Bob
print p1 == p2 # prints 'true', should it?
print p1 is p2 # prints 'false'


Most often, I see "is-a" confused with "is-implemented-using-a". A
developer decides that there is some benefit (reduced storage, perhaps)
of modeling a zip code using an integer, and feels the need to define
some class like:

class ZipCode(int):
def lookupState(self):
...

But zip codes *aren't* integers, they just happen to be numeric - there
is no sense in supporting zip code arithmetic, nor in using zip codes
as slice indices, etc. And there are other warts, such as printing zip
codes with leading zeroes (like they have in Maine).

So when, about once a month we see on c.l.py "I'm having trouble
sub-classing <built-in class XYZ>," I can't help but wonder if telling
the poster how to sub-class an XYZ is really doing the right thing.

In this thread, the OP wanted to extend str with something that was
constructable with two arguments, a string and an integer, as in s1 =
MyString("some text", 100). I tried to propose a case that would be a
good example of inheritance, where the integer would be used to define
and/or constrain some str attribute. A *bad* example of inheritance
would have been one where the 100 had some independent characteristic,
like a font size, or an age value to be associated with a string that
happens to contain a person's name. In fact, looking at the proposed
MyClass, this seems to be the direction he was headed.

When *should* you use inheritance? Well, for a while, there was such
backlash that the response was "Never". Personally, I use inheritance
in cases where I have adopted a design pattern that incorporates it,
such as Strategy; otherwise, I tend not to use it. (For those of you
who use my pyparsing package, it is loaded with the Strategy pattern.
The base class ParserElement defines an abstract do-nothing parsing
implementation, which is overridden in subclasses such as Literal,
Word, and Group. All derived instances are treated like the base
ParserElement, with each subclass providing its own specialized
parseImpl or postParse behavior, so any subclass can be substituted for
the base ParserElement, satisfying LSP.)

I think the current conventional wisdom is "prefer composition over
inheritance" - never say "never"! :)

-- Paul
 
D

Donn Cave

Most often, I see "is-a" confused with "is-implemented-using-a". A
developer decides that there is some benefit (reduced storage, perhaps)
of modeling a zip code using an integer, and feels the need to define
some class like:

class ZipCode(int):
def lookupState(self):
...

But zip codes *aren't* integers, they just happen to be numeric - there
is no sense in supporting zip code arithmetic, nor in using zip codes
as slice indices, etc. And there are other warts, such as printing zip
codes with leading zeroes (like they have in Maine).

I agree, but I'm not sure how easily this kind of reasoning
can be applied more generally to objects we write. Take for
example an indexed data structure, that's generally similar
to a dictionary but may compute some values. I think it's
common practice in Python to implement this just as I'm sure
you would propose, with composition. But is that because it
fails your "is-a" test? What is-a dictionary, or is-not-a
dictionary? If you ask me, there isn't any obvious principle,
it's just a question of how we arrive at a sound implementation --
and that almost always militates against inheritance, because
of liabilities you mentioned elsewhere in your post, but in the
end it depends on the details of the implementation.

Donn Cave, (e-mail address removed)
 
P

Paul McGuire

Look at the related post, on keeping key-key pairs in a dictionary.
Based on our discussion in this thread, I created a subclass of dict
called SymmetricDict, that, when storing symDict["A"] = 1, implicitly
saves the backward looking symDict[1] = "A".

I chose to inherit from dict, in part just to see what it would look
like. In doing so, SymmetricDict automagically gets methods such as
keys(), values(), items(), contains(), and support for len, "key in
dict", etc. However, I think SymmetricDict breaks (or at least bends)
LSP, in that there are some cases where SymmetricDict has some
surprising non-dict behavior. For instance, if I do:

d = dict()
d["A"] = 1
d["B"] = 1
print d.keys()

I get ["A", "B"]. But a SymmetricDict is rather strange.

sd = SymmetricDict()
sd["A"] = 1
sd["B"] = 1
print sd.keys()

gives ["B",1]. The second assignment wiped out the association of "A"
to 1. (This reminds me of some maddening O-O discussions I used to
have at a former place of employment, in which one developer cited
similar behavior for not having Square inherit from Rectangle - calling
Square.setWidth() would have to implicitly call setHeight() and vice
versa, in order to maintain its squarishness, and thereby broke Liskov.
I withdrew from the debate, citing lack of context that would have
helped resolve how things should go. At best, you can *probably* say
that both inherit from Shape, and can be drawn, have an area, a
bounding rectangle, etc., but not either inherits from the other.
Unless I'm mistaken, I think Robert Martin has some discussion on this
example also.)

So in sum, I'd say that I would be comfortable having SymmetricDict
extend dict *in my own code*, but that such a beast probably should
*not* be part of the standard Python distribution, in whose scope the
non-dictishness of SymmetricDict cannot be predicted. (And maybe this
gives us some clue about the difficulty of deciding what and what not
to put in to the Python language and libs.)

-- Paul
 
D

Donn Cave

"Paul McGuire said:
This reminds me of some maddening O-O discussions I used to
have at a former place of employment, in which one developer cited
similar behavior for not having Square inherit from Rectangle - calling
Square.setWidth() would have to implicitly call setHeight() and vice
versa, in order to maintain its squarishness, and thereby broke Liskov.
I withdrew from the debate, citing lack of context that would have
helped resolve how things should go. At best, you can *probably* say
that both inherit from Shape, and can be drawn, have an area, a
bounding rectangle, etc., but not either inherits from the other.

This Squares and Rectangles issue sounds debatable in a
language like C++ or Java, where it's important because
of subtype polymorphism. In Python, does it matter?
As a user of Square, I'm not supposed to ask about its
parentage, I just try to be clear what's expected of it.
There's no static typing to notice whether Square is a
subclass of Rectangle, and if it gets out that I tried
to discover this issubclass() relationship, I'll get a
lecture from folks on comp.lang.python who suspect I'm
confused about polymorphism in Python.

This is a good thing, because as you can see it relieves
us of the need to debate abstract principles out of context.
It doesn't change the real issues - Square is still a lot
like Rectangle, it still has a couple of differences, and
the difference could be a problem in some contexts designed
for Rectangle - but no one can fix that. If you need Square,
you'll implement it, and whether you choose to inherit from
Rectangle is left as a matter of implementation convenience.

Donn Cave, (e-mail address removed)
 
B

Bengt Richter

Dang, that class should be:

class PaddedStr(str):
def __new__(cls,s,l,padc=' '):
if l > len(s):
s2 = "%s%s" % (s,padc*(l-len(s)))
return str.__new__(cls,s2)
else:
return str.__new__(cls,s)
Or you could write
... def __new__(cls,s,l,padc=' '):
... return str.__new__(cls, s+padc*(l-len(s)))
...

Which gives

(Taking advantage of multipliers <=0 working like 0 for strings):
...
-3: >xxx<
-2: >xxx<
-1: >xxx<
0: >xxx<
1: >xxx.<
2: >xxx..<
3: >xxx...<

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top