subclassing str

  • Thread starter not1xor1 (Alessandro)
  • Start date
N

not1xor1 (Alessandro)

Hi,

I'd like to know what is the best way to subclass str
I need to add some new methods and that each method (both new and str
ones) return my new type

For instance I've seen I can do:


class mystr(str):

def between(self, start, end):
i = self.index(start) + len(start)
j = self.index(end, i) + len(end)
return self[i:j], self[j:]

def replace(self, old, new='', count=-1):
return mystr(str.replace(self, old, new, count))

if __name__ == '__main__':

s = mystr('one two <three> four')
print s
print type(s)
b,r = s.between('<', '>')
print b
print r
print type(b)
print type(r)
c = s.replace('three', 'five')
print c
print type(c)


when I ran it I get:

one two <three> four
<class '__main__.mystr'>
three>
four
<type 'str'>
<type 'str'>
one two <five> four
<class '__main__.mystr'>


I guess that if I had redefined the slice method even 'between' would
have returned <class '__main__.mystr'>

I wonder if I have to redefine all methods one by one or if there is a
sort of hook to intercept all methods calls and just change the return
type

thanks
 
C

Chris Rebert

Hi,

I'd like to know what is the best way to subclass str
I need to add some new methods and that each method (both new and str ones)
return my new type

For instance I've seen I can do:

class mystr(str):

  def between(self, start, end):
     i = self.index(start) + len(start)
     j = self.index(end, i) + len(end)
     return self[i:j], self[j:]

  def replace(self, old, new='', count=-1):
     return mystr(str.replace(self, old, new, count))
I wonder if I have to redefine all methods one by one or if there is a sort
of hook to intercept all methods calls and just change the return type

You could subclass UserString instead of str; all of UserString's
methods seem to ensure that instances of the subclass rather than just
plain strs or UserStrings are returned. See
http://docs.python.org/library/userdict.html#UserString.UserString

But you should also consider whether your additions absolutely *must*
be methods. Merely instead defining some functions that take strings
as parameters is obviously a simpler, and probably more performant,
approach.

If you insist on subclassing str, there's no such hook; you'll have to
override all the methods yourself.*

Cheers,
Chris
 
N

not1xor1 (Alessandro)

You could subclass UserString instead of str; all of UserString's
methods seem to ensure that instances of the subclass rather than just
plain strs or UserStrings are returned. See
http://docs.python.org/library/userdict.html#UserString.UserString

I'll have a look at it, thanks
But you should also consider whether your additions absolutely *must*
be methods. Merely instead defining some functions that take strings
as parameters is obviously a simpler, and probably more performant,
approach.

I regularly save web pages (mostly scientific research abstracts) from
various web sites and use a python script to strip them of ads and
unneeded informations, embedding the images directly in the html file
(as base64 encoded data) and at times joining multiple pages into just one

since those sites often change the format of their files I've to
modify my script accordingly

I'm already using plain functions, but thought that wrapping most of
them in a str subclass would let me save some time and yield cleaner
and more manageable code
If you insist on subclassing str, there's no such hook; you'll have to
override all the methods yourself.*

I'll try this route too (at least for the methods I need)
thanks for your help
 
L

Lawrence D'Oliveiro

not1xor1 (Alessandro) said:
I'm already using plain functions, but thought that wrapping most of
them in a str subclass would let me save some time and yield cleaner
and more manageable code

How exactly does

a.f(b, c)

save time over

f(a, b, c)

?
 
R

rantingrick

How exactly does

   a.f(b, c)

save time over

    f(a, b, c)

?

I think if you'll re-read the OP's statements (specifically the ones
you quoted!) then you will see it's not an issue of time, its an issue
of form and structure. Its an issue of paradigm. The OP wishes to keep
his code as true to OOP as possible and what is wrong with that?
Nothing in my book!

One thing i love about Python is the fact that it can please almost
all the "religious paradigm zealots" with it's multiple choice
approach to programming. However some of the features that OOP
fundamentalists hold dear in their heart are not always achievable in
a clean manner with Python. Yes you could use a function but that is
not the OOP way. Yes you could use UserString but that also seems too
excessive. Ruby allows redefining everything. However this can open a
REAL can of worms in group environment greater than one!
 
M

Mark Wooding

rantingrick said:
One thing i love about Python is the fact that it can please almost
all the "religious paradigm zealots" with it's multiple choice
approach to programming. However some of the features that OOP
fundamentalists hold dear in their heart are not always achievable in
a clean manner with Python. Yes you could use a function but that is
not the OOP way.

It can be more than just an aesthetic difference. Sometimes it can be
nice to extend existing classes to understand additional protocols. For
example, suppose you're implementing some object serialization format:
it'd be really nice if you could add `serialize' methods to `str',
`int', `list' and friends. This is certainly the Smalltalk way.
Unfortunately, you can't do it in Python.

One option is to implement a subclass which implements the additional
protocol. This causes fragmentation: now each protocol wants its own
subclass: if you want an object which understands both protocols, you
need to mess with multiple inheritance[1] (and, in Python, at least,
hope that everyone is using `super' properly to implement upwards
delegation).

Otherwise you have to do the dispatch yourself, either by hand or by
using one of the generic function/multimethod frameworks.

[1] At least Python gives you a fighting chance of getting this to work.
Java and C# won't countenance multiple inheritance at all, and C++
will botch it all by giving you a separate copy of the common
superclass's state for each protocol implementation (what are the
chances that everyone decided to use a virtual base class?) /and/
hideously breaking upwards delegation.

-- [mdw]
 
L

Lawrence D'Oliveiro

Mark Wooding said:
One option is to implement a subclass which implements the additional
protocol.

This is why I think object orientation ruins your ability to think properly.
For “protocol†read “functionâ€. If you want to implement a new function,
then implement a new function, why do you have to go through this
“subclassing†malarkey just to do something so simple.
 
M

Mark Wooding

Lawrence D'Oliveiro said:
This is why I think object orientation ruins your ability to think
properly. For “protocol†read “functionâ€. If you want to implement a
new function, then implement a new function, why do you have to go
through this “subclassing†malarkey just to do something so simple.

Functions don't do type-dependent dispatch, which was, of course, the
whole point.

A `protocol' is a collection of messages understood by a number of
different kinds of objects, causing potentially object-specific
behaviour. I gave the explicit example of serialization: string
probably serializes differently from an integer or a list. The
`__str__' and `__repr__' methods form a simple protocol for object
printing, as an additional example. The `str' and `repr' functions
provides a potentially useful external entry points into the protocol,
but they work by doing a type-dependent dispatch on the argument --
calling the appropriate method.

Object orientation isn't useless or an impediment to clear thinking. It
does seem to have turned into a bizarre kind of religion by some, and
many `mainstream' languages provide very poor object orientation
features (yes, Java and C#, I'm looking at you). But OO can be useful.

I think the notion of `protocol' is central to coherent OO design, but
this seems largely overlooked in much of the literature I've read. This
may mean that many people are muddled about what OO is actually for.

-- [mdw]
 
N

not1xor1 (Alessandro)

Il 09/11/2010 03:18, Lawrence D'Oliveiro ha scritto:
How exactly does

a.f(b, c)

save time over

f(a, b, c)

unfortunately in real world you have:

objId = objId.method(args)

vs.

objId = moduleName.method(objId, args)

I know you can use "from moduleName import *", but IMHO that produces
code much harder to manage and extend
 
C

Chris Rebert

Il 09/11/2010 03:18, Lawrence D'Oliveiro ha scritto:


unfortunately in real world you have:

objId = objId.method(args)

vs.

objId = moduleName.method(objId, args)

I know you can use "from moduleName import *", but IMHO that produces code
much harder to manage and extend

So just don't use the form with the asterisk. Explicitly list the
members you wish to import from the module (`from moduleName import
meth1, meth2`). That way, you don't have to specify the module name on
every invocation, but the origin of the functions is still made
explicit.

Cheers,
Chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top