Where does str class represent its data?

C

ChrisEdgemon

I'd like to implement a subclass of string that works like this:
m = MyString('mail')
m == 'fail' True
m == 'mail' False
m in ['fail', hail']
True

My best attempt for something like this is:

class MyString(str):
def __init__(self, seq):
if self == self.clean(seq): pass
else: self = MyString(self.clean(seq))

def clean(self, seq):
seq = seq.replace("m", "f")

but this doesn't work. Nothing gets changed.

I understand that I could just remove the clean function from the
class and call it every time, but I use this class in several
locations, and I think it would be much safer to have it do the
cleaning itself.
 
M

Miles

I'd like to implement a subclass of string that works like this:
m = MyString('mail')
m == 'fail' True
m == 'mail' False
m in ['fail', hail']

True

My best attempt for something like this is:

class MyString(str):
def __init__(self, seq):
if self == self.clean(seq): pass
else: self = MyString(self.clean(seq))

def clean(self, seq):
seq = seq.replace("m", "f")

but this doesn't work. Nothing gets changed.

I understand that I could just remove the clean function from the
class and call it every time, but I use this class in several
locations, and I think it would be much safer to have it do the
cleaning itself.

Since strings are immutable, you need to override the __new__ method.
See http://www.python.org/download/releases/2.2.3/descrintro/#__new__
 
J

James Stroud

I'd like to implement a subclass of string that works like this:

m = MyString('mail')
m == 'fail'
True
m == 'mail'
False
m in ['fail', hail']

True

My best attempt for something like this is:

class MyString(str):
def __init__(self, seq):
if self == self.clean(seq): pass
else: self = MyString(self.clean(seq))

def clean(self, seq):
seq = seq.replace("m", "f")

but this doesn't work. Nothing gets changed.

I understand that I could just remove the clean function from the
class and call it every time, but I use this class in several
locations, and I think it would be much safer to have it do the
cleaning itself.

The "flat is better than nested" philosophy suggests that clean should
be module level and you should initialize a MyString like such:

m = MyString(clean(s))

Where clean is

def clean(astr):
return astr.replace('m', 'f')

Although it appears compulsory to call clean each time you instantiate
MyString, note that you do it anyway when you check in your __init__.
Here, you are explicit. Such an approach also eliminates the obligation
to clean the string under conditions where you know it will already be
clean--such as deserialization.

Also, you don't return anything from clean above, so you assign None to
self here:

self = MyString(self.clean(seq))

Additionally, it has been suggested that you use __new__. E.g.:

py> class MyString(str):
.... def __new__(cls, astr):
.... astr = astr.replace('m', 'f')
.... return super(MyString, cls).__new__(cls, astr)
....
py> MyString('mail')
'fail'

But this is an abuse of the str class if you intend to populate your
subclasses with self-modifying methods such as your clean method. In
this case, you might consider composition, wherein you access an
instance of str as an attribute of class instances. The python standard
library make this easy with the UserString class and the ability to add
custom methods to its subclasses:

py> from UserString import UserString as UserString
py> class MyClass(UserString):
.... def __init__(self, astr):
.... self.data = self.clean(astr)
.... def clean(self, astr):
.... return astr.replace('m', 'f')
....
py> MyClass('mail')
'fail'
py> type(_)
<type 'instance'>

This class is much slower than str, but you can always access an
instance's data attribute directly if you want fast read-only behavior.

py> astr = MyClass('mail').data
py> astr
'fail'

But now you are back to a built-in type, which is actually the
point--not everything needs to be in a class. This isn't java.

James


--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
A

attn.steven.kuo

I'd like to implement a subclass of string that works like this:
m = MyString('mail')
m == 'fail' True
m == 'mail' False
m in ['fail', hail']

True

My best attempt for something like this is:

class MyString(str):
def __init__(self, seq):
if self == self.clean(seq): pass
else: self = MyString(self.clean(seq))

def clean(self, seq):
seq = seq.replace("m", "f")

but this doesn't work. Nothing gets changed.


What about subclassing str and redefining __eq__:
.... def __eq__(self, other):
.... return not str.__eq__(self, other)
....
m = MyString('mail')
m == 'fail' True
m == 'mail' False
m in ['fail', 'hail']
True
 
A

attn.steven.kuo

I'd like to implement a subclass of string that works like this:
m = MyString('mail')
m == 'fail' True
m == 'mail' False
m in ['fail', hail']

My best attempt for something like this is:
class MyString(str):
def __init__(self, seq):
if self == self.clean(seq): pass
else: self = MyString(self.clean(seq))
def clean(self, seq):
seq = seq.replace("m", "f")
but this doesn't work. Nothing gets changed.

What about subclassing str and redefining __eq__:

... def __eq__(self, other):
... return not str.__eq__(self, other)
...>>> m = MyString('mail')
m == 'fail' True
m == 'mail' False
m in ['fail', 'hail']

True

....

Um, nevermind -- I *completely* misunderstood the question...
 
C

ChrisEdgemon

I'd like to implement a subclass of string that works like this:
m = MyString('mail')
m == 'fail'
m == 'mail'
m in ['fail', hail']

My best attempt for something like this is:
class MyString(str):
def __init__(self, seq):
if self == self.clean(seq): pass
else: self = MyString(self.clean(seq))
def clean(self, seq):
seq = seq.replace("m", "f")
but this doesn't work. Nothing gets changed.
I understand that I could just remove the clean function from the
class and call it every time, but I use this class in several
locations, and I think it would be much safer to have it do the
cleaning itself.

The "flat is better than nested" philosophy suggests that clean should
be module level and you should initialize a MyString like such:

m = MyString(clean(s))

Where clean is

def clean(astr):
return astr.replace('m', 'f')

Although it appears compulsory to call clean each time you instantiate
MyString, note that you do it anyway when you check in your __init__.
Here, you are explicit. Such an approach also eliminates the obligation
to clean the string under conditions where you know it will already be
clean--such as deserialization.

Initially, I tried simply calling a clean function on a regular
string, without any of this messy subclassing. However, I would end
up accidentally cleaning it more than once, and transforming the
string was just very messy. I thought that it would be much easier to
just clean the string once, and then add methods that would give me
the various transformations that I wanted from the cleaned string.
Using __new__ seems to be the solution I was looking for.
Also, you don't return anything from clean above, so you assign None to
self here:

self = MyString(self.clean(seq))

Additionally, it has been suggested that you use __new__. E.g.:

py> class MyString(str):
... def __new__(cls, astr):
... astr = astr.replace('m', 'f')
... return super(MyString, cls).__new__(cls, astr)
...
py> MyString('mail')
'fail'

But this is an abuse of the str class if you intend to populate your
subclasses with self-modifying methods such as your clean method. In
this case, you might consider composition, wherein you access an
instance of str as an attribute of class instances. The python standard
library make this easy with the UserString class and the ability to add
custom methods to its subclasses:

What constitutes an abuse of the str class? Is there some performance
decrement that results from subclassing str like this? (Unfortunately
my implementation seems to have a pretty large memory footprint, 400mb
for about 400,000 files.) Or do you just mean from a philsophical
standpoint? I guess I don't understand what benefits come from using
UserString instead of just str.

Thanks for the help,
Chris
 
J

James Stroud

Initially, I tried simply calling a clean function on a regular
string, without any of this messy subclassing. However, I would end
up accidentally cleaning it more than once, and transforming the
string was just very messy.

Its not clear what you mean here. A code snippet might help. In theory,
you can encapsulate any amount of cleaning inside a single function, so
it shouldn't be messy. You need only to return the result.


def fix_whitespace(astr):
import string
astr = ''.join(c if c not in string.whitespace else '-' for c in astr)
return astr.strip()

def fix_m(astr):
return astr.replace('m', 'f')

def clean_up(astr):
return fix_m(fix_whitespace(astr))


In theory, if you didn't want a custom string to actually change its own
value, this could be semantically equivalent to:

new_str = astr.fix_whitespace().fix_m()

The latter might be a little more readable than the former.
I thought that it would be much easier to
just clean the string once, and then add methods that would give me
the various transformations that I wanted from the cleaned string.

If you intended these transformations to be new instances of MyString,
then this would probably not be abuse of the str built in type.
Using __new__ seems to be the solution I was looking for.

What constitutes an abuse of the str class?

Changing its value as if it were mutable. This might be ok (though not
recommended) during instantiation, but you wouldn't want something like
this:

py> class MyString(str):
.... [etc.]
....
py> s = MyString('mail man')
py> s
'fail fan'
py> # kind-of ok up till now, but...
py> s.fix_whitespace()
py> s
'fail-fan'
py> # abusive to str

Probably better, if subclassing str, would be something more explicit:

py> s = MyString('mail man')
py> s
'mail man'
py> s = s.fix_ms()
py> s
'fail fan'
py> s = s.fix_whitespace()
py> s
'fail-fan'
py> s = MyString('mail man')
py> s
'mail man'
py> s.clean_up()
'fail-fan'

In this way, users would not need to understand the implementation of
MyString (i.e. that it gets cleaned by default), and its behavior more
intuitively resembles the built-in str class--except that MyString has
added functionality.
Is there some performance
decrement that results from subclassing str like this?(Unfortunately
my implementation seems to have a pretty large memory footprint, 400mb
for about 400,000 files.) Or do you just mean from a philsophical
standpoint?

Philosophical from a standpoint of desiring intuitively usable,
reusable, and maintainable code.
I guess I don't understand what benefits come from using
UserString instead of just str.

Probably not many if you think of MyString as I suggest above. But if
you want it to be magic, as you described originally, then you might
think about UserString.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
J

James Stroud

I guess I don't understand what benefits come from using
UserString instead of just str.

I hit send without remembering to include this kind of example:

from UserString import UserString as UserString
class MyString(UserString):
def __init__(self, astr):
UserString.__init__(self, astr)
self.history = [astr]
self.fix_ms()
def update(self):
if self.history[-1] != self.data:
self.history.append(self.data)
def fix_ms(self):
self.data = self.data.replace('m', 'f')
self.update()
def fix_whitespace(self):
self.data = "".join(c if c not in string.whitespace else '-'
for c in self.data.strip())
self.update()

Now, you have a record of the history of the string, which may help you
later

py> s = MyString('mail man')
py> s
'fail fan'
py> s.data
'fail fan'
py> s.history
['mail man','fail fan']
py> s.fix_whitespace()
py> s
'fail-fan'
py> s.history()
['mail man', 'fail fan', 'fail-fan']

A history of a str or instances or its subclasses make no sense because
str is immutable. You may or may not want a history (I just made up this
use case), but hopefully you see the utility in using regular classes
for complex behavior instead of forcing an immutable built in type to do
magic.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
K

Klaas

Since strings are immutable, you need to override the __new__ method.
Seehttp://www.python.org/download/releases/2.2.3/descrintro/#__new__

In case this isn't clear, here is how to do it:

In [1]: class MyString(str):
...: def __new__(cls, value):
...: return str.__new__(cls, value.lower())

In [2]: s = MyString('Hello World')

In [3]: s
Out[3]: 'hello world'

Note that this will not do fancy stuff like automatically call
__str__() methods. If you want that, call str() first:

In [5]: class MyString(str):
...: def __new__(cls, value):
...: return str.__new__(cls, str(value).lower())

-Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top