Where does str class represent its data?

Discussion in 'Python' started by ChrisEdgemon@gmail.com, Jul 12, 2007.

  1. Guest

    I'd like to implement a subclass of string that works like this:

    >>>m = MyString('mail')
    >>>m == 'fail'

    True
    >>>m == 'mail'

    False
    >>>m in ['fail', hail']

    True

    My best attempt for something like this is:

    class MyString(str):
    def __init__(self, seq):
    if self == self.clean(seq): pass
    else: self = MyString(self.clean(seq))

    def clean(self, seq):
    seq = seq.replace("m", "f")

    but this doesn't work. Nothing gets changed.

    I understand that I could just remove the clean function from the
    class and call it every time, but I use this class in several
    locations, and I think it would be much safer to have it do the
    cleaning itself.
     
    , Jul 12, 2007
    #1
    1. Advertising

  2. Miles Guest

    On Jul 11, 7:21 pm, wrote:
    > I'd like to implement a subclass of string that works like this:
    >
    > >>>m = MyString('mail')
    > >>>m == 'fail'

    > True
    > >>>m == 'mail'

    > False
    > >>>m in ['fail', hail']

    >
    > True
    >
    > My best attempt for something like this is:
    >
    > class MyString(str):
    > def __init__(self, seq):
    > if self == self.clean(seq): pass
    > else: self = MyString(self.clean(seq))
    >
    > def clean(self, seq):
    > seq = seq.replace("m", "f")
    >
    > but this doesn't work. Nothing gets changed.
    >
    > I understand that I could just remove the clean function from the
    > class and call it every time, but I use this class in several
    > locations, and I think it would be much safer to have it do the
    > cleaning itself.


    Since strings are immutable, you need to override the __new__ method.
    See http://www.python.org/download/releases/2.2.3/descrintro/#__new__
     
    Miles, Jul 12, 2007
    #2
    1. Advertising

  3. James Stroud Guest

    wrote:
    > I'd like to implement a subclass of string that works like this:
    >
    >
    >>>>m = MyString('mail')
    >>>>m == 'fail'

    >
    > True
    >
    >>>>m == 'mail'

    >
    > False
    >
    >>>>m in ['fail', hail']

    >
    > True
    >
    > My best attempt for something like this is:
    >
    > class MyString(str):
    > def __init__(self, seq):
    > if self == self.clean(seq): pass
    > else: self = MyString(self.clean(seq))
    >
    > def clean(self, seq):
    > seq = seq.replace("m", "f")
    >
    > but this doesn't work. Nothing gets changed.
    >
    > I understand that I could just remove the clean function from the
    > class and call it every time, but I use this class in several
    > locations, and I think it would be much safer to have it do the
    > cleaning itself.
    >


    The "flat is better than nested" philosophy suggests that clean should
    be module level and you should initialize a MyString like such:

    m = MyString(clean(s))

    Where clean is

    def clean(astr):
    return astr.replace('m', 'f')

    Although it appears compulsory to call clean each time you instantiate
    MyString, note that you do it anyway when you check in your __init__.
    Here, you are explicit. Such an approach also eliminates the obligation
    to clean the string under conditions where you know it will already be
    clean--such as deserialization.

    Also, you don't return anything from clean above, so you assign None to
    self here:

    self = MyString(self.clean(seq))

    Additionally, it has been suggested that you use __new__. E.g.:

    py> class MyString(str):
    .... def __new__(cls, astr):
    .... astr = astr.replace('m', 'f')
    .... return super(MyString, cls).__new__(cls, astr)
    ....
    py> MyString('mail')
    'fail'

    But this is an abuse of the str class if you intend to populate your
    subclasses with self-modifying methods such as your clean method. In
    this case, you might consider composition, wherein you access an
    instance of str as an attribute of class instances. The python standard
    library make this easy with the UserString class and the ability to add
    custom methods to its subclasses:

    py> from UserString import UserString as UserString
    py> class MyClass(UserString):
    .... def __init__(self, astr):
    .... self.data = self.clean(astr)
    .... def clean(self, astr):
    .... return astr.replace('m', 'f')
    ....
    py> MyClass('mail')
    'fail'
    py> type(_)
    <type 'instance'>

    This class is much slower than str, but you can always access an
    instance's data attribute directly if you want fast read-only behavior.

    py> astr = MyClass('mail').data
    py> astr
    'fail'

    But now you are back to a built-in type, which is actually the
    point--not everything needs to be in a class. This isn't java.

    James


    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com/
     
    James Stroud, Jul 12, 2007
    #3
  4. Guest

    On Jul 11, 4:21 pm, wrote:
    > I'd like to implement a subclass of string that works like this:
    >
    > >>>m = MyString('mail')
    > >>>m == 'fail'

    > True
    > >>>m == 'mail'

    > False
    > >>>m in ['fail', hail']

    >
    > True
    >
    > My best attempt for something like this is:
    >
    > class MyString(str):
    > def __init__(self, seq):
    > if self == self.clean(seq): pass
    > else: self = MyString(self.clean(seq))
    >
    > def clean(self, seq):
    > seq = seq.replace("m", "f")
    >
    > but this doesn't work. Nothing gets changed.
    >



    What about subclassing str and redefining __eq__:

    >>> class MyString(str):

    .... def __eq__(self, other):
    .... return not str.__eq__(self, other)
    ....
    >>> m = MyString('mail')
    >>> m == 'fail'

    True
    >>> m == 'mail'

    False
    >>> m in ['fail', 'hail']

    True


    --
    Hope this helps,
    Steven
     
    , Jul 12, 2007
    #4
  5. Guest

    On Jul 11, 8:20 pm, ""
    <> wrote:
    > On Jul 11, 4:21 pm, wrote:
    >
    >
    >
    > > I'd like to implement a subclass of string that works like this:

    >
    > > >>>m = MyString('mail')
    > > >>>m == 'fail'

    > > True
    > > >>>m == 'mail'

    > > False
    > > >>>m in ['fail', hail']

    >
    > > True

    >
    > > My best attempt for something like this is:

    >
    > > class MyString(str):
    > > def __init__(self, seq):
    > > if self == self.clean(seq): pass
    > > else: self = MyString(self.clean(seq))

    >
    > > def clean(self, seq):
    > > seq = seq.replace("m", "f")

    >
    > > but this doesn't work. Nothing gets changed.

    >
    > What about subclassing str and redefining __eq__:
    >
    > >>> class MyString(str):

    >
    > ... def __eq__(self, other):
    > ... return not str.__eq__(self, other)
    > ...>>> m = MyString('mail')
    > >>> m == 'fail'

    > True
    > >>> m == 'mail'

    > False
    > >>> m in ['fail', 'hail']

    >
    > True


    ....

    Um, nevermind -- I *completely* misunderstood the question...

    --
    Regards,
    Steven
     
    , Jul 12, 2007
    #5
  6. Guest

    On Jul 11, 9:49 pm, James Stroud <> wrote:
    > wrote:
    > > I'd like to implement a subclass of string that works like this:

    >
    > >>>>m = MyString('mail')
    > >>>>m == 'fail'

    >
    > > True

    >
    > >>>>m == 'mail'

    >
    > > False

    >
    > >>>>m in ['fail', hail']

    >
    > > True

    >
    > > My best attempt for something like this is:

    >
    > > class MyString(str):
    > > def __init__(self, seq):
    > > if self == self.clean(seq): pass
    > > else: self = MyString(self.clean(seq))

    >
    > > def clean(self, seq):
    > > seq = seq.replace("m", "f")

    >
    > > but this doesn't work. Nothing gets changed.

    >
    > > I understand that I could just remove the clean function from the
    > > class and call it every time, but I use this class in several
    > > locations, and I think it would be much safer to have it do the
    > > cleaning itself.

    >
    > The "flat is better than nested" philosophy suggests that clean should
    > be module level and you should initialize a MyString like such:
    >
    > m = MyString(clean(s))
    >
    > Where clean is
    >
    > def clean(astr):
    > return astr.replace('m', 'f')
    >
    > Although it appears compulsory to call clean each time you instantiate
    > MyString, note that you do it anyway when you check in your __init__.
    > Here, you are explicit. Such an approach also eliminates the obligation
    > to clean the string under conditions where you know it will already be
    > clean--such as deserialization.


    Initially, I tried simply calling a clean function on a regular
    string, without any of this messy subclassing. However, I would end
    up accidentally cleaning it more than once, and transforming the
    string was just very messy. I thought that it would be much easier to
    just clean the string once, and then add methods that would give me
    the various transformations that I wanted from the cleaned string.
    Using __new__ seems to be the solution I was looking for.

    >
    > Also, you don't return anything from clean above, so you assign None to
    > self here:
    >
    > self = MyString(self.clean(seq))
    >
    > Additionally, it has been suggested that you use __new__. E.g.:
    >
    > py> class MyString(str):
    > ... def __new__(cls, astr):
    > ... astr = astr.replace('m', 'f')
    > ... return super(MyString, cls).__new__(cls, astr)
    > ...
    > py> MyString('mail')
    > 'fail'
    >
    > But this is an abuse of the str class if you intend to populate your
    > subclasses with self-modifying methods such as your clean method. In
    > this case, you might consider composition, wherein you access an
    > instance of str as an attribute of class instances. The python standard
    > library make this easy with the UserString class and the ability to add
    > custom methods to its subclasses:


    What constitutes an abuse of the str class? Is there some performance
    decrement that results from subclassing str like this? (Unfortunately
    my implementation seems to have a pretty large memory footprint, 400mb
    for about 400,000 files.) Or do you just mean from a philsophical
    standpoint? I guess I don't understand what benefits come from using
    UserString instead of just str.

    Thanks for the help,
    Chris

    >
    > py> from UserString import UserString as UserString
    > py> class MyClass(UserString):
    > ... def __init__(self, astr):
    > ... self.data = self.clean(astr)
    > ... def clean(self, astr):
    > ... return astr.replace('m', 'f')
    > ...
    > py> MyClass('mail')
    > 'fail'
    > py> type(_)
    > <type 'instance'>
    >
    > This class is much slower than str, but you can always access an
    > instance's data attribute directly if you want fast read-only behavior.
    >
    > py> astr = MyClass('mail').data
    > py> astr
    > 'fail'
    >
    > But now you are back to a built-in type, which is actually the
    > point--not everything needs to be in a class. This isn't java.
    >
    > James
    >
    > --
    > James Stroud
    > UCLA-DOE Institute for Genomics and Proteomics
    > Box 951570
    > Los Angeles, CA 90095
    >
    > http://www.jamesstroud.com/
     
    , Jul 12, 2007
    #6
  7. James Stroud Guest

    wrote:
    > On Jul 11, 9:49 pm, James Stroud <> wrote:


    >>The "flat is better than nested" philosophy suggests that clean should
    >>be module level and you should initialize a MyString like such:
    >>
    >> m = MyString(clean(s))
    >>
    >>Where clean is
    >>
    >> def clean(astr):
    >> return astr.replace('m', 'f')
    >>
    >>Although it appears compulsory to call clean each time you instantiate
    >>MyString, note that you do it anyway when you check in your __init__.
    >>Here, you are explicit.

    >
    > Initially, I tried simply calling a clean function on a regular
    > string, without any of this messy subclassing. However, I would end
    > up accidentally cleaning it more than once, and transforming the
    > string was just very messy.


    Its not clear what you mean here. A code snippet might help. In theory,
    you can encapsulate any amount of cleaning inside a single function, so
    it shouldn't be messy. You need only to return the result.


    def fix_whitespace(astr):
    import string
    astr = ''.join(c if c not in string.whitespace else '-' for c in astr)
    return astr.strip()

    def fix_m(astr):
    return astr.replace('m', 'f')

    def clean_up(astr):
    return fix_m(fix_whitespace(astr))


    In theory, if you didn't want a custom string to actually change its own
    value, this could be semantically equivalent to:

    new_str = astr.fix_whitespace().fix_m()

    The latter might be a little more readable than the former.

    > I thought that it would be much easier to
    > just clean the string once, and then add methods that would give me
    > the various transformations that I wanted from the cleaned string.


    If you intended these transformations to be new instances of MyString,
    then this would probably not be abuse of the str built in type.

    > Using __new__ seems to be the solution I was looking for.


    >>Additionally, it has been suggested that you use __new__. E.g.:
    >>
    >>py> class MyString(str):
    >>... def __new__(cls, astr):
    >>... astr = astr.replace('m', 'f')
    >>... return super(MyString, cls).__new__(cls, astr)
    >>...
    >>py> MyString('mail')
    >>'fail'
    >>
    >>But this is an abuse of the str class if you intend to populate your
    >>subclasses with self-modifying methods such as your clean method. In
    >>this case, you might consider composition, wherein you access an
    >>instance of str as an attribute of class instances. The python standard
    >>library make this easy with the UserString class and the ability to add
    >>custom methods to its subclasses:



    > What constitutes an abuse of the str class?


    Changing its value as if it were mutable. This might be ok (though not
    recommended) during instantiation, but you wouldn't want something like
    this:

    py> class MyString(str):
    .... [etc.]
    ....
    py> s = MyString('mail man')
    py> s
    'fail fan'
    py> # kind-of ok up till now, but...
    py> s.fix_whitespace()
    py> s
    'fail-fan'
    py> # abusive to str

    Probably better, if subclassing str, would be something more explicit:

    py> s = MyString('mail man')
    py> s
    'mail man'
    py> s = s.fix_ms()
    py> s
    'fail fan'
    py> s = s.fix_whitespace()
    py> s
    'fail-fan'
    py> s = MyString('mail man')
    py> s
    'mail man'
    py> s.clean_up()
    'fail-fan'

    In this way, users would not need to understand the implementation of
    MyString (i.e. that it gets cleaned by default), and its behavior more
    intuitively resembles the built-in str class--except that MyString has
    added functionality.

    > Is there some performance
    > decrement that results from subclassing str like this?(Unfortunately
    > my implementation seems to have a pretty large memory footprint, 400mb
    > for about 400,000 files.) Or do you just mean from a philsophical
    > standpoint?


    Philosophical from a standpoint of desiring intuitively usable,
    reusable, and maintainable code.

    > I guess I don't understand what benefits come from using
    > UserString instead of just str.


    Probably not many if you think of MyString as I suggest above. But if
    you want it to be magic, as you described originally, then you might
    think about UserString.

    James

    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com/
     
    James Stroud, Jul 13, 2007
    #7
  8. James Stroud Guest

    wrote:
    > I guess I don't understand what benefits come from using
    > UserString instead of just str.


    I hit send without remembering to include this kind of example:

    from UserString import UserString as UserString
    class MyString(UserString):
    def __init__(self, astr):
    UserString.__init__(self, astr)
    self.history = [astr]
    self.fix_ms()
    def update(self):
    if self.history[-1] != self.data:
    self.history.append(self.data)
    def fix_ms(self):
    self.data = self.data.replace('m', 'f')
    self.update()
    def fix_whitespace(self):
    self.data = "".join(c if c not in string.whitespace else '-'
    for c in self.data.strip())
    self.update()

    Now, you have a record of the history of the string, which may help you
    later

    py> s = MyString('mail man')
    py> s
    'fail fan'
    py> s.data
    'fail fan'
    py> s.history
    ['mail man','fail fan']
    py> s.fix_whitespace()
    py> s
    'fail-fan'
    py> s.history()
    ['mail man', 'fail fan', 'fail-fan']

    A history of a str or instances or its subclasses make no sense because
    str is immutable. You may or may not want a history (I just made up this
    use case), but hopefully you see the utility in using regular classes
    for complex behavior instead of forcing an immutable built in type to do
    magic.

    James

    --
    James Stroud
    UCLA-DOE Institute for Genomics and Proteomics
    Box 951570
    Los Angeles, CA 90095

    http://www.jamesstroud.com/
     
    James Stroud, Jul 13, 2007
    #8
  9. Klaas Guest

    On Jul 11, 4:37 pm, Miles <> wrote:

    > Since strings are immutable, you need to override the __new__ method.
    > Seehttp://www.python.org/download/releases/2.2.3/descrintro/#__new__


    In case this isn't clear, here is how to do it:

    In [1]: class MyString(str):
    ...: def __new__(cls, value):
    ...: return str.__new__(cls, value.lower())

    In [2]: s = MyString('Hello World')

    In [3]: s
    Out[3]: 'hello world'

    Note that this will not do fancy stuff like automatically call
    __str__() methods. If you want that, call str() first:

    In [5]: class MyString(str):
    ...: def __new__(cls, value):
    ...: return str.__new__(cls, str(value).lower())

    -Mike
     
    Klaas, Jul 14, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David
    Replies:
    2
    Views:
    494
    Thomas G. Marshall
    Aug 3, 2003
  2. Trevor

    sizeof(str) or sizeof(str) - 1 ?

    Trevor, Apr 3, 2004, in forum: C Programming
    Replies:
    9
    Views:
    656
    CBFalconer
    Apr 10, 2004
  3. thunk
    Replies:
    1
    Views:
    347
    thunk
    Mar 30, 2010
  4. thunk
    Replies:
    0
    Views:
    522
    thunk
    Apr 1, 2010
  5. thunk
    Replies:
    14
    Views:
    655
    thunk
    Apr 3, 2010
Loading...

Share This Page