Sub-classing unicode: getting the unicode value

T

Torsten Bronger

Hallöchen!

I sub-classed unicode in an own class called "Excerpt", and now I
try to implement a __unicode__ method. In this method, I want to
get the actual value of the instance, i.e. the unicode string:

def __unicode__(self):
"""Returns the Unicode representation of Excerpt. Note that this is buffered,
so don't be afraid of calling it many times.

:Return:
- Unicode representation of ``self``

:rtype: unicode
"""
if not hasattr(self, "__unicode"):
self.__unicode = super(Excerpt, self).__unicode__()
return self.__unicode

Unfortunately, unicode objects don't have a __unicode__ method.
However, unicode(super(Excerpt, self)) is also forbidden because
super() allows attribute access only (why by the way?).

How does my object get its own value?

Tschö,
Torsten.
 
G

Gabriel Genellina

I sub-classed unicode in an own class called "Excerpt", and now I
try to implement a __unicode__ method.  In this method, I want to
get the actual value of the instance, i.e. the unicode string:

The "actual value of the instance", given that it inherits from
unicode, is... self.
Are you sure you *really* want to inherit from unicode? Don't you want
to store an unicode object as an instance attribute?
Unfortunately, unicode objects don't have a __unicode__ method.

Because it's not needed; unicode objects are already unicode objects,
and since they're immutable, there is no need to duplicate them. If it
existed, would always return self.
However, unicode(super(Excerpt, self)) is also forbidden because
super() allows attribute access only (why by the way?).

(because its purpose is to allow cooperative methods in a multiple
inheritance hierarchy)
How does my object get its own value?

"its own value" is the instance itself
 
T

Torsten Bronger

Hallöchen!

Gabriel said:
The "actual value of the instance", given that it inherits from
unicode, is... self.

But then it is not unicode but Excerpt which I don't want. The idea
is to buffer the unicode representation in order to gain efficiency.
Otherwise, a lot of unicode conversion would take place.
Are you sure you *really* want to inherit from unicode? Don't you
want to store an unicode object as an instance attribute?

No, I need many unicode operations (concatenating, slicing, ...).
[...]
However, unicode(super(Excerpt, self)) is also forbidden because
super() allows attribute access only (why by the way?).

(because its purpose is to allow cooperative methods in a multiple
inheritance hierarchy)

It would be more useful, however, if it returned full-fledged
objects. Or if there was another way to get a full-fledged mother
object.

Tschö,
Torsten.
 
G

Gabriel Genellina

But then it is not unicode but Excerpt which I don't want.  The idea
is to buffer the unicode representation in order to gain efficiency.
Otherwise, a lot of unicode conversion would take place.

Still I don't see why you want to inherit from unicode.
No, I need many unicode operations (concatenating, slicing, ...).

If you don't redefine __add__, __iadd__, __getitem__ etc. you'll end
up with bare unicode objects anyway; Excerpt + unicode = unicode. So
you'll have to redefine all required operators; then, why inherit from
unicode at all?
An example may help:

class Excerpt(object):
def __init__(self, value):
self.value = value # anything
def __str__(self):
return "%s(%r)" % (self.__class__.__name__, self.value)
def __unicode__(self):
if not hasattr(self, "_unicode"):
self._unicode = unicode(self.value)
return self._unicode
def __add__(self, other):
return Excerpt(unicode(self)+unicode(other))
def __getitem__(self, index):
return Excerpt(unicode(self)[index])

py> e1 = Excerpt((1,2,3))
py> e2 = Excerpt("Hello")
py> print e1
Excerpt((1, 2, 3))
py> print unicode(e1)
(1, 2, 3)
py> e3 = e1+e2
py> print e3
Excerpt(u'(1, 2, 3)Hello')
py> e3._unicode
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Excerpt' object has no attribute '_unicode'
py> print e3[7:10]
Excerpt(u'3)H')
py> e3._unicode
u'(1, 2, 3)Hello'
It would be more useful, however, if it returned full-fledged
objects.  Or if there was another way to get a full-fledged mother
object.

There is no such "mother object", in Python an instance is usually a
whole unique object, not an onion-like structure.
 
T

Torsten Bronger

Hallöchen!

Gabriel said:
[...]

But then it is not unicode but Excerpt which I don't want.  The
idea is to buffer the unicode representation in order to gain
efficiency. Otherwise, a lot of unicode conversion would take
place.

Still I don't see why you want to inherit from unicode.

Because I want to use some of its functionality and override the
rest. And now, all I want to do is to convert the current Excerpt
object into a real unicode object that represents the same string.
Outside the object, I'd simply say unicode(my_excerpt), but this
doesn't work within the __unicode__ method of Excerpt.
[...]
It would be more useful, however, if it returned full-fledged
objects. Or if there was another way to get a full-fledged
mother object.

There is no such "mother object", in Python an instance is usually
a whole unique object, not an onion-like structure.

But the mother object could be created.

I simply see no reason why "super(MotherClass, self)[key]" must
fail, except maybe because it's harder to realise in the underlying
implementation of Python.

Tschö,
Torsten.
 
M

Martin v. Löwis

How does my object get its own value?

def __unicode__(self):
return unicode(self)

Regards,
Martin
 
J

John Machin

Hallöchen!




But then it is not unicode but Excerpt which I don't want. The idea
is to buffer the unicode representation in order to gain efficiency.
Otherwise, a lot of unicode conversion would take place.

I'm confused: you are subclassing unicode, and want to use several
unicode methods, but the object's value appears to be an 8-bit
string!?!?

Care to divulge the code for your __init__ method?
 
T

Torsten Bronger

Hallöchen!
def __unicode__(self):
return unicode(self)

I get an endless recursion with this.

I must admit, though, that I probably overestimate the costs
connected with unicode(my_excerpt) because Gabriel is probably right
that no real conversion takes place. A mere attribute lookup may
still be cheaper, but only very slightly.

Tschö,
Torsten.
 
T

Torsten Bronger

Hallöchen!

John said:
[...]

But then it is not unicode but Excerpt which I don't want. The
idea is to buffer the unicode representation in order to gain
efficiency. Otherwise, a lot of unicode conversion would take
place.

I'm confused: you are subclassing unicode, and want to use several
unicode methods, but the object's value appears to be an 8-bit
string!?!?

No, a unicode. Never said something else ...
Care to divulge the code for your __init__ method?

It has no __init__, only a __new__. But I doubt that it is of much
use here:

def __new__(cls, excerpt, mode, url=None,
pre_substitutions=None, post_substitutions=None):
if mode == "NONE":
instance = unicode.__new__(cls, excerpt)
elif mode == "PRE":
preprocessed_text, original_positions, escaped_positions = \
cls.apply_pre_input_method(excerpt, url, pre_substitutions)
instance = unicode.__new__(cls, preprocessed_text)
instance.original_text = unicode(excerpt)
instance.original_positions = original_positions
instance.escaped_positions = escaped_positions
instance.post_substitutions = post_substitutions
elif mode == "POST":
postprocessed_text, original_positions, escaped_positions = \
cls.apply_post_input_method(excerpt)
instance = unicode.__new__(cls, postprocessed_text)
instance.original_positions = original_positions
instance.escaped_positions = escaped_positions
instance.original_text = excerpt.original_text
instance.post_substitutions = post_substitutions
instance.__escaped_text = None
instance.__unicode = None
return instance


Tschö,
Torsten.
 
M

Martin v. Löwis

How does my object get its own value?
I get an endless recursion with this.

I see. That worked fine in Python 2.4, but give a stack overflow
in Python 2.5.

Depending on your exact class definition, something like

return super(unicode, self).__getitem__(slice(0,len(self)))

should work.
I must admit, though, that I probably overestimate the costs
connected with unicode(my_excerpt) because Gabriel is probably right
that no real conversion takes place. A mere attribute lookup may
still be cheaper, but only very slightly.

I don't understand that remark. To implement the conversion in the
way you want it to be, it definitely needs to produce a copy of
self.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top