Can I make unicode in a repr() print readably?

Terry Hancock · Sep 9, 2006

I still run into my own ignorance a lot with unicode in Python.

Is it possible to define some combination of __repr__, __str__,
and/or __unicode__ so that the unicode() wrapper isn't necessary
in this statement:

>>> print unicode(jp.concepts['adjectives']['BLUE'][0])

Click to expand...

Click to expand...

<GLOSS: é’ã„, cl=None, {'wd': u'\u9752\u3044'}>

(i.e. can I make it so that the object that print gets is already
unicode, so that the label 'é’ã„' will print readably?)

Or, put another way, what exactly does 'print' do when it gets
a class instance to print? It seems to do the right thing if
given a unicode or string object, but I cant' figure out how to
make it do the same thing for a class instance.

I guess it would've seemed more intuitive to me if print attempted
to use __unicode__() first, then __str__(), and then __repr__(). But
it apparently skips straight to __str__(), unless the object is already
a unicode object. (?)

The following doesn't bother me:

>>> jp.concepts['adjectives']['BLUE'][0]

Click to expand...

Click to expand...

<GLOSS: \u9752\u3044, cl=None, {'wd': u'\u9752\u3044'}>

And I understand that I might want that if I'm working in
an ASCII-only terminal. But it's a big help to be able to
read/recognize the labels when I'm working with localized
encodings, and I'd like to save the extra typing if I'm
going to be looking at a lot of these

So far, I've tried overriding the __unicode__ method to return
the unicode representation (doesn't seem like print calls it,
though), and I've tried returning the same thing from __repr__,
but the latter causes this unpleasant result:

>>> print jp.concepts['adjectives']['BLUE'][0]

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode characters in position
8-9: ordinal not in range(128)

so I don't think I want to do that.

Advice?

Terry

Guest · Sep 10, 2006

Terry said:
Is it possible to define some combination of __repr__, __str__,
and/or __unicode__ so that the unicode() wrapper isn't necessary
in this statement:

I'm not aware of a way of doing so.

Or, put another way, what exactly does 'print' do when it gets
a class instance to print? It seems to do the right thing if
given a unicode or string object, but I cant' figure out how to
make it do the same thing for a class instance.

It won't. PyFile_WriteObject checks for Unicode objects, and whether
the file has an encoding attribute set, and if so, encodes the
Unicode object.

If it is not a Unicode object, it falls through to PyObject_Print,
which first checks for the tp_print slot (which can't be set in
Python), then uses PyObject_Str (which requires that the __str__
result is a true byte string), or PyObject_Repr (if the RAW
flag isn't set - it is when printing). PyObject_Str first checks
for tp_str; if that isn't set, it falls back to PyObject_Repr.

And I understand that I might want that if I'm working in
an ASCII-only terminal. But it's a big help to be able to
read/recognize the labels when I'm working with localized
encodings, and I'd like to save the extra typing if I'm
going to be looking at a lot of these

You can save some typing, of course, with a helper function:

def p(o):
print unicode(o)

I agree that this is not optimal; contributions are welcome.
It would probably be easiest to drop the guarantee that
PyObject_Str returns a true string, or use _PyObject_Str
(which does not make this guarantee) in PyObject_Print.
One would have to think what the effect on backwards
compatibility is of such a change.

Regards,
Martin

Terry Hancock · Sep 11, 2006

Martin said:
It won't. PyFile_WriteObject checks for Unicode objects, and whether
the file has an encoding attribute set, and if so, encodes the
Unicode object.

If it is not a Unicode object, it falls through to PyObject_Print,
which first checks for the tp_print slot (which can't be set in
Python), then uses PyObject_Str (which requires that the __str__
result is a true byte string), or PyObject_Repr (if the RAW flag
isn't set - it is when printing). PyObject_Str first checks for
tp_str; if that isn't set, it falls back to PyObject_Repr.

You can save some typing, of course, with a helper function:

def p(o): print unicode(o)

Yeah, that's what I've done as it stands. I think it's actually fewer
keystrokes that way, but it is still inconsistent* with other objects,
of course.

I agree that this is not optimal; contributions are welcome. It would
probably be easiest to drop the guarantee that PyObject_Str returns a
true string, or use _PyObject_Str (which does not make this
guarantee) in PyObject_Print. One would have to think what the effect
on backwards compatibility is of such a change.

Ah, contribute to Python itself. I'll have to think about it -- I don't do
a lot of C programming these days, but it sounds like an idea.

I don't know about the backwards compatibility issue. I'm not sure
what would be affected. But "print" frequently generates encoded
Unicode output if the stream supports it, so there is no guarantee
whether it produces unicode or string output now. I think it's clear
that str() *must* return an ordinary Python string.

I think what would make sense is for the "print" statement to attempt
to call __unicode__ on an instance before attempting to call __str__
(just as it currently falls back from __str__ to __repr__). That seems like
it would be pretty consistent, right?

Cheers,
Terry

*Okay, actually it is perfectly consistent in a technical sense, but not in
the utility, "this is what you do to examine the object", sense.

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Sep 11, 2006

Terry said:
I don't know about the backwards compatibility issue. I'm not sure
what would be affected. But "print" frequently generates encoded
Unicode output if the stream supports it, so there is no guarantee
whether it produces unicode or string output now.

I'm not worried about the code path that print takes - it is obvious
that Unicode objects are allowed to show up, and will cause
UnicodeErrors if encoding them with the stream encoding fails.

I'm (slightly) worried about other code paths that may be affected.

I think it's clear
that str() *must* return an ordinary Python string.

Notice, however, that __str__ may return Unicode objects; those
get silently converted with the system encoding.

I think what would make sense is for the "print" statement to attempt
to call __unicode__ on an instance before attempting to call __str__
(just as it currently falls back from __str__ to __repr__). That seems
like
it would be pretty consistent, right?

This is one option; the other option is that print does not
convert unicode strings returned from __str__ with the system
encoding, but with the stream's encoding. But yes; your approach
might work as well (with the then-incompatibility that __unicode__
will get called in contexts where it wasn't called before).

It will probably be necessary to collect a third and fourth
opinion from python-dev; the actual implementation of whatever
approach gets chosen should be easy. And there should be
documentation changes, of course.

Regards,
Martin

UnicodeEncodeError during repr()	3	Apr 19, 2010
I want to make such a page in which i can put my excel file.	1	Jun 23, 2023
python 3.3 repr	28	Nov 15, 2013
unicode bit me	20	May 8, 2009
Unicode	2	Mar 15, 2013
Unicode	20	Dec 16, 2012
__unicode__() works, unicode() blows up.	3	Nov 4, 2012
Unicode Chars in Windows Path	12	Apr 3, 2014

Can I make unicode in a repr() print readably?

Terry Hancock

Guest

Terry Hancock

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads