treating str as unicode in legacy code?

Ben · Apr 12, 2007

I'm left with some legacy code using plain old str, and I need to make
sure it works with unicode input/output. I have a simple plan to do
this:

- Run the code with "python -U" so all the string literals become
unicode litrals.
- Add this statement

str = unicode

to all .py files so the type comparison (e.g., type('123') == str)
would work.

Did I miss anything? Does this sound like a workable plan?

Thanks!

Steve Holden · Apr 13, 2007

Ben said:
I'm left with some legacy code using plain old str, and I need to make
sure it works with unicode input/output. I have a simple plan to do
this:

- Run the code with "python -U" so all the string literals become
unicode litrals.
- Add this statement

str = unicode

to all .py files so the type comparison (e.g., type('123') == str)
would work.

Did I miss anything? Does this sound like a workable plan?

Thanks!

Well, don't forget that the assignment to str *shadows* the built-in
rather than replacing it, so there may be places (imported modules being
the example that most readily springs to mind) where that replacement
won't be effective.

Plus which in CPython the C parts of the code may well be creating and
expecting objects of type str but they won't use the Python naming
mechanism at all, so you will have no way to effect changes in those
behaviors.

This will probably account for about 95% of any strangeness you see, but
it's probably a good first step in the conversion process.

regards
Steve

John Machin · Apr 14, 2007

I'm left with some legacy code using plain oldstr, and I need to make
sure it works withunicodeinput/output. I have a simple plan to do
this:

- Run the code with "python -U" so all the string literals becomeunicodelitrals.

Requiring that the code is always run with a non-default argument
doesn't seem very robust/portable to me.

- Add this statement

str=unicode

to all .py files so the type comparison (e.g., type('123') ==str)
would work.

IMVHO (1) doing that merely changes "legacy code" to "kludged legacy
code" (2) there is no substitute for reading the code and trying to
nut out what it is doing.

Do you mean that those two things are the ONLY changes you plan to
make?

Did I miss anything? Does this sound like a workable plan?

Do you need to make sure it still works with ASCII input? With input
in some other encoding e.g. cp1252?

What do you mean by "unicode input"? Bear in mind that if you want to
work with Python unicode objects internally, input from a file /
socket / whatever will need to be decoded i.e. you will have to read
the code and make appropriate changes. Data stored in (say) utf_16_le
encoding is not "unicode" in the sense that you need; it still has to
be decoded.

What do you mean by "unicode output"? You are going to need to encode
your output.

This doesn't work; the output is not "unicode" in any meaningful
sense:### Warning: you need to hope that all builtins etc that you are
calling cope with unicode arguments as well as the above one does.'abcde\r\n'

This doesn't work; it crashes.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in
position 5:
ordinal not in range(128)
Some object methods work differently with unicode; e.g. (1)
str.translate and unicode.translate.

(2)

'abc\xA0def'.split() ['abc\xa0def']
u'abc\xA0def'.split()

Click to expand...

Click to expand...

[u'abc', u'def']
NameError: name 'isspace' is not defined
HTH,
John

Need advices regarding the strings (str, unicode, coding) used asinterface for an external library.	2	Nov 22, 2010
Code sharing	2	Oct 15, 2024
Treating a unicode string as latin-1	8	Jan 3, 2008
Revised PEP 349: Allow str() to return unicode strings	2	Aug 22, 2005
Python Unicode handling wins again -- mostly	67	Nov 29, 2013
Python dict as unicode	1	Nov 24, 2010
Anoying unicode / str conversion problem	2	Jan 26, 2009
Can anyone code this for me ?	1	Dec 6, 2024

treating str as unicode in legacy code?

Ben

Steve Holden

John Machin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads