string.encode on HP-UX

R

Richard Townsend

Using Python-2.3.4 on HP-UX11i, the following code:

import locale
loc = locale.setlocale(locale.LC_ALL)
print 'locale =', loc
loc = locale.nl_langinfo(locale.CODESET)
print 'locale =', loc
print 'hello'.encode(loc, 'replace')

produces:

locale = C C C C C C
locale = roman8
Traceback (most recent call last):
File "test_locale.py", line 13, in ?
print 'hello'.encode(loc, 'replace')
LookupError: unknown encoding: roman8


[The same code on SUSE 9.1 doesn't raise an exception].

Should I be able to pass the value returned by nl_langinfo() to the
string.encode call?

Similar code is used by wxGlade and this exception prevents it from
running.

Does anybody know how to fix this on HP-UX?
 
M

Michael Hudson

Richard Townsend said:
Using Python-2.3.4 on HP-UX11i, the following code:

import locale
loc = locale.setlocale(locale.LC_ALL)
print 'locale =', loc
loc = locale.nl_langinfo(locale.CODESET)
print 'locale =', loc
print 'hello'.encode(loc, 'replace')

produces:

locale = C C C C C C
locale = roman8
Traceback (most recent call last):
File "test_locale.py", line 13, in ?
print 'hello'.encode(loc, 'replace')
LookupError: unknown encoding: roman8


[The same code on SUSE 9.1 doesn't raise an exception].

Should I be able to pass the value returned by nl_langinfo() to the
string.encode call?

What is roman8? If it's some hp-ux specific thingy, I guess the
solution is to teach Python what to do with it. If it's just HP's
name for iso-8859-1 or something then this is easy (mucking with
encodings.aliases).

If it's some custom encoding, finding out what unicode codepoint each
octet maps to and writing a codec a la macroman can't be impossibly
hard.

I guess a patch would be welcome either way.

Cheers,
mwh
 
C

Christopher T King

Using Python-2.3.4 on HP-UX11i, the following code:

import locale
loc = locale.setlocale(locale.LC_ALL)
print 'locale =', loc
loc = locale.nl_langinfo(locale.CODESET)
print 'locale =', loc
print 'hello'.encode(loc, 'replace')

produces:

locale = C C C C C C
locale = roman8
Traceback (most recent call last):
File "test_locale.py", line 13, in ?
print 'hello'.encode(loc, 'replace')
LookupError: unknown encoding: roman8

[The same code on SUSE 9.1 doesn't raise an exception].

My guess is roman8 is HP-UX's version of latin_1. Setting an alias fixes
this:
import encodings
encodings.aliases.aliases['roman8']='latin_1'
'hello'.encode('roman8')
'hello'

You can add those first two lines to a sitecustomize.py file, located
somewhere in your Python path (generally ~/site-packages/ or
/usr/local/lib/python2.X/ should work).
 
R

Richard Townsend

Hi Christopher,

Thanks for your suggestion, however it produces two problems for me.

1. If I execute the code in the interpreter, it still fails like this:
import encodings
encodings.aliases.aliases['roman8']='latin_1'
'hello'.encode('roman8')

Traceback (most recent call last):
File "<pyshell#3>", line 1, in -toplevel-
'hello'.encode('roman8')
LookupError: unknown encoding: roman8


2. If I put the code in site-packages/sitecustomize.py, it fails like
this:

capulet:home/richardt > python -v
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# /opt/python/lib/python2.3/site.pyc matches /opt/python/lib/python2.3/s
ite.py
import site # precompiled from /opt/python/lib/python2.3/site.pyc
# /opt/python/lib/python2.3/os.pyc matches /opt/python/lib/python2.3/os.
py
import os # precompiled from /opt/python/lib/python2.3/os.pyc
import posix # builtin
# /opt/python/lib/python2.3/posixpath.pyc matches /opt/python/lib/python
2.3/posixpath.py
import posixpath # precompiled from /opt/python/lib/python2.3/posixpath.
pyc
# /opt/python/lib/python2.3/stat.pyc matches /opt/python/lib/python2.3/s
tat.py
import stat # precompiled from /opt/python/lib/python2.3/stat.pyc
# /opt/python/lib/python2.3/UserDict.pyc matches /opt/python/lib/python2
..3/UserDict.py
import UserDict # precompiled from /opt/python/lib/python2.3/UserDict.py
c
# /opt/python/lib/python2.3/copy_reg.pyc matches /opt/python/lib/python2
..3/copy_reg.py
import copy_reg # precompiled from /opt/python/lib/python2.3/copy_reg.py
c
# /opt/python/lib/python2.3/types.pyc matches /opt/python/lib/python2.3/
types.py
import types # precompiled from /opt/python/lib/python2.3/types.pyc
# /opt/python/lib/python2.3/site-packages/sitecustomize.pyc matches
/opt/python/lib/python2.3/site-packages/sitecustomize.py
import sitecustomize # precompiled from /opt/python/lib/python2.3/site-
packages/sitecustomize.pyc
import encodings # directory /opt/python/lib/python2.3/encodings
# /opt/python/lib/python2.3/encodings/__init__.pyc matches /opt/python/l
ib/python2.3/encodings/__init__.py
import encodings # precompiled from /opt/python/lib/python2.3/encodings/
__init__.pyc
# /opt/python/lib/python2.3/codecs.pyc matches /opt/python/lib/python2.3
/codecs.py
import codecs # precompiled from /opt/python/lib/python2.3/codecs.pyc
import _codecs # builtin
'import site' failed; traceback:
Traceback (most recent call last):
File "/opt/python/lib/python2.3/site.py", line 355, in ?
import sitecustomize
File "/opt/python/lib/python2.3/site-packages/sitecustomize.py", line
7, in ?
encodings.aliases.aliases['roman8']='latin_1'
AttributeError: 'module' object has no attribute 'aliases'
# /opt/python/lib/python2.3/warnings.pyc matches /opt/python/lib/python2
..3/warnings.py
import warnings # precompiled from /opt/python/lib/python2.3/warnings.py
c
# /opt/python/lib/python2.3/linecache.pyc matches /opt/python/lib/python
2.3/linecache.py
import linecache # precompiled from /opt/python/lib/python2.3/linecache.
pyc
# /opt/python/lib/python2.3/encodings/aliases.pyc matches /opt/python/li
b/python2.3/encodings/aliases.py
import encodings.aliases # precompiled from /opt/python/lib/python2.3/en
codings/aliases.pyc
Python 2.3.4 (#3, May 28 2004, 13:24:19) [C] on hp-ux11
Type "help", "copyright", "credits" or "license" for more information.
 
R

Richard Townsend

Further, if I put the following in sitecustomize.py:

import encodings
print dir(encodings)

I get:

['CodecRegistryError', '__builtins__', '__doc__', '__file__',
'__name__', '__path__', '_cache', '_import_tail',
'_norm_encoding_map', '_unknown', 'codecs', 'exceptions',
'normalize_encoding', 'search_function', 'types']

Notice there is no 'aliases' attribute.

But if I then run this interactively:
['CodecRegistryError', '__builtins__', '__doc__', '__file__',
'__name__', '__path__', '_cache', '_import_tail',
'_norm_encoding_map', '_unknown', 'aliases', 'codecs', 'exceptions',
'normalize_encoding', 'search_function', 'types']

then the 'aliases' attribute is there.
 
C

Christopher T King

Further, if I put the following in sitecustomize.py:

import encodings
print dir(encodings)

I get:

['CodecRegistryError', '__builtins__', '__doc__', '__file__',
'__name__', '__path__', '_cache', '_import_tail',
'_norm_encoding_map', '_unknown', 'codecs', 'exceptions',
'normalize_encoding', 'search_function', 'types']

Notice there is no 'aliases' attribute.

Oops, I had only tested it at the prompt :p I had assumed sitecustomize.py
was run after everything was set up.

This code uses a more defined interface for altering the codecs registry,
uses 'ascii' instead of 'latin_1' (to prevent some confusion), and
I've actually tested it in sitecustomize.py:

import codecs

def roman8(n):
if n=='roman8':
return codecs.lookup('ascii')

codecs.register(roman8)

This achieves the same effect as the aliases trick (which I'm surprised
didn't work for you at the prompt), but is less tricksy and should
therefore work better.
 
R

Richard Brodie

This achieves the same effect as the aliases trick (which I'm surprised
didn't work for you at the prompt), but is less tricksy and should
therefore work better.

I've put what should be a proper codec as a patch on SF. Really needs
testing on HP-UX though....
 
R

Richard Townsend

Christopher T King said:
Oops, I had only tested it at the prompt :p I had assumed sitecustomize.py
was run after everything was set up.

This code uses a more defined interface for altering the codecs registry,
uses 'ascii' instead of 'latin_1' (to prevent some confusion), and
I've actually tested it in sitecustomize.py:

import codecs

def roman8(n):
if n=='roman8':
return codecs.lookup('ascii')

codecs.register(roman8)

This achieves the same effect as the aliases trick (which I'm surprised
didn't work for you at the prompt), but is less tricksy and should
therefore work better.

Hi Christopher,

Thanks for your new suggestion. I have tested it on HP-UX and it doesn't
raise the exception anymore.

regards,
Richard
 
R

Richard Townsend

Richard Brodie said:
I've put what should be a proper codec as a patch on SF. Really needs
testing on HP-UX though....

Hi Richard,

I copied your hp_roman.py file to ../lib/python2.3/encodings and added
the line

'roman8' : 'hp_roman'

to aliases.py and string.encode('roman8') now runs without raising an
exception.

I called string.printable.encode('roman8') and the returned string
matches string.printable.

Are there any other tests you want me to do with this on HP-UX?
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Richard said:
Should I be able to pass the value returned by nl_langinfo() to the
string.encode call?

I believe all of "yes", "no", and "perhaps not" are valid answers. Yes,
it is intentional that the strings returned by nl_langinfo are
understood as codec names. However, the string is returned from the OS,
and the codec is provided by Python, so it is perhaps not accepted.

But no, you should never ever invoce string.encode with a character
encoding. Instead, you should use string.decode to use encodings in
a meaningful way. It is an unfortunate "feature" that string.encode
is available and does "something".

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top