Faulty encoding settings

N

Neil Cerutti

How do I cope with faulty encoding settings?

I'm writing an application that needs all internal character data
to be stored in iso-8859-1. It also must allow input and output
using stdin and stdout.

This works just fine with the Windows binary of Python.
sys.stdin.encoding is correctly set to the encoding of the
current terminal ('cp437').

s = sys.stdin.readline()
# Convert to iso-8859-1.
s = s.decode(sys.stdin.encoding).encode('iso-8859-1')

Granted, users are constrained to entering characters in the
cp437 charset, but that's better than the following.

The Cygwin binary I have (2.4.3) reports sys.stdin.encoding as
'US-ASCII', which is quite wrong. A Cygwin terminal uses, as far
as I can tell, iso-8859-1. This renders the above construction
useless if the user enters any character codes above 128.
Using raw_input instead of readline addresses the problem by making
it impossible to enter non-ascii text.

Please advise.

This is only a temporary problem, as eventually this application
will use Tkinter as an interface instead. But of course then I'll
probably have a bunch of new problems. ;)
 
M

Marc 'BlackJack' Rintsch

I'm writing an application that needs all internal character data
to be stored in iso-8859-1. It also must allow input and output
using stdin and stdout.

This works just fine with the Windows binary of Python.
sys.stdin.encoding is correctly set to the encoding of the
current terminal ('cp437').

s = sys.stdin.readline()
# Convert to iso-8859-1.
s = s.decode(sys.stdin.encoding).encode('iso-8859-1')

Granted, users are constrained to entering characters in the
cp437 charset, but that's better than the following.

The Cygwin binary I have (2.4.3) reports sys.stdin.encoding as
'US-ASCII', which is quite wrong. A Cygwin terminal uses, as far
as I can tell, iso-8859-1. This renders the above construction
useless if the user enters any character codes above 128.
Using raw_input instead of readline addresses the problem by making
it impossible to enter non-ascii text.

Please advise.

Give the user the ability to explicitly give an encoding. Using the
encoding attribute of files is quite fragile. If you redirect stdin or
stdout the encoding is set to None for example because the interpreter
can't tell what encoding the "other side" of the redirection produces or
expects.

BTW the US-ASCII isn't wrong but just limiting as everything in the ASCII
range is the same in ISO-8859-1.

Ciao,
Marc 'BlackJack' Rintsch
 
?

=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=

Neil said:
The Cygwin binary I have (2.4.3) reports sys.stdin.encoding as
'US-ASCII', which is quite wrong. A Cygwin terminal uses, as far
as I can tell, iso-8859-1. This renders the above construction
useless if the user enters any character codes above 128.
Using raw_input instead of readline addresses the problem by making
it impossible to enter non-ascii text.

Please advise.

In principle, setting the LANG environment variable should help.
Unfortunately, Cygwin doesn't implement locales correctly (neither
in the Unix way, nor in the Windows way), hence Python's machinery
fails.

If you believe that a Cygwin terminal always uses Latin-1 (try
entering ¤, though - it could be windows-1252 instead), you should
be able to hard-code that, by determining that it is a Cygwin
Python, or that you are running in a Cygwin terminal.

Regards,
Martin
 
N

Neil Cerutti

Give the user the ability to explicitly give an encoding.
Using the encoding attribute of files is quite fragile. If you
redirect stdin or stdout the encoding is set to None for
example because the interpreter can't tell what encoding the
"other side" of the redirection produces or expects.

Thanks for that sensible idea.

On the other hand, if Python's implementors couldn't figure out
what the encoding is, I doubt the average user has a prayer. ;-)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top