How to print a unicode string?

D

damonwischik

I'd like to print out a unicode string.

I'm running Python inside Emacs, which understands utf-8, so I want to
force Python to send utf-8 to sys.stdout.

From what I've googled, I think I need to set my locale. I don't
understand how.

import locale
print locale.getlocale()
--> (None,None)
print locale.getdefaultlocal()
--> ('en_GB','cp1252')
print locale.normalize('en_GB.utf-8')
--> en_GB.UTF8
locale.setlocale(locale.LC_ALL,'en_GB.UTF8')
--> locale.Error: unsupported locale setting

I'd be grateful for advice.
Damon.
 
D

damonwischik

Just because the locale library knows the normalised name for it
doesn't mean it's available on your OS. Have you confirmed that your
OS (independent of Python) supports the locale you're trying to set?

No. How do I found out which locales my OS supports? (I'm running
Windows
XP.) Why does it matter what locales my OS supports, when all I want
is to set the encoding to be used for the output, and the output is
all
going to Emacs, and I know that Emacs supports utf8?
(Emacs 22.2.1 i386-mingw-nt5.1.2600.)

Damon.
 
M

Martin v. Löwis

From what I've googled, I think I need to set my locale.

Not on this operating system. On Windows, you need to change
your console. If it is a cmd.exe-style console, use chcp.
For IDLE, changing the output encoding is not supported.

If you want to output into a file, use codecs.open.

If you absolutely want to output UTF-8 to the terminal even
though the terminal will not be able to render it correctly,
use

sys.stdout = codecs.getwriter("UTF-8")(sys.stdout)

HTH,
Martin
 
7

7stud

I'd like to print out a unicode string.

I'm running Python inside Emacs, which understands utf-8, so I want to
force Python to send utf-8 to sys.stdout.

From what I've googled, I think I need to set my locale. I don't
understand how.

import locale
print locale.getlocale()
--> (None,None)
print locale.getdefaultlocal()
--> ('en_GB','cp1252')
print locale.normalize('en_GB.utf-8')
--> en_GB.UTF8
locale.setlocale(locale.LC_ALL,'en_GB.UTF8')
-->  locale.Error: unsupported locale setting

I'd be grateful for advice.
Damon.

u_str = u'hell\u00F6 w\u00F6rld' #o's with umlauts

print u_str.encode('utf-8')

--output:--
hellö wörld
 
7

7stud

u_str = u'hell\u00F6 w\u00F6rld'  #o's with umlauts

print u_str.encode('utf-8')

--output:--
hellö wörld

Or maybe you want this:

u_str = u'hell\u00F6 w\u00F6rld'
regular_str = u_str.encode('utf-8')
print repr(regular_str)

--output:--
'hell\_x_c3\_x_b6 w\_x_c3\_x_b6rld'
#underscores added to keep your browser from rendering the utf-8
characters
 
D

damonwischik

u_str = u'hell\u00F6 w\u00F6rld' #o's with umlauts
print u_str.encode('utf-8')

--output:--
hellö wörld

Maybe on your system. On my system, those same commands produce
hell\303\266 w\303\266rld

Those \303\266 symbols are single characters -- when I move around
with cursor keys, the cursor jumps across them with a single key-
press.

As I wrote, I'm running Python inside Emacs 22.2.1 (using python-
mode).

Damon.
 
D

damonwischik

Not on this operating system. On Windows, you need to change
your console. If it is a cmd.exe-style console, use chcp.
For IDLE, changing the output encoding is not supported.

If you want to output into a file, use codecs.open.

If you absolutely want to output UTF-8 to the terminal even
though the terminal will not be able to render it correctly,
use

sys.stdout = codecs.getwriter("UTF-8")(sys.stdout)

Thank you for the suggestion. As I said, I am running Python through
Emacs 22.2.1, so I doubt it is a cmd.exe-style console, and it most
certainly is not IDLE. I want to output to the Emacs buffer, via the
python-mode plugin for Emacs, not to a file.

I tried your suggestion of setting sys.stdout, and it works perfectly.
As I said, the output is going to Emacs, and Emacs _does_ know how to
render UTF-8.

How can I make this a global setting? Is it possible to change an
environment variable, so that Python uses this coding automatically?
Or pass a command-line argument when Emacs python-mode invokes the
Python interpreter? Or execute this line of Python in a startup script
which is invoked whenever a new Python session is started?

Thank you again for your help,
Damon.
 
D

damonwischik

Because the Python 'locale' module is all about using the OS's
(actually, the underlying C library's) locale support.

The locale you request with 'locale.setlocale' needs to be supported
by the locale database, which is independent of any specific
application, be it Python, Emacs, or otherwise.

Let me try to ask a better question. It seems that the logical choice
of locale (en_GB.utf8) is not supported by my operating system.
Nonetheless, I want Python to output in utf-8, because I know for
certain that the terminal I am using (Emacs 22.2.1 with python-mode)
will display utf-8 correctly. It therefore seems that I cannot use the
locale mechanism to indicate to Python the encoding I want for
sys.stdout. What other mechanisms are there for me to indicate what I
want to Python?

Another poster pointed me toand this works great. All I want now is some reassurance that this is
the most appropriate way for me to achieve what I want (e.g. least
likely to break with future versions of Python, most in keeping with
the design of Python, easiest for me to maintain, etc.).

Damon.
 
D

damonwischik

I'd like to print out a unicode string.

I'm running Python inside Emacs, which understands utf-8, so I want to
force Python to send utf-8 to sys.stdout.

Thank you everyone who was sent suggestions. Here is my solution (for
making Python output utf-8, and persuading Emacs 22.2.1 with python-
mode to print it).

1. Set the registry key HKEY_CURRENT_USER\Software\GNU\Emacs\Home to
have value "d:\documents\home". This makes Emacs look for a .emacs
file in this directory (the home directory).

2. Put a file called .emacs file in the home directory. It should
include these lines:
(setenv "PYTHONPATH" "d:/documents/home")
(prefer-coding-system 'utf-8)
The first line means that python will look in my home directory for
libraries etc. The second line tells Emacs to default to utf-8 for its
buffers. Without the second line, Emacs may default to a different
coding, and it will not know what to do when it receives utf-8.

3. Put a file called sitecustomize.py in the home directory. This file
should contain these lines:
import codecs
import sys
sys.stdout = codecs.getwriter("UTF-8")(sys.stdout)

4. Now it should all work. If I enter
print u'La Pe\xf1a'
then it comes out with a n-tilde.

NB. An alternative solution is to edit site.py in the Python install
directory, and replace the line
encoding = "ascii" # Default value set by _PyUnicode_Init()
with
encoding = 'utf8'
But the trouble with this is that it will be overwritten if I install
a new version of Python.


NB. I also have these lines in my .emacs file, to load python-mode,
and to make it so that ctrl+enter executes the current paragraph:
; Python file association
(load "c:/program files/emacs-plugins/python-mode-1.0/python-mode.el")
(setq auto-mode-alist
(cons '("\\.py$" . python-mode) auto-mode-alist))
(setq interpreter-mode-alist
(cons '("python" . python-mode)
interpreter-mode-alist))
(autoload 'python-mode "python-mode" "Python editing mode." t)
; Note: the command for invoking Python is specified at the end,
; as a custom variable.
;; DJW's command to select the current paragraph, then execute-region.
(defun py-execute-paragraph (vis)
"Send the current paragraph to Python
Don't know what vis does."
(interactive "P")
(save-excursion
(forward-paragraph)
(let ((end (point)))
(backward-paragraph)
(py-execute-region (point) end ))))
(setq py-shell-switch-buffers-on-execute nil)
(global-set-key [(ctrl return)] 'py-execute-paragraph)

(custom-set-variables
;; custom-set-variables was added by Custom -- don't edit or cut/
paste it!
;; Your init file should contain only one such instance.
'(py-python-command "c:/program files/Python25/python.exe"))


Damon.
 
B

Benjamin

Not on this operating system. On Windows, you need to change
your console. If it is a cmd.exe-style console, use chcp.
For IDLE, changing the output encoding is not supported.

If you want to output into a file, use codecs.open.

If you absolutely want to output UTF-8 to the terminal even
though the terminal will not be able to render it correctly,
use

sys.stdout = codecs.getwriter("UTF-8")(sys.stdout) And in Py3k?

HTH,
Martin
 
M

Martin v. Löwis

Is it possible to change an
environment variable, so that Python uses this coding automatically?
No.

Or pass a command-line argument when Emacs python-mode invokes the
Python interpreter?
No.

Or execute this line of Python in a startup script
which is invoked whenever a new Python session is started?

Yes, you can add the code I suggested to sitecustomize.py.

Regards,
Martin
 
M

M.-A. Lemburg

Another poster pointed me to
and this works great. All I want now is some reassurance that this is
the most appropriate way for me to achieve what I want (e.g. least
likely to break with future versions of Python, most in keeping with
the design of Python, easiest for me to maintain, etc.).

While the above works nicely for Unicode objects you write
to sys.stdout, you are going to have problems with non-ASCII
8-bit strings, e.g. binary data.

Python will have to convert these to Unicode before applying
the UTF-8 codec and uses the default encoding for this, which
is ASCII.

You could wrap sys.stdout using a codecs.EncodedFile() which provides
transparent recoding, but then you have problems with Unicode objects,
since the recoder assumes that it has to work with strings on input
(to e.g. the .write() method).

There's no ideal solution - it really depends a lot on what
your application does and how it uses strings and Unicode.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Apr 19 2008)________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top