Print formatted Strings with Umlauts

Joerg Lehmann · Feb 11, 2004

I am using Python 2.2.3 (Fedora Core 1). The problem is, that strings containing
umlauts do not work as I would expect. Here is my example:
äöü äöü
123 123

I would expect, that the displayed width of a or b is the same: 5 characters.
I also see, that len(a) is 6 (2 bytes per umlaut), whereas len(b) is 3:
6 3

I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte??? What is the right way
to print strings containing umlauts in a tabular way (same field width)?

Thanks!

Amy G · Feb 11, 2004

Upgrading to 2.3 will probablt solve this problem. I am using 2.3 and here
is what I get when I try it.

3

äöü äöü
123 123

Jeff Epler · Feb 11, 2004

If you work with Unicode strings instead of byte strings in the utf-8
encoding, you'll get the desired results for characters in the german
character set:
Ã¤Ã¶Ã¼ Ã¤Ã¶Ã¼
123 123

However, this isn't good enough in general. For instance, in the
presence of Unicode combining characters, you won't get what you want:Ã¤Ã¶Ã¼ Ã¤Ã¶Ã¼
123 123

You'll also run into problems with characters that have "Wide" or
"Ambiguous" East Asian Width properties in Unicode. For example,ï½•ï½•ï½• ï½•ï½•ï½•
123 123

Jeff

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Feb 12, 2004

Joerg said:
I am using Python 2.2.3 (Fedora Core 1). ...
I have tried to set the encoding in site.py to 'latin-1', but it did not change
my results. Is there no way to store umlauts in 1 byte???

There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.

What is the right way
to print strings containing umlauts in a tabular way (same field width)?

As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin

Joerg Lehmann · Feb 12, 2004

Martin v. Löwis said:
There is, but Fedora Core 1 does not use it. Instead, it uses an
encoding where an umlaut character needs two bytes (namely, UTF-8).
Changing site.py does not change the way your system represents
these characters.

As Jeff explains: In the specific case, using Unicode strings would
help. He is also right that, in general, it is very difficult to find
out how many columns a single character uses, as some characters have
width 0, and other characters have width 2 (in a mono-spaced terminal;
for variable-spaced output, adding space characters to achieve
formatting will never work reliably).

Regards,
Martin

I have found a fix myself, I'm not sure if this is "the right way",
but it solves my problem:

I changed the settings in /etc/sysconfig/i18ln from UTF-8 to
ISO-8859-1:

LANG="en_US.ISO-8859-1"
SUPPORTED="en_US.ISO-8859-1:en_US:en"
SYSFONT="latarcyrheb-sun16"

This fixed my problem, Umlauts are stored in one byte now.

Thanks for your inspirations.

PS: Installing Python 2.3 (rpm for Fedora from www.python.org) did not
help.

umlauts	9	Oct 17, 2009
right adjusted strings containing umlauts	18	Aug 8, 2013
Padding strings for a clean visual print out...	5	Dec 23, 2023
split lines from stdin into a list of unicode strings	0	Aug 28, 2013
reading filenames from stdin - with umlauts?	18	Jul 27, 2008
Help with passing test	3	Jun 8, 2023
PyWart: The problem with "print"	102	Jun 2, 2013
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023

Print formatted Strings with Umlauts

Joerg Lehmann

Amy G

Jeff Epler

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Joerg Lehmann

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads