String formatting for complex writing systems

Andy · Jun 27, 2007

Hi guys,

I'm writing a piece of software for some Thai friend. At the end it
is supposed to print on paper some report with tables of text and
numbers. When I test it in English, the columns are aligned nicely,
but when he tests it with Thai data, the columns are all crooked.

The problem here is that in the Thai writing system some times two or
more characters together might take one single space, for example à¸‡à¸´
(u"\u0E07\u0E34"). This is why when I use something like u"%10s"
% ..., it just doesn't work as expected.

Is anybody aware of an alternative string format function that can
deal with this kind of writing properly?

Any suggestion is highly appreciated. Thanks!
Andy

Gabriel Genellina · Jun 27, 2007

En Wed said:
I'm writing a piece of software for some Thai friend. At the end it
is supposed to print on paper some report with tables of text and
numbers. When I test it in English, the columns are aligned nicely,
but when he tests it with Thai data, the columns are all crooked.

The problem here is that in the Thai writing system some times two or
more characters together might take one single space, for example à¸‡à¸´
(u"\u0E07\u0E34"). This is why when I use something like u"%10s"
% ..., it just doesn't work as expected.

Is anybody aware of an alternative string format function that can
deal with this kind of writing properly?

The same thing happens even in English if you print using a proportional
width font, a "W" is usually wider than an "i" or "l" letter.
You could use a reporting library or program (like ReportLab, generating
PDF files), but perhaps the simplest approach is to generate an HTML page
containing a table, and display and print it using your favorite browser.

Leo Kislov · Jun 27, 2007

Hi guys,

I'm writing a piece of software for some Thai friend. Â At the end it
is supposed to print on paper some report with tables of text and
numbers. Â When I test it in English, the columns are aligned nicely,
but when he tests it with Thai data, the columns are all crooked.

The problem here is that in the Thai writing system some times two or
more characters together might take one single space, for example à¸‡à¸´
(u"\u0E07\u0E34"). Â This is why when I use something like u"%10s"
% ..., it just doesn't work as expected.

Is anybody aware of an alternative string format function that can
deal with this kind of writing properly?

In general case it's impossible to write such a function for many
unicode characters without feedback from rendering library.
Assuming you use *fixed* font for English and Thai the following
function will return how many columns your text will use:

from unicodedata import category
def columns(self, s):
return sum(1 for c in s if category(c) != 'Mn')

-- Leo

Leo Kislov · Jun 27, 2007

In general case it's impossible to write such a function for many
unicode characters without feedback from rendering library.
Assuming you use *fixed* font for English and Thai the following
function will return how many columns your text will use:

from unicodedata import category
def columns(self, s):
Â Â return sum(1 for c in s if category(c) != 'Mn')

That should of course be written as def columns(s). Need to learn to
proofread before posting

-- Leo

Andy · Jul 2, 2007

Thanks guys!

I've used the HTML and the unicodedata suggestions, each on a
different report. These worked nicely!

Andy

codecs.register_error for "strict", unicode.encode() and str.decode()	0	Jul 27, 2012
Problems formatting text within tables	11	Aug 13, 2006
(Design question) Formatting objects for file output	1	Jan 14, 2007
For Peer Review	1	Apr 2, 2010
operator[] and different behaviour for reading and writing	4	May 11, 2005
update: timezone offset calc and date formatting	0	Apr 8, 2005
ISMM 2009 Call for Papers	0	Nov 21, 2008
simpler over view on dao: a functional logic solver with builtinparsing power, and dinpy, the sugar	0	Nov 8, 2011

String formatting for complex writing systems

Andy

Gabriel Genellina

Leo Kislov

Leo Kislov

Andy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads