hex dump w/ or w/out utf-8 chars

wxjmfauth · Jul 24, 2013

I do not find the thread, where a Python core dev spoke
about French, so I'm putting here.

This stupid Flexible String Representation splits Unicode
in chunks and one of these chunks is latin-1 (iso-8859-1).

If we consider that latin-1 is unusable for 17 (seventeen)
European languages based on the latin alphabet, one can not
say Python is really well prepared.

Most of the problems are coming from the extensive usage of
diacritics in these languages. Thanks to the FSR again,
working with normalized forms does not work very well. At
least, there is some consistency.

Now, if we consider that most of the new characters will
be part of the BMP ("daily" used chars), it is hard to
present Python as a modern language. It sticks more
to the past and it not really prepared for the future,
the acceptance of new chars like áºž or the new Turkish lira
sign ((U+20BA).
26

14 bytes to encode a non-latin-1 char is not so bad.

jmf

Simple converter of files into their hex components... but i can'tarrange utf-8 parts!	2	Jun 9, 2013
UTF-8 output problems	2	Mar 9, 2007
UTF-8 question from Dive into Python 3	19	Jan 17, 2011
Unicode/UTF-8 confusion	1	Mar 15, 2008
usage of <string>.encode('utf-8','xmlcharrefreplace')?	7	Feb 19, 2008
Tkinter and utf-8	7	Oct 28, 2004
Translater + module + tkinter	1	Feb 16, 2023
UTF-8 to unicode or latin-1 (and yes, I read the FAQ)	10	Oct 19, 2006

hex dump w/ or w/out utf-8 chars

wxjmfauth

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads