Converting between Unicode and default locale

Gianni Mariani · Sep 27, 2003

Keith said:
Well, my question has certainly generated a lot of responses, but not the
kind I was hoping for. Clearly, I was being completely naive to expect the
standard library to include this facility, but I am completely disheartened
not to have found a single working example of how to do code conversion in
streams using 3rd party libraries, such as iconv. Presumably this is
because nobody does it that way.

[RANT] It seems crazy that after a decade of Unicode use, C++ still requires
everyone to reinvent the wheel and do it their own way. I think that the
standards committee is being too precious about this. I know that Unicode
is a moving target, but UCS-2 would suffice for 95% of my requirements - and
100% for those who don't know the difference between it and UTF-16. After
all, the C++ char type doesn't even support the full British English
character set (never mind those of the rest of Europe), without using
non-standard compiler options to make char unsigned. Please, anything is
better than nothing! [/RANT]

You RANT is mostly justified.

However, there are a number of libraries that provide the support you
asking for.

If you have the energy to propose a revision to the C++ standard then do
so but it's a very complex problem to get right. In regards to just
UCS-2 support, you would probably not have anyone on the standards
comittee agree on that.

Ron Natalie · Sep 27, 2003

Gianni Mariani said:
However, there are a number of libraries that provide the support you
asking for.

The problem is that you can't even implement this without redefining/extending
the C++ standard library classes. The problem is that wchar_t is incompletely
supported in the C++ library, so even if you were to fix up everything in your
implementation, you'd still have to add non-conforming extensions.

Gianni Mariani · Sep 27, 2003

Ron said:
The problem is that you can't even implement this without redefining/extending
the C++ standard library classes. The problem is that wchar_t is incompletely
supported in the C++ library, so even if you were to fix up everything in your
implementation, you'd still have to add non-conforming extensions.

An option is not to use what_t at all. Stick to multibyte. Perform all
the processing in utf-8 multibyte. (you need to make sure you provide
support to convert any incoming strings to utf-8.

Even for UTF-32 you need to deal with multi-"unit" issues because of
composing characters. I don't remember specifically what the 10646
standard says but processing text with composed characters has many of
the same restrictions as multibyte characters (keeping them together).

Processing utf-16 or utf-32, you have issues with endianness or managing
the byte-order-mark which makes it a stateful encoding. This breaks a
whole bunch of subltle assumptions about the indexability of files. No
such problem exists with utf-8.

It just makes a whole lot of sense to use utf-8 everywhere when possible.

Converting EBCDIC to Unicode	3	Sep 28, 2010
The need of Unicode types in C++0x	26	Oct 1, 2008
wchar_t is useless	18	Nov 21, 2011
Fixed precision floating point and locale facets	4	Nov 5, 2003
Encoding of character literals	4	Nov 3, 2011
std::set<> and predicates	13	Oct 5, 2009
sizeof(object) is different in ANSI and Unicode	3	Dec 1, 2003
Wide characters and streams	3	Sep 30, 2006

Converting between Unicode and default locale

Gianni Mariani

Ron Natalie

Gianni Mariani

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads