how to convert narrow string to wide string and vice versa?

thinktwice · Sep 7, 2006

i'm using VC++6 IDE
i know i could use macros like A2T, T2A,
but is there any way more decent way to do this?

Bart · Sep 7, 2006

thinktwice said:
i'm using VC++6 IDE
i know i could use macros like A2T, T2A,
but is there any way more decent way to do this?

Look up std::ctype::widen and std::ctype::narrow in the <locale>
header.

Regards,
Bart.

=?iso-8859-1?q?Kirit_S=E6lensminde?= · Sep 7, 2006

Bart said:
Look up std::ctype::widen and std::ctype::narrow in the <locale>
header.

These may not be much good for Unicode or other variable width
encodings - depends on how you use the resultant strings.

It's a tricky thing to deal with. If you properly understand what you
mean by 'narrow' and 'wide' strings the solution should present itself.
If you're not sure what the string content means then you're unlikely
to find the right solution in a library because you won't know how to
use the functions or their results properly.

K

Arne 'deice' Pajunen · Sep 7, 2006

Kirit said:
These may not be much good for Unicode or other variable width
encodings - depends on how you use the resultant strings.

It's a tricky thing to deal with. If you properly understand what you
mean by 'narrow' and 'wide' strings the solution should present itself.
If you're not sure what the string content means then you're unlikely
to find the right solution in a library because you won't know how to
use the functions or their results properly.

well, if you just want a quick ugly hack, then personally i've sometimes
used:

wstring wide(L"some wide character string");
string narrow(wide.begin(), wide.end());

But this is a cleaving axe for microsurgery: It depends on wide having
equivalent encoding codepoints to the charset in string, which is only
really tru if wstrings are unicode, contain only ISO-8859-1 characters
(0-255), and normal character encoding is ISO-8859-1 or similar. (char
type, depends on platform).

I would actually be interested in seeing what the "clean" solution for
converting is when you have, say, Unicode in wchar_t's and whatever
encoding the locale specifies in char's (ISO-8859-1, or maybe
windows-1252)

//deice [deice at deice dot cjb dot net]
//Arne Pajunen

=?iso-8859-1?q?Kirit_S=E6lensminde?= · Sep 7, 2006

Arne said:
well, if you just want a quick ugly hack, then personally i've sometimes
used:

wstring wide(L"some wide character string");
string narrow(wide.begin(), wide.end());

But this is a cleaving axe for microsurgery: It depends on wide having
equivalent encoding codepoints to the charset in string, which is only
really tru if wstrings are unicode, contain only ISO-8859-1 characters
(0-255), and normal character encoding is ISO-8859-1 or similar. (char
type, depends on platform).

I would actually be interested in seeing what the "clean" solution for
converting is when you have, say, Unicode in wchar_t's and whatever
encoding the locale specifies in char's (ISO-8859-1, or maybe
windows-1252)

The first step is to convert the UTF-16 (which is normal for wchar_t,
but I think there may be some platforms/compilers that use UTF-32) to
UTF-32. Then convert that down (often with a code table, but sometimes
algorithmically). Of course there's the open question of what to do
with characters that don't/can't map. In some applications you can use
a variety of character encodings (as distinct to character sets). For
example, if you're using ISO-8859-1 in XML/HTML you can use the forms
XML/HTML defines for this.

A full answer depends on what you are using the string for which is why
it's so hard to answer. For some things your solution is perfectly
valid - it's fine for the many parts of internet protocols which are
defined to use ASCII characters only.

For our framework we're looking at using ICU to do the conversions, but
haven't had much of a chance to play with it yet. As nearly 100% of the
interactions we do are through HTTP then we just use UTF-8 and that
solves nearly the whole problem. We have found it useful to define our
own std::wstring like class that uses UTF-32 as the single character
interface points (operator[] and at() etc.) but uses UTF-16 for
character sequences. Things like substr() use the correct position and
count based on the number of UTF-32 characters _not_ the number of
UTF-16 code points so applications can't chop in half some characters.

K

Convert std::string to std::vector<unsigned char> and vice versa	3	Apr 2, 2007
Hello guys ! How do I convert a string from an array into numbers ? Javascript	3	Dec 19, 2022
Batch Convert HTML to UTF-8 Files	2	Oct 2, 2023
How to convert CSV to parquet file without RLE_DICTIONARY encoding?	0	Sep 2, 2022
Converting an Array to a String in JavaScript	7	Sep 22, 2023
Python string find() examples	2	Oct 17, 2022
Casting pointer to derived class and vice versa	8	Apr 24, 2008
New to VHDL... Trying to convert a 2-bytes number into an decimal	0	Dec 9, 2022

how to convert narrow string to wide string and vice versa?

thinktwice

Bart

=?iso-8859-1?q?Kirit_S=E6lensminde?=

Arne 'deice' Pajunen

=?iso-8859-1?q?Kirit_S=E6lensminde?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads