B
Boris Du¹ek
Hi,
I have an API that returns UTF-8 encoded strings. I have a utf8 codevt
facet available to do the conversion from UTF-8 to wchar_t encoding
defined by the platform. I have no trouble converting when a UTF-8
encoded string comes from file - I just create a std::wifstream and
imbue it with a locale that uses the utf-8 facet for
std::locale::ctype. Then I just use operator>> to get wstring properly
decoded from UTF-8. I thought I could create something similar for
std::wstringstream or std::wstringbuf, but I have a hard time with it.
I imagine the situation that if a std::wstringstream is imbued with
UTF-8, then it stored an array of char (not wchar_t) which is encoded
with UTF-8. I can push to it or get from it wide string like I like,
and the result is encoded in UTF-8 in some internal buffer.
What I now need is to be able to supply my UTF-8 buffer prefilled with
the values I need in UTF-8 to act as the internal UTF-8 encoded buffer
for the std::wstingbuf, and then call operator>>(..., std::wstring &),
to get the wide-string representation converted from the UTF-8 to the
proper wide encoding. Also while I am at it, I would like to know the
reverse - how to get this internal UTF-8 encoded buffer (so I can push
wstrings into it as I like and get a "char *" encoded in UTF-8).
Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringstream conv;
conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >> wname; // now my name should be properly decoded from UTF-8
Thanks for any suggestions,
Boris
I have an API that returns UTF-8 encoded strings. I have a utf8 codevt
facet available to do the conversion from UTF-8 to wchar_t encoding
defined by the platform. I have no trouble converting when a UTF-8
encoded string comes from file - I just create a std::wifstream and
imbue it with a locale that uses the utf-8 facet for
std::locale::ctype. Then I just use operator>> to get wstring properly
decoded from UTF-8. I thought I could create something similar for
std::wstringstream or std::wstringbuf, but I have a hard time with it.
I imagine the situation that if a std::wstringstream is imbued with
UTF-8, then it stored an array of char (not wchar_t) which is encoded
with UTF-8. I can push to it or get from it wide string like I like,
and the result is encoded in UTF-8 in some internal buffer.
What I now need is to be able to supply my UTF-8 buffer prefilled with
the values I need in UTF-8 to act as the internal UTF-8 encoded buffer
for the std::wstingbuf, and then call operator>>(..., std::wstring &),
to get the wide-string representation converted from the UTF-8 to the
proper wide encoding. Also while I am at it, I would like to know the
reverse - how to get this internal UTF-8 encoded buffer (so I can push
wstrings into it as I like and get a "char *" encoded in UTF-8).
Sample code (how I would imagine it):
char name[] = "Boris Du" "\xc5\xa1" "ek"; // my name - Boris Dušek
std::wstringstream conv;
conv.rdbuf()->pubsetcharbuf(name, 11); // pubsetbuf only accepts
"wchar_t *", not "char *"
std::wstring wname;
conv >> wname; // now my name should be properly decoded from UTF-8
Thanks for any suggestions,
Boris