Support for std::wstring

D

Divick

Hi all,
can somebody tell how much std::wstring is supported across
different compilers on different platforms? AFAIK std::string is
supported by almost all C++ compilers and almost all platforms, is that
also the case with wstring?

Another related question that I have is, is it advisable to use
wstring than string for unicode support? To be able to support Unicode
build, is it that all the occurrence of std::string will need to be
changed to std::wstring?

Ps: I am little new to unicode stuff so please elaborate or point me to
external links for reference if need be. Not sure that if it is the
right forum to post this question but I guess it is related.

Thanks,
Divick
 
P

peter koch

Divick said:
Hi all,
can somebody tell how much std::wstring is supported across
different compilers on different platforms? AFAIK std::string is
supported by almost all C++ compilers and almost all platforms, is that
also the case with wstring?
Yes - all conforming compilers will support std::wstring. I doubt you
can find a compiler that supports std::string but not std::wstring.
Another related question that I have is, is it advisable to use
wstring than string for unicode support? To be able to support Unicode
build, is it that all the occurrence of std::string will need to be
changed to std::wstring?
Hold on. std::wstring is not necesarrily unicode. On windows it will
likely be the unicode-subset also used by Java, but there are now
guarantees.
Ps: I am little new to unicode stuff so please elaborate or point me to
external links for reference if need be. Not sure that if it is the
right forum to post this question but I guess it is related.

A google for unicode should take you the official unicode page with a
minimum of effort.
Thanks,
Divick

/Peter
 
B

Bronek Kozicki

peter koch said:
Yes - all conforming compilers will support std::wstring. I doubt you
can find a compiler that supports std::string but not std::wstring.

libstdc++ port for Windows (as bundled with mingw compiler) does not
support std::wstring, because its implementation is dependent on
POSIX-style locale. But one can always use STLPort, which does support
std::wstring with this compiler.
Hold on. std::wstring is not necesarrily unicode.

indeed, but on platforms that directly support Unicode on the operating
system level, wchar_t usually is some Unicode encoding (on Windows 2000
or newer it's UTF-16). I'd say that it's OK to use std::wstring and
wchar_t to handle Unicode strings if both are true:
- you do not care which encoding is used
- you do not target exotic platforms where wchar_t is not Unicode at all


B.
 
D

Davlet Panech

Divick said:
Hi all,
can somebody tell how much std::wstring is supported across
different compilers on different platforms? AFAIK std::string is
supported by almost all C++ compilers and almost all platforms, is that
also the case with wstring?

IIRC the old gcc 2.95 shipped with a pre-standard STL that didn't
support wide strings (among other irregularities). I had to write code
for that as recently as 2 years ago, and although I ended up upgrading
its libraries to STLport, wchar_t support was so broken on that platform
(an old SCO Unix from early or mid 90-ies) that STLport had to be
configured without wchar_t support, hence no std::wstring.

Another related question that I have is, is it advisable to use
wstring than string for unicode support? To be able to support Unicode
build, is it that all the occurrence of std::string will need to be
changed to std::wstring?

Yes; plus you may have to convert to/from something like UTF-8 when
interfacing with some libraries (like system functions that expect
filenames in UNIX and their C++ "equivalents", like std::fstream::eek:pen).
You can't really get rid of *all* "narrow" strings completely most of
the time, you'll end up with code that uses both types depending on the
situation. I've found this to be too painful in practice most of the
time. The alternative is to store UNICODE strings encoded in char-based
strings with a variable-length encoding supported on your system (UTF-8,
usually). Of course in this case your strings will be "non-linear" (no
simple mapping between byte/UNICODE char offsets), some bit patterns are
forbidden, etc, so it may not be sufficient for what you need.

D.
 
D

Davlet Panech

Davlet said:
Yes; plus you may have to convert to/from something like UTF-8 when
interfacing with some libraries (like system functions that expect
filenames in UNIX and their C++ "equivalents", like std::fstream::eek:pen).
You can't really get rid of *all* "narrow" strings completely most of
the time, you'll end up with code that uses both types depending on the
situation. I've found this to be too painful in practice most of the
time. The alternative is to store UNICODE strings encoded in char-based
strings with a variable-length encoding supported on your system (UTF-8,
usually). Of course in this case your strings will be "non-linear" (no
simple mapping between byte/UNICODE char offsets), some bit patterns are
forbidden, etc, so it may not be sufficient for what you need.

D.

Just to clarify: I'm not saying you shouldn't use wchar's just that
you'll probably have to use encoded (UTF-8) forms regardless especially
on UNIX-like systems. In my experience it's easier to convert to/from
wchar's *only* when necessary, rather than blindly throughout.

D.
 
?

=?iso-8859-1?q?Kirit_S=E6lensminde?=

Divick said:
Another related question that I have is, is it advisable to use
wstring than string for unicode support? To be able to support Unicode
build, is it that all the occurrence of std::string will need to be
changed to std::wstring?

It's actually a fantastically complicated area. The standard string
implementations are not designed for handling multi-character encodings
- this means that they're not designed for UTF-8 in 8 bit char derived
std::string and nor are they designed for UTF-16 in 16 bit wchar_t
derived std::wstring. The only safe way is to create your own char
traits to use 32 bit integers in std::basic_string<> and then convert
to UTF-16 and UTF-8 as needed.

Having said that though, you can use std::string and std::wstring
(assuming char is 8 bit and wchar_t is 16 bit) so long as you're
careful. Remember that length() will return the number of UTF-8 or
UTF-16 encoded characters and that functions like substr() are liable
to chop within a single code point as a single Unicode code point can
be up to four UTF-8 chars and two UTF-16 wchar_ts.

You can get around some of this if you write your own string iterators
and use those, but by then you may as well write your own string
classes that do handle them correctly. If you don't do any string
manipulation then you'll probably be ok.

By the time you understand the encodings well enough to know which
string manipulations are safe and which aren't you won't need to ask
here about them :)


K
 
D

Divick

Hi all,
That definitely helps. I think I need to look more into unicode
stuff from a programming perspective to understand more of what all you
guys are saying.

Thanks for all the help,
Divick
 
P

peter koch

Bronek Kozicki skrev:
libstdc++ port for Windows (as bundled with mingw compiler) does not
support std::wstring, because its implementation is dependent on
POSIX-style locale. But one can always use STLPort, which does support
std::wstring with this compiler.

I forgot that - thanks for reminding me.
indeed, but on platforms that directly support Unicode on the operating
system level, wchar_t usually is some Unicode encoding (on Windows 2000
or newer it's UTF-16). I'd say that it's OK to use std::wstring and
wchar_t to handle Unicode strings if both are true:
- you do not care which encoding is used
- you do not target exotic platforms where wchar_t is not Unicode at all

I part of agree here. You can not use std::wstring as a generic unicode
string in Windows as the representation is encoded using the same
principle as for utf-8. Thus, s might not necesarrily return the
i'th character of s. If you are aware of this (or only process
characters in the basic plane), you are safe.

/Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,177
Latest member
OrderGlucea
Top