string to wstring problems

v4vijayakumar · Apr 25, 2007

1. why the following program is not working as expected?

#include <iostream>
using namespace std;

int main()
{
string t("test");
wcout << (wchar_t *) t.c_str() << endl;
wcout << t.c_str() << endl;

wstring t2 = (wchar_t *) t.c_str();
wcout << t2.c_str() << endl;

return 0;
}

2. It is acceptable that there is no conversion from wstring to
string, but,
Why there is no conversion (wstring::wstring(string )) from string to
wstring?

Bjoern Doebel · Apr 25, 2007

v4vijayakumar said:
1. why the following program is not working as expected?

Depends on what your expectations are.

#include <iostream>
using namespace std;

int main()
{
string t("test");
wcout << (wchar_t *) t.c_str() << endl;
wcout << t.c_str() << endl;

wstring t2 = (wchar_t *) t.c_str();
wcout << t2.c_str() << endl;

return 0;
}

2. It is acceptable that there is no conversion from wstring to
string, but,
Why there is no conversion (wstring::wstring(string )) from string to
wstring?

Try to #include <cstdlib> and use mbstowcs(). (That's what Google tells me...)

Bjoern

red floyd · Apr 25, 2007

v4vijayakumar said:
1. why the following program is not working as expected?

[program redacted]

Define "as expected". What were you expecting, and what did you get?

See FAQ 5.8, http://www.parashift.com/c++-faq-lite/how-to-post.html#faq-5.8

James Kanze · Apr 26, 2007

1. why the following program is not working as expected?

What do you expect?

#include <iostream>
using namespace std;

int main()
{
string t("test");
wcout << (wchar_t *) t.c_str() << endl;

You're lying to the compiler. That's generally a good way of
getting into trouble. The address returned by t.c_str() does
NOT point to wchar_t objects.

wcout << t.c_str() << endl;

wstring t2 = (wchar_t *) t.c_str();

More lies.

wcout << t2.c_str() << endl;

return 0;

}

2. It is acceptable that there is no conversion from wstring to
string, but,
Why there is no conversion (wstring::wstring(string )) from string to
wstring?

Because for some stupid reason, they're both instantiations of a
template, and there is no generic solution.

Also, of course, because the conversion would have to be locale
specific. (But that doesn't explain why there is no
string::toWString( locale const& ) function. That has to be
chalked up to the design error of making std::string a
template.)

Gennaro Prota · Apr 26, 2007

Because for some stupid reason, they're both instantiations of a
template, and there is no generic solution.

Also, of course, because the conversion would have to be locale
specific. (But that doesn't explain why there is no
string::toWString( locale const& ) function. That has to be
chalked up to the design error of making std::string a
template.)

Do you really mean that it should be a member of std::string? I'd
rather go for a namespace scope function:

std::wstring widen( const std::string &,
const std::locale & = std::locale() );

=?iso-8859-1?q?Kirit_S=E6lensminde?= · Apr 27, 2007

std::wstring widen( const std::string &,
const std::locale & = std::locale() );

If you were generalising this I think it would need two locales. One
the std::string is in and one the std::wstring should be be in.
There's no g'tee that the std::wstring is going to be some form of
Unicode string.

K

James Kanze · Apr 27, 2007

Gennaro said:
On 26 Apr 2007 06:12:41 -0700, James Kanze <[email protected]>
wrote:

Do you really mean that it should be a member of std::string? I'd
rather go for a namespace scope function:

std::wstring widen( const std::string &,
const std::locale & = std::locale() );

Both ways can be made to work. I don't think it really changes
the issues. We have two classes, std::string and std::wstring,
which really require a slightly different interface. And of
course, in practice, you can't really instantiate basic_string
for anything else, and expect it to work.

There's also an interesting question: if std::string is supposed
to represent text, shouldn't it know in what encoding it is?
(This would very strongly argue for a member, of course.) But
of course, if std::string is supposed to represent text, we also
get a number of awkward questions with regards to multi-byte
characters.

James Kanze · Apr 27, 2007

If you were generalising this I think it would need two locales. One
the std::string is in and one the std::wstring should be be in.
There's no g'tee that the std::wstring is going to be some form of
Unicode string.

The design of locale doesn't allow for this, and to be truthful,
I don't really see how it could. You'd need some way of
creating a codecvt facet on the fly, from the two different
locales.

The design of locale does permit different encodings for
wchar_t, of course. But you'll need nxm different locales, for
n encodings of wchar_t and m encodings of char, in order to make
it work.

If you'll look at the specifications of codecvt, you'll see as
well that it is designed to always go to or from char. There is
a very pervasive underlying assumption that char is the only
external representation, and that conversion is between external
and internal.

(Note that I think your point is well taken. I just don't think
that there is a good practical anser to it at present.)

Gennaro Prota · Apr 27, 2007

Both ways can be made to work. I don't think it really changes
the issues. We have two classes, std::string and std::wstring,
which really require a slightly different interface. And of
course, in practice, you can't really instantiate basic_string
for anything else, and expect it to work.

There's also an interesting question: if std::string is supposed
to represent text, shouldn't it know in what encoding it is?

Yep, I guess so :-( So we should ask: what does std::string really
represents? It seems to me that the answer is: a sequence of small

except said:
(This would very strongly argue for a member, of course.) But
of course, if std::string is supposed to represent text, we also
get a number of awkward questions with regards to multi-byte
characters.

IIUC, the issue is that we have no abstractions for "character" and
"encoding". The expression "multi-byte character" is a misnomer too:
in fact there's a character, and several possible encodings of it,
some of which require multiple bytes. Now, is this arguing for a
generic CharT again?

Gennaro Prota · Apr 27, 2007

If you were generalising this I think it would need two locales. One
the std::string is in and one the std::wstring should be be in.
There's no g'tee that the std::wstring is going to be some form of
Unicode string.

Not that I disagree with your general point but... the locale
parameter was actually for the wstring (from what you say I'm under
the impression you are assuming it is for the source string, and that
another one would be need for the destination). Basically the idea
was:

// warning: uncompiled code
std::wstring to_wstring( const std::string & source,
const std::locale & loc = std::locale() )
{
typedef std::ctype< wchar_t > ctype;
typedef std::string::size_type size_type;

const size_type len( source.length() );
std::wstring dest( len, wchar_t() );

const ctype & ct( std::use_facet< ctype >( loc ) );
for( size_type i( 0 ); i < len; ++i ) {
std::wstring::traits_type
::assign( dest[ i ], ct.widen( source[ i ] ) );
}

return dest;
}

Not terribly useful, I'm afraid.

=?iso-8859-1?q?Kirit_S=E6lensminde?= · Apr 27, 2007

Not that I disagree with your general point but... the locale
parameter was actually for the wstring (from what you say I'm under
the impression you are assuming it is for the source string, and that
another one would be need for the destination).

That was kind of what I was thinking and you and James have both
pointed out it doesn't work with the current locale system in C++.

I don't know the locale stuff well enough, but conceptually it needs
to go through some all encompassing encoding and then back out. Or at
least that's a little more practical than a full conversion matrix. I
think for many people Unicode could be the all encompassing encoding,
but I'm aware that it won't do so for everybody.

It's a tricky problem all right. It's one reason that I generally
refuse to work with systems which don't have at least UTF-16. There
are still subtle problems with most of them (they tend to count UTF-16
codes rather than actual characters), but it's less likely to cause a
problem in practice than UTF-8.

I can't see it as any other than a desperately hard problem with no
universal solution. The one thing with that though is it seems to have
slowed progress on Unicode which, although not perfect, does at least
give a pretty practical solution for most uses.

K

string to wstring problems	0	Apr 25, 2007
string to wstring problem	1	Apr 25, 2007
Why doesn't implicit conversion work with wide ostream?	4	Jul 5, 2013
wstring to string and back	3	Feb 19, 2009
wstring usage	2	Oct 26, 2006
Hardcoding a Unicode String(looks not work)	4	Jun 26, 2011
Converting from std::wstring to UTF-8 std::string	5	Aug 19, 2011
wchar_t wstring char string transformations	2	Jul 12, 2003

string to wstring problems

v4vijayakumar

Bjoern Doebel

red floyd

James Kanze

Gennaro Prota

=?iso-8859-1?q?Kirit_S=E6lensminde?=

James Kanze

James Kanze

Gennaro Prota

Gennaro Prota

=?iso-8859-1?q?Kirit_S=E6lensminde?=

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads