string to wstring problems

V

v4vijayakumar

1. why the following program is not working as expected?

#include <iostream>
using namespace std;

int main()
{
string t("test");
wcout << (wchar_t *) t.c_str() << endl;
wcout << t.c_str() << endl;

wstring t2 = (wchar_t *) t.c_str();
wcout << t2.c_str() << endl;

return 0;
}


2. It is acceptable that there is no conversion from wstring to
string, but,
Why there is no conversion (wstring::wstring(string )) from string to
wstring?
 
B

Bjoern Doebel

v4vijayakumar said:
1. why the following program is not working as expected?

Depends on what your expectations are.
#include <iostream>
using namespace std;

int main()
{
string t("test");
wcout << (wchar_t *) t.c_str() << endl;
wcout << t.c_str() << endl;

wstring t2 = (wchar_t *) t.c_str();
wcout << t2.c_str() << endl;

return 0;
}


2. It is acceptable that there is no conversion from wstring to
string, but,
Why there is no conversion (wstring::wstring(string )) from string to
wstring?

Try to #include <cstdlib> and use mbstowcs(). (That's what Google tells me...)

Bjoern
 
J

James Kanze

1. why the following program is not working as expected?

What do you expect?
#include <iostream>
using namespace std;
int main()
{
string t("test");
wcout << (wchar_t *) t.c_str() << endl;

You're lying to the compiler. That's generally a good way of
getting into trouble. The address returned by t.c_str() does
NOT point to wchar_t objects.
wcout << t.c_str() << endl;
wstring t2 = (wchar_t *) t.c_str();

More lies.
wcout << t2.c_str() << endl;

return 0;

}
2. It is acceptable that there is no conversion from wstring to
string, but,
Why there is no conversion (wstring::wstring(string )) from string to
wstring?

Because for some stupid reason, they're both instantiations of a
template, and there is no generic solution.

Also, of course, because the conversion would have to be locale
specific. (But that doesn't explain why there is no
string::toWString( locale const& ) function. That has to be
chalked up to the design error of making std::string a
template.)
 
G

Gennaro Prota

Because for some stupid reason, they're both instantiations of a
template, and there is no generic solution.

Also, of course, because the conversion would have to be locale
specific. (But that doesn't explain why there is no
string::toWString( locale const& ) function. That has to be
chalked up to the design error of making std::string a
template.)

Do you really mean that it should be a member of std::string? I'd
rather go for a namespace scope function:

std::wstring widen( const std::string &,
const std::locale & = std::locale() );
 
?

=?iso-8859-1?q?Kirit_S=E6lensminde?=

std::wstring widen( const std::string &,
const std::locale & = std::locale() );

If you were generalising this I think it would need two locales. One
the std::string is in and one the std::wstring should be be in.
There's no g'tee that the std::wstring is going to be some form of
Unicode string.


K
 
J

James Kanze

Gennaro said:
On 26 Apr 2007 06:12:41 -0700, James Kanze <[email protected]>
wrote:
Do you really mean that it should be a member of std::string? I'd
rather go for a namespace scope function:
std::wstring widen( const std::string &,
const std::locale & = std::locale() );

Both ways can be made to work. I don't think it really changes
the issues. We have two classes, std::string and std::wstring,
which really require a slightly different interface. And of
course, in practice, you can't really instantiate basic_string
for anything else, and expect it to work.

There's also an interesting question: if std::string is supposed
to represent text, shouldn't it know in what encoding it is?
(This would very strongly argue for a member, of course.) But
of course, if std::string is supposed to represent text, we also
get a number of awkward questions with regards to multi-byte
characters.
 
J

James Kanze

If you were generalising this I think it would need two locales. One
the std::string is in and one the std::wstring should be be in.
There's no g'tee that the std::wstring is going to be some form of
Unicode string.

The design of locale doesn't allow for this, and to be truthful,
I don't really see how it could. You'd need some way of
creating a codecvt facet on the fly, from the two different
locales.

The design of locale does permit different encodings for
wchar_t, of course. But you'll need nxm different locales, for
n encodings of wchar_t and m encodings of char, in order to make
it work.

If you'll look at the specifications of codecvt, you'll see as
well that it is designed to always go to or from char. There is
a very pervasive underlying assumption that char is the only
external representation, and that conversion is between external
and internal.

(Note that I think your point is well taken. I just don't think
that there is a good practical anser to it at present.)
 
G

Gennaro Prota

Both ways can be made to work. I don't think it really changes
the issues. We have two classes, std::string and std::wstring,
which really require a slightly different interface. And of
course, in practice, you can't really instantiate basic_string
for anything else, and expect it to work.

There's also an interesting question: if std::string is supposed
to represent text, shouldn't it know in what encoding it is?

Yep, I guess so :-( So we should ask: what does std::string really
represents? It seems to me that the answer is: a sequence of small
except said:
(This would very strongly argue for a member, of course.) But
of course, if std::string is supposed to represent text, we also
get a number of awkward questions with regards to multi-byte
characters.

IIUC, the issue is that we have no abstractions for "character" and
"encoding". The expression "multi-byte character" is a misnomer too:
in fact there's a character, and several possible encodings of it,
some of which require multiple bytes. Now, is this arguing for a
generic CharT again? :)
 
G

Gennaro Prota

If you were generalising this I think it would need two locales. One
the std::string is in and one the std::wstring should be be in.
There's no g'tee that the std::wstring is going to be some form of
Unicode string.

Not that I disagree with your general point but... the locale
parameter was actually for the wstring (from what you say I'm under
the impression you are assuming it is for the source string, and that
another one would be need for the destination). Basically the idea
was:

// warning: uncompiled code
std::wstring to_wstring( const std::string & source,
const std::locale & loc = std::locale() )
{
typedef std::ctype< wchar_t > ctype;
typedef std::string::size_type size_type;

const size_type len( source.length() );
std::wstring dest( len, wchar_t() );

const ctype & ct( std::use_facet< ctype >( loc ) );
for( size_type i( 0 ); i < len; ++i ) {
std::wstring::traits_type
::assign( dest[ i ], ct.widen( source[ i ] ) );
}

return dest;
}

Not terribly useful, I'm afraid.
 
?

=?iso-8859-1?q?Kirit_S=E6lensminde?=

Not that I disagree with your general point but... the locale
parameter was actually for the wstring (from what you say I'm under
the impression you are assuming it is for the source string, and that
another one would be need for the destination).

That was kind of what I was thinking and you and James have both
pointed out it doesn't work with the current locale system in C++.

I don't know the locale stuff well enough, but conceptually it needs
to go through some all encompassing encoding and then back out. Or at
least that's a little more practical than a full conversion matrix. I
think for many people Unicode could be the all encompassing encoding,
but I'm aware that it won't do so for everybody.

It's a tricky problem all right. It's one reason that I generally
refuse to work with systems which don't have at least UTF-16. There
are still subtle problems with most of them (they tend to count UTF-16
codes rather than actual characters), but it's less likely to cause a
problem in practice than UTF-8.

I can't see it as any other than a desperately hard problem with no
universal solution. The one thing with that though is it seems to have
slowed progress on Unicode which, although not perfect, does at least
give a pretty practical solution for most uses.


K
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,059
Latest member
cryptoseoagencies

Latest Threads

Top