Some basic questions

S

slot

Is there any problem to use "std::string" with unicode strings?

When using std::string, does it have to be initialized? Is the following
code OK?

string s;
S = "This is a test string";


Is it OK to have a container (for example, list), as global object whose
size and elements increases at run time?

Thanks!
 
V

Victor Bazarov

slot said:
Is there any problem to use "std::string" with unicode strings?

There may be. Unicode character sequences can contain many 0 chars,
and that can present a problem in using the 'c_str()' member function
if you expect it to return a zero-terminated c-string.
When using std::string, does it have to be initialized?

If you don't provide an initialiser, the string will be default-
initialised, which means that it will be empty.
Is the following
code OK?

string s;
S = "This is a test string";

No, it's not OK. C++ is case-sensitive. If you declare a variable
called 's', assigning to 'S' later doesn't make sense (unless you also
declared a variable with the name 'S').
Is it OK to have a container (for example, list), as global object whose
size and elements increases at run time?

Yes. Although, in many cases global objects can and should be avoided.

Victor
 
P

Pete Becker

Victor said:
There may be. Unicode character sequences can contain many 0 chars,
and that can present a problem in using the 'c_str()' member function
if you expect it to return a zero-terminated c-string.

Well, yes and no. (Sorry, Victor, I seem to be in the mood for quibbling
with you today <g>). A std::basic_string<wchar_t> works fine for Unicode
if wchar_t is large enough to hold individual Unicode characters. In
that case you only get zeroes if you put them there. If a wchar_t isn't
large enough you typically use one of the UTF-x encodings. The easiest
one to use portably is UTF-8, becuase you can always hold 8-bit
characters in a std::basic_string<char> (also known as std::string). A
UTF-8 encoded Unicode string only has zero values when you put them
there; the encoding doesn't generate spurious zeroes.

On the other hand, if you try to force 16-bit Unicode values into a
std::string by simply splitting each Unicode value into two 8-bit
pieces, you'll indeed get lots of zeroes. UTF-8 is designed to avoid
that, at the cost of sometimes requiring more than two bytes to encode a
16-bit value.
 
V

Victor Bazarov

Pete said:
Victor said:
There may be. Unicode character sequences can contain many 0 chars,
and that can present a problem in using the 'c_str()' member function
if you expect it to return a zero-terminated c-string.


Well, yes and no. (Sorry, Victor, [...]

Don't worry 'bout it. I am in no sense an expert on Unicode and
sincerely hoped that somebody would correct me and expand on my so
amateurish attempt at an explanation...

V
 
I

Ioannis Vranos

slot said:
Is there any problem to use "std::string" with unicode strings?


Yes std::string is for chars. Try std::wstring instead (which is for
wchar_t), in most platforms it will work (in Windows it does).


When using std::string, does it have to be initialized? Is the following
code OK?



It is not required if I understand your question correctly.

string s;
S = "This is a test string";


s="This is a test string";



and for wstring:


wstring s=L"This is a test string";



L signifies a wchar_t literal.


Is it OK to have a container (for example, list), as global object whose
size and elements increases at run time?


Yes, but why make it global?






Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys
 
D

Default User

Pete said:
Well, yes and no. (Sorry, Victor, I seem to be in the mood for quibbling
with you today <g>). A std::basic_string<wchar_t> works fine for Unicode
if wchar_t is large enough to hold individual Unicode characters. In
that case you only get zeroes if you put them there. If a wchar_t isn't
large enough you typically use one of the UTF-x encodings. The easiest
one to use portably is UTF-8, becuase you can always hold 8-bit
characters in a std::basic_string<char> (also known as std::string). A
UTF-8 encoded Unicode string only has zero values when you put them
there; the encoding doesn't generate spurious zeroes.


I've never used wide strings of any sort. If you have a wstring, what
does the call to c_str() do in the case where the string uses some
encoding like you mention. Does it convert each wide character into some
equivalent that a char can hold?




Brian Rodenborn
 
P

Pete Becker

Default said:
I've never used wide strings of any sort. If you have a wstring, what
does the call to c_str() do in the case where the string uses some
encoding like you mention. Does it convert each wide character into some
equivalent that a char can hold?

basic_string<wchar_t>::c_str() returns a const wchar_t* that points to a
null-terminated sequence of wchar_t. No conversions.
 
D

Default User

Pete said:
basic_string<wchar_t>::c_str() returns a const wchar_t* that points to a
null-terminated sequence of wchar_t. No conversions.


Gotcha, thanks. Just curiosity on my part, as I said I've never used
wide chars.




Brian Rodenborn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top