Some basic questions

slot · Aug 3, 2004

Is there any problem to use "std::string" with unicode strings?

When using std::string, does it have to be initialized? Is the following
code OK?

string s;
S = "This is a test string";

Is it OK to have a container (for example, list), as global object whose
size and elements increases at run time?

Thanks!

Victor Bazarov · Aug 3, 2004

slot said:
Is there any problem to use "std::string" with unicode strings?

There may be. Unicode character sequences can contain many 0 chars,
and that can present a problem in using the 'c_str()' member function
if you expect it to return a zero-terminated c-string.

When using std::string, does it have to be initialized?

If you don't provide an initialiser, the string will be default-
initialised, which means that it will be empty.

Is the following
code OK?

string s;
S = "This is a test string";

No, it's not OK. C++ is case-sensitive. If you declare a variable
called 's', assigning to 'S' later doesn't make sense (unless you also
declared a variable with the name 'S').

Is it OK to have a container (for example, list), as global object whose
size and elements increases at run time?

Yes. Although, in many cases global objects can and should be avoided.

Victor

Alex Vinokur · Aug 3, 2004

Victor Bazarov said:
There may be. Unicode character sequences can contain many 0 chars,
and that can present a problem in using the 'c_str()' member function
if you expect it to return a zero-terminated c-string.

Here is a link to relevant discussion:
http://groups.google.com/[email protected]

[snip]

Pete Becker · Aug 3, 2004

Victor said:
There may be. Unicode character sequences can contain many 0 chars,
and that can present a problem in using the 'c_str()' member function
if you expect it to return a zero-terminated c-string.

Well, yes and no. (Sorry, Victor, I seem to be in the mood for quibbling
with you today <g>). A std::basic_string<wchar_t> works fine for Unicode
if wchar_t is large enough to hold individual Unicode characters. In
that case you only get zeroes if you put them there. If a wchar_t isn't
large enough you typically use one of the UTF-x encodings. The easiest
one to use portably is UTF-8, becuase you can always hold 8-bit
characters in a std::basic_string<char> (also known as std::string). A
UTF-8 encoded Unicode string only has zero values when you put them
there; the encoding doesn't generate spurious zeroes.

On the other hand, if you try to force 16-bit Unicode values into a
std::string by simply splitting each Unicode value into two 8-bit
pieces, you'll indeed get lots of zeroes. UTF-8 is designed to avoid
that, at the cost of sometimes requiring more than two bytes to encode a
16-bit value.

Victor Bazarov · Aug 3, 2004

Pete said:
Victor said:

There may be. Unicode character sequences can contain many 0 chars,
and that can present a problem in using the 'c_str()' member function
if you expect it to return a zero-terminated c-string.

Click to expand...

Well, yes and no. (Sorry, Victor, [...]

Don't worry 'bout it. I am in no sense an expert on Unicode and
sincerely hoped that somebody would correct me and expand on my so
amateurish attempt at an explanation...

V

Ioannis Vranos · Aug 3, 2004

slot said:
Is there any problem to use "std::string" with unicode strings?

Yes std::string is for chars. Try std::wstring instead (which is for
wchar_t), in most platforms it will work (in Windows it does).

When using std::string, does it have to be initialized? Is the following
code OK?

It is not required if I understand your question correctly.

string s;
S = "This is a test string";

s="This is a test string";

and for wstring:

wstring s=L"This is a test string";

L signifies a wchar_t literal.

Is it OK to have a container (for example, list), as global object whose
size and elements increases at run time?

Yes, but why make it global?

Regards,

Ioannis Vranos

http://www23.brinkster.com/noicys

Default User · Aug 3, 2004

Pete said:
Well, yes and no. (Sorry, Victor, I seem to be in the mood for quibbling
with you today <g>). A std::basic_string<wchar_t> works fine for Unicode
if wchar_t is large enough to hold individual Unicode characters. In
that case you only get zeroes if you put them there. If a wchar_t isn't
large enough you typically use one of the UTF-x encodings. The easiest
one to use portably is UTF-8, becuase you can always hold 8-bit
characters in a std::basic_string<char> (also known as std::string). A
UTF-8 encoded Unicode string only has zero values when you put them
there; the encoding doesn't generate spurious zeroes.

I've never used wide strings of any sort. If you have a wstring, what
does the call to c_str() do in the case where the string uses some
encoding like you mention. Does it convert each wide character into some
equivalent that a char can hold?

Brian Rodenborn

Pete Becker · Aug 3, 2004

Default said:
I've never used wide strings of any sort. If you have a wstring, what
does the call to c_str() do in the case where the string uses some
encoding like you mention. Does it convert each wide character into some
equivalent that a char can hold?

basic_string<wchar_t>::c_str() returns a const wchar_t* that points to a
null-terminated sequence of wchar_t. No conversions.

Default User · Aug 3, 2004

Pete said:
basic_string<wchar_t>::c_str() returns a const wchar_t* that points to a
null-terminated sequence of wchar_t. No conversions.

Gotcha, thanks. Just curiosity on my part, as I said I've never used
wide chars.

Brian Rodenborn

Hello from beginner with some questions!	3	Jul 30, 2021
What is the most astounding C++ syntax construct?	0	Dec 22, 2022
Tasks	1	Nov 29, 2022
Born Again C.S. Guy Intro/Career Questions	3	May 2, 2023
Big problem I need to solve with some unix utils	1	Jun 19, 2022
TF-IDF	1	Aug 19, 2021
CIN Input #2 gets skipped, I don't understand why.	1	Feb 9, 2023
Getting some code into container classes	12	Mar 31, 2014

Some basic questions

slot

Victor Bazarov

Alex Vinokur

Pete Becker

Victor Bazarov

Ioannis Vranos

Default User

Pete Becker

Default User

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads