unicode

Guest · Feb 14, 2004

1. how can I have a unicode string?
I see wstring but I dont know if it is unicode
typedef basic_string<wchar_t> wstring;
if not can I use something like: ?
typedef basic_string<short> unicode_string;

2. how can I load a (previous) unicode_string from a stream?
What stream I must use instead of ifstream?
Can I use this: ?
typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;

<off-topic>
3. unicode text files (saved from notepad, word, etc) have 2 bytes before any text:
characters FE && FF which indicates 2 numbers FFFE or FEFF (I dont know endianess)
What is this? C++ recognize these characters?
</off-topic>

Thank you

Alf P. Steinbach · Feb 14, 2004

1. how can I have a unicode string?
I see wstring but I dont know if it is unicode
typedef basic_string<wchar_t> wstring;
if not can I use something like: ?
typedef basic_string<short> unicode_string;

In theory C++ doesn't support Unicode.

In practice the wchar_t type can always be used for 16-bit old Unicode
(as in Java and C#), that is, UCS2.

For 32-bit Unicode (that is, 21-bit...) you'll have to roll your own to
have portable code.

2. how can I load a (previous) unicode_string from a stream?
What stream I must use instead of ifstream?
Can I use this: ?
typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;

Just try it out.

Be aware that some (all?) implementations convert wide characters to narrow
characters in their wide characters stream implementations.

This is probably the part of C++ that you can rely the least on wrt Unicode
handling, so I'd recommend using binary input and output.

<off-topic>
3. unicode text files (saved from notepad, word, etc) have 2 bytes before any text:
characters FE && FF which indicates 2 numbers FFFE or FEFF (I dont know endianess)
What is this? C++ recognize these characters?
</off-topic>

It indicates both that it is a Unicode file and the endianness used in that file.

P.J. Plauger · Feb 14, 2004

1. how can I have a unicode string?
I see wstring but I dont know if it is unicode
typedef basic_string<wchar_t> wstring;
if not can I use something like: ?
typedef basic_string<short> unicode_string;

2. how can I load a (previous) unicode_string from a stream?
What stream I must use instead of ifstream?
Can I use this: ?
typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;

<off-topic>
3. unicode text files (saved from notepad, word, etc) have 2 bytes before any text:
characters FE && FF which indicates 2 numbers FFFE or FEFF (I dont know endianess)
What is this? C++ recognize these characters?
</off-topic>

See the on-line manual for our CoreX package. It describes the software
you need to read and write files of this sort and process them internally
as UNICODE-encoded wchar_t strings.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

John Harrison · Feb 14, 2004

Alf P. Steinbach said:
For 32-bit Unicode (that is, 21-bit...) you'll have to roll your own to
have portable code.

Click to expand...

Huh? How is 32 bit Unicode only 21 bit? Just curious.

john

Alf P. Steinbach · Feb 14, 2004

Alf P. Steinbach said:
Alf P. Steinbach said:

Huh? How is 32 bit Unicode only 21 bit? Just curious.

Click to expand...

Well, it's 21-bit, but unless you go for UCS-16 or UCS-8 variable length
encodings 32 bits is the nearest "de facto standard variable size".

Alf P. Steinbach · Feb 14, 2004

Well, it's 21-bit, but unless you go for UCS-16 or UCS-8 variable length
encodings 32 bits is the nearest "de facto standard variable size".

Click to expand...

Sorry. I'm sick and so not thinking clearly. Should be _UTF_, not UCS.

John Harrison · Feb 14, 2004

Alf P. Steinbach said:
Sorry. I'm sick and so not thinking clearly. Should be _UTF_, not UCS.

Click to expand...

So the Unicode organization have only defined codes up to 21 bits. Have they
committed themselves to this, or is this just how far they've got so far?
How many more of the world's scripts have they got to go?

john

Click to expand...

P.J. Plauger · Feb 14, 2004

So the Unicode organization have only defined codes up to 21 bits. Have they
committed themselves to this,
Yes.

or is this just how far they've got so far?
Yes.

How many more of the world's scripts have they got to go?

Lots.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jon Willeke · Feb 15, 2004

P.J. Plauger said:
See the on-line manual for our CoreX package. It describes the software
you need to read and write files of this sort and process them internally
as UNICODE-encoded wchar_t strings.

It would be useful for the standard to specify some codecvt facets,
especially for wchar_t UCS-2 / UCS-4 and char UTF-8. Is this likely to
happen?

P.J. Plauger · Feb 15, 2004

It would be useful for the standard to specify some codecvt facets,
especially for wchar_t UCS-2 / UCS-4 and char UTF-8. Is this likely to
happen?

The C and C++ Standards have so far been scrupulously character-set neutral.
This sort of thing tends to fall through the cracks, which is why we
produced CoreX.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

given char* utf8, how to read unicode line by line, and output utf8	2	Mar 13, 2012
Converting EBCDIC to Unicode	3	Sep 28, 2010
wide character file to wstring - unexpected results	1	Dec 14, 2011
Hardcoding a Unicode String(looks not work)	4	Jun 26, 2011
std::operator<<(basic_ostream&, type) vs. std::basic_ostream::operator(type)... ?	7	Mar 6, 2008
how do I know whether .txt file is of char or wchar_t type?	2	Dec 27, 2006
attempting to print unicode characters.	23	Aug 29, 2010
Portable Code that supports Unicode	13	Feb 28, 2006

unicode

Guest

Alf P. Steinbach

P.J. Plauger

John Harrison

Alf P. Steinbach

Alf P. Steinbach

John Harrison

P.J. Plauger

Jon Willeke

P.J. Plauger

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads