unicode

G

Guest

1. how can I have a unicode string?
I see wstring but I dont know if it is unicode
typedef basic_string<wchar_t> wstring;
if not can I use something like: ?
typedef basic_string<short> unicode_string;

2. how can I load a (previous) unicode_string from a stream?
What stream I must use instead of ifstream?
Can I use this: ?
typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;

<off-topic>
3. unicode text files (saved from notepad, word, etc) have 2 bytes before any text:
characters FE && FF which indicates 2 numbers FFFE or FEFF (I dont know endianess)
What is this? C++ recognize these characters?
</off-topic>


Thank you
 
A

Alf P. Steinbach

1. how can I have a unicode string?
I see wstring but I dont know if it is unicode
typedef basic_string<wchar_t> wstring;
if not can I use something like: ?
typedef basic_string<short> unicode_string;

In theory C++ doesn't support Unicode.

In practice the wchar_t type can always be used for 16-bit old Unicode
(as in Java and C#), that is, UCS2.

For 32-bit Unicode (that is, 21-bit...) you'll have to roll your own to
have portable code.


2. how can I load a (previous) unicode_string from a stream?
What stream I must use instead of ifstream?
Can I use this: ?
typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;

Just try it out.

Be aware that some (all?) implementations convert wide characters to narrow
characters in their wide characters stream implementations.

This is probably the part of C++ that you can rely the least on wrt Unicode
handling, so I'd recommend using binary input and output.


<off-topic>
3. unicode text files (saved from notepad, word, etc) have 2 bytes before any text:
characters FE && FF which indicates 2 numbers FFFE or FEFF (I dont know endianess)
What is this? C++ recognize these characters?
</off-topic>

It indicates both that it is a Unicode file and the endianness used in that file.
 
P

P.J. Plauger

1. how can I have a unicode string?
I see wstring but I dont know if it is unicode
typedef basic_string<wchar_t> wstring;
if not can I use something like: ?
typedef basic_string<short> unicode_string;

2. how can I load a (previous) unicode_string from a stream?
What stream I must use instead of ifstream?
Can I use this: ?
typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;

<off-topic>
3. unicode text files (saved from notepad, word, etc) have 2 bytes before any text:
characters FE && FF which indicates 2 numbers FFFE or FEFF (I dont know endianess)
What is this? C++ recognize these characters?
</off-topic>

See the on-line manual for our CoreX package. It describes the software
you need to read and write files of this sort and process them internally
as UNICODE-encoded wchar_t strings.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
A

Alf P. Steinbach

Alf P. Steinbach said:
Huh? How is 32 bit Unicode only 21 bit? Just curious.

Well, it's 21-bit, but unless you go for UCS-16 or UCS-8 variable length
encodings 32 bits is the nearest "de facto standard variable size".
 
J

John Harrison

Alf P. Steinbach said:
Sorry. I'm sick and so not thinking clearly. Should be _UTF_, not UCS.

So the Unicode organization have only defined codes up to 21 bits. Have they
committed themselves to this, or is this just how far they've got so far?
How many more of the world's scripts have they got to go?

john
 
P

P.J. Plauger

So the Unicode organization have only defined codes up to 21 bits. Have they
committed themselves to this,
Yes.

or is this just how far they've got so far?
Yes.

How many more of the world's scripts have they got to go?

Lots.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
J

Jon Willeke

P.J. Plauger said:
See the on-line manual for our CoreX package. It describes the software
you need to read and write files of this sort and process them internally
as UNICODE-encoded wchar_t strings.

It would be useful for the standard to specify some codecvt facets,
especially for wchar_t UCS-2 / UCS-4 and char UTF-8. Is this likely to
happen?
 
P

P.J. Plauger

It would be useful for the standard to specify some codecvt facets,
especially for wchar_t UCS-2 / UCS-4 and char UTF-8. Is this likely to
happen?

The C and C++ Standards have so far been scrupulously character-set neutral.
This sort of thing tends to fall through the cracks, which is why we
produced CoreX.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top