unicode

Discussion in 'C++' started by Guest, Feb 14, 2004.

  1. Guest

    Guest Guest

    1. how can I have a unicode string?
    I see wstring but I dont know if it is unicode
    typedef basic_string<wchar_t> wstring;
    if not can I use something like: ?
    typedef basic_string<short> unicode_string;

    2. how can I load a (previous) unicode_string from a stream?
    What stream I must use instead of ifstream?
    Can I use this: ?
    typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;

    <off-topic>
    3. unicode text files (saved from notepad, word, etc) have 2 bytes before any text:
    characters FE && FF which indicates 2 numbers FFFE or FEFF (I dont know endianess)
    What is this? C++ recognize these characters?
    </off-topic>


    Thank you
     
    Guest, Feb 14, 2004
    #1
    1. Advertising

  2. On Sat, 14 Feb 2004 17:25:55 +0200, "<- Chameleon ->" <> wrote:

    >1. how can I have a unicode string?
    >I see wstring but I dont know if it is unicode
    >typedef basic_string<wchar_t> wstring;
    >if not can I use something like: ?
    >typedef basic_string<short> unicode_string;


    In theory C++ doesn't support Unicode.

    In practice the wchar_t type can always be used for 16-bit old Unicode
    (as in Java and C#), that is, UCS2.

    For 32-bit Unicode (that is, 21-bit...) you'll have to roll your own to
    have portable code.



    >2. how can I load a (previous) unicode_string from a stream?
    >What stream I must use instead of ifstream?
    >Can I use this: ?
    >typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;


    Just try it out.

    Be aware that some (all?) implementations convert wide characters to narrow
    characters in their wide characters stream implementations.

    This is probably the part of C++ that you can rely the least on wrt Unicode
    handling, so I'd recommend using binary input and output.



    ><off-topic>
    >3. unicode text files (saved from notepad, word, etc) have 2 bytes before any text:
    >characters FE && FF which indicates 2 numbers FFFE or FEFF (I dont know endianess)
    >What is this? C++ recognize these characters?
    ></off-topic>


    It indicates both that it is a Unicode file and the endianness used in that file.
     
    Alf P. Steinbach, Feb 14, 2004
    #2
    1. Advertising

  3. Guest

    P.J. Plauger Guest

    "<- Chameleon ->" <> wrote in message
    news:c0leno$sdl$...

    > 1. how can I have a unicode string?
    > I see wstring but I dont know if it is unicode
    > typedef basic_string<wchar_t> wstring;
    > if not can I use something like: ?
    > typedef basic_string<short> unicode_string;
    >
    > 2. how can I load a (previous) unicode_string from a stream?
    > What stream I must use instead of ifstream?
    > Can I use this: ?
    > typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;
    >
    > <off-topic>
    > 3. unicode text files (saved from notepad, word, etc) have 2 bytes before

    any text:
    > characters FE && FF which indicates 2 numbers FFFE or FEFF (I dont know

    endianess)
    > What is this? C++ recognize these characters?
    > </off-topic>


    See the on-line manual for our CoreX package. It describes the software
    you need to read and write files of this sort and process them internally
    as UNICODE-encoded wchar_t strings.

    P.J. Plauger
    Dinkumware, Ltd.
    http://www.dinkumware.com
     
    P.J. Plauger, Feb 14, 2004
    #3
  4. [OT] Re: unicode

    "Alf P. Steinbach" <> wrote in message
    news:...
    > On Sat, 14 Feb 2004 17:25:55 +0200, "<- Chameleon ->"

    <> wrote:
    >
    >
    > For 32-bit Unicode (that is, 21-bit...) you'll have to roll your own to
    > have portable code.
    >


    Huh? How is 32 bit Unicode only 21 bit? Just curious.

    john
     
    John Harrison, Feb 14, 2004
    #4
  5. Re: [OT] Re: unicode

    On Sat, 14 Feb 2004 16:38:02 -0000, "John Harrison" <> wrote:

    >
    >"Alf P. Steinbach" <> wrote in message
    >news:...
    >> On Sat, 14 Feb 2004 17:25:55 +0200, "<- Chameleon ->"

    ><> wrote:
    >>
    >>
    >> For 32-bit Unicode (that is, 21-bit...) you'll have to roll your own to
    >> have portable code.
    >>

    >
    >Huh? How is 32 bit Unicode only 21 bit? Just curious.


    Well, it's 21-bit, but unless you go for UCS-16 or UCS-8 variable length
    encodings 32 bits is the nearest "de facto standard variable size".
     
    Alf P. Steinbach, Feb 14, 2004
    #5
  6. Re: [OT] Re: unicode

    On Sat, 14 Feb 2004 16:52:40 GMT, (Alf P. Steinbach) wrote:

    >On Sat, 14 Feb 2004 16:38:02 -0000, "John Harrison" <> wrote:
    >
    >>
    >>"Alf P. Steinbach" <> wrote in message
    >>news:...
    >>> On Sat, 14 Feb 2004 17:25:55 +0200, "<- Chameleon ->"

    >><> wrote:
    >>>
    >>>
    >>> For 32-bit Unicode (that is, 21-bit...) you'll have to roll your own to
    >>> have portable code.
    >>>

    >>
    >>Huh? How is 32 bit Unicode only 21 bit? Just curious.

    >
    >Well, it's 21-bit, but unless you go for UCS-16 or UCS-8 variable length
    >encodings 32 bits is the nearest "de facto standard variable size".


    Sorry. I'm sick and so not thinking clearly. Should be _UTF_, not UCS.
     
    Alf P. Steinbach, Feb 14, 2004
    #6
  7. Re: [OT] Re: unicode

    "Alf P. Steinbach" <> wrote in message
    news:...
    > On Sat, 14 Feb 2004 16:52:40 GMT, (Alf P. Steinbach) wrote:
    >
    > >On Sat, 14 Feb 2004 16:38:02 -0000, "John Harrison"

    <> wrote:
    > >
    > >>
    > >>"Alf P. Steinbach" <> wrote in message
    > >>news:...
    > >>> On Sat, 14 Feb 2004 17:25:55 +0200, "<- Chameleon ->"
    > >><> wrote:
    > >>>
    > >>>
    > >>> For 32-bit Unicode (that is, 21-bit...) you'll have to roll your own

    to
    > >>> have portable code.
    > >>>
    > >>
    > >>Huh? How is 32 bit Unicode only 21 bit? Just curious.

    > >
    > >Well, it's 21-bit, but unless you go for UCS-16 or UCS-8 variable length
    > >encodings 32 bits is the nearest "de facto standard variable size".

    >
    > Sorry. I'm sick and so not thinking clearly. Should be _UTF_, not UCS.
    >


    So the Unicode organization have only defined codes up to 21 bits. Have they
    committed themselves to this, or is this just how far they've got so far?
    How many more of the world's scripts have they got to go?

    john
     
    John Harrison, Feb 14, 2004
    #7
  8. Guest

    P.J. Plauger Guest

    Re: [OT] Re: unicode

    "John Harrison" <> wrote in message
    news:c0lmvu$1957ui$-berlin.de...

    > > >>Huh? How is 32 bit Unicode only 21 bit? Just curious.
    > > >
    > > >Well, it's 21-bit, but unless you go for UCS-16 or UCS-8 variable

    length
    > > >encodings 32 bits is the nearest "de facto standard variable size".

    > >
    > > Sorry. I'm sick and so not thinking clearly. Should be _UTF_, not UCS.
    > >

    >
    > So the Unicode organization have only defined codes up to 21 bits. Have

    they
    > committed themselves to this,


    Yes.

    > or is this just how far they've got so far?


    Yes.

    > How many more of the world's scripts have they got to go?


    Lots.

    P.J. Plauger
    Dinkumware, Ltd.
    http://www.dinkumware.com
     
    P.J. Plauger, Feb 14, 2004
    #8
  9. Guest

    Jon Willeke Guest

    P.J. Plauger wrote:
    > "<- Chameleon ->" <> wrote in message
    > news:c0leno$sdl$...
    >
    >>2. how can I load a (previous) unicode_string from a stream?
    >>What stream I must use instead of ifstream?
    >>Can I use this: ?
    >>typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;

    >
    > See the on-line manual for our CoreX package. It describes the software
    > you need to read and write files of this sort and process them internally
    > as UNICODE-encoded wchar_t strings.


    It would be useful for the standard to specify some codecvt facets,
    especially for wchar_t UCS-2 / UCS-4 and char UTF-8. Is this likely to
    happen?
     
    Jon Willeke, Feb 15, 2004
    #9
  10. Guest

    P.J. Plauger Guest

    "Jon Willeke" <> wrote in message
    news:89NXb.51053$...

    > P.J. Plauger wrote:
    > > "<- Chameleon ->" <> wrote in message
    > > news:c0leno$sdl$...
    > >
    > >>2. how can I load a (previous) unicode_string from a stream?
    > >>What stream I must use instead of ifstream?
    > >>Can I use this: ?
    > >>typedef basic_ifstream<wchar_t, char_traits<wchar_t> > unicode_ifstream;

    > >
    > > See the on-line manual for our CoreX package. It describes the software
    > > you need to read and write files of this sort and process them

    internally
    > > as UNICODE-encoded wchar_t strings.

    >
    > It would be useful for the standard to specify some codecvt facets,
    > especially for wchar_t UCS-2 / UCS-4 and char UTF-8. Is this likely to
    > happen?


    The C and C++ Standards have so far been scrupulously character-set neutral.
    This sort of thing tends to fall through the cracks, which is why we
    produced CoreX.

    P.J. Plauger
    Dinkumware, Ltd.
    http://www.dinkumware.com
     
    P.J. Plauger, Feb 15, 2004
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Mark Bram
    Replies:
    0
    Views:
    4,039
    Robert Mark Bram
    Sep 28, 2003
  2. ygao

    unicode wrap unicode object?

    ygao, Apr 8, 2006, in forum: Python
    Replies:
    6
    Views:
    588
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Apr 8, 2006
  3. Gabriele *darkbard* Farina

    Unicode digit to unicode string

    Gabriele *darkbard* Farina, May 16, 2006, in forum: Python
    Replies:
    2
    Views:
    560
    Gabriele *darkbard* Farina
    May 16, 2006
  4. gabor
    Replies:
    13
    Views:
    583
    Leo Kislov
    Nov 18, 2006
  5. Jean-Paul Calderone
    Replies:
    23
    Views:
    723
    Leo Kislov
    Nov 21, 2006
Loading...

Share This Page