basic_string with unsigned short

Discussion in 'C++' started by wolverine, Oct 28, 2006.

  1. wolverine

    wolverine Guest

    Hi
    I want to know how to use basic_string with unsigned short (I have
    mentioned below why i have to do this). Could any tell me some good
    references in this topic. I am new to creating a new basic_string
    class.

    #include <string>
    #include<iostream>
    using namespace std;

    struct unsigned_short_traits
    {
    typedef unsigned short _E;
    typedef _E char_type;
    typedef int int_type;
    typedef std::streampos pos_type;
    typedef std::streamoff off_type;
    typedef std::mbstate_t state_type;
    static void assign(_E& _X, const _E& _Y)
    {_X = _Y; }
    static bool eq(const _E& _X, const _E& _Y)
    {return (_X == _Y); }
    static bool lt(const _E& _X, const _E& _Y)
    {return (_X < _Y); }
    static int compare(const _E *_U, const _E *_V, size_t _N)
    {return (memcmp(_U, _V, _N)); }
    static size_t length(const _E *_U)
    {return (strlen((const char *)_U)); }
    static _E * copy(_E *_U, const _E *_V, size_t _N)
    {return ((_E *)memcpy(_U, _V, _N)); }
    static const _E * find(const _E *_U, size_t _N, const _E& _C)
    {return ((const _E *)memchr(_U, _C, _N)); }
    static _E * move(_E *_U, const _E *_V, size_t _N)
    {return ((_E *)memmove(_U, _V, _N)); }
    static _E * assign(_E *_U, size_t _N, const _E& _C)
    {return ((_E *)memset(_U, _C, _N)); }
    static _E to_char_type(const int_type& _C)
    {return ((_E)_C); }
    static int_type to_int_type(const _E& _C)
    {return ((int_type)(_C)); }
    static bool eq_int_type(const int_type& _X, const int_type& _Y)
    {return (_X == _Y); }
    static int_type eof()
    {return (EOF); }
    static int_type not_eof(const int_type& _C)
    {return (_C != eof() ? _C : !eof()); }
    };

    typedef std::basic_string<unsigned short, unsigned_short_traits>
    utf16string;

    int main()
    {
    char *a = "abc";
    utf16string str(reinterpret_cast<unsigned short*>(a));
    cout<<str<<endl;
    return 0;
    }

    REASON TO CREATE THIS utf16string

    I am using xerces parser which uses a XMLCh ( typedef unsigned short
    XMLCh) as the basic character. Most of xerces functions have XMLCh
    pointers as input. But since my application has to be unicode supported
    and at the same time i cannot use std::wstring I cannot use
    std::wstring since wchar_t is 32 bit in linux and XMLCh is 16 bit. So
    conversion between std::wstring and XMLCh will not work. So i thought
    of defining basic_string with unsigned short.


    I know this group is not for solving issues in c++ isses regarding any
    platform (linux). But i am just asking how to use basic_string with
    unsigned char.

    Thanks in Advance
    Kiran.
     
    wolverine, Oct 28, 2006
    #1
    1. Advertising

  2. On 27 Oct 2006 22:52:28 -0700, "wolverine" <>
    wrote:
    > I want to know how to use basic_string with unsigned short (I have
    >mentioned below why i have to do this). Could any tell me some good
    >references in this topic. I am new to creating a new basic_string
    >class.
    >
    >#include <string>
    >#include<iostream>
    >using namespace std;


    there should be no 'using namespace ...' in header files

    >struct unsigned_short_traits


    Specialize the char_traits template

    template<>
    struct char_traits<XMLCh>

    >{
    > typedef unsigned short _E;
    > typedef _E char_type;
    > typedef int int_type;
    > typedef std::streampos pos_type;
    > typedef std::streamoff off_type;


    // may not be correct

    > typedef std::mbstate_t state_type;
    > static void assign(_E& _X, const _E& _Y)
    > {_X = _Y; }
    > static bool eq(const _E& _X, const _E& _Y)
    > {return (_X == _Y); }
    > static bool lt(const _E& _X, const _E& _Y)
    > {return (_X < _Y); }
    > static int compare(const _E *_U, const _E *_V, size_t _N)
    > {return (memcmp(_U, _V, _N)); }
    > static size_t length(const _E *_U)
    > {return (strlen((const char *)_U)); }


    // strlen doesn't work for unsigned long

    > static _E * copy(_E *_U, const _E *_V, size_t _N)
    > {return ((_E *)memcpy(_U, _V, _N)); }
    > static const _E * find(const _E *_U, size_t _N, const _E& _C)
    > {return ((const _E *)memchr(_U, _C, _N)); }
    > static _E * move(_E *_U, const _E *_V, size_t _N)
    > {return ((_E *)memmove(_U, _V, _N)); }
    > static _E * assign(_E *_U, size_t _N, const _E& _C)
    > {return ((_E *)memset(_U, _C, _N)); }
    > static _E to_char_type(const int_type& _C)
    > {return ((_E)_C); }
    > static int_type to_int_type(const _E& _C)
    > {return ((int_type)(_C)); }
    > static bool eq_int_type(const int_type& _X, const int_type& _Y)
    > {return (_X == _Y); }
    > static int_type eof()
    > {return (EOF); }
    > static int_type not_eof(const int_type& _C)
    > {return (_C != eof() ? _C : !eof()); }
    >};


    You need to test those traits. Probably there are some more issues
    with them.

    >typedef std::basic_string<unsigned short, unsigned_short_traits>
    >utf16string;


    typedef std::basic_string<XMLCh> XMLstring;

    It's not an UTF16 string because characters in UTF16 can be longer
    then unsigned long.

    >int main()
    >{
    > char *a = "abc";
    > utf16string str(reinterpret_cast<unsigned short*>(a));


    Oops, you need to write a function, that converts char* into XMLCh*
    (probably available in Xerces).

    > cout<<str<<endl;


    cout is for char. It doesn't work with XMLCh.

    > return 0;
    >}
    >
    >REASON TO CREATE THIS utf16string
    >
    >I am using xerces parser which uses a XMLCh ( typedef unsigned short
    >XMLCh) as the basic character.


    Best wishes,
    Roland Pibinger
     
    Roland Pibinger, Oct 28, 2006
    #2
    1. Advertising

  3. wolverine

    wolverine Guest

    Hi
    Thanks for your reply. It helped. Thanks for spending your valuable
    time in helping novices like me.

    Thanks
    Kiran Pradeep

    Roland Pibinger wrote:
    > On 27 Oct 2006 22:52:28 -0700, "wolverine" <>
    > wrote:
    > > I want to know how to use basic_string with unsigned short (I have
    > >mentioned below why i have to do this). Could any tell me some good
    > >references in this topic. I am new to creating a new basic_string
    > >class.
    > >
    > >#include <string>
    > >#include<iostream>
    > >using namespace std;

    >
    > there should be no 'using namespace ...' in header files
    >
    > >struct unsigned_short_traits

    >
    > Specialize the char_traits template
    >
    > template<>
    > struct char_traits<XMLCh>
    >
    > >{
    > > typedef unsigned short _E;
    > > typedef _E char_type;
    > > typedef int int_type;
    > > typedef std::streampos pos_type;
    > > typedef std::streamoff off_type;

    >
    > // may not be correct
    >
    > > typedef std::mbstate_t state_type;
    > > static void assign(_E& _X, const _E& _Y)
    > > {_X = _Y; }
    > > static bool eq(const _E& _X, const _E& _Y)
    > > {return (_X == _Y); }
    > > static bool lt(const _E& _X, const _E& _Y)
    > > {return (_X < _Y); }
    > > static int compare(const _E *_U, const _E *_V, size_t _N)
    > > {return (memcmp(_U, _V, _N)); }
    > > static size_t length(const _E *_U)
    > > {return (strlen((const char *)_U)); }

    >
    > // strlen doesn't work for unsigned long
    >
    > > static _E * copy(_E *_U, const _E *_V, size_t _N)
    > > {return ((_E *)memcpy(_U, _V, _N)); }
    > > static const _E * find(const _E *_U, size_t _N, const _E& _C)
    > > {return ((const _E *)memchr(_U, _C, _N)); }
    > > static _E * move(_E *_U, const _E *_V, size_t _N)
    > > {return ((_E *)memmove(_U, _V, _N)); }
    > > static _E * assign(_E *_U, size_t _N, const _E& _C)
    > > {return ((_E *)memset(_U, _C, _N)); }
    > > static _E to_char_type(const int_type& _C)
    > > {return ((_E)_C); }
    > > static int_type to_int_type(const _E& _C)
    > > {return ((int_type)(_C)); }
    > > static bool eq_int_type(const int_type& _X, const int_type& _Y)
    > > {return (_X == _Y); }
    > > static int_type eof()
    > > {return (EOF); }
    > > static int_type not_eof(const int_type& _C)
    > > {return (_C != eof() ? _C : !eof()); }
    > >};

    >
    > You need to test those traits. Probably there are some more issues
    > with them.
    >
    > >typedef std::basic_string<unsigned short, unsigned_short_traits>
    > >utf16string;

    >
    > typedef std::basic_string<XMLCh> XMLstring;
    >
    > It's not an UTF16 string because characters in UTF16 can be longer
    > then unsigned long.
    >
    > >int main()
    > >{
    > > char *a = "abc";
    > > utf16string str(reinterpret_cast<unsigned short*>(a));

    >
    > Oops, you need to write a function, that converts char* into XMLCh*
    > (probably available in Xerces).
    >
    > > cout<<str<<endl;

    >
    > cout is for char. It doesn't work with XMLCh.
    >
    > > return 0;
    > >}
    > >
    > >REASON TO CREATE THIS utf16string
    > >
    > >I am using xerces parser which uses a XMLCh ( typedef unsigned short
    > >XMLCh) as the basic character.

    >
    > Best wishes,
    > Roland Pibinger
     
    wolverine, Oct 28, 2006
    #3
  4. On 28 Oct 2006 04:51:08 -0700, "wolverine" <>
    wrote:
    > Thanks for your reply. It helped. Thanks for spending your valuable
    >time in helping novices like me.


    I forgot to mention that since the char_traits template is in
    namespace std also the specialization needs to be in namespace std.
    This looks strange because normally you are not allowed to put
    anything into std. Alternatively you can define your traits as 'struct
    unsigned_short_traits' and provide it as a second template paramerer
    as you have done in your code. Actually I'd prefer your solution.

    Best wishes,
    Roland Pibinger
     
    Roland Pibinger, Oct 28, 2006
    #4
  5. wolverine

    wolverine Guest

    Hi
    Thanks for your reply. It helped. Thanks for spending your valuable
    time in helping novices like me.

    Thanks
    Kiran Pradeep

    Roland Pibinger wrote:
    > On 27 Oct 2006 22:52:28 -0700, "wolverine" <>
    > wrote:
    > > I want to know how to use basic_string with unsigned short (I have
    > >mentioned below why i have to do this). Could any tell me some good
    > >references in this topic. I am new to creating a new basic_string
    > >class.
    > >
    > >#include <string>
    > >#include<iostream>
    > >using namespace std;

    >
    > there should be no 'using namespace ...' in header files
    >
    > >struct unsigned_short_traits

    >
    > Specialize the char_traits template
    >
    > template<>
    > struct char_traits<XMLCh>
    >
    > >{
    > > typedef unsigned short _E;
    > > typedef _E char_type;
    > > typedef int int_type;
    > > typedef std::streampos pos_type;
    > > typedef std::streamoff off_type;

    >
    > // may not be correct
    >
    > > typedef std::mbstate_t state_type;
    > > static void assign(_E& _X, const _E& _Y)
    > > {_X = _Y; }
    > > static bool eq(const _E& _X, const _E& _Y)
    > > {return (_X == _Y); }
    > > static bool lt(const _E& _X, const _E& _Y)
    > > {return (_X < _Y); }
    > > static int compare(const _E *_U, const _E *_V, size_t _N)
    > > {return (memcmp(_U, _V, _N)); }
    > > static size_t length(const _E *_U)
    > > {return (strlen((const char *)_U)); }

    >
    > // strlen doesn't work for unsigned long
    >
    > > static _E * copy(_E *_U, const _E *_V, size_t _N)
    > > {return ((_E *)memcpy(_U, _V, _N)); }
    > > static const _E * find(const _E *_U, size_t _N, const _E& _C)
    > > {return ((const _E *)memchr(_U, _C, _N)); }
    > > static _E * move(_E *_U, const _E *_V, size_t _N)
    > > {return ((_E *)memmove(_U, _V, _N)); }
    > > static _E * assign(_E *_U, size_t _N, const _E& _C)
    > > {return ((_E *)memset(_U, _C, _N)); }
    > > static _E to_char_type(const int_type& _C)
    > > {return ((_E)_C); }
    > > static int_type to_int_type(const _E& _C)
    > > {return ((int_type)(_C)); }
    > > static bool eq_int_type(const int_type& _X, const int_type& _Y)
    > > {return (_X == _Y); }
    > > static int_type eof()
    > > {return (EOF); }
    > > static int_type not_eof(const int_type& _C)
    > > {return (_C != eof() ? _C : !eof()); }
    > >};

    >
    > You need to test those traits. Probably there are some more issues
    > with them.
    >
    > >typedef std::basic_string<unsigned short, unsigned_short_traits>
    > >utf16string;

    >
    > typedef std::basic_string<XMLCh> XMLstring;
    >
    > It's not an UTF16 string because characters in UTF16 can be longer
    > then unsigned long.
    >
    > >int main()
    > >{
    > > char *a = "abc";
    > > utf16string str(reinterpret_cast<unsigned short*>(a));

    >
    > Oops, you need to write a function, that converts char* into XMLCh*
    > (probably available in Xerces).
    >
    > > cout<<str<<endl;

    >
    > cout is for char. It doesn't work with XMLCh.
    >
    > > return 0;
    > >}
    > >
    > >REASON TO CREATE THIS utf16string
    > >
    > >I am using xerces parser which uses a XMLCh ( typedef unsigned short
    > >XMLCh) as the basic character.

    >
    > Best wishes,
    > Roland Pibinger
     
    wolverine, Oct 28, 2006
    #5
  6. Roland Pibinger wrote:
    > On 27 Oct 2006 22:52:28 -0700, "wolverine" <>
    > wrote:
    > >typedef std::basic_string<unsigned short, unsigned_short_traits>
    > >utf16string;

    >
    > typedef std::basic_string<XMLCh> XMLstring;
    >
    > It's not an UTF16 string because characters in UTF16 can be longer
    > then unsigned long.


    Might be worth clarifying this a little - I'm not totally sure what
    point you're trying to make, but it's probably the following.

    The semantics of std::basic_string are such that they assume each
    character type (XMLCh in your case) holds one character. Of course for
    UTF-16 sequences this isn't true.

    Now this doesn't mean that you can't use std::basic_string in this
    circumstance, but it does mean that you must be careful what operations
    you perform on it. For example, parsing the string is Ok and chopping
    the string at a boundary you find by looking for certain characters is
    safe.

    What isn't safe though is things like:

    myXMLstring.substr( 0, 100 );

    This may chop the last character (give you only one half of a surrogate
    pair) and in any case will return you a string with between 100 and 50
    Unicode characters. You will also have some dissonance with iterators
    and operators like [] and the member at().

    Things like this are deeply suspicous:

    std::transform( myXMLstring.begin(), myXMLstring.end(), std::towupper
    );


    For our FOST.3 framework we went down a slightly different route. We
    took the std::basic_string and copied the concepts of the interface,
    but used UTF-16 sequences for all character sequences and UTF-32 for
    all character operations.

    So there is a constructor that does wstring( const wchar_t * ) and
    another which is wstring( size_type, utf32 ), using utf32 as a typedef
    for a 32 bit int. Dereferencing an iterator returns a UTF-32 character
    as does at() and operator [], but c_str() returns a UTF-16 sequence.
    size() and length() both return the number of UTF-32 characters (there
    are seperate members for fetching the UTF-16 length).

    This seems to give us the best of both worlds - normal UTF-16 handling
    for the Windows API, but correct Unicode handling for string
    operations.

    If you are currently exploring new std::basic_string character types
    for your needs you may also want to think about this alternative
    implementation.

    Of course if you use std::basic_string with a UTF-32 character type
    then all this complication goes away because there the one-to-one
    mapping is maintained.


    K
     
    =?iso-8859-1?q?Kirit_S=E6lensminde?=, Oct 28, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    828
    Kaz Kylheku
    Oct 17, 2006
  2. wolverine
    Replies:
    0
    Views:
    332
    wolverine
    Oct 28, 2006
  3. wolverine
    Replies:
    0
    Views:
    342
    wolverine
    Oct 28, 2006
  4. wolverine
    Replies:
    0
    Views:
    392
    wolverine
    Oct 30, 2006
  5. Ioannis Vranos

    unsigned short, short literals

    Ioannis Vranos, Mar 4, 2008, in forum: C Programming
    Replies:
    5
    Views:
    681
    Eric Sosman
    Mar 5, 2008
Loading...

Share This Page