how to convert narrow string to wide string and vice versa?

Discussion in 'C++' started by thinktwice, Sep 7, 2006.

  1. thinktwice

    thinktwice Guest

    i'm using VC++6 IDE
    i know i could use macros like A2T, T2A,
    but is there any way more decent way to do this?
    thinktwice, Sep 7, 2006
    #1
    1. Advertising

  2. thinktwice

    Bart Guest

    thinktwice wrote:
    > i'm using VC++6 IDE
    > i know i could use macros like A2T, T2A,
    > but is there any way more decent way to do this?


    Look up std::ctype::widen and std::ctype::narrow in the <locale>
    header.

    Regards,
    Bart.
    Bart, Sep 7, 2006
    #2
    1. Advertising

  3. Bart wrote:
    > thinktwice wrote:
    > > i'm using VC++6 IDE
    > > i know i could use macros like A2T, T2A,
    > > but is there any way more decent way to do this?

    >
    > Look up std::ctype::widen and std::ctype::narrow in the <locale>
    > header.


    These may not be much good for Unicode or other variable width
    encodings - depends on how you use the resultant strings.

    It's a tricky thing to deal with. If you properly understand what you
    mean by 'narrow' and 'wide' strings the solution should present itself.
    If you're not sure what the string content means then you're unlikely
    to find the right solution in a library because you won't know how to
    use the functions or their results properly.


    K
    =?iso-8859-1?q?Kirit_S=E6lensminde?=, Sep 7, 2006
    #3
  4. Kirit Sælensminde wrote:
    > Bart wrote:
    >> thinktwice wrote:
    >>> i'm using VC++6 IDE
    >>> i know i could use macros like A2T, T2A,
    >>> but is there any way more decent way to do this?

    >> Look up std::ctype::widen and std::ctype::narrow in the <locale>
    >> header.

    >
    > These may not be much good for Unicode or other variable width
    > encodings - depends on how you use the resultant strings.
    >
    > It's a tricky thing to deal with. If you properly understand what you
    > mean by 'narrow' and 'wide' strings the solution should present itself.
    > If you're not sure what the string content means then you're unlikely
    > to find the right solution in a library because you won't know how to
    > use the functions or their results properly.
    >


    well, if you just want a quick ugly hack, then personally i've sometimes
    used:

    wstring wide(L"some wide character string");
    string narrow(wide.begin(), wide.end());

    But this is a cleaving axe for microsurgery: It depends on wide having
    equivalent encoding codepoints to the charset in string, which is only
    really tru if wstrings are unicode, contain only ISO-8859-1 characters
    (0-255), and normal character encoding is ISO-8859-1 or similar. (char
    type, depends on platform).

    I would actually be interested in seeing what the "clean" solution for
    converting is when you have, say, Unicode in wchar_t's and whatever
    encoding the locale specifies in char's (ISO-8859-1, or maybe
    windows-1252) :)

    //deice [deice at deice dot cjb dot net]
    //Arne Pajunen
    Arne 'deice' Pajunen, Sep 7, 2006
    #4
  5. Arne 'deice' Pajunen wrote:
    > Kirit Sælensminde wrote:
    > > Bart wrote:
    > >> thinktwice wrote:
    > >>> i'm using VC++6 IDE
    > >>> i know i could use macros like A2T, T2A,
    > >>> but is there any way more decent way to do this?
    > >> Look up std::ctype::widen and std::ctype::narrow in the <locale>
    > >> header.

    > >
    > > These may not be much good for Unicode or other variable width
    > > encodings - depends on how you use the resultant strings.
    > >
    > > It's a tricky thing to deal with. If you properly understand what you
    > > mean by 'narrow' and 'wide' strings the solution should present itself.
    > > If you're not sure what the string content means then you're unlikely
    > > to find the right solution in a library because you won't know how to
    > > use the functions or their results properly.
    > >

    >
    > well, if you just want a quick ugly hack, then personally i've sometimes
    > used:
    >
    > wstring wide(L"some wide character string");
    > string narrow(wide.begin(), wide.end());
    >
    > But this is a cleaving axe for microsurgery: It depends on wide having
    > equivalent encoding codepoints to the charset in string, which is only
    > really tru if wstrings are unicode, contain only ISO-8859-1 characters
    > (0-255), and normal character encoding is ISO-8859-1 or similar. (char
    > type, depends on platform).
    >
    > I would actually be interested in seeing what the "clean" solution for
    > converting is when you have, say, Unicode in wchar_t's and whatever
    > encoding the locale specifies in char's (ISO-8859-1, or maybe
    > windows-1252) :)


    The first step is to convert the UTF-16 (which is normal for wchar_t,
    but I think there may be some platforms/compilers that use UTF-32) to
    UTF-32. Then convert that down (often with a code table, but sometimes
    algorithmically). Of course there's the open question of what to do
    with characters that don't/can't map. In some applications you can use
    a variety of character encodings (as distinct to character sets). For
    example, if you're using ISO-8859-1 in XML/HTML you can use the forms
    XML/HTML defines for this.

    A full answer depends on what you are using the string for which is why
    it's so hard to answer. For some things your solution is perfectly
    valid - it's fine for the many parts of internet protocols which are
    defined to use ASCII characters only.

    For our framework we're looking at using ICU to do the conversions, but
    haven't had much of a chance to play with it yet. As nearly 100% of the
    interactions we do are through HTTP then we just use UTF-8 and that
    solves nearly the whole problem. We have found it useful to define our
    own std::wstring like class that uses UTF-32 as the single character
    interface points (operator[] and at() etc.) but uses UTF-16 for
    character sequences. Things like substr() use the correct position and
    count based on the number of UTF-32 characters _not_ the number of
    UTF-16 code points so applications can't chop in half some characters.


    K
    =?iso-8859-1?q?Kirit_S=E6lensminde?=, Sep 7, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. -
    Replies:
    8
    Views:
    601
    Antti S. Brax
    Jun 11, 2005
  2. Flyingaway
    Replies:
    6
    Views:
    22,774
    Jack Klein
    Feb 19, 2005
  3. Byron
    Replies:
    6
    Views:
    482
    Byron
    Sep 16, 2004
  4. Replies:
    3
    Views:
    8,742
  5. chen li
    Replies:
    3
    Views:
    110
    Daniel Martin
    Jul 13, 2007
Loading...

Share This Page