What to prefer - TCHAR arrays, std::string or std::wstring ?

Discussion in 'C++' started by rohitpatel9999@yahoo.com, Aug 2, 2006.

  1. Guest

    Hi

    While developing any software, developer need to think about it's
    possible enhancement for international usage and considering UNICODE.

    I have read many nice articles/items in advanced C++ books (Effective
    C++, More Effective C++, Exceptional C++, More Exceptional C++, C++
    FAQs, Addison Wesley 2nd Edition)

    Authors of these books have not considered UNICODE. So many of their
    suggestions/guidelines confuse developers regarding what to use for
    character-string members of class (considering exception-safety,
    reusability and maintenance of code).

    Many books have stated that:
    Instead of using character arrays, always prefer using std::string.

    My Questions is:

    While developing generic Win32 app using C++ for Windows
    (98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
    What to prefer - TCHAR arrays, std::string or std::wstring
    for character-string members (name, address, city, state, country etc.)

    of classes like Address, Customer, Vendor, Employee ?

    What to prefer - TCHAR arrays, std::string or std::wstring ?

    I truly appreciate any help or guideline.
    Anand
    , Aug 2, 2006
    #1
    1. Advertising

  2. Marcus Kwok Guest

    wrote:
    > My Questions is:
    >
    > While developing generic Win32 app using C++ for Windows
    > (98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
    > What to prefer - TCHAR arrays, std::string or std::wstring
    > for character-string members (name, address, city, state, country etc.)
    >
    > of classes like Address, Customer, Vendor, Employee ?
    >
    > What to prefer - TCHAR arrays, std::string or std::wstring ?
    >
    > I truly appreciate any help or guideline.


    Standard C++ does not know about the TCHAR type (I know what it
    represents, but it is not a standard language feature), and formally
    also does not know about Unicode (std::wstring isn't quite Unicode).
    Handling Unicode can be a complex topic, and one on which I cannot claim
    to be well versed in.

    Your question is probably better suited for a Windows newsgroup.

    --
    Marcus Kwok
    Replace 'invalid' with 'net' to reply
    Marcus Kwok, Aug 2, 2006
    #2
    1. Advertising

  3. Phlip Guest

    rohitpatel9999 wrote:

    > While developing any software, developer need to think about it's
    > possible enhancement for international usage and considering UNICODE.


    Negative. Programmers must prepare for _anything_. The requirement for
    Unicode may or may not come next.

    Prepare for anything by writing copious unit tests, and by folding as much
    duplication as possible. If you duplicate the word "the" in two strings,
    fold them into one.

    If you then need to localize, read this:

    http://flea.sourceforge.net/TFUI_localization.doc

    Then incrementally move your strings into a pluggable resource, and
    incrementally widen or convert your string variables. "Incrementally" means
    one at a time, passing all tests after each small edit.

    The myth that some important decisions must be made early, to avoid the cost
    of a late change, is a self-fulfilling prophecy of defeat.

    > Authors of these books have not considered UNICODE. So many of their
    > suggestions/guidelines confuse developers regarding what to use for
    > character-string members of class (considering exception-safety,
    > reusability and maintenance of code).


    Right. They all use std::string, because many programmers learned C first,
    where a character array is still the simplest and most robust way to
    represent a fixed-length string. So std::string should be the default,
    without a real reason to use anything else. Such a reason could then switch
    you to TCHAR, or to std::wstring, or to something else.

    > My Questions is:
    >
    > While developing generic Win32 app using C++ for Windows
    > (98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
    > What to prefer - TCHAR arrays, std::string or std::wstring
    > for character-string members (name, address, city, state, country etc.)


    Tell your "customer liaison", the person authorized to request features, if
    you should spend 9 days working on their next feature, or 18 days working on
    that feature + internationalization.

    If they need only English, then use std::string everywhere you possibly can,
    and something like CString for the remainder.

    When they schedule a port to another language, you obtain a glossary for
    that language _first_. Then you refactor your code to use something like
    std::basic_string<TCHAR>.

    If you truly need TCHAR in its WCHAR mode, then you must configure your
    tests to run (and pass) with the _UNICODE version of your binary. You should
    always pass all such tests, each time you change anything. Otherwise you
    might make an innocent change that works in one mode, but breaks in another.

    Further, not all code-pages can use WCHAR or wchar_t. Spanish, for example,
    is the same code-page as English. Greek is a different code-page, but it
    still uses 8-bit bytes. So you should only enable the few features you need
    to support another language, and not all those languages need Unicode. Some
    versions of Chinese don't need it.

    If you truly need "one binary that presents all languages, mixed together",
    then you need Unicode. And if you need a rare language like Sanskrit or
    Inuit, that has no independent 8-bit code-page, then you will need Unicode.
    Otherwise you probably don't.

    From here, you must read a book on internationalization. Yet you don't do
    _any_ of that research until your business side has selected a target
    language. Otherwise you will just be writing speculative features that
    _might_ work with any language.

    So default to std::string, and keep your programming velocity high. That
    helps ensure that your clients will be _able_ to eventually target the
    international markets...

    --
    Phlip
    http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
    Phlip, Aug 2, 2006
    #3
  4. Guest

    Thank you for helpful suggestions.
    Suggestion of using std::basic_string<TCHAR> is also good.

    Client is sure that they will need UNICODE for few languages (e.g.
    Japanese).
    Client req. document did specify to make code C++ generic for UNICODE
    consideration (but should not use MFC specific CString).

    So (in Microfost Visual C++)
    application build for Win98/ME will have MBCS defined
    application build for Win2000/NT/2003/XP will have UNICODE and _UNICODE
    defined.

    Please guide me, (considering exception-safety, reusability and
    maintenance of code).

    What to prefer - TCHAR arrays, std::string or std::wstring ?

    or Which of the following three classes is preferable ?

    e.g.

    /* Option 1 */
    class Address
    {
    _TCHAR name[30];
    _TCHAR addressline1[30];
    _TCHAR addressline2[30];
    _TCHAR city[30];
    }


    /* Option 2 */
    class Address
    {
    std::basic_string<TCHAR> name;
    std::basic_string<TCHAR> addressline1;
    std::basic_string<TCHAR> addressline2;
    std::basic_string<TCHAR> city;
    }


    /* Option 3 */
    #ifdef UNICODE
    typedef std::wstring tstring
    #else
    typedef std::string tstring
    #endif
    class Address
    {
    tstring name;
    tstring addressline1;
    tstring addressline2;
    tstring city;
    }

    Thanks again.
    Anand (Rohit)
    , Aug 3, 2006
    #4
  5. wrote:
    > Hi
    >
    > While developing any software, developer need to think about it's
    > possible enhancement for international usage and considering UNICODE.
    >
    > I have read many nice articles/items in advanced C++ books (Effective
    > C++, More Effective C++, Exceptional C++, More Exceptional C++, C++
    > FAQs, Addison Wesley 2nd Edition)
    >
    > Authors of these books have not considered UNICODE. So many of their
    > suggestions/guidelines confuse developers regarding what to use for
    > character-string members of class (considering exception-safety,
    > reusability and maintenance of code).
    >
    > Many books have stated that:
    > Instead of using character arrays, always prefer using std::string.
    >
    > My Questions is:
    >
    > While developing generic Win32 app using C++ for Windows
    > (98/NT/2000/2003/XP), considering unicode for Windows NT/2000/2003/XP,
    > What to prefer - TCHAR arrays, std::string or std::wstring
    > for character-string members (name, address, city, state, country etc.)
    >
    > of classes like Address, Customer, Vendor, Employee ?
    >
    > What to prefer - TCHAR arrays, std::string or std::wstring ?
    >
    > I truly appreciate any help or guideline.
    > Anand


    I don't use TCHAR as it's a horrid kludge and has problems of its own.
    Although it pretends to support both wchar_t and char it's slightly
    broken. The _T macro that may or may not put the L in front of string
    literals is even more broken.

    As you're developing on Windows then just use wchar_t (and tell MSVC to
    define it as a base type, not a typedef to short). You will get exactly
    zero benefit from trying to compile the same program with and without
    Unicode support.

    It is normally much better to just use Unicode internally and then
    convert to eight bit in whatever localised form you need when you have
    to do so. You will find that you have to do all of this anyway for any
    non-trivial program.


    K
    =?iso-8859-1?q?Kirit_S=E6lensminde?=, Aug 3, 2006
    #5
  6. Phlip Guest

    rohitpatel9999 wrote:

    > Client is sure that they will need UNICODE for few languages (e.g.
    > Japanese).


    There are requirements and then there are requirements.

    I once ported an application to Greek. The original author had added lots of
    calls to convert between code-pages. Then the program never converted to any
    code pages - it all worked in Western Europe with just one code-page.

    I had a lot of fun diagnosing and fixing each bug, the first time any of
    these conversion functions ever got called. Oh, and I was implicitly blamed
    for the slow velocity, not the original programmer.

    So, has this client arranged to provide a real Japanese locale, with a
    glossary, for you to port the app to _now_?

    Without the critical step of actually using this speculative code, the
    client will instead order you to waste time twice, now when you proactively
    code for Unicode, and later when you actually provide a new locale.

    > Client req. document did specify to make code C++ generic for UNICODE
    > consideration (but should not use MFC specific CString).
    >
    > So (in Microfost Visual C++)
    > application build for Win98/ME will have MBCS defined
    > application build for Win2000/NT/2003/XP will have UNICODE and _UNICODE
    > defined.
    >
    > Please guide me, (considering exception-safety, reusability and
    > maintenance of code).


    From here on, I can't. The question is now only on-topic for, roughly,
    news:microsoft.public.vc.language , or possibly a localization forum
    thereof. However, MBCS might provide for as much Japanese as UNICODE would.
    You need to ask your client for a real Japanese locale, and then you need to
    match your work to it. (And don't get me started about UCS.)

    If they give you a glossary in the JIS201 code-page, then an 8-bit non-MBCS
    would work for both the Win95s and the WinNTs. If you first enabled UNICODE,
    and only then discover your glossary is in JIS201, then you would have
    wasted that effort.

    (You could use iconv to convert the glossary to UNICODE or back. The goal is
    to match which code-page Japanese customers will accept. Has your client
    actually researched this?)

    > What to prefer - TCHAR arrays, std::string or std::wstring ?


    Joel Spolky sez "there's no such thing as raw text". The rejoinder is that
    wchar_t does not a localized application make.

    If you need UNICODE, and if you truly need to pack all kinds of text into
    any string, then you need a kind of UTF to encode it. UNICODE is a character
    set, not an encoding. And if you can go with UTF-8, even on a Win95 machine,
    then you don't need std::wstring.

    > _TCHAR name[30];


    Never. The fixed-length string itself will cause untold horror.

    > std::basic_string<TCHAR> name;


    Only if you actually test both modes, as you program.

    And please introduce a typedef:

    typedef std::basic_string<TCHAR> tstring;

    > /* Option 3 */
    > #ifdef UNICODE
    > typedef std::wstring tstring


    This is a clumsy version of Option 2.

    The next complaint is that neither wchar_t or WCHAR are "UNICODE". Sometimes
    they are UTF-16. (And on some compilers wchar_t is UTF-32.)

    The more you seek a simple answer, the harder this problem will get. The
    answer would be simple if you had enough evidence to back up your decision.
    Always get as much evidence as possible - preferrably from live deployed
    code - before making hard and irreversible decisions. Your client clearly
    has experience with source code that created problems when it localized.
    They _cannot_ fix this by just guessing you will need the _UNICODE flag
    turned on. You must work with them to either defer the requirement, and
    write clean code, or promote the requirement, targetting a real release
    candidate that a real international user will accept.

    --
    Phlip
    http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
    Phlip, Aug 3, 2006
    #6
  7. Phlip Guest

    Kirit Sælensminde wrote:

    > As you're developing on Windows then just use wchar_t (and tell MSVC to
    > define it as a base type, not a typedef to short). You will get exactly
    > zero benefit from trying to compile the same program with and without
    > Unicode support.


    Except that turning on _UNICODE will automagically make the compiler and
    program interpret your RC file in UTF-16 instead of a code-paged 8-bit
    encoding.

    > It is normally much better to just use Unicode internally and then
    > convert to eight bit in whatever localised form you need when you have
    > to do so. You will find that you have to do all of this anyway for any
    > non-trivial program.


    The OP also has the requirement to target the Win95s, which can't run in
    Wide mode.

    Aren't there strap-on DLL sets that provide a kind of Wide mode for the
    Win95s? If so, the OP could deploy these with the application, build
    everything for UNICODE, and safely neglect to enable any other code-pages.

    --
    Phlip
    http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
    Phlip, Aug 3, 2006
    #7
  8. loufoque Guest

    wrote :

    > What to prefer - TCHAR arrays, std::string or std::wstring ?


    Just make anything Unicode-aware without using any specific stupidity
    from the win32 API.
    However, if you rely heavily on that API it may be annoying to interface
    with it if you don't follow its internationalization concepts.
    But anyway if you rely that much on it you're coding something so
    specific that you should ask in another group.

    std::wstring will allow UCS-2 (on win32) and UCS-4 (on most unices).
    You can use std::string for 'unsafe' utf-8, which is in most of the
    cases enough.

    Or you could use ICU or glibmm for advanced Unicode support.
    loufoque, Aug 3, 2006
    #8
  9. Bo Persson Guest

    "Phlip" <> skrev i meddelandet
    news:mdnAg.1984$...
    > Kirit Sælensminde wrote:
    >
    >> As you're developing on Windows then just use wchar_t (and tell
    >> MSVC to
    >> define it as a base type, not a typedef to short). You will get
    >> exactly
    >> zero benefit from trying to compile the same program with and
    >> without
    >> Unicode support.

    >
    > Except that turning on _UNICODE will automagically make the compiler
    > and program interpret your RC file in UTF-16 instead of a code-paged
    > 8-bit encoding.


    You can turn that option on as well, if it has any advantage. Using
    wchar_t and std::wstring in your application makes it independent of
    those settings.

    >
    >> It is normally much better to just use Unicode internally and then
    >> convert to eight bit in whatever localised form you need when you
    >> have
    >> to do so. You will find that you have to do all of this anyway for
    >> any
    >> non-trivial program.

    >
    > The OP also has the requirement to target the Win95s, which can't
    > run in Wide mode.


    Windows 95, 98, and NT are officially unsupported both as OSs and as
    targets for the present compiler. All currently supported Windows
    versions use wchar_t internally. New applications could do that as
    well.

    Using TCHAR to optionally compile a new application for a dead OS
    doesn't seem very useful to me. :)

    >
    > Aren't there strap-on DLL sets that provide a kind of Wide mode for
    > the Win95s? If so, the OP could deploy these with the application,
    > build everything for UNICODE, and safely neglect to enable any other
    > code-pages.


    Except that these are as dead as their OSs. Can't be distributed after
    their end-of-life.


    Bo Persson
    Bo Persson, Aug 3, 2006
    #9
  10. loufoque Guest

    Phlip wrote :

    > The OP also has the requirement to target the Win95s, which can't run in
    > Wide mode.


    Actually, you can probably do it with MSLU (the Microsoft Layer for
    Unicode on Windows 95, 98, and Me systems)
    loufoque, Aug 3, 2006
    #10
  11. Phlip Guest

    Bo Persson wrote:

    > Windows 95, 98, and NT are officially unsupported both as OSs and as
    > targets for the present compiler. All currently supported Windows versions
    > use wchar_t internally. New applications could do that as well.


    Nice to know, but I use "Win95s" to refer to the lineage, up to ME, and
    WinNTs for versions up to Win2005 or whatever.

    > Using TCHAR to optionally compile a new application for a dead OS doesn't
    > seem very useful to me. :)


    The OP seems to have a requirements bottleneck. Sometimes a client will
    over-specify everything, hoping to keep their options open. Narrow
    requirements and clean code will do that better than guessing that the
    program must someday port to a Win95-derived platform.

    Is WinME officially dead?

    > Except that these are as dead as their OSs. Can't be distributed after
    > their end-of-life.


    You mean MS makes packaging an unsupported DLL illegal? They retract its
    license or something? Don't they know the 17th Rule of Acquisition is "A
    contract is a contract"?

    Regardless, if the client actually needs to target the home market, they
    must start with MS's official definition of that market.

    Turning on UNICODE will make all OS strings wide, and will turn on UTF-16.
    Hence, go with std::wstring, hard-coded, everywhere.

    --
    Phlip
    http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
    Phlip, Aug 3, 2006
    #11
  12. Bo Persson Guest

    "Phlip" <> skrev i meddelandet
    news:EXqAg.4238$...
    > Bo Persson wrote:
    >
    >> Windows 95, 98, and NT are officially unsupported both as OSs and
    >> as targets for the present compiler. All currently supported
    >> Windows versions use wchar_t internally. New applications could do
    >> that as well.

    >
    > Nice to know, but I use "Win95s" to refer to the lineage, up to ME,
    > and WinNTs for versions up to Win2005 or whatever.
    >
    >> Using TCHAR to optionally compile a new application for a dead OS
    >> doesn't seem very useful to me. :)

    >
    > The OP seems to have a requirements bottleneck. Sometimes a client
    > will over-specify everything, hoping to keep their options open.
    > Narrow requirements and clean code will do that better than guessing
    > that the program must someday port to a Win95-derived platform.
    >
    > Is WinME officially dead?


    It is still supported I guess, but it never worked very well. Was sort
    of a downgrade from Windows 98 - nothing much new, just more unstable.
    :)

    >
    >> Except that these are as dead as their OSs. Can't be distributed
    >> after their end-of-life.

    >
    > You mean MS makes packaging an unsupported DLL illegal? They retract
    > its license or something? Don't they know the 17th Rule of
    > Acquisition is "A contract is a contract"?


    From what I know, MS has removed it from their servers so you cannot
    get it legitimately anymore. If you already use it and continue to
    distribute it, they will probably not sue. If you have a problem
    though, what happens?

    >
    > Regardless, if the client actually needs to target the home market,
    > they must start with MS's official definition of that market.
    >
    > Turning on UNICODE will make all OS strings wide, and will turn on
    > UTF-16. Hence, go with std::wstring, hard-coded, everywhere.


    Right.


    Bo Persson
    Bo Persson, Aug 3, 2006
    #12
  13. Phlip Guest

    Bo Persson wrote:

    >> Turning on UNICODE will make all OS strings wide, and will turn on
    >> UTF-16. Hence, go with std::wstring, hard-coded, everywhere.

    >
    > Right.


    Then, per my lecture on requirements, neither compile for nor use any 8-bit
    mode, or std::string. Never leave a "flavor" of a program that's full of
    bugs and nasty surprises, expecting that it "might be useful someday".

    --
    Phlip
    http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
    Phlip, Aug 3, 2006
    #13
  14. red floyd Guest

    Phlip wrote:

    >
    > Is WinME officially dead?



    WinME was officially dead upon release. :)
    red floyd, Aug 3, 2006
    #14
  15. Phlip Guest

    red floyd wrote:

    > WinME was officially dead upon release. :)


    Why didn't they just call it WinY2K Bug?

    ;-)

    --
    Phlip
    http://c2.com/cgi/wiki?ZeekLand <-- NOT a blog!!!
    Phlip, Aug 3, 2006
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. red floyd
    Replies:
    1
    Views:
    4,585
    Ron Natalie
    Oct 15, 2003
  2. sorty
    Replies:
    4
    Views:
    20,699
    Rolf Magnus
    Nov 25, 2003
  3. He Shiming
    Replies:
    8
    Views:
    4,773
    Stephen Howe
    Jan 3, 2005
  4. Jeffrey Walton
    Replies:
    10
    Views:
    927
    Mathias Gaunard
    Nov 26, 2006
  5. Peter Poulsen
    Replies:
    5
    Views:
    693
Loading...

Share This Page