How can I convert a ISO 8859-2 to UTF?

Discussion in 'C++' started by focussedgyan@gmail.com, Jan 23, 2009.

  1. Guest

    How can I convert a ISO 8859-2 to UTF?
    How can I convert a ISO 8859-2 and ISO 8859-9 to UTF in c++?
    Is there something similar in c++ as in Java ?
    In Java we can simply create a new string(oldstring, "8859-2");
    Is it that simple in c++?
    , Jan 23, 2009
    #1
    1. Advertising

  2. "" <> writes:

    > How can I convert a ISO 8859-2 to UTF?
    > How can I convert a ISO 8859-2 and ISO 8859-9 to UTF in c++?
    > Is there something similar in c++ as in Java ?
    > In Java we can simply create a new string(oldstring, "8859-2");
    > Is it that simple in c++?


    In general, you could use libiconv, and then yes, it's as simple.




    In the case of iso-8859-2, you can use:


    /* not tested, not even tried to compile that code */
    typedef unsigned int unicode;
    typedef unsigned char iso_8859_2;

    unicode* convert_from_iso_8859_2(const iso_8859_2* cstring){
    static unicode map[]={160,260,728,321,164,317,346,167,168,352,350,356,377,173,381,379,
    176,261,731,322,180,318,347,711,184,353,351,357,378,733,382,380,
    340,193,194,258,196,313,262,199,268,201,280,203,282,205,206,270,
    272,323,327,211,212,336,214,215,344,366,218,368,220,221,354,223,
    341,225,226,259,228,314,263,231,269,233,281,235,283,237,238,271,
    273,324,328,243,244,337,246,247,345,367,250,369,252,253,355,729};
    unicode* result=malloc(sizeof(unicode)*(strlen(cstring)+1));
    if(result){
    int i;
    for(i=0;cstring!=0;i++){
    result=(cstring<0xA0)?(cstring):(map[cstring-0xA0]);
    }
    result=0;
    }
    return(result);
    }


    For iso-8859-15, you just change the map:

    static unicode map[]={160,161,162,163,8364,165,352,167,353,169,170,171,172,173,174,175,
    176,177,178,179,381,181,182,183,382,185,186,187,338,339,376,191,
    192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
    208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
    224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
    240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255};


    Then you use the normal unicode to utf-8 routine.

    --
    __Pascal Bourguignon__
    Pascal J. Bourguignon, Jan 23, 2009
    #2
    1. Advertising

  3. James Kanze Guest

    On Jan 23, 4:38 am, "" <>
    wrote:
    > How can I convert a ISO 8859-2 to UTF?
    > How can I convert a ISO 8859-2 and ISO 8859-9 to UTF in c++?
    > Is there something similar in c++ as in Java ?
    > In Java we can simply create a new string(oldstring, "8859-2");
    > Is it that simple in c++?


    No. The philosophy in C++ is that you use a single encoding
    internally (say UTF-8), and do all of the code translation at
    the system interface level (reading and writing); this is done
    by imbuing an appropriate locale in the fstream doing the
    reading or writing.

    Of course, this philosophy falls down if you're reading or
    writing from a socket, rather than a file. Either you have to
    do the code translation in your streambuf, or on the byte buffer
    you're working with. There is a facet, codecvt, which is
    supposed to be used here (so we're back to locale), but I find
    it anything but easy to use; in particular, it will not manage
    the memory for you. (In your particular case, you can always
    ensure a target buffer of six times the length when converting
    8859-2 to UTF-8, and you're OK.) On many systems, there's also
    an iconv function; as far as I can see, it doesn't do anything
    the std::codecvt doesn't, but it may be easier to find the
    necessary arguments than it is to find an appropriate locale.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
    James Kanze, Jan 23, 2009
    #3
  4. Pascal J. Bourguignon wrote:
    > unicode* result=malloc(sizeof(unicode)*(strlen(cstring)+1));
    > [...]
    > return(result);


    Please. This is comp.lang.c++, not comp.lang.c. What you are doing
    there is horrible.
    Juha Nieminen, Jan 23, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Navanith

    UTF-8 & ISO-8859-1

    Navanith, Jan 5, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    357
    Fred Chateau
    Jan 5, 2004
  2. Peter  Laan
    Replies:
    6
    Views:
    4,110
    Peter Laan
    Mar 7, 2005
  3. Franck DARRAS
    Replies:
    12
    Views:
    618
    Jim Higson
    Aug 23, 2004
  4. Peter Jacobi
    Replies:
    13
    Views:
    827
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Aug 3, 2004
  5. Axel Etzold
    Replies:
    1
    Views:
    267
    Axel Etzold
    Sep 7, 2008
Loading...

Share This Page