How can I convert a ISO 8859-2 to UTF?

F

focussedgyan

How can I convert a ISO 8859-2 to UTF?
How can I convert a ISO 8859-2 and ISO 8859-9 to UTF in c++?
Is there something similar in c++ as in Java ?
In Java we can simply create a new string(oldstring, "8859-2");
Is it that simple in c++?
 
P

Pascal J. Bourguignon

How can I convert a ISO 8859-2 to UTF?
How can I convert a ISO 8859-2 and ISO 8859-9 to UTF in c++?
Is there something similar in c++ as in Java ?
In Java we can simply create a new string(oldstring, "8859-2");
Is it that simple in c++?

In general, you could use libiconv, and then yes, it's as simple.




In the case of iso-8859-2, you can use:


/* not tested, not even tried to compile that code */
typedef unsigned int unicode;
typedef unsigned char iso_8859_2;

unicode* convert_from_iso_8859_2(const iso_8859_2* cstring){
static unicode map[]={160,260,728,321,164,317,346,167,168,352,350,356,377,173,381,379,
176,261,731,322,180,318,347,711,184,353,351,357,378,733,382,380,
340,193,194,258,196,313,262,199,268,201,280,203,282,205,206,270,
272,323,327,211,212,336,214,215,344,366,218,368,220,221,354,223,
341,225,226,259,228,314,263,231,269,233,281,235,283,237,238,271,
273,324,328,243,244,337,246,247,345,367,250,369,252,253,355,729};
unicode* result=malloc(sizeof(unicode)*(strlen(cstring)+1));
if(result){
int i;
for(i=0;cstring!=0;i++){
result=(cstring<0xA0)?(cstring):(map[cstring-0xA0]);
}
result=0;
}
return(result);
}


For iso-8859-15, you just change the map:

static unicode map[]={160,161,162,163,8364,165,352,167,353,169,170,171,172,173,174,175,
176,177,178,179,381,181,182,183,382,185,186,187,338,339,376,191,
192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,
208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,
224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,
240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255};


Then you use the normal unicode to utf-8 routine.
 
J

James Kanze

How can I convert a ISO 8859-2 to UTF?
How can I convert a ISO 8859-2 and ISO 8859-9 to UTF in c++?
Is there something similar in c++ as in Java ?
In Java we can simply create a new string(oldstring, "8859-2");
Is it that simple in c++?

No. The philosophy in C++ is that you use a single encoding
internally (say UTF-8), and do all of the code translation at
the system interface level (reading and writing); this is done
by imbuing an appropriate locale in the fstream doing the
reading or writing.

Of course, this philosophy falls down if you're reading or
writing from a socket, rather than a file. Either you have to
do the code translation in your streambuf, or on the byte buffer
you're working with. There is a facet, codecvt, which is
supposed to be used here (so we're back to locale), but I find
it anything but easy to use; in particular, it will not manage
the memory for you. (In your particular case, you can always
ensure a target buffer of six times the length when converting
8859-2 to UTF-8, and you're OK.) On many systems, there's also
an iconv function; as far as I can see, it doesn't do anything
the std::codecvt doesn't, but it may be easier to find the
necessary arguments than it is to find an appropriate locale.
 
J

Juha Nieminen

Pascal said:
unicode* result=malloc(sizeof(unicode)*(strlen(cstring)+1));
[...]
return(result);

Please. This is comp.lang.c++, not comp.lang.c. What you are doing
there is horrible.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top