Unicode to characters

K

KK

Hello all,
There could be flavors of this question discussed in the past, but I
could not really make a head/tail out of it.

I have bunch of unicode values stored in a string array and I want to
see the corresponding characters displayed in an excel file. How could
I go about doing that ?

vector<string> unicodevalues; // has values 0041, 0042, ... 0410 etc.
(hexa decimal values)
for 0041 (assumes hex) I should see alphabet 'A' , a 'B' for 0042 ...
special character corresponding to 0x410.

I could live with a comma separated .csv file instead of a .xls to
view it in excel.

Please advice.
 
P

Pascal J. Bourguignon

KK said:
Hello all,
There could be flavors of this question discussed in the past, but I
could not really make a head/tail out of it.

I have bunch of unicode values stored in a string array and I want to
see the corresponding characters displayed in an excel file. How could
I go about doing that ?

vector<string> unicodevalues; // has values 0041, 0042, ... 0410 etc.

If you are refering to std::string, then it's a
std::basic_string<char> so you only get bytes.

If, as it is most probable, your CHAR_BITS==8, then you can only store
the codes of ISO-8859-1 characters in these strings.

(hexa decimal values)
for 0041 (assumes hex) I should see alphabet 'A' , a 'B' for 0042 ...
special character corresponding to 0x410.

0x410 is not the unicode for a special character. It's the unicode for
the CYRILLIC_CAPITAL_LETTER_A.

I could live with a comma separated .csv file instead of a .xls to
view it in excel.

I would advise you to get a better understanding of characters, codes,
the STL, I/O, files. Start reading:

http://en.wikipedia.org/wiki/Unicode
http://en.wikipedia.org/wiki/Utf-8
http://www.cplusplus.com/reference/string/string/
http://www.cplusplus.com/reference/iostream/

etc...
 
J

James Kanze

If you are refering to std::string, then it's a
std::basic_string<char> so you only get bytes.
If, as it is most probable, your CHAR_BITS==8, then you can
only store the codes of ISO-8859-1 characters in these
strings.

Nonsense. I regularly use char for Unicode (UTF-8) and ISO
8859-15; in other places, other ISO 8859 codes, or JIS are also
used. Not to mention various Windows (and earlier MS-DOS) code
pages, or EBCDIC (which is still used, in 8 bit bytes, on IBM
mainframes).

Still, I don't know what he really has or wants. Some posters
seem to think that he has a textual representation of the
unicode code values, e.g. strings like "0041". Which seems
wierd to me, but who knows.
0x410 is not the unicode for a special character. It's the
unicode for the CYRILLIC_CAPITAL_LETTER_A.

Well, that's a special character to me:). I certainly don't
use it very often.

The best reference I know about these issues is "Fonts and
Encoding", by Yannis Haralambous. (I've not seen the English
translation---I hope it's better than the translations of
English into French we usually get.) And of course, he'll also
need to find out about Excel. But I'd be very surprised if it
didn't have an option for reading UTF-8, at least in CSV.
(Alternatively, he could use UTF-16LE; I think that's the native
code set under Windows.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top