How to create an UTF-16 text file with iostream ?

T

Timothy Madden

Hello

I would like to export data from my application to CSV (comma separated
values) in Unicode because I have Asian characters in my text, but
wofstream writes all basic characters in plain ASCII !

That is until it reaches an extended character than I get an exception
(I set badbit to throw exceptions).

I use Visual Studio 2008 but I would like to use the standard C++
library for this task (and all other tasks).

By searching Google I could see I should explicitly define a null
conversion for my wide-character stream and then imbue it on my
wofstream but the code sample was quite large and when I used it my
application crashed.

Why does this have to be so complicated ? Do I need to explicitly
define this null conversion ? Do I need to define all those conversion
methods and know about locale and facets to write the file in UTF-16 ?

Is there no easy way in the standard for this ?

How can I write binary data to the wofstream ? I tried myFile.write()
but that takes a wofstream::char_type * pointer and still undergoes
the damn conversion that narrows the characters and throws on extended
ones...

Thank you,
Timothy Madden
 
A

AnonMail2005

Hello

I would like to export data from my application to CSV (comma separated
values) in Unicode because I have Asian characters in my text, but
wofstream writes all basic characters in plain ASCII !

That is until it reaches an extended character than I get an exception
(I set badbit to throw exceptions).

I use Visual Studio 2008 but I would like to use the standard C++
library for this task (and all other tasks).

By searching Google I could see I should explicitly define a null
conversion for my wide-character stream and then imbue it on my
wofstream but the code sample was quite large and when I used it my
application crashed.

Why does this have to be so complicated ? Do I need to explicitly
define this null conversion ? Do I need to define all those conversion
methods and know about locale and facets to write the file in UTF-16 ?

Is there no easy way in the standard for this ?

How can I write binary data to the wofstream ? I tried myFile.write()
but that takes a wofstream::char_type * pointer and still undergoes
the damn conversion that narrows the characters and throws on extended
ones...

Thank you,
Timothy Madden
Use iconv to convert from one character set to another. It's cross
platform and freely available (not sure under which license). And
code your csv read/writer to be the only place where this conversion
take place. Meaning no other code knows or cares about the
conversion. Either have different csv readers/writers for different
conversions or supply a converter to the csv reader/writer class as an
input to the constructor. For latter case, you could set up a default
converter if one is not supplied.

I don't know about UTF-16 but we used this strategy in our C++ wrapper
of xml2. We converted ASCII to UTF-8 and it worked properly. No wide
character stuff. Others can chime in if the strategy works for UTF-16.

HTH
 
S

Stefan Ram

Timothy Madden said:
Is there no easy way in the standard for this ?

Converting a sequence of Unicode 5.1.0 code points to a
sequence of octets using UTF-16 should not be so difficult
to write.

I do not know whether support for this is already in the
standard library of C++ or in boost, but see

http://site.icu-project.org/
 
T

Timothy Madden

Stefan said:
Converting a sequence of Unicode 5.1.0 code points to a
sequence of octets using UTF-16 should not be so difficult
to write.

I do not know whether support for this is already in the
standard library of C++ or in boost, but see

http://site.icu-project.org/

I got my hands on "C++ Standard Library: A tutorial and reference" by
Nicolai M. Josuttis (I have it in .chm) and then I saw the sample code
from the net might need some adjustments. Now my wofstream object writes
UTF-16 files with the corrected null codecvt facet on the imbued locale.

I can post the corrected facet if anyone is interested.

Thank you,
Timothy Madden
 
Z

zindorsky

(or UTF-8, which is way more popular; why do you want precisely UTF-16?)

Probably because he's encoding Asian scripts. Codepoints in that range
require 3 bytes to encode in UTF-8, instead of the 2 UTF-16 requires.
 
T

Timothy Madden

Juha said:
Encoding unicode characters in UTF-16 (or UTF-8, which is way more
popular; why do you want precisely UTF-16?) is not that complicated, but
if you want to save yourself the work (there are a few gotchas,
especially if you want to fully support the entire unicode), you can use
a conversion library, such as this one:

http://utfcpp.sourceforge.net/

I would like to use UTF-16 so I can start my text file with the byte
order mark and then any text editor, and also M$ Excel, would know my
charset and encoding.

If I write a plain text file with a majority of ASCII characters but a
few Asian ones in UTF-8, then an editor would not know my encoding and
would misinterpret the few extended characters in my text.

And I already have my text as a stream of wchar_t, what should I do with
a conversion library ? M

My problem was to convince the wofstream to write them as such in-file
on the disk (instead of trying to convert them to narrow characters,
which fails upon encountering any extended character).

Thank you,
Timothy Madden
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top