wstring, wofstream, and encodings

J

Jeffrey Walton

Hi All,

I'm attempting to write a wstring to a file by way of wofstream. I'm
getting compression on the stream (I presume it is UTF-8). How/where
do I invoke an alternate constructotor so that the stream stays wide
(UTF-16)?

I suspect that it is hidden in a locale, but I don't have much
experience with them. I also have not been able to locate it in
Stroustrup: Appendix D: Locales [1]. [1] does state the following, but
I do not have section 21.7: "Section §21.7 describes how to change
locale for a stream; this appendix describes how a locale is
constructed out of facets and explains the mechanisms through which a
locale affects its stream."

== Sample ==
wstring ws = L"wide";

wofstream ofs;
ofs.open("wide.dat", std::ios::binary | std::ios::trunc );
if( !ofs.good() ) { return; }

ofs << ws;
ofs.close();
== End Sample ==

Thanks,
Jeff
Jeffrey Walton

[1] http://www.research.att.com/~bs/3rd_loc0.html
 
I

Ivan Vecerina

Jeffrey Walton said:
I'm attempting to write a wstring to a file by way of wofstream. I'm
getting compression on the stream (I presume it is UTF-8). How/where
do I invoke an alternate constructotor so that the stream stays wide
(UTF-16)?

I suspect that it is hidden in a locale, but I don't have much
experience with them. I also have not been able to locate it in
Stroustrup: Appendix D: Locales [1]. [1] does state the following, but
I do not have section 21.7: "Section §21.7 describes how to change
locale for a stream; this appendix describes how a locale is
constructed out of facets and explains the mechanisms through which a
locale affects its stream."

From my understanding of iostream, locales will not be the answer.

Locales apply to the upper layer of the iostream, which takes
care of converting values to characters. They affect the
choice of the characters used to represent a value, but not
the encoding of these characters.

The internal filebuf or basic_filebuf is the object that will
determine how the in-memory characters are wirtten to a file.
This is the layer (the stream *buffer*) that can define whether
a file is written using UTF8 or another character encoding.

However, the C++ standard does not specify an interface allowing
to select what character encoding is to be used by (w)filebuf.

Your best bet would be to ask your question on a platform-
specific forum, related to the library implementation you use.
A specific wfilebuf (/basic_filebuf) implementation may
allow you to specify the file's enocding style.
Or maybe this is configurable at an OS or C library level.
Worst case, you will still be able to write your own streambuf
layer to write files using the specific encoding you want.


I hope this helps...
Ivan
 
V

Vaclav Haisman

Jeffrey Walton wrote, On 12.4.2008 5:27:
Hi All,

I'm attempting to write a wstring to a file by way of wofstream. I'm
getting compression on the stream (I presume it is UTF-8). How/where
do I invoke an alternate constructotor so that the stream stays wide
(UTF-16)?

I suspect that it is hidden in a locale, but I don't have much
experience with them. I also have not been able to locate it in
Stroustrup: Appendix D: Locales [1]. [1] does state the following, but
I do not have section 21.7: "Section §21.7 describes how to change
locale for a stream; this appendix describes how a locale is
constructed out of facets and explains the mechanisms through which a
locale affects its stream."
I am too lazy to look into the standard right now but IIRC it says something
about it being implementation defined. In either case, I think you can imbue
the stream with your own locale that has custom codecvt facet.

--
VH


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIAG+UoUFWwtEPkHIRCORBAJ4zQNd73YPhyP8gtG+n4A5JPlh5owCcDiR7
lSSGMtU6qa3Ndmy3YB9ftks=
=B9oY
-----END PGP SIGNATURE-----
 
J

James Kanze

news:5fb40192-0a98-40e2-971e-35754e9ca764@b64g2000hsa.googlegroups.com...
I'm attempting to write a wstring to a file by way of wofstream. I'm
getting compression on the stream (I presume it is UTF-8). How/where
do I invoke an alternate constructotor so that the stream stays wide
(UTF-16)?
I suspect that it is hidden in a locale, but I don't have much
experience with them. I also have not been able to locate it in
Stroustrup: Appendix D: Locales [1]. [1] does state the following, but
I do not have section 21.7: "Section §21.7 describes how to change
locale for a stream; this appendix describes how a locale is
constructed out of facets and explains the mechanisms through which a
locale affects its stream."
From my understanding of iostream, locales will not be the answer.
Locales apply to the upper layer of the iostream, which takes
care of converting values to characters. They affect the
choice of the characters used to represent a value, but not
the encoding of these characters.

That would be logical, but... std::streambuf maintains a locale
too, which can be imbued (and imbuing an istream or an ostream
imbues the attached streambuf, if any). Most streambuf ignore
the locale, but filebuf uses it for code translation.
The internal filebuf or basic_filebuf is the object that will
determine how the in-memory characters are wirtten to a file.
This is the layer (the stream *buffer*) that can define whether
a file is written using UTF8 or another character encoding.
However, the C++ standard does not specify an interface
allowing to select what character encoding is to be used by
(w)filebuf.

wfilebuf::imbue().

Note that you can call it independantly of the iostream, e.g.:

std::ifstream input( ... ) ;
input.imbue( someLocale) ;
input.rdbuf()->imbue( someOtherLocale ) ;

On the other hand, imbuing the iostream will imbue the attached
streambuf, even if this streambuf is also used by other
iostreams.

With regards to the posters original question:

std::wofstream output( ... ) ;
output.imbue( desiredLocale ) ;

should do the trick. Provided he can find the desired locale.
Your best bet would be to ask your question on a platform-
specific forum, related to the library implementation you use.

He'll probably have to go that route in order to find what
locales are available, and how to name them.
 
J

Jeffrey Walton

Hi Ivan,

I'm attempting to write a wstring to a file by way of wofstream. I'm
getting compression on the stream (I presume it is UTF-8). How/where
do I invoke an alternate constructotor so that the stream stays wide
(UTF-16)?
I suspect that it is hidden in a locale, but I don't have much
experience with them. I also have not been able to locate it in
Stroustrup: Appendix D: Locales [1]. [1] does state the following, but
I do not have section 21.7: "Section §21.7 describes how to change
locale for a stream; this appendix describes how a locale is
constructed out of facets and explains the mechanisms through which a
locale affects its stream."

From my understanding of iostream, locales will not be the answer.

Locales apply to the upper layer of the iostream, which takes
care of converting values to characters.  They affect the
choice of the characters used to represent a value, but not
the encoding of these characters.

The internal filebuf or basic_filebuf is the object that will
determine how the in-memory characters are wirtten to a file.
This is the layer (the stream *buffer*) that can define whether
a file is written using UTF8 or another character encoding.

However, the C++ standard does not specify an interface allowing
to select what character encoding is to be used by (w)filebuf.

Your best bet would be to ask your question on a platform-
specific forum, related to the library implementation you use.
A specific wfilebuf (/basic_filebuf) implementation may
allow you to specify the file's enocding style.
Or maybe this is configurable at an OS or C library level.
Worst case, you will still be able to write your own streambuf
layer to write files using the specific encoding you want.

I hope this helps...
Ivan
--http://ivan.vecerina.com/contact/?subject=NG_POST<- email contact form
Brainbench MVP for C++ <>http://www.brainbench.com
Your best bet would be to ask your question on a platform-
specific forum, related to the library implementation you use.
You are correct. I asked over on microsoft.public.vc.language.
Microsoft's implementation is broken [1]. I'm kind of suprised - I
though Plaugher (sp?) supplied it through Visual Studio.

Jeff
Jeffrey Walton

[1] http://groups.google.com/group/microsoft.public.vc.language/browse_thread/thread/f78eb7489a42b568
 
J

Jeffrey Walton

Jeffrey Walton said:
I'm attempting to write a wstring to a file by way of wofstream. I'm
getting compression on the stream (I presume it is UTF-8). How/where
do I invoke an alternate constructotor so that the stream stays wide
(UTF-16)?
I suspect that it is hidden in a locale, but I don't have much
experience with them. I also have not been able to locate it in
Stroustrup: Appendix D: Locales [1]. [1] does state the following, but
I do not have section 21.7: "Section §21.7 describes how to change
locale for a stream; this appendix describes how a locale is
constructed out of facets and explains the mechanisms through which a
locale affects its stream."
From my understanding of iostream, locales will not be the answer.
Locales apply to the upper layer of the iostream, which takes
care of converting values to characters.  They affect the
choice of the characters used to represent a value, but not
the encoding of these characters.

That would be logical, but... std::streambuf maintains a locale
too, which can be imbued (and imbuing an istream or an ostream
imbues the attached streambuf, if any).  Most streambuf ignore
the locale, but filebuf uses it for code translation.
The internal filebuf or basic_filebuf is the object that will
determine how the in-memory characters are wirtten to a file.
This is the layer (the stream *buffer*) that can define whether
a file is written using UTF8 or another character encoding.
However, the C++ standard does not specify an interface
allowing to select what character encoding is to be used by
(w)filebuf.

wfilebuf::imbue().

Note that you can call it independantly of the iostream, e.g.:

    std::ifstream input( ... ) ;
    input.imbue( someLocale) ;
    input.rdbuf()->imbue( someOtherLocale ) ;

On the other hand, imbuing the iostream will imbue the attached
streambuf, even if this streambuf is also used by other
iostreams.

With regards to the posters original question:

    std::wofstream output( ... ) ;
    output.imbue( desiredLocale ) ;

should do the trick.  Provided he can find the desired locale.
Your best bet would be to ask your question on a platform-
specific forum, related to the library implementation you use.

He'll probably have to go that route in order to find what
locales are available, and how to name them.

--
James Kanze (GABI Software)             email:[email protected]
Conseils en informatique orientée objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34- Hide quoted text -

- Show quoted text -

Thanks James. This will help.

Jeff
Jeffrey Walton
 
J

Jeffrey Walton

Jeffrey Walton wrote, On 12.4.2008 5:27:> Hi All,
I'm attempting to write a wstring to a file by way of wofstream. I'm
getting compression on the stream (I presume it is UTF-8). How/where
do I invoke an alternate constructotor so that the stream stays wide
(UTF-16)?
I suspect that it is hidden in a locale, but I don't have much
experience with them. I also have not been able to locate it in
Stroustrup: Appendix D: Locales [1]. [1] does state the following, but
I do not have section 21.7: "Section §21.7 describes how to change
locale for a stream; this appendix describes how a locale is
constructed out of facets and explains the mechanisms through which a
locale affects its stream."

I am too lazy to look into the standard right now but IIRC it says something
about it being implementation defined. In either case, I think you can imbue
the stream with your own locale that has custom codecvt facet.

--
VH

 signature.asc
1KDownload

Thanks VH. This will help.

Jeff
Jeffrey Walton
 
J

Jeffrey Walton

Hi VH,

Jeffrey Walton wrote, On 12.4.2008 5:27:> Hi All,
I'm attempting to write a wstring to a file by way of wofstream. I'm
getting compression on the stream (I presume it is UTF-8). How/where
do I invoke an alternate constructotor so that the stream stays wide
(UTF-16)?
I suspect that it is hidden in a locale, but I don't have much
experience with them. I also have not been able to locate it in
Stroustrup: Appendix D: Locales [1]. [1] does state the following, but
I do not have section 21.7: "Section §21.7 describes how to change
locale for a stream; this appendix describes how a locale is
constructed out of facets and explains the mechanisms through which a
locale affects its stream."

I am too lazy to look into the standard right now but IIRC it says something
about it being implementation defined. In either case, I think you can imbue
the stream with your own locale that has custom codecvt facet.
I was not able to grab a copy off the ISO web site $$$ (I wanted to
thumb through it). If you could post a link to your source, it would
be appreciated.

Jeff
Jeffrey Walton
 
J

James Kanze

On Apr 12, 3:51 am, "Ivan Vecerina"

[...]
You are correct. I asked over on microsoft.public.vc.language.
Microsoft's implementation is broken [1]. I'm kind of suprised - I
though Plaugher (sp?) supplied it through Visual Studio.

Broken, or simply that the locales which do what you want aren't
delivered with VC++? (In general, Dinkumware has been well in
advance of other implementations in its implementation of
locales, and locale support in streams. And while one can never
totally discount the possibility of an error, I'd tend to
suspect rather the absense of the necessary locales.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top