Stupid std::codecvt question

W

wscholine

This is with MSVC8, if there's an implementation dependency.

I have a requirement to read lines from files that might be composed
of of wchar_t (for example, text files written by MS Notepad using
"Save As Unicode"). I would like to do this:

typedef std::codecvt<wchar_t, wchar_t, mbstate_t> nullcodecvt;
...
std::wifstream myFile;
...
// somehow associate a nullcodecvt facet with myFile, if it has a
Unicode BOM
...
std::wstring wline;
std::getstring(myFile, wline);
...

What I tried is this:

// awkward-looking circumlocution seems to be the only way to get
a
// reference to a nullcodecvt
const nullcodecvt &conv =
std::use_facet<nullcodecvt>(std::wcin.getloc());
const std::locale from(std::wcin.getloc(), &conv);
// the file I'm playing with contains the text of a sonnet, hence
the name
std::wifstream wsonnet;
wsonnet.imbue(from);
wsonnet.open(L"sonnet-2");
// seek past the BOM
wsonnet.seekg(2, std::ios::beg);
std::wstring wline;
while (wsonnet)
{
std::getline(wsonnet, wline);
}

which does not do the trick. The first time through the loop, wline
gets the low-order half of the character after the BOM, and is empty
thereafter.

Inspecting the data structures with the debugger, I find that wsonnet
has a member of type std::basic_filebuf<wchar_t,
std::char_traits<wchar_t> >, and that this member has a member of type
std::codecvt<wchar_t, char, int> *. The call to
std::wifstream::imbue() doesn't touch that (unsurprisingly, since it's
a different type than the codecvt instantiation that I want). However,
if I manually modify the pointer to point to my nullcodecvt & conv,
the behavior is what I want: each time through the loop, the
successive lines get read without being converted.

FWIW, wsonnet::basic_istream::basic_ios::ios_base dose have a
std::locale * that includes my nullcodecvt in its facets. It doesn't
affect the behavior of std::getline() though.

Is what I am trying to do just wrong? Or is there something broken
with the MS implementation of std::wifstream?

If I'm not totally on the wrong track, is there some less kludgy-
looking way of getting the facet instantiated?

Thanks in advance.
 
P

P.J. Plauger

This is with MSVC8, if there's an implementation dependency.

I have a requirement to read lines from files that might be composed
of of wchar_t (for example, text files written by MS Notepad using
"Save As Unicode"). I would like to do this:

typedef std::codecvt<wchar_t, wchar_t, mbstate_t> nullcodecvt;
...
std::wifstream myFile;
...
// somehow associate a nullcodecvt facet with myFile, if it has a
Unicode BOM
...
std::wstring wline;
std::getstring(myFile, wline);
...

What I tried is this:

// awkward-looking circumlocution seems to be the only way to get
a
// reference to a nullcodecvt
const nullcodecvt &conv =
std::use_facet<nullcodecvt>(std::wcin.getloc());
const std::locale from(std::wcin.getloc(), &conv);
// the file I'm playing with contains the text of a sonnet, hence
the name
std::wifstream wsonnet;
wsonnet.imbue(from);
wsonnet.open(L"sonnet-2");
// seek past the BOM
wsonnet.seekg(2, std::ios::beg);
std::wstring wline;
while (wsonnet)
{
std::getline(wsonnet, wline);
}

which does not do the trick. The first time through the loop, wline
gets the low-order half of the character after the BOM, and is empty
thereafter.

Inspecting the data structures with the debugger, I find that wsonnet
has a member of type std::basic_filebuf<wchar_t,
std::char_traits<wchar_t> >, and that this member has a member of type
std::codecvt<wchar_t, char, int> *. The call to
std::wifstream::imbue() doesn't touch that (unsurprisingly, since it's
a different type than the codecvt instantiation that I want). However,
if I manually modify the pointer to point to my nullcodecvt & conv,
the behavior is what I want: each time through the loop, the
successive lines get read without being converted.

FWIW, wsonnet::basic_istream::basic_ios::ios_base dose have a
std::locale * that includes my nullcodecvt in its facets. It doesn't
affect the behavior of std::getline() though.

Is what I am trying to do just wrong?
Yes.

Or is there something broken
with the MS implementation of std::wifstream?
No.

If I'm not totally on the wrong track, is there some less kludgy-
looking way of getting the facet instantiated?

You need one of the codecvt facets in our code conversion library.
Just which one depends on details you haven't specified, but I'm
sure what you need is in there. Or you might get lucky and find
an open-source codecvt facet that does what you want.
Thanks in advance.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top