wide character file to wstring - unexpected results

C

Christopher

I loaded a file using these two blocks of code and examined the
results. I did not see what I expected. Each wchar_t seems to have its
byte order swapped when looking at the results as bytes. When
examining the contents of the wstring, extra '0' characters are
inserted before each expected character.

My colleague claims that its some microsoft/intel thing. That doesn't
help me to write code that handles it though.

Can someone explain?


//---
// Load the file as wide character text
{
// Load the Init Document
std::wifstream initDocFile(initDocumentPath.c_str());
ASSERT_TRUE( initDocFile );

// Copy the contents of the file into a string
std::wstring initDoc((std::istreambuf_iterator<wchar_t,
std::char_traits<wchar_t> >(initDocFile)),
(std::istreambuf_iterator<wchar_t,
std::char_traits<wchar_t> >()));
ASSERT_FALSE( initDoc.empty() );

// Close the file
initDocFile.close();
}
//-----

Hovering over initDoc in Visual Studio 2008 shows:
<
0
A
0
T
0
etc, etc


//---
// Load the file as bytes
{
// Load the Init Document
std::ifstream initDocFile(initDocumentPath.c_str(),
std::fstream::binary);
ASSERT_TRUE( initDocFile );

// Get the size of the file
initDocFile.seekg(0,std::ios::end);
std::streampos numBytes = initDocFile.tellg();
initDocFile.seekg(0,std::ios::beg);

// Copy the contents of the file into a vector
std::vector<char> initDoc(numBytes);
initDocFile.read(&initDoc[0], numBytes);
ASSERT_FALSE( initDoc.empty() );

// Close the file
initDocFile.close();
}
//-----

Hovering over initDoc in Visual Studio 2008 shows:
60
0
65
0
etc.
etc.

//----

Looking at the file in a hex editor shows:
3C 00 41 00 54 00 etc. etc.

Furthermore,
1) I cannot double click the file and open it as XML on Windows Server
2003. It says "Invalid character. Error processing resource"
2) I cannot hover over initDoc in Visual Studio 2008, click the down
arrow, and open the variable in the text visualizer, it shows "<"
3) I cannot hover over initDoc in Visual Studio 2008, click the down
arrow, and open the variable in the xml visualizer, it shows "A
declaration was not closed. Error processing resource"

Someone help me to understand.
 
C

Christopher

I am assuming, based on your description, that your file contents are coded
in UTF-16.

If so, each two-byte codepoints should've been read into single wchar_t.
That's what a wchar_t is, after all. Sounds like your std::wifstream thought
that your file contents were coded in, probably, ISO-8859-1, and you're
seeing the results.

Sounds reasonable.
Double-check that you've set your global locale correctly to reflect that
your system environment uses UTF-16 coding, or imbue a UTF-16 locale into
your std::wifstream.

As I understand it, In Visual Studio, if a project is set to use
unicode, then any wide strings are UTF16. I also assume the Windows
API calls to read and write files treat text as UTF16. That's a
question for a MS newsgroup though.

My questions here are,
How do I set a "global locale"?
How do I imbue a UTF16 locale into a stream?
Are there built in UTF-16 locales?
Are there built in UTF-8 locales?
Are there built in conversions methods?

I am googling the hell out of facets and locales and finding very
little, aside from similarly frustrated people.


If that's the case, then this has nothing to do with your code, and the
file's coding does not match your system locale.
The file must've been generated on a system that uses a locale with a
different character set/code point.

I think that the encoding is not valid anywhere because of the mix and
match between multibyte, wide, acii, UTF16, UTF8, Windows generated
text, 3rd party library generated text, streaming, etc. used
throughout the project I am in, without any regard or consistancy for
character encoding.

I am trying to decypher what they "thought it was" and how to get it
into something usable.

Additionally, all XML files should be coded in UTF-8 anyway, not UTF-16, and
not ISO-8859-1.

It's not XML that follows the rules. It's "XML" that only resembles
xml in its use of tags, that some developer put into a file using
Windows API functions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top