N
nnimod
Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.
But nothing I do seems to be able to do that.
I tried reading it in binary mode and reading two characters in:
FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......
I tried reading it in binary mode and read a wchar_t in:
FILE *fin; wchar_t wch;
fin.open (filename, "rb");
if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....
I tried using ifstream for two characters/wifstream for wchar_t but to
no avail.
All of them seems to skip the so-called byte-order-mask. I am quite
lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
but I don't want to use those. I'm sure there's a perfectly simple
method to do this.
Sorry about the long msg for such a simple problem, but it is getting
quite frustrating.... Any help would be very much appreciated.
Cheers,
Nemo.
PS. I know the mask is there. I viewed the files using a hex editor.
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.
But nothing I do seems to be able to do that.
I tried reading it in binary mode and reading two characters in:
FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......
I tried reading it in binary mode and read a wchar_t in:
FILE *fin; wchar_t wch;
fin.open (filename, "rb");
if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....
I tried using ifstream for two characters/wifstream for wchar_t but to
no avail.
All of them seems to skip the so-called byte-order-mask. I am quite
lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
but I don't want to use those. I'm sure there's a perfectly simple
method to do this.
Sorry about the long msg for such a simple problem, but it is getting
quite frustrating.... Any help would be very much appreciated.
Cheers,
Nemo.
PS. I know the mask is there. I viewed the files using a hex editor.