Need help reading UTF-16 files ...

nnimod · Jan 13, 2006

Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.

But nothing I do seems to be able to do that.

I tried reading it in binary mode and reading two characters in:

FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......

I tried reading it in binary mode and read a wchar_t in:

FILE *fin; wchar_t wch;
fin.open (filename, "rb");
if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....

I tried using ifstream for two characters/wifstream for wchar_t but to
no avail.

All of them seems to skip the so-called byte-order-mask. I am quite
lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
but I don't want to use those. I'm sure there's a perfectly simple
method to do this.

Sorry about the long msg for such a simple problem, but it is getting
quite frustrating.... Any help would be very much appreciated.

Cheers,
Nemo.

PS. I know the mask is there. I viewed the files using a hex editor.

P.J. Plauger · Jan 13, 2006

Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.

But nothing I do seems to be able to do that.

I tried reading it in binary mode and reading two characters in:

FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......

I tried reading it in binary mode and read a wchar_t in:

FILE *fin; wchar_t wch;
fin.open (filename, "rb");
if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....

I tried using ifstream for two characters/wifstream for wchar_t but to
no avail.

All of them seems to skip the so-called byte-order-mask. I am quite
lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
but I don't want to use those. I'm sure there's a perfectly simple
method to do this.

See our CoreX library, at our web site. It has exactly what you need.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Richard Herring · Jan 17, 2006

Hi. I'm having trouble reading some unicode files. Basically, I have to
parse certain files. Some of those files are being input in Japanese,
Chinese etc. The easiest way, I figured, to distinguish between plain
ASCII files I receive and the Unicode ones would be to check if the
first two bytes read 0xFFFE.

But nothing I do seems to be able to do that.

I tried reading it in binary mode and reading two characters in:

FILE *fin; char ch [2];
fin.open (filename, "rb");
if (fin) { fopen (ch, sizeof (char), 2, fin); ......

Try posting the *actual* code that causes the problem. The above is
clearly not it.

I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
How to create an UTF-16 text file with iostream ?	5	Dec 20, 2009
std iostreams design question, why not like java stream wrappers?	18	Aug 27, 2009
reading files in blocks	3	Dec 9, 2010
How to read UTF-8 text files?	8	Apr 25, 2006
UTF - SEEK_SET workaround for BOM encoding(utf-16/32) layer Bug	2	Aug 5, 2009
Need some help !!!	14	Oct 12, 2006
Problems reading from files	11	Aug 25, 2007

Need help reading UTF-16 files ...

nnimod

P.J. Plauger

Richard Herring

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads