Need help reading UTF-16 files ...

Discussion in 'C++' started by nnimod@gmail.com, Jan 13, 2006.

  1. Guest

    Hi. I'm having trouble reading some unicode files. Basically, I have to
    parse certain files. Some of those files are being input in Japanese,
    Chinese etc. The easiest way, I figured, to distinguish between plain
    ASCII files I receive and the Unicode ones would be to check if the
    first two bytes read 0xFFFE.

    But nothing I do seems to be able to do that.

    I tried reading it in binary mode and reading two characters in:

    FILE *fin; char ch [2];
    fin.open (filename, "rb");
    if (fin) { fopen (ch, sizeof (char), 2, fin); ......

    I tried reading it in binary mode and read a wchar_t in:

    FILE *fin; wchar_t wch;
    fin.open (filename, "rb");
    if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....

    I tried using ifstream for two characters/wifstream for wchar_t but to
    no avail.

    All of them seems to skip the so-called byte-order-mask. I am quite
    lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
    but I don't want to use those. I'm sure there's a perfectly simple
    method to do this.

    Sorry about the long msg for such a simple problem, but it is getting
    quite frustrating.... Any help would be very much appreciated.

    Cheers,
    Nemo.

    PS. I know the mask is there. I viewed the files using a hex editor.
     
    , Jan 13, 2006
    #1
    1. Advertising

  2. P.J. Plauger Guest

    <> wrote in message
    news:...

    > Hi. I'm having trouble reading some unicode files. Basically, I have to
    > parse certain files. Some of those files are being input in Japanese,
    > Chinese etc. The easiest way, I figured, to distinguish between plain
    > ASCII files I receive and the Unicode ones would be to check if the
    > first two bytes read 0xFFFE.
    >
    > But nothing I do seems to be able to do that.
    >
    > I tried reading it in binary mode and reading two characters in:
    >
    > FILE *fin; char ch [2];
    > fin.open (filename, "rb");
    > if (fin) { fopen (ch, sizeof (char), 2, fin); ......
    >
    > I tried reading it in binary mode and read a wchar_t in:
    >
    > FILE *fin; wchar_t wch;
    > fin.open (filename, "rb");
    > if (fin) { fopen (&wch, sizeof (wchar_t), 1, fin); ....
    >
    > I tried using ifstream for two characters/wifstream for wchar_t but to
    > no avail.
    >
    > All of them seems to skip the so-called byte-order-mask. I am quite
    > lost for ideas. I saw a few examples using MFC Class CStdioFile etc.
    > but I don't want to use those. I'm sure there's a perfectly simple
    > method to do this.


    See our CoreX library, at our web site. It has exactly what you need.

    P.J. Plauger
    Dinkumware, Ltd.
    http://www.dinkumware.com
     
    P.J. Plauger, Jan 13, 2006
    #2
    1. Advertising

  3. In message <>,
    writes
    >Hi. I'm having trouble reading some unicode files. Basically, I have to
    >parse certain files. Some of those files are being input in Japanese,
    >Chinese etc. The easiest way, I figured, to distinguish between plain
    >ASCII files I receive and the Unicode ones would be to check if the
    >first two bytes read 0xFFFE.
    >
    >But nothing I do seems to be able to do that.
    >
    >I tried reading it in binary mode and reading two characters in:
    >
    >FILE *fin; char ch [2];
    >fin.open (filename, "rb");
    >if (fin) { fopen (ch, sizeof (char), 2, fin); ......


    Try posting the *actual* code that causes the problem. The above is
    clearly not it.

    --
    Richard Herring
     
    Richard Herring, Jan 17, 2006
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JJBW
    Replies:
    1
    Views:
    10,448
    Joerg Jooss
    Apr 24, 2004
  2. =?Utf-8?B?QXNoYQ==?=
    Replies:
    3
    Views:
    449
  3. darrel
    Replies:
    1
    Views:
    397
    darrel
    Apr 30, 2007
  4. tomsoft

    Reading files in UTF-8

    tomsoft, Sep 4, 2007, in forum: Ruby
    Replies:
    2
    Views:
    84
    Nobuyoshi Nakada
    Sep 5, 2007
  5. Atoli Atoli
    Replies:
    2
    Views:
    299
    Atoli Atoli
    Nov 18, 2010
Loading...

Share This Page