How to write UTF-16 with BOM in little endian Von: Jean-Marc Autexier <jmau2002@web.de> Datum: Samst

Discussion in 'Java' started by Jean-Marc Autexier, Aug 30, 2003.

  1. Hi,

    I must write UTF-16 files, with byte order mark (BOM) in little endian (jdk
    1.4.2).

    According to java.nio.charset, UTF-16LE and UTF-16BE don't use BOM, UTF-16
    does (as defined in Unicode and XML standards).

    UTF-16 can read big and little endian, but can only write big endian (see
    below). Why?

    regards
    Jean-marc


    From http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html

    - When decoding, the UTF-16BE and UTF-16LE charsets ignore byte-order marks;
    - when encoding, they do not write byte-order marks.
    - When decoding, the UTF-16 charset interprets a byte-order mark to indicate
    the byte order of the stream but defaults to big-endian if there is no
    byte-order mark; when encoding, it uses big-endian byte order and writes a
    big-endian byte-order mark.
    Jean-Marc Autexier, Aug 30, 2003
    #1
    1. Advertising

  2. Jean-Marc Autexier

    Roedy Green Guest

    On Sat, 30 Aug 2003 13:40:10 +0200, Jean-Marc Autexier
    <> wrote or quoted :

    >- When decoding, the UTF-16BE and UTF-16LE charsets ignore byte-order marks;
    >- when encoding, they do not write byte-order marks.
    >- When decoding, the UTF-16 charset interprets a byte-order mark to indicate
    >the byte order of the stream but defaults to big-endian if there is no
    >byte-order mark; when encoding, it uses big-endian byte order and writes a
    >big-endian byte-order mark.



    check out http://mindprod.com/jgloss/encoding.html


    UnicodeLittle and UnicodeLittleUnmarked should give you what you want.


    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
    Roedy Green, Aug 30, 2003
    #2
    1. Advertising

  3. Roedy Green wrote:

    > On Sat, 30 Aug 2003 13:40:10 +0200, Jean-Marc Autexier
    > <> wrote or quoted :
    >
    >>- When decoding, the UTF-16BE and UTF-16LE charsets ignore byte-order
    >>marks; - when encoding, they do not write byte-order marks.
    >>- When decoding, the UTF-16 charset interprets a byte-order mark to
    >>indicate the byte order of the stream but defaults to big-endian if there
    >>is no byte-order mark; when encoding, it uses big-endian byte order and
    >>writes a big-endian byte-order mark.

    >
    >
    > check out http://mindprod.com/jgloss/encoding.html
    >
    >
    > UnicodeLittle and UnicodeLittleUnmarked should give you what you want.


    Thanks Roedy, I finally found it out by myself and was just about to post
    the answer.
    This solve my problem, but I'm curious to know why it is not supported in
    java.nio?
    See http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html,
    UnicodeLittle and UnicodeBig are only supported in java.io.

    Jean-Marc
    Jean-Marc Autexier, Aug 30, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hicham
    Replies:
    2
    Views:
    9,005
    dxcoder
    Jul 2, 2003
  2. Ernst Murnleitner

    float: IEEE, big endian, little endian

    Ernst Murnleitner, Jan 13, 2004, in forum: C++
    Replies:
    0
    Views:
    850
    Ernst Murnleitner
    Jan 13, 2004
  3. invincible

    Little Endian to Big Endian

    invincible, Jun 14, 2005, in forum: C++
    Replies:
    9
    Views:
    14,323
    Old Wolf
    Jun 14, 2005
  4. invincible
    Replies:
    1
    Views:
    536
    red floyd
    Jun 14, 2005
  5. hicham

    convert from big-endian to little-endian

    hicham, Jun 30, 2003, in forum: C Programming
    Replies:
    0
    Views:
    1,512
    hicham
    Jun 30, 2003
Loading...

Share This Page