Detecting UniCode encoding

Discussion in 'Java' started by Aryeh M. Friedman, Apr 28, 2005.

  1. If I have an arbitary character (primative type not Character) array is
    it possible to detect the encoding used for any given charcter in the
    array. Specifically I am writting a parser that excepts abritary
    strings of UniCode and it needs to know what character set to parse
    against (decorater pattern and reflection). Please note I am brand
    new to UniCode so if I asked the wrong way or what ever forgive me.
     
    Aryeh M. Friedman, Apr 28, 2005
    #1
    1. Advertising

  2. Aryeh M. Friedman

    Mickey Segal Guest

    I am having trouble figuring out what your data looks like. Is it straight
    Unicode with two bytes for every character? Or is it a variable length
    Unicode encoding such as UTF-8? Or is it a restricted one byte encoding
    where you are trying to guess the encoding?

    "Aryeh M. Friedman" <> wrote in message
    news:rB1ce.17917$...
    > If I have an arbitary character (primative type not Character) array is it
    > possible to detect the encoding used for any given charcter in the array.
    > Specifically I am writting a parser that excepts abritary strings of
    > UniCode and it needs to know what character set to parse against
    > (decorater pattern and reflection). Please note I am brand
    > new to UniCode so if I asked the wrong way or what ever forgive me.
     
    Mickey Segal, Apr 28, 2005
    #2
    1. Advertising

  3. Aryeh M. Friedman

    HK Guest

    Aryeh M. Friedman wrote:
    > If I have an arbitary character (primative type not Character) array

    is
    > it possible to detect the encoding used for any given charcter in the


    > array.


    If you are talking about a char[], there is no encoding
    involved. It contains just characters.

    If you are talking about a byte[] or an InputStream,
    then indeed character encoding is involved. But then
    you cannot derive from a single byte, which
    encoding was used to encode characters. Looking at
    several bytes you may be able to get a hunch of
    which encoding is involved, but only if you
    know beforehand that only a limited and known
    number of encodings are possible. The reason is,
    that in principle I can define my own encoding
    with complete obtuse mappings between bytes
    and characters.

    Harald.
     
    HK, Apr 28, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Nomak
    Replies:
    2
    Views:
    560
    Nomak
    May 18, 2005
  2. bugbear
    Replies:
    0
    Views:
    344
    bugbear
    Sep 28, 2005
  3. Tim N. van der Leeuw

    Detecting filename-encoding (on WinXP)?

    Tim N. van der Leeuw, Feb 2, 2006, in forum: Python
    Replies:
    4
    Views:
    392
    Tim N. van der Leeuw
    Feb 10, 2006
  4. Jaime Casanova

    detecting incompatible locale modes for encoding

    Jaime Casanova, Dec 16, 2005, in forum: C Programming
    Replies:
    0
    Views:
    335
    Jaime Casanova
    Dec 16, 2005
  5. Daniel Choi
    Replies:
    0
    Views:
    115
    Daniel Choi
    Dec 12, 2008
Loading...

Share This Page