Unicode values

Discussion in 'XML' started by tamizhselvys@india.com, Feb 19, 2008.

  1. Guest


    Can any one explain me the difference between unicode and hexadecimal
    entity used in xml.

    , Feb 19, 2008
    1. Advertisements

  2. On Tue, 19 Feb 2008, wrote:

    > Can any one explain me the difference between unicode and hexadecimal
    > entity used in xml.

    For example, the Devanagari letter 'ka' has the position U+0915
    in Unicode and can be referenced in both HTML and XML as क
    or as क .

    Solipsists of the world - unite!
    Andreas Prilop, Feb 19, 2008
    1. Advertisements

  3. Andy Dingley

    Andy Dingley Guest

    On 19 Feb, 14:00, wrote:

    > Can any one explain me the difference between unicode and hexadecimal
    > entity used in xml.

    Try searching for "Jukka Korpela" and Unicode. He has an O'Reilly book
    and a very useful website on the topic. Wikipedia is worth reading

    "Unicode" defines a "character set". There are also "encodings" that
    specify how computers interpret sequences of bytes or numbers to turn
    them into characters. There may be many encodings that all specify the
    same character in the same character set, which can get complicated.

    Character sets before Unicode tended to work for only one language at
    a time. This made them manageably smaller, but also inconvenient for
    multi-language work. Unicode takes the different approach: one single,
    huge character set for everything.

    When you use HTML or XML, there is only _one_ character set that is
    ever used: Unicode.

    There may be lots of different encodings for a HTML or XML document
    (one at a time), but they all lead to Unicode characters. Most
    commonly you will specify a character directly (e.g. by typing it),
    which also requires you to make sure it's in a suitable encoding for
    the document. Alternatively you can use a "numeric character entity"
    to specify the Unicode character "ø" by its identifying number, either
    in decimal ø or in hexadecimal ø No matter what the
    document's encoding, these same numbers refer to these same
    characters: it's skipping the encoding and going straight to Unicode.
    This works equally in XML or HTML.

    For a few of these characters, there are also "character entity
    references" defined for HTML, such as ø (meaning the same "o
    with a slash" character as before). These are a bit more readable than
    the raw numbers. However remember that they're part of HTML only, not
    XML! So you can use them in XHTML, but not in RSS.

    (I've confused some definitions here between bytes / octets,
    characters / codepoints and Unicode / UCS / ISO10646 in an attempt at
    brevity, if not clarity. Jukka will probably accuse me of "worthless
    babbling" again as a result)
    Andy Dingley, Feb 19, 2008
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Mark Bram
    Robert Mark Bram
    Sep 28, 2003
  2. ygao

    unicode wrap unicode object?

    ygao, Apr 8, 2006, in forum: Python
    Apr 8, 2006
  3. Gabriele *darkbard* Farina

    Unicode digit to unicode string

    Gabriele *darkbard* Farina, May 16, 2006, in forum: Python
    Gabriele *darkbard* Farina
    May 16, 2006
  4. gabor
    Leo Kislov
    Nov 18, 2006
  5. Jean-Paul Calderone
    Leo Kislov
    Nov 21, 2006

Share This Page