Unicode values

T

tamizhselvys

Hi,

Can any one explain me the difference between unicode and hexadecimal
entity used in xml.

Thanks,
Anu.
 
A

Andy Dingley

Can any one explain me the difference between unicode and hexadecimal
entity used in xml.

Try searching for "Jukka Korpela" and Unicode. He has an O'Reilly book
and a very useful website on the topic. Wikipedia is worth reading
too.

"Unicode" defines a "character set". There are also "encodings" that
specify how computers interpret sequences of bytes or numbers to turn
them into characters. There may be many encodings that all specify the
same character in the same character set, which can get complicated.

Character sets before Unicode tended to work for only one language at
a time. This made them manageably smaller, but also inconvenient for
multi-language work. Unicode takes the different approach: one single,
huge character set for everything.

When you use HTML or XML, there is only _one_ character set that is
ever used: Unicode.

There may be lots of different encodings for a HTML or XML document
(one at a time), but they all lead to Unicode characters. Most
commonly you will specify a character directly (e.g. by typing it),
which also requires you to make sure it's in a suitable encoding for
the document. Alternatively you can use a "numeric character entity"
to specify the Unicode character "ø" by its identifying number, either
in decimal ø or in hexadecimal ø No matter what the
document's encoding, these same numbers refer to these same
characters: it's skipping the encoding and going straight to Unicode.
This works equally in XML or HTML.

For a few of these characters, there are also "character entity
references" defined for HTML, such as ø (meaning the same "o
with a slash" character as before). These are a bit more readable than
the raw numbers. However remember that they're part of HTML only, not
XML! So you can use them in XHTML, but not in RSS.


(I've confused some definitions here between bytes / octets,
characters / codepoints and Unicode / UCS / ISO10646 in an attempt at
brevity, if not clarity. Jukka will probably accuse me of "worthless
babbling" again as a result)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top