Decimal code or entity?

Discussion in 'HTML' started by Karl C., May 16, 2007.

  1. Karl C.

    Karl C. Guest

    Hi,

    I'dd like to know which is the prefered code to display the umlaut mark, is
    it in decimal code &#235 or the entity ë?

    I have one page using iso-8859-1 where I just use the normal character 'ë'
    but the other site uses utf-8 which asks for unicode.

    Thanks everyone.
    Karl
     
    Karl C., May 16, 2007
    #1
    1. Advertising

  2. Karl C. wrote:

    > I'dd like to know which is the prefered code to display the umlaut mark, is
    > it in decimal code &#235 or the entity ë?
    >
    > I have one page using iso-8859-1 where I just use the normal character 'ë'
    > but the other site uses utf-8 which asks for unicode.


    If you're using ISO-8859-1 or UTF-8, then you can just type ë straight
    into the file -- no need to reference it in any special way. This is
    because both of those encodings include the ë character.

    You only need to use an entity or character reference such as ë or
    ë when you're working in an encoding that doesn't include ë. Examples
    of such encodings are US-ASCII and Shift-JIS.

    For example, say you're working on an HTML file in US-ASCII encoding.
    US-ASCII is a fairly old character set with support for only about 100
    printable characters. In particular, it doesn't include any characters
    with diacritic marks (a.k.a. "accents") So because you can't represent ë
    directly in the file, you can use one of HTML's methods of representing
    that character:

    ë
    ë
    ë

    That way, the file is still valid US-ASCII, as you've not directly
    included the non-US-ASCII character ë -- you've only included an ampersand
    (&) and a few other characters, all of which are valid US-ASCII characters.
    But an HTML User-Agent, which "mentally converts" all the files it reads
    into Unicode, will know to read the entity as ë.

    With regard to which you should use, it doesn't really matter except in
    some exceptional circumstances.

    Circumstance 1: Hexadecimal character references (ones beginning with
    "&#x") tend to have slightly poorer support in some very old browsers, so
    if you need to support those, then stick to the mnemonic entities (ë)
    and decimal character references (ë).

    Circumstance 2: XML only has five mnemonic character entities -- "&",
    ">", "<", """ and "'". Others can be defined, but should
    not be relied on as they require the processing agent to read the DTD to
    understand what they are. Many agents do not read the DTD (formally, they
    don't have to), so will not understand the entities. For this reason, it's
    wise to stick to only using numeric character references in XML, except for
    "&", ">", "<" and """. (I leave out "'" because
    Internet Explorer doesn't support it -- use "'" instead.) As XHTML is
    a variety of XML, this advice applies to XHTML too.

    --
    Toby A Inkster BSc (Hons) ARCS
    http://tobyinkster.co.uk/
    Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
     
    Toby A Inkster, May 16, 2007
    #2
    1. Advertising

  3. Karl C.

    Andy Dingley Guest

    On 16 May, 09:15, Toby A Inkster <>
    wrote:

    > You only need to use an entity or character reference such as &euml; or
    > ë when you're working in an encoding that doesn't include ë.


    Or when one of your cow-orkers is using an editor that works with some
    other encoding.

    Much of my current day is taken up by mucking out the cages of SQL
    Server database developers who've opened one of our UTF-8 documents
    with a Windows editor, silently converted it to UTF-16 and then broken
    every part of our build process.

    If you're not in an environment that's reliably unicode-clean or at
    least ë clean, then you might find that &euml; isn't necessary, but
    it's still desirable to use it as it's less fragile.
     
    Andy Dingley, May 16, 2007
    #3
  4. Karl C.

    Karl C. Guest

    "Toby A Inkster" <> wrote in message
    news:5n.co.uk...
    > Karl C. wrote:
    >
    >> I'dd like to know which is the prefered code to display the umlaut mark,
    >> is
    >> it in decimal code &#235 or the entity &euml;?
    >>
    >> I have one page using iso-8859-1 where I just use the normal character
    >> 'ë'
    >> but the other site uses utf-8 which asks for unicode.

    >
    > If you're using ISO-8859-1 or UTF-8, then you can just type ë straight
    > into the file -- no need to reference it in any special way. This is
    > because both of those encodings include the ë character.
    >
    > You only need to use an entity or character reference such as &euml; or
    > ë when you're working in an encoding that doesn't include ë. Examples
    > of such encodings are US-ASCII and Shift-JIS.
    >
    > For example, say you're working on an HTML file in US-ASCII encoding.
    > US-ASCII is a fairly old character set with support for only about 100
    > printable characters. In particular, it doesn't include any characters
    > with diacritic marks (a.k.a. "accents") So because you can't represent ë
    > directly in the file, you can use one of HTML's methods of representing
    > that character:
    >
    > &euml;
    > ë
    > ë
    >
    > That way, the file is still valid US-ASCII, as you've not directly
    > included the non-US-ASCII character ë -- you've only included an ampersand
    > (&) and a few other characters, all of which are valid US-ASCII
    > characters.
    > But an HTML User-Agent, which "mentally converts" all the files it reads
    > into Unicode, will know to read the entity as ë.
    >
    > With regard to which you should use, it doesn't really matter except in
    > some exceptional circumstances.
    >
    > Circumstance 1: Hexadecimal character references (ones beginning with
    > "&#x") tend to have slightly poorer support in some very old browsers, so
    > if you need to support those, then stick to the mnemonic entities (&euml;)
    > and decimal character references (ë).
    >
    > Circumstance 2: XML only has five mnemonic character entities -- "&amp;",
    > "&gt;", "&lt;", "&quot;" and "&apos;". Others can be defined, but should
    > not be relied on as they require the processing agent to read the DTD to
    > understand what they are. Many agents do not read the DTD (formally, they
    > don't have to), so will not understand the entities. For this reason, it's
    > wise to stick to only using numeric character references in XML, except
    > for
    > "&amp;", "&gt;", "&lt;" and "&quot;". (I leave out "&apos;" because
    > Internet Explorer doesn't support it -- use "'" instead.) As XHTML is
    > a variety of XML, this advice applies to XHTML too.
    >
    > --
    > Toby A Inkster BSc (Hons) ARCS
    > http://tobyinkster.co.uk/
    > Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux


    Thanks for this informative answer, I've now decided to use &euml; to
    represent 'ë'. I can only hope the three major searchengines Google, Yahoo
    and MSN, respond well to it.

    You've mentioned to put in the 'ë' character right away, I tried that, but
    when using UTF-8 I have all the difficulties getting the page validated on
    W3C, it actually stops checking the page because of these characters.
    Without these 'ë' characters the page validates perfectly using XHTML
    Transitional (even XHTML1.1).

    I knew how to fix it, but the multiple choices to solve the same problem got
    me confused, it is why I was asking which one of these alternatives would be
    the prefered one.

    Thanks once again!

    Karl
     
    Karl C., May 17, 2007
    #4
  5. Scripsit Karl C.:

    > Thanks for this informative answer,


    You didn't need to quote it comprehensively. Just the opposite is true:
    courtesy requires that you only quote the relevant part that you are
    responding to.

    > You've mentioned to put in the 'ë' character right away, I tried
    > that, but when using UTF-8 I have all the difficulties getting the
    > page validated on W3C,


    That's because you didn't put the character there as UTF-8 encoded. Of
    course, if you declare UTF-8 encoding, everything shall be interpreted
    according to it.

    If your authoring tool doesn't really support UTF-8, the question arises
    whether you should use UTF-8 at all (instead of, say, ISO-8859-1).

    > Without these 'ë' characters the page validates
    > perfectly using XHTML Transitional (even XHTML1.1).


    I hope you realize that XHTML as a delivery format of web pages almost never
    gives any real benefits over HTML 4.01, and XHTML 1.1 causes some real
    trouble e.g. if you ever plan to use client-side image maps for example.

    --
    Jukka K. Korpela ("Yucca")
    http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, May 18, 2007
    #5
  6. Karl C. wrote:

    > You've mentioned to put in the 'ë' character right away, I tried that, but
    > when using UTF-8 I have all the difficulties getting the page validated on
    > W3C, it actually stops checking the page because of these characters.


    Check your HTTP headers to see what encoding your server *claims* you are
    using. The http-equiv='Content-Type' META tag also has some relevance, but
    the real HTTP header is key here.

    --
    Toby A Inkster BSc (Hons) ARCS
    http://tobyinkster.co.uk/
    Geek of ~ HTML/SQL/Perl/PHP/Python/Apache/Linux
     
    Toby A Inkster, May 18, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Samuel van Laere

    Entity Name or Entity Number?

    Samuel van Laere, Feb 24, 2007, in forum: HTML
    Replies:
    4
    Views:
    1,640
    Jukka K. Korpela
    Feb 24, 2007
  2. markla
    Replies:
    1
    Views:
    550
    Steven Cheng
    Oct 6, 2008
  3. Norm
    Replies:
    3
    Views:
    2,731
  4. ThatsIT.net.au

    Entity, problem with entity key

    ThatsIT.net.au, Sep 6, 2009, in forum: ASP .Net
    Replies:
    1
    Views:
    1,202
    ThatsIT.net.au
    Sep 7, 2009
  5. kurt
    Replies:
    2
    Views:
    106
Loading...

Share This Page