han yu pin yin's tone marks

Discussion in 'HTML' started by fulio pen, Nov 1, 2012.

  1. fulio pen

    fulio pen Guest

    Chinese is a tonal language. There are four different tones, and are represented by four diacritics in the language's alphabetic writing system, the pinyin.

    These diacritics are:

    macron(1st tone), acute(2nd tone), caron(3rd tone), grave(fourth tone).

    There are two types of code to present these diacritic tone marks on the web page. The first type is a piece of code represents the combination of a letter with the diacritic on its top.

    The second type separates the letter and diacritic. The code for the diacritic follows the letter. When displayed on web page, the diacritic would automatically appear on the top of the letter.

    I am interested in the latter, and looking for the code for the four diacritics. Thanks for help.

    fulio pen
    fulio pen, Nov 1, 2012
    #1
    1. Advertising

  2. 2012-11-01 17:49, fulio pen wrote:

    > These diacritics are:
    >
    > macron(1st tone), acute(2nd tone), caron(3rd tone), grave(fourth tone).
    >
    > There are two types of code to present these diacritic tone marks on the web page.
    > The first type is a piece of code represents the combination of a

    letter with the diacritic on its top.
    > The second type separates the letter and diacritic.


    Right. The first type is generally to be preferred in practice, for
    reasons explained at
    http://www.cs.tut.fi/~jkorpela/html/characters.html#precomp

    > When displayed on web page, the diacritic would automatically appear on the top of the letter.


    It's somewhat complicated, and browsers used to fail in doing that properly.

    > I am interested in the latter, and looking for the code for the four diacritics.


    They are
    U+0304 COMBINING MACRON
    U+0301 COMBINING ACUTE ACCENT
    U+030C COMBINING CARON
    U+0300 COMBINING GRAVE ACCENT

    To find out the codes for precomposed characters, like U+0101 LATIN
    SMALL LETTER A WITH MACRON, you can use e.g. the BabelPad editor,
    http://www.babelstone.co.uk/software/babelpad.html
    or Alan Wood's resources http://www.alanwood.net/unicode/#links

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Jukka K. Korpela, Nov 1, 2012
    #2
    1. Advertising

  3. fulio pen

    fulio pen Guest

    On Thursday, November 1, 2012 1:10:26 PM UTC-4, Jukka K. Korpela wrote:
    > 2012-11-01 17:49, fulio pen wrote:
    >
    >
    >
    > > These diacritics are:

    >
    > >

    >
    > > macron(1st tone), acute(2nd tone), caron(3rd tone), grave(fourth tone).

    >
    > >

    >
    > > There are two types of code to present these diacritic tone marks on the web page.

    >
    > > The first type is a piece of code represents the combination of a

    >
    > letter with the diacritic on its top.
    >
    > > The second type separates the letter and diacritic.

    >
    >
    >
    > Right. The first type is generally to be preferred in practice, for
    >
    > reasons explained at
    >
    > http://www.cs.tut.fi/~jkorpela/html/characters.html#precomp
    >
    >
    >
    > > When displayed on web page, the diacritic would automatically appear on the top of the letter.

    >
    >
    >
    > It's somewhat complicated, and browsers used to fail in doing that properly.
    >
    >
    >
    > > I am interested in the latter, and looking for the code for the four diacritics.

    >
    >
    >
    > They are
    >
    > U+0304 COMBINING MACRON
    >
    > U+0301 COMBINING ACUTE ACCENT
    >
    > U+030C COMBINING CARON
    >
    > U+0300 COMBINING GRAVE ACCENT
    >
    >
    >
    > To find out the codes for precomposed characters, like U+0101 LATIN
    >
    > SMALL LETTER A WITH MACRON, you can use e.g. the BabelPad editor,
    >
    > http://www.babelstone.co.uk/software/babelpad.html
    >
    > or Alan Wood's resources http://www.alanwood.net/unicode/#links
    >
    >
    >
    > --
    >
    > Yucca, http://www.cs.tut.fi/~jkorpela/


    Hi, Jukka,

    Thanks a lot for your help. The tone marks in the following page are from your posting:

    http://www.pinyinology.com/toneMarks/tones/marks2b.html

    It is strange that the symbols in the fourth tone are bigger than others. The code was validated, and there were errors on the validater, but there was none in the notepad file. If possible, please help find out what is wrong in the code.

    Tanks again for your expertise.

    fulio pen
    fulio pen, Nov 2, 2012
    #3
  4. 2012-11-02 3:25, fulio pen wrote:

    > The tone marks in the following page are from your posting:
    >
    > http://www.pinyinology.com/toneMarks/tones/marks2b.html
    >
    > It is strange that the symbols in the fourth tone are bigger than others.
    > The code was validated, and there were errors on the validater


    The error that matters here is "Unclosed element div", which means that
    the <div> element containing the third tone is not closed properly;
    instead of </div>, there is <div>. This makes the <div> for the fourth
    tone part of the earlier <div>, which in turn means that the rule
    div.marks {font-size:150%; } has cumulative effect.

    The validator's warning "Text run is not in Unicode Normalization Form
    C." basically says that the text data contains combinations of letters
    and diacritic marks that could and should be written as precomposed
    characters. As I wrote, that's generally good advice, but should not be
    seen as an absolute rule.

    The topic is discussed at
    http://www.w3.org/International/questions/qa-html-css-normalization
    which is descriptive, not normative. And it's biased and partly even
    erroneous: "The Unicode Standard allows either of these alternatives,
    but requires that both be treated as identical" is not true. (What the
    standard really says, loosely speaking, is that a precomposed character
    and its decomposition can normally be expected to look the same and be
    treated the same, and you should not expect applications to make a
    difference, but applications *may* make a difference. And in reality,
    there are differences. Besides, conformance to Unicode standard does not
    require support to any particular set of characters. For example, a
    conforming application may be ignorant of combining marks - as long as
    it is not plain wrong about them.)

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Jukka K. Korpela, Nov 2, 2012
    #4
  5. Andreas Prilop, Nov 2, 2012
    #5
  6. On Fri, 2 Nov 2012, Jukka K. Korpela wrote:

    > The validator's warning "Text run is not in Unicode Normalization
    > Form C." basically says that the text data contains combinations
    > of letters and diacritic marks that could and should be written
    > as precomposed characters.


    This applies to Latin letters.
    When you write the precomposed Devanagari letters ड़ ढ़ ,
    you get the same warning and you are supposed to write
    ड़ ढ़ instead.

    I regard this as an illogical and unnecessary requirement of HTML5.
    It is not the job of HTML5 to prescribe the way of writing characters.

    --
    Outgoing mail is certified free from defamation of Islam™
    and insult of the Prophet™.
    Checked by Thinkpol anti-obscenity system v. 6.66.
    Andreas Prilop, Nov 2, 2012
    #6
  7. 2012-11-02 20:34, Andreas Prilop wrote:

    > On Fri, 2 Nov 2012, Jukka K. Korpela wrote:
    >
    >> The validator's warning "Text run is not in Unicode Normalization
    >> Form C." basically says that the text data contains combinations
    >> of letters and diacritic marks that could and should be written
    >> as precomposed characters.

    >
    > This applies to Latin letters.
    > When you write the precomposed Devanagari letters ड़ ढ़ ,
    > you get the same warning and you are supposed to write
    > ड़ ढ़ instead.


    Right, Unicode "normalization" is partly rather abnormal.

    > I regard this as an illogical and unnecessary requirement of HTML5.
    > It is not the job of HTML5 to prescribe the way of writing characters.


    Unfortunately, HTML5 seems to follow W3C traditions here, reflecting a
    simplistic view. This is a category error, so to say, dealing with
    character-level issues at a higher protocol level.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
    Jukka K. Korpela, Nov 2, 2012
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roger Martin
    Replies:
    0
    Views:
    3,237
    Roger Martin
    Aug 19, 2008
  2. Macky G
    Replies:
    3
    Views:
    1,139
    Macky G
    Jun 30, 2009
  3. Alex Hall
    Replies:
    1
    Views:
    463
    Gregory Ewing
    May 17, 2010
  4. ratullloch_delthis

    Frequency/tone generator in java tia sal22

    ratullloch_delthis, Nov 28, 2010, in forum: Java
    Replies:
    2
    Views:
    3,974
    Roedy Green
    Dec 2, 2010
  5. Jonah Olsson

    How to recognize what ring tone support?

    Jonah Olsson, Jul 26, 2004, in forum: ASP .Net Mobile
    Replies:
    1
    Views:
    109
    Jean-Luc David [MS-MVP]
    Jul 26, 2004
Loading...

Share This Page