han yu pin yin's tone marks

F

fulio pen

Chinese is a tonal language. There are four different tones, and are represented by four diacritics in the language's alphabetic writing system, the pinyin.

These diacritics are:

macron(1st tone), acute(2nd tone), caron(3rd tone), grave(fourth tone).

There are two types of code to present these diacritic tone marks on the web page. The first type is a piece of code represents the combination of a letter with the diacritic on its top.

The second type separates the letter and diacritic. The code for the diacritic follows the letter. When displayed on web page, the diacritic would automatically appear on the top of the letter.

I am interested in the latter, and looking for the code for the four diacritics. Thanks for help.

fulio pen
 
J

Jukka K. Korpela

These diacritics are:

macron(1st tone), acute(2nd tone), caron(3rd tone), grave(fourth tone).

There are two types of code to present these diacritic tone marks on the web page.
The first type is a piece of code represents the combination of a
letter with the diacritic on its top.
The second type separates the letter and diacritic.

Right. The first type is generally to be preferred in practice, for
reasons explained at
http://www.cs.tut.fi/~jkorpela/html/characters.html#precomp
When displayed on web page, the diacritic would automatically appear on the top of the letter.

It's somewhat complicated, and browsers used to fail in doing that properly.
I am interested in the latter, and looking for the code for the four diacritics.

They are
U+0304 COMBINING MACRON
U+0301 COMBINING ACUTE ACCENT
U+030C COMBINING CARON
U+0300 COMBINING GRAVE ACCENT

To find out the codes for precomposed characters, like U+0101 LATIN
SMALL LETTER A WITH MACRON, you can use e.g. the BabelPad editor,
http://www.babelstone.co.uk/software/babelpad.html
or Alan Wood's resources http://www.alanwood.net/unicode/#links
 
F

fulio pen

2012-11-01 17:49, fulio pen wrote:





letter with the diacritic on its top.




Right. The first type is generally to be preferred in practice, for

reasons explained at

http://www.cs.tut.fi/~jkorpela/html/characters.html#precomp






It's somewhat complicated, and browsers used to fail in doing that properly.






They are

U+0304 COMBINING MACRON

U+0301 COMBINING ACUTE ACCENT

U+030C COMBINING CARON

U+0300 COMBINING GRAVE ACCENT



To find out the codes for precomposed characters, like U+0101 LATIN

SMALL LETTER A WITH MACRON, you can use e.g. the BabelPad editor,

http://www.babelstone.co.uk/software/babelpad.html

or Alan Wood's resources http://www.alanwood.net/unicode/#links

Hi, Jukka,

Thanks a lot for your help. The tone marks in the following page are from your posting:

http://www.pinyinology.com/toneMarks/tones/marks2b.html

It is strange that the symbols in the fourth tone are bigger than others. The code was validated, and there were errors on the validater, but there was none in the notepad file. If possible, please help find out what is wrong in the code.

Tanks again for your expertise.

fulio pen
 
J

Jukka K. Korpela

The tone marks in the following page are from your posting:

http://www.pinyinology.com/toneMarks/tones/marks2b.html

It is strange that the symbols in the fourth tone are bigger than others.
The code was validated, and there were errors on the validater

The error that matters here is "Unclosed element div", which means that
the <div> element containing the third tone is not closed properly;
instead of </div>, there is <div>. This makes the <div> for the fourth
tone part of the earlier <div>, which in turn means that the rule
div.marks {font-size:150%; } has cumulative effect.

The validator's warning "Text run is not in Unicode Normalization Form
C." basically says that the text data contains combinations of letters
and diacritic marks that could and should be written as precomposed
characters. As I wrote, that's generally good advice, but should not be
seen as an absolute rule.

The topic is discussed at
http://www.w3.org/International/questions/qa-html-css-normalization
which is descriptive, not normative. And it's biased and partly even
erroneous: "The Unicode Standard allows either of these alternatives,
but requires that both be treated as identical" is not true. (What the
standard really says, loosely speaking, is that a precomposed character
and its decomposition can normally be expected to look the same and be
treated the same, and you should not expect applications to make a
difference, but applications *may* make a difference. And in reality,
there are differences. Besides, conformance to Unicode standard does not
require support to any particular set of characters. For example, a
conforming application may be ignorant of combining marks - as long as
it is not plain wrong about them.)
 
A

Andreas Prilop

The validator's warning "Text run is not in Unicode Normalization
Form C." basically says that the text data contains combinations
of letters and diacritic marks that could and should be written
as precomposed characters.

This applies to Latin letters.
When you write the precomposed Devanagari letters ड़ ढ़ ,
you get the same warning and you are supposed to write
ड़ ढ़ instead.

I regard this as an illogical and unnecessary requirement of HTML5.
It is not the job of HTML5 to prescribe the way of writing characters.
 
J

Jukka K. Korpela

This applies to Latin letters.
When you write the precomposed Devanagari letters ड़ ढ़ ,
you get the same warning and you are supposed to write
ड़ ढ़ instead.

Right, Unicode "normalization" is partly rather abnormal.
I regard this as an illogical and unnecessary requirement of HTML5.
It is not the job of HTML5 to prescribe the way of writing characters.

Unfortunately, HTML5 seems to follow W3C traditions here, reflecting a
simplistic view. This is a category error, so to say, dealing with
character-level issues at a higher protocol level.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top