Canonical Science Today, authoring system for science and mathematics (1st part)

Discussion in 'XML' started by Juan R., May 9, 2006.

  1. Juan R.

    Juan R. Guest

    The initial CanonMath program presented here


    was discussed with several specialists, including father of XML-MAIDEN
    project (which provided many interesting ideas over original desing).
    The initial CanonMath program (was abandoned) was presented at the w3c
    mailing list for mathematics. There was little discussion but
    subsequent discussion on others related MathML topics (formal structure
    of Content MathML markup, script models, special entities, Unicode
    composed diacritics vs <mover>, et cetera) benefited the development of
    this new program.

    Now the new CanonML approach is splinted into three modules:

    CanonText. This mainly a canonical expressions reformulation of the
    initial Canontext module was based in XML in the old approach but now
    is based in canonical expressions (modification of SEXPR).

    CanonCode. This is a transformation/scripting/programming module. It
    will be presented in next postings.

    CanonFormal. It is the substitute of the old CanonMath, Canongraph, and
    scientific modules. This is also based in canonical expressions now.
    The change of name is justified because the language is used for
    encoding of formal systems: either mathematical or scientific-technical
    ones. Main features will be discussed here.

    This new program substitutes to the old one. Ideas presented at


    are superseded by new ones. For example the initially chosen <over/>
    tag is not already used for overscripts now and, therefore, we can
    recover the TeX \over tag for fractions.

    Formal structure]

    The basic unit structure of a formal fragment is

    [preargument1 ... \command postargument1 ...]

    This is an infix notation and contrary to mainstream wisdom this is the
    most general notation in formal systems. For example imagine that there
    is not prearguments -i.e. preargument set is empty- then

    [\command postargument1 ...]

    That is, one recovers prefix notation as a special case. An example is

    [\cos x]

    Now imagine that there are not post arguments, then one obtains postfix

    [preargument1 ... \command]

    A known case in mathematics is the factorial of a number

    [23 \!]

    The infix notation is preferred because easiness on writing by humans
    and because contains others notations as special cases.

    Both the infix notation and the formal canonical expressions design
    unify the syntax doing parsing more easy for computers and the learning
    more simple to users. The erratic TeX-like syntax (so critiqued in the
    past by users) is avoided, but structure of mathematical markup is more
    solid than that of ISO 12083 or MathML.

    For instance, TeX commands




    can be compared with their CanonFormal correspondences

    [f \^^ ~]

    [M \_ b]

    [\sum \__ 0 \^^ n]

    Note the erratic ordering of factors in TeX. For instance, the ordering
    in the \stackrel command is overscript-base, but base-overscript in the
    case of \sum. Note also that the TeX command "_" introduces subindices
    when acting in M but introduces an underscript when applied to \sum. If
    you want a subindex then you may consult your TeX manual (a TeX gurĂ¼
    will work too ;-) for learning the way. In CanonFormal \_ always
    introduces subindices whereas \__ introduces underscripts.

    Note also that base is perfectly defined in CanonFormal. Problems on
    parsing base of scripts and lack of any structural model for tensors or
    prescripts are two sound weaknesses of TeX and similar models (e.g.
    also of ASCIIMath). MathML or ISO 12083 improve this via specific
    markup structures fixing bases. However, both approaches present
    serious difficulties. MathML introduce bases as child of scripts
    structures and this generate difficulties for extensibility of content
    model, complicate the DTD a lot of at that point. Moreover, the MathML
    script model returns CSS incompatibilities forcing an incorrect
    implementation in browsers rendering engine.

    All this is solved in CanonFormal; in a future posting we will see
    detailed comparative review of script models in MathML, ASCIIMath,
    LaTeX, amstex, ISO 12083 and CanonFormal, including illustrative
    samples of code.

    Computational advantages of postfix]

    Due to his formal structure CanonFormal structures can be transformed
    to either prefix or postfix (RPN) notation. In fact, there exist
    standard algorithms for transforming infix notation to RPN: e.g. the
    so-called shunting yard algorithm. In a future posting we will see how
    an algorithm could be directly implemented in CanonCode: a
    scripting-programming-transformation language based in own CanonML.

    RPN has the computational advantage of being extremely easy to
    computational analisis, increasing computer efficiency and speed. It is
    also possible a better memory management.

    For example the CanonFormal representation for (3 * 9) + 2 can be
    easily transformed to

    [[3 9 \*] 2 \+]

    or even to

    3 9 \* 2 \+.

    This may be useful in computational science. E.g. the authors of
    Biosystems 2003; 72(1-2); 159-76 presented a novel algorithm with RPN
    design for reducing the computational cost.

    Usage of invisible operators and others]

    Invisible operators and special commands are useful for encoding
    content, for computation and also for improving accessibility of

    For example, what do I mean by h(x+y)? The function h of (x+y) or maybe
    the product of variables h and (x+y)?

    TeX, IteX, ASCIIMath, or ISO 12083, between others, cannot disambiguate
    the expression. In CanonFormal we can write

    [h \function-of [x \+ y]]


    [h \* [x \+ y]].

    A search engine can discriminate between both expressions; aural
    renderings for people with disabilities are those we wait, etc.

    In theory, presentation MathML 2.0 can also differentiate both
    expressions via application of specific entities &ApplyFunction; and
    &InvisibleTimes;. For instance, latter CanonFormal code may be
    expressed in MathML 2.0 like


    But I remarked the "in theory" because in practice MathML is not
    authored by hand due to its unusual verbosity and many popular tools
    (including tools generating the MathML code used in some academic
    journals as Living Reviews in Relativity (Hermes) and scientific blogs
    such as Musings from J. Distler) are *not* using invisible operators.

    Semantic correctness and accessibility of the real MathML code is being
    served in the Internet is very limited.

    Similar thoughts are applied to dx (encoded as [d \* x]) versus dx
    (encoded as [\differential x]).

    The situation in MathML documents served on the Internet is still poor.
    The differential is not usually encoded via special entity and,
    therefore, confounded with others d; a very basic concept such as the
    "square of differential of s" is being incorrectly encoded as "two
    times s times ds". This incorrect encoding is being served to the
    Internet in several articles of HERMES - Living Reviews on relativity
    journal and also in Distler's blog Musings on string theory and
    related stuff. See


    for further details and links to Distler's blog articles. In the next
    HERMES generated document


    you can find the MathML 2.0 code


    just after the section 2.1. It is obvious that accessibility,
    structure, "semantics", encoding, and rendering of this code are all
    wrong. Moreover, the HERMES generated code contains redundant <mrow>
    tags. Mozilla developers carefully recommend avoiding redundant <mrow>
    by technical motives: performance and memory consumption.


    In a future posting we will see that there exist more problems with
    redundant mrows than those are being listed in the mozilla page.

    Even if you fix by hand the above weird MathML code, correct it, and
    introduce the entity &DifferentialD;, the code will suffer from
    additional difficulties. The MathML entity for differentials will be
    visually rendered as a double-struck d, instead of the more usual roman

    David Carlisle in


    and others MathML WG folks replied that for visual rendering a roman d
    the best way in MathML may be to use a simple d in the input file, for
    example <mo>d</mo>.

    But the own MathML 2.0 specification does not recommend that way!

    Using something as <mo>d</mo> you obtain a not very convincing visual
    rendering, incorrect aural rendering for people with disabilities, and
    you confound search engines.

    The second option is to use MathML recommended code
    <mo>&Differential;</mo>, then search engines can recognize
    differential, the aural rendering is correct, but one obtains a default
    non-standard visual rendering cannot be modified as some folks worried
    at the list.

    All those problems are absent in CanonFormal. Search engines can
    distinguish then a simple d from the \differential command. Aural
    renderings read "differential of" for blinded users, and the visual
    rendering of the command is set to a roman d by default but can be
    changed by the user via "stylesheets" when desired.



    Juan R.

    Center for CANONICAL |SCIENCE)
    Juan R., May 9, 2006
    1. Advertisements

  2. Juan R.

    Juan R. Guest

    Juan R., May 9, 2006
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ali.M

    Online Help Authoring System

    Ali.M, May 13, 2004, in forum: ASP .Net
    Ken Cox [Microsoft MVP]
    May 13, 2004
  2. Juan R.
    Juan R.
    Feb 11, 2006
  3. Zachary
    Peter Hansen
    Jan 19, 2004
  4. Priyank
    Dec 15, 2006
  5. tommak
    Chris Dollin
    Oct 10, 2006

Share This Page