J

#### Juan R.

[http://canonicalscience.blogspot.com/2006/04/scientific-language-canonml-is.html]

I presented some generic requirements for a markup language for science

and mathematics. Basic features of CanonML and ampliations and

improvements over TeX, SGML, XML or Scheme based encodings are listed

below. However, let me an incise first. Rememeber how we also saw that

the mathematics in Distler's blog Musings were being incorrectly

encoded with simulation of tensors, incorrect structural markup,

incorrect numerics, etc.

However one of most fascinating samples is Distler's posting

"Designing the 5th Dimension"

[http://golem.ph.utexas.edu/~distler/blog/archives/000635.html]

There Distler is serving the 5D element of line (ds)^2 as 2s ds. But 2s

ds is not equal to (ds)^2!!

Of course this kind of foolish MathML code is not accessible, not

searchable, etc.

Some folks have noted the terrible paradox of being proud enough to

claim that Musings is (in own Distler's words) "the world's most

technologically-advanced weblog" whereas being unable to correctly

encode something so simple as the square of a line element.

Other folks go beyond and carefully note the analogies between Musings

(Distler is a string theorist) and the own string theory. String,

superstring, brane, and M theories are popularized in mass media as the

world's most sophisticated stuff whereas being unable to derive

something so simple as Coulomb force law in despite of numerous efforts

in last 40 years. Or it is still poor; string theory is offering us

clearly wrong output for almost any empirical property of our universe:

string theory deals with exact supersymmetry and perturbative gravitons

on a flat static spacetime, which contradicts lessons learned from

Standard model and General Relativity and, of course, contradicts all

experimental data.

The great failure of string theory is not that the "theory" is in a

permanent anodyne state; the problem -and big one- is that nobody has

been able to do the theory compatible with our current knowledge.

You do not need to construct giant accelerators for verifying some

exotic result such as existence of hidden dimensions or one-dimensional

tiny vibrating objects. That you need is a theory was compatible with

all experimental data now available; once achieved this basic goal of

scientific methodology then you can focus on future experiments and

exotic possibilities...

After several "great folks" became so enthusiastic on theoretical

possibilities for the multiverse opened by the "Landscape", it started

a popular joke between academicians saying that our universe can be

roughly characterized as the one that string-brane-M theory cannot

predict or explain ;-)

I agree on the parallelism between the technological fiasco of

Distler's Musings blog and the scientific fiasco of string and M

theories. As number of lesson arise.

The first lesson is you may verify details of you are doing. You cannot

assume points (as Distler is doing in both topics Itex-MathML and in

string theory) simply waiting that may be true.

Second lesson is that you may provide better alternatives to available

ones. If general relativity works fine explaining-predicting a subset

of phenomena, then the next theory may explain *also* phenomena cannot

be explained via general relativity: e.g. microscopic phenomena, the CC

problem...

String theory cannot compute anomalous perihelion for Mercury, has not

quantized gravity, and has not solved the CC problem (at the best is

discussed using anthropic reasoning). This is parallel with IteX

approach used in Distler's blog. For instance, the correctness and

accessibility of almost all MathML equations are being served on the

web are poor than using an old HTML + GIF + ALT model!

The third lesson to be learned here is that you may be not proud enough

about your own work, at least not before you achieve success.

Of course, something as simple as (ds)^2 can be perfectly encoded in

CanonML. The syntax is similar to TeX one but more powerful and

sophisticated than presentation MathML 2.0 and with more semantic

content than Content MathML 2.0. Semantic content will be useful for

encoding scientific information. For example, the MathML parallel

markup for energy-mass relationship is

<semantics>

<mrow>

<mi>E</mi>

<mo>=</mo>

<mrow>

<mi>m</mi>

<mo>⁢</mo>

<msup>

<mi>c</mi>

<mn>2</mn>

</msup>

</mrow>

</mrow>

<annotation-xml encoding="MathML-Content">

<apply>

<eq/>

<ci>E</ci>

<apply>

<times/>

<ci>m</ci>

<apply>

<power/>

<ci>c</ci>

<cn>2</cn>

</apply>

</apply>

</apply>

</annotation-xml>

</semantics>

Terrible and still can be poor!!!

but using both canonical expressions (modification of SEXPR) and the

infix formal operator model, we can encode more information with

easiness of TeX or ASCIIMath syntax. Above ultra-verbose formula is

encoded in CanonFormal as

[E \= [m \* [c \** 2] ] ]

I may add now that I presented some mathematic-formal properties of

Canonical Meta Language (CanonML) in

[http://canonicalscience.blogspot.com/2006/04/canonml-mathematical-formal-language.html]

Now we will see some of the possibilities of CanonML in the area of

markup languages and why CanonML is better. The comparison with XML is

general; this implies that any mathematical scientific language based

in XML -such as CML, CellML, physicsML, MathML, et cetera- is already

being introduced in the comparison.

In future postings I will review a journal of mathematics and other of

physics are using MathML in their articles. We will see that kind of

incorrect code is being served to the Internet below a hype of cool...

Tagging]

CanonML present us several advantages:

i)

Syntax is less verbose than XML.

ii)

The datument becomes a data structure composed of canonical expressions

(modification of SEXPR) and can be manipulated that way.

iii)

All technology is unified. For instance, due to limitations of XML

design, w3c folks were obligated to provide non-XML alternatives in

many recent improvements. XPath is not an XML language; the original

RDF was so complex that needs of alternative syntaxes, and the same

about RELAX NG and all that stuff. SVG also introduces a non-XML

language, etc.

However, the CanonML version of XPath is based in CanonML itself. You

can manipulate it as you can manipulate any other canonical expression

structure.

iv)

The metadata model is also better than in XML. XML has been rudely

critiqued for their inefficient and limited attribute model.

One of the improvements of CanonML over SXML and other SEXPR inspired

markup languages is that the tag is marked instead of the text:

[:ara The CanonML language is [::em more] readable than XML]

Another of improvements over SXML is that CanonML includes special

syntax for empty tags

[::group [::tag1] [::tag2]]

is equivalent to

[::group \tag1 \tag2]

I choose the \ notation for empty tags because readability issues and

because with this notation mathematical formulae look close to TeX, and

this may simplify the adaptation of TeX users to the new syntax.

Compare next CanonML fragment

[a \over 2]

with (TeX)

{a \over 2}

In a future posting, I will focus in mathematics and science and will

provide detailed comparison with TeX, MathML, XML-MAIDEN, ASCIIMath,

and others.

Why not closing tag?]

- Because redundancy and verbosity of XML. Do you know any

mathematician or scientist writing (2 + 2 + 2 - 4 + 6/6 - 1) when she

or he means just the number 2?

I just write x^2 + bx + c = 0, for instance because mathematics is a

concise formal system.

- Because parsering is easier.

- Because simplify human authoring of datuments.

- Because cost per MB and server bandwidth requirements.

- Because in dynamical algorithms the tagname may not be known until

runtime and, therefore, the closing tagnames may be not forced.

However, the structural purity of CanonML datuments is better than TeX

documents and even better than next XHTML 2.0 (which improves structure

but at cost of backward incompatibility with HTML and XHTML 1.0!!).

In CanonML there is not specific open markup as LaTeX

\part, \subsection, \paragraph, \chapter, \subsubsection,

\subparagraph, and \section. In fact, commands as \subsection or

\subsubsection are clearly redundant. And what if I need structural

subsections at fifth level? Is there a \ subsubsubsubsection in LaTeX?

CanonML offers a virtually infinite number of structural levels for

your datuments with a single optimised command.

However, the structure of datument is better still because as was

explained in a previous posting (see above) CanonML is also a

formal-mathematical language arising from a "unification" of

S-expressions, bracketed Dyck language, and Keizer canonical vectors.

Multi-markup: Non-hierarchical structures]

CanonML lets us encode overlapping structures.

Non-hierarchical structures are of great interest in science. For

instance, the Lewis structure for HF is

<H/<F/ e e /H> e e e e e e /F>

and

[H}[F} e e {H] e e e e e e {F]

in GODDAG and liminal respectively. However, both approaches present

well-know difficulties (e.g. disambiguating keys in liminal).

In CanonML the Lewis structure for HF can be encoded as

[::H::F e e] [::F e e e e e e]

where ::H::F is an example of the novel multi-markup concept introduced

by this language.

Metadata and attributes]

CanonML introduces a novel metadata model. Advantages over XML:

- A bit less verbose.

- Attribute values can be any expression not just of type string.

- Attribute keys can be any object. XML does not let an attribute key

start with a digit or contain angle-brackets.

- Tagname of an element can be any expression.

- Attribute keys are optional. In the popular CSV syntax and in all

major programming languages, field or argument values are given by

position, not by keyword.

- Attributes for attributes for attributes for... This lets us to

denote attributes types for instance.

"Namespaces"]

Another of highly critiqued points of XML world is namespaces. CanonML

avoid usage of namespaces whereas solving the naming conflict for tags.

In this way one could download mathematical formulas encoded in CanonML

from a hypothetical international database and introducing them into a

personal datument without worry about naming conflicts. The MathML URI

is not needed.

A new Generic Markup Language]

The possibilities for CanonML are immense. This new sophisticated meta

language can be used as hosting language for many different markup

approaches including Schema, RELAX NG the useful CSS, etc.

This approach has been listed in the alternatives to XML directory

[http://www.pault.com/xmlalternatives.html]

and presented at the terseXML group as

<blockquote>

Strong (the only?) attempt on "one markup for several xml processing

specs" design

</blockquote>

[http://groups.yahoo.com/group/tersexml/message/103]

Source:

http://canonicalscience.blogspot.com/2006/04/canonml-markup-language-beyond-tex-xml.html