J
Juan R.
In
[http://canonicalscience.blogspot.com/2006/04/scientific-language-canonml-is.html]
I presented some generic requirements for a markup language for science
and mathematics. Basic features of CanonML and ampliations and
improvements over TeX, SGML, XML or Scheme based encodings are listed
below. However, let me an incise first. Rememeber how we also saw that
the mathematics in Distler's blog Musings were being incorrectly
encoded with simulation of tensors, incorrect structural markup,
incorrect numerics, etc.
However one of most fascinating samples is Distler's posting
"Designing the 5th Dimension"
[http://golem.ph.utexas.edu/~distler/blog/archives/000635.html]
There Distler is serving the 5D element of line (ds)^2 as 2s ds. But 2s
ds is not equal to (ds)^2!!
Of course this kind of foolish MathML code is not accessible, not
searchable, etc.
Some folks have noted the terrible paradox of being proud enough to
claim that Musings is (in own Distler's words) "the world's most
technologically-advanced weblog" whereas being unable to correctly
encode something so simple as the square of a line element.
Other folks go beyond and carefully note the analogies between Musings
(Distler is a string theorist) and the own string theory. String,
superstring, brane, and M theories are popularized in mass media as the
world's most sophisticated stuff whereas being unable to derive
something so simple as Coulomb force law in despite of numerous efforts
in last 40 years. Or it is still poor; string theory is offering us
clearly wrong output for almost any empirical property of our universe:
string theory deals with exact supersymmetry and perturbative gravitons
on a flat static spacetime, which contradicts lessons learned from
Standard model and General Relativity and, of course, contradicts all
experimental data.
The great failure of string theory is not that the "theory" is in a
permanent anodyne state; the problem -and big one- is that nobody has
been able to do the theory compatible with our current knowledge.
You do not need to construct giant accelerators for verifying some
exotic result such as existence of hidden dimensions or one-dimensional
tiny vibrating objects. That you need is a theory was compatible with
all experimental data now available; once achieved this basic goal of
scientific methodology then you can focus on future experiments and
exotic possibilities...
After several "great folks" became so enthusiastic on theoretical
possibilities for the multiverse opened by the "Landscape", it started
a popular joke between academicians saying that our universe can be
roughly characterized as the one that string-brane-M theory cannot
predict or explain ;-)
I agree on the parallelism between the technological fiasco of
Distler's Musings blog and the scientific fiasco of string and M
theories. As number of lesson arise.
The first lesson is you may verify details of you are doing. You cannot
assume points (as Distler is doing in both topics Itex-MathML and in
string theory) simply waiting that may be true.
Second lesson is that you may provide better alternatives to available
ones. If general relativity works fine explaining-predicting a subset
of phenomena, then the next theory may explain *also* phenomena cannot
be explained via general relativity: e.g. microscopic phenomena, the CC
problem...
String theory cannot compute anomalous perihelion for Mercury, has not
quantized gravity, and has not solved the CC problem (at the best is
discussed using anthropic reasoning). This is parallel with IteX
approach used in Distler's blog. For instance, the correctness and
accessibility of almost all MathML equations are being served on the
web are poor than using an old HTML + GIF + ALT model!
The third lesson to be learned here is that you may be not proud enough
about your own work, at least not before you achieve success.
Of course, something as simple as (ds)^2 can be perfectly encoded in
CanonML. The syntax is similar to TeX one but more powerful and
sophisticated than presentation MathML 2.0 and with more semantic
content than Content MathML 2.0. Semantic content will be useful for
encoding scientific information. For example, the MathML parallel
markup for energy-mass relationship is
<semantics>
<mrow>
<mi>E</mi>
<mo>=</mo>
<mrow>
<mi>m</mi>
<mo>⁢</mo>
<msup>
<mi>c</mi>
<mn>2</mn>
</msup>
</mrow>
</mrow>
<annotation-xml encoding="MathML-Content">
<apply>
<eq/>
<ci>E</ci>
<apply>
<times/>
<ci>m</ci>
<apply>
<power/>
<ci>c</ci>
<cn>2</cn>
</apply>
</apply>
</apply>
</annotation-xml>
</semantics>
Terrible and still can be poor!!!
but using both canonical expressions (modification of SEXPR) and the
infix formal operator model, we can encode more information with
easiness of TeX or ASCIIMath syntax. Above ultra-verbose formula is
encoded in CanonFormal as
[E \= [m \* [c \** 2] ] ]
I may add now that I presented some mathematic-formal properties of
Canonical Meta Language (CanonML) in
[http://canonicalscience.blogspot.com/2006/04/canonml-mathematical-formal-language.html]
Now we will see some of the possibilities of CanonML in the area of
markup languages and why CanonML is better. The comparison with XML is
general; this implies that any mathematical scientific language based
in XML -such as CML, CellML, physicsML, MathML, et cetera- is already
being introduced in the comparison.
In future postings I will review a journal of mathematics and other of
physics are using MathML in their articles. We will see that kind of
incorrect code is being served to the Internet below a hype of cool...
Tagging]
CanonML present us several advantages:
i)
Syntax is less verbose than XML.
ii)
The datument becomes a data structure composed of canonical expressions
(modification of SEXPR) and can be manipulated that way.
iii)
All technology is unified. For instance, due to limitations of XML
design, w3c folks were obligated to provide non-XML alternatives in
many recent improvements. XPath is not an XML language; the original
RDF was so complex that needs of alternative syntaxes, and the same
about RELAX NG and all that stuff. SVG also introduces a non-XML
language, etc.
However, the CanonML version of XPath is based in CanonML itself. You
can manipulate it as you can manipulate any other canonical expression
structure.
iv)
The metadata model is also better than in XML. XML has been rudely
critiqued for their inefficient and limited attribute model.
One of the improvements of CanonML over SXML and other SEXPR inspired
markup languages is that the tag is marked instead of the text:
[:ara The CanonML language is [::em more] readable than XML]
Another of improvements over SXML is that CanonML includes special
syntax for empty tags
[::group [::tag1] [::tag2]]
is equivalent to
[::group \tag1 \tag2]
I choose the \ notation for empty tags because readability issues and
because with this notation mathematical formulae look close to TeX, and
this may simplify the adaptation of TeX users to the new syntax.
Compare next CanonML fragment
[a \over 2]
with (TeX)
{a \over 2}
In a future posting, I will focus in mathematics and science and will
provide detailed comparison with TeX, MathML, XML-MAIDEN, ASCIIMath,
and others.
Why not closing tag?]
- Because redundancy and verbosity of XML. Do you know any
mathematician or scientist writing (2 + 2 + 2 - 4 + 6/6 - 1) when she
or he means just the number 2?
I just write x^2 + bx + c = 0, for instance because mathematics is a
concise formal system.
- Because parsering is easier.
- Because simplify human authoring of datuments.
- Because cost per MB and server bandwidth requirements.
- Because in dynamical algorithms the tagname may not be known until
runtime and, therefore, the closing tagnames may be not forced.
However, the structural purity of CanonML datuments is better than TeX
documents and even better than next XHTML 2.0 (which improves structure
but at cost of backward incompatibility with HTML and XHTML 1.0!!).
In CanonML there is not specific open markup as LaTeX
\part, \subsection, \paragraph, \chapter, \subsubsection,
\subparagraph, and \section. In fact, commands as \subsection or
\subsubsection are clearly redundant. And what if I need structural
subsections at fifth level? Is there a \ subsubsubsubsection in LaTeX?
CanonML offers a virtually infinite number of structural levels for
your datuments with a single optimised command.
However, the structure of datument is better still because as was
explained in a previous posting (see above) CanonML is also a
formal-mathematical language arising from a "unification" of
S-expressions, bracketed Dyck language, and Keizer canonical vectors.
Multi-markup: Non-hierarchical structures]
CanonML lets us encode overlapping structures.
Non-hierarchical structures are of great interest in science. For
instance, the Lewis structure for HF is
<H/<F/ e e /H> e e e e e e /F>
and
[H}[F} e e {H] e e e e e e {F]
in GODDAG and liminal respectively. However, both approaches present
well-know difficulties (e.g. disambiguating keys in liminal).
In CanonML the Lewis structure for HF can be encoded as
[::H::F e e] [::F e e e e e e]
where ::H::F is an example of the novel multi-markup concept introduced
by this language.
Metadata and attributes]
CanonML introduces a novel metadata model. Advantages over XML:
- A bit less verbose.
- Attribute values can be any expression not just of type string.
- Attribute keys can be any object. XML does not let an attribute key
start with a digit or contain angle-brackets.
- Tagname of an element can be any expression.
- Attribute keys are optional. In the popular CSV syntax and in all
major programming languages, field or argument values are given by
position, not by keyword.
- Attributes for attributes for attributes for... This lets us to
denote attributes types for instance.
"Namespaces"]
Another of highly critiqued points of XML world is namespaces. CanonML
avoid usage of namespaces whereas solving the naming conflict for tags.
In this way one could download mathematical formulas encoded in CanonML
from a hypothetical international database and introducing them into a
personal datument without worry about naming conflicts. The MathML URI
is not needed.
A new Generic Markup Language]
The possibilities for CanonML are immense. This new sophisticated meta
language can be used as hosting language for many different markup
approaches including Schema, RELAX NG the useful CSS, etc.
This approach has been listed in the alternatives to XML directory
[http://www.pault.com/xmlalternatives.html]
and presented at the terseXML group as
<blockquote>
Strong (the only?) attempt on "one markup for several xml processing
specs" design
</blockquote>
[http://groups.yahoo.com/group/tersexml/message/103]
Source:
http://canonicalscience.blogspot.com/2006/04/canonml-markup-language-beyond-tex-xml.html
[http://canonicalscience.blogspot.com/2006/04/scientific-language-canonml-is.html]
I presented some generic requirements for a markup language for science
and mathematics. Basic features of CanonML and ampliations and
improvements over TeX, SGML, XML or Scheme based encodings are listed
below. However, let me an incise first. Rememeber how we also saw that
the mathematics in Distler's blog Musings were being incorrectly
encoded with simulation of tensors, incorrect structural markup,
incorrect numerics, etc.
However one of most fascinating samples is Distler's posting
"Designing the 5th Dimension"
[http://golem.ph.utexas.edu/~distler/blog/archives/000635.html]
There Distler is serving the 5D element of line (ds)^2 as 2s ds. But 2s
ds is not equal to (ds)^2!!
Of course this kind of foolish MathML code is not accessible, not
searchable, etc.
Some folks have noted the terrible paradox of being proud enough to
claim that Musings is (in own Distler's words) "the world's most
technologically-advanced weblog" whereas being unable to correctly
encode something so simple as the square of a line element.
Other folks go beyond and carefully note the analogies between Musings
(Distler is a string theorist) and the own string theory. String,
superstring, brane, and M theories are popularized in mass media as the
world's most sophisticated stuff whereas being unable to derive
something so simple as Coulomb force law in despite of numerous efforts
in last 40 years. Or it is still poor; string theory is offering us
clearly wrong output for almost any empirical property of our universe:
string theory deals with exact supersymmetry and perturbative gravitons
on a flat static spacetime, which contradicts lessons learned from
Standard model and General Relativity and, of course, contradicts all
experimental data.
The great failure of string theory is not that the "theory" is in a
permanent anodyne state; the problem -and big one- is that nobody has
been able to do the theory compatible with our current knowledge.
You do not need to construct giant accelerators for verifying some
exotic result such as existence of hidden dimensions or one-dimensional
tiny vibrating objects. That you need is a theory was compatible with
all experimental data now available; once achieved this basic goal of
scientific methodology then you can focus on future experiments and
exotic possibilities...
After several "great folks" became so enthusiastic on theoretical
possibilities for the multiverse opened by the "Landscape", it started
a popular joke between academicians saying that our universe can be
roughly characterized as the one that string-brane-M theory cannot
predict or explain ;-)
I agree on the parallelism between the technological fiasco of
Distler's Musings blog and the scientific fiasco of string and M
theories. As number of lesson arise.
The first lesson is you may verify details of you are doing. You cannot
assume points (as Distler is doing in both topics Itex-MathML and in
string theory) simply waiting that may be true.
Second lesson is that you may provide better alternatives to available
ones. If general relativity works fine explaining-predicting a subset
of phenomena, then the next theory may explain *also* phenomena cannot
be explained via general relativity: e.g. microscopic phenomena, the CC
problem...
String theory cannot compute anomalous perihelion for Mercury, has not
quantized gravity, and has not solved the CC problem (at the best is
discussed using anthropic reasoning). This is parallel with IteX
approach used in Distler's blog. For instance, the correctness and
accessibility of almost all MathML equations are being served on the
web are poor than using an old HTML + GIF + ALT model!
The third lesson to be learned here is that you may be not proud enough
about your own work, at least not before you achieve success.
Of course, something as simple as (ds)^2 can be perfectly encoded in
CanonML. The syntax is similar to TeX one but more powerful and
sophisticated than presentation MathML 2.0 and with more semantic
content than Content MathML 2.0. Semantic content will be useful for
encoding scientific information. For example, the MathML parallel
markup for energy-mass relationship is
<semantics>
<mrow>
<mi>E</mi>
<mo>=</mo>
<mrow>
<mi>m</mi>
<mo>⁢</mo>
<msup>
<mi>c</mi>
<mn>2</mn>
</msup>
</mrow>
</mrow>
<annotation-xml encoding="MathML-Content">
<apply>
<eq/>
<ci>E</ci>
<apply>
<times/>
<ci>m</ci>
<apply>
<power/>
<ci>c</ci>
<cn>2</cn>
</apply>
</apply>
</apply>
</annotation-xml>
</semantics>
Terrible and still can be poor!!!
but using both canonical expressions (modification of SEXPR) and the
infix formal operator model, we can encode more information with
easiness of TeX or ASCIIMath syntax. Above ultra-verbose formula is
encoded in CanonFormal as
[E \= [m \* [c \** 2] ] ]
I may add now that I presented some mathematic-formal properties of
Canonical Meta Language (CanonML) in
[http://canonicalscience.blogspot.com/2006/04/canonml-mathematical-formal-language.html]
Now we will see some of the possibilities of CanonML in the area of
markup languages and why CanonML is better. The comparison with XML is
general; this implies that any mathematical scientific language based
in XML -such as CML, CellML, physicsML, MathML, et cetera- is already
being introduced in the comparison.
In future postings I will review a journal of mathematics and other of
physics are using MathML in their articles. We will see that kind of
incorrect code is being served to the Internet below a hype of cool...
Tagging]
CanonML present us several advantages:
i)
Syntax is less verbose than XML.
ii)
The datument becomes a data structure composed of canonical expressions
(modification of SEXPR) and can be manipulated that way.
iii)
All technology is unified. For instance, due to limitations of XML
design, w3c folks were obligated to provide non-XML alternatives in
many recent improvements. XPath is not an XML language; the original
RDF was so complex that needs of alternative syntaxes, and the same
about RELAX NG and all that stuff. SVG also introduces a non-XML
language, etc.
However, the CanonML version of XPath is based in CanonML itself. You
can manipulate it as you can manipulate any other canonical expression
structure.
iv)
The metadata model is also better than in XML. XML has been rudely
critiqued for their inefficient and limited attribute model.
One of the improvements of CanonML over SXML and other SEXPR inspired
markup languages is that the tag is marked instead of the text:
[:ara The CanonML language is [::em more] readable than XML]
Another of improvements over SXML is that CanonML includes special
syntax for empty tags
[::group [::tag1] [::tag2]]
is equivalent to
[::group \tag1 \tag2]
I choose the \ notation for empty tags because readability issues and
because with this notation mathematical formulae look close to TeX, and
this may simplify the adaptation of TeX users to the new syntax.
Compare next CanonML fragment
[a \over 2]
with (TeX)
{a \over 2}
In a future posting, I will focus in mathematics and science and will
provide detailed comparison with TeX, MathML, XML-MAIDEN, ASCIIMath,
and others.
Why not closing tag?]
- Because redundancy and verbosity of XML. Do you know any
mathematician or scientist writing (2 + 2 + 2 - 4 + 6/6 - 1) when she
or he means just the number 2?
I just write x^2 + bx + c = 0, for instance because mathematics is a
concise formal system.
- Because parsering is easier.
- Because simplify human authoring of datuments.
- Because cost per MB and server bandwidth requirements.
- Because in dynamical algorithms the tagname may not be known until
runtime and, therefore, the closing tagnames may be not forced.
However, the structural purity of CanonML datuments is better than TeX
documents and even better than next XHTML 2.0 (which improves structure
but at cost of backward incompatibility with HTML and XHTML 1.0!!).
In CanonML there is not specific open markup as LaTeX
\part, \subsection, \paragraph, \chapter, \subsubsection,
\subparagraph, and \section. In fact, commands as \subsection or
\subsubsection are clearly redundant. And what if I need structural
subsections at fifth level? Is there a \ subsubsubsubsection in LaTeX?
CanonML offers a virtually infinite number of structural levels for
your datuments with a single optimised command.
However, the structure of datument is better still because as was
explained in a previous posting (see above) CanonML is also a
formal-mathematical language arising from a "unification" of
S-expressions, bracketed Dyck language, and Keizer canonical vectors.
Multi-markup: Non-hierarchical structures]
CanonML lets us encode overlapping structures.
Non-hierarchical structures are of great interest in science. For
instance, the Lewis structure for HF is
<H/<F/ e e /H> e e e e e e /F>
and
[H}[F} e e {H] e e e e e e {F]
in GODDAG and liminal respectively. However, both approaches present
well-know difficulties (e.g. disambiguating keys in liminal).
In CanonML the Lewis structure for HF can be encoded as
[::H::F e e] [::F e e e e e e]
where ::H::F is an example of the novel multi-markup concept introduced
by this language.
Metadata and attributes]
CanonML introduces a novel metadata model. Advantages over XML:
- A bit less verbose.
- Attribute values can be any expression not just of type string.
- Attribute keys can be any object. XML does not let an attribute key
start with a digit or contain angle-brackets.
- Tagname of an element can be any expression.
- Attribute keys are optional. In the popular CSV syntax and in all
major programming languages, field or argument values are given by
position, not by keyword.
- Attributes for attributes for attributes for... This lets us to
denote attributes types for instance.
"Namespaces"]
Another of highly critiqued points of XML world is namespaces. CanonML
avoid usage of namespaces whereas solving the naming conflict for tags.
In this way one could download mathematical formulas encoded in CanonML
from a hypothetical international database and introducing them into a
personal datument without worry about naming conflicts. The MathML URI
is not needed.
A new Generic Markup Language]
The possibilities for CanonML are immense. This new sophisticated meta
language can be used as hosting language for many different markup
approaches including Schema, RELAX NG the useful CSS, etc.
This approach has been listed in the alternatives to XML directory
[http://www.pault.com/xmlalternatives.html]
and presented at the terseXML group as
<blockquote>
Strong (the only?) attempt on "one markup for several xml processing
specs" design
</blockquote>
[http://groups.yahoo.com/group/tersexml/message/103]
Source:
http://canonicalscience.blogspot.com/2006/04/canonml-markup-language-beyond-tex-xml.html