Implementing MathML in Mathematica
Jason Harris
Wolfram Research
Abstract
This paper outlines the state of MathML[1] in Mathematica[5]. It details the changes that have occurred
in Mathematica's handling of MathML as compared to previous
versions[2]. These include changes to
incorporate content and, as near as possible, lossless translations. It
also details several novel aspects of the MathML and XML handling which
are now possible in Mathematica. These involve the ability to
perform integrated computations with MathML and to add specific
conversions between Mathematica and MathML. This paper further goes
on to detail some of the aspects of MathML which were problematic to
implement. And finally, we conclude with our plans for the future and some
of the more speculative things we are working on.
Mathematica's XML functionality
The latest version of Mathematica has implemented
the ability to flexibly handle XML documents at a fundamental level. There
are extremely close parallels between the way Mathematica and XML
represent data structures. XML documents are basically tree structures,
and Mathematica is, among other things, an extremely efficient
processing language for tree structures. Thus, Mathematica allows
us to process XML structures with ease, since we can utilize the full
power of Mathematica's pattern matching language working on
symbolic XML structures. For comparison's sake, the XSLT and the XPath
functionality can be replicated in Mathematica with ease.
Mathematica's processing capabilities for MathML are built on top
of its XML handling.
Symbolic XML
XML structures are represented in Mathematica in
terms of SymbolicXML. SymbolicXML structures are Mathematica
expressions which isomorphically represent XML structures. Thus, just as
MathML is a "flavour" of XML, SymbolicMathML is a flavour of
SymbolicXML. Here is an example which generates a SymbolicMathML
expression:
In[1]:=
![XML`MathML`ExpressionToSymbolicMathML[x^2 + 2,
"OutputForms" -> "Presentation"]](HTMLFiles/MathML2002_Harris_1.gif)
Out[1]=
![XMLElement[math, {xmlns -> http://www.w3.org/1998/Math/MathML},
{XMLElement[mrow, {}, {XMLElement[msup, {}, {XMLElement[mi, {}, {x}],
XMLElement[mn, {}, {2}]}], XMLElement[mo, {}, {+}], XMLElement[mn, {},
{2}]}]}]](HTMLFiles/MathML2002_Harris_2.gif)
This SymbolicXML structure is isomorphic to the following
XML text.
In[2]:=
![ExportString[%, "XML"]](HTMLFiles/MathML2002_Harris_3.gif)
Out[2]=

The conversion process
Here is a diagram representing the process whereby
expressions and box structures are converted from and transformed to
MathML.
![[Graphics:HTMLFiles/MathML2002_Harris_5.gif]](HTMLFiles/MathML2002_Harris_5.gif)
Here is an example where MathML is parsed into an
expression.
In[3]:=
![XML`MathML`MathMLToExpression[ "<math>
<mrow> <mi>x</mi> <mo>+</mo>
<mi>y</mi> </mrow> </math>"]](HTMLFiles/MathML2002_Harris_6.gif)
Out[3]=

Complete XML documents in Mathematica
Since version 3.0, Mathematica has incorporated a
highly advanced front-end which has a notebook interface. This notebook
interface allows true editing of WYSIWYG two dimensional typeset
structures for both input and output. These mathematical notations
are fully customizable. Style sheets are also an integral part of the
notebook interface, and their use allows for the production of uniform and
consistent formatting styles throughout the notebook document. Graphics
can be included directly into the notebooks. In short, Mathematica
can act as a complete mathematical document preparation system.
Since XML is accommodated at a fundamental level, we are
able to provide other XML forms besides just MathML. One such form is
NotebookML which is an isomorphic representation of a notebook expression.
Here is an example of a call which would generate a NotebookML file
representing the notebook nb.
In[4]:=
![Export["example.xml", nb, BoxFormats ->
{"MathML"}, GraphicsFormats -> {"SVG"}]](HTMLFiles/MathML2002_Harris_8.gif)
In the creation of the NotebookML file, there are a number
of options which control the form of the output. In this example we have
specified that the cells containing typeset mathematics should be recorded
as MathML markup. Similarly, any graphics should be stored in SVG[3], the emerging standard for vector graphics
in XML. Since NotebookML is a flavour of XML, we would obtain a document
which is completely XML based. This allows the entire document to be
processed with XML tools, e.g. CSS style sheets or XSLT. One consequence
of this is that notebooks can now be archived in a purely XML format,
which is important for some companies with large numbers of documents that
they wish to handle in a uniform way.
There is one other noteworthy XML that should be
mentioned, ExpressionML. This allows an arbitrary Mathematica
expression to be isomorphically represented in XML. Sometimes, when
necessary, ExpressionML fragments are included in NotebookML documents.
Faithful conversions
One of the important features we have striven for in our
revamped implementation of MathML is to be as “lossless" as
possible. One important feature of our losslessness is that we preserve
the underlying MathML, including its mixed content and semantics. That is,
the processing retains the same form of annotations which were present in
the original document. The MathML specification calls this being
content-faithful. For illustration, below is an example in the
MathML2 specification which has some presentation along with some content
semantics. The MathML that comprises this example has several subparts
which are constructed from a <semantics> tag surrounding a
presentation subpart and a content subpart. When this is imported into
Mathematica, it appears as prescribed by the presentation and it is
interpreted as prescribed by the content.
In[5]:=
![n ≡ 1^Overscript[n, -]/1^Overscript[n - 1, -]](HTMLFiles/MathML2002_Harris_9.gif)
Out[5]=

Moreover, if one were to import the MathML and then
immediately export the resulting expression as MathML, one would obtain
the original MathML. In this way we have not decided that content or
presentation takes precedence, but have losslessly maintained the same
structure.
Unfortunately, there are several things which stop
complete losslessness in practice. In a distilled generalization, these
occur since there are multiple interpretations or ways to convert
something and we must pick one such way. For instance, if we
are exporting an integer as content, do we or do we not include the
attribute type="integer"? It is legal in either case. If we
choose to include this attribute and import something that does not have
this attribute and then immediately export it, we will have added the
attribute. (Or if we choose not to include this attribute, then we would
unintentionally strip the attribute if it was originally present.)
It should be pointed out that we could preserve
everything if we translated MathML to non-standard purpose built
Mathematica structures, but of course this would defeat the purpose
of having a conversion back and forth from standard Mathematica to
standard MathML.
Among problems of this kind are the preservation of named
character entities, since in our processing these entities will be
rewritten to their Unicode numeric code points or to their named entities;
however, we do not preserve which form the original entity was given in.
(This differentiation is sometimes done by XML editor applications, but
typically most XML processing tools will also make this same
standardization step.) Another facet of our conversion process which
follows the same general theme occurs with invisible times and invisible
application, which are required in MathML but not in Mathematica.
Thus we must insert these operators on exporting Mathematica to
MathML and strip them on importing MathML into Mathematica.
Regrettably if these operators occur in the Mathematica before it
is exported they will get stripped upon roundtripping. This consequently
influences the typesetting rules, which are necessarily designed to be
fairly stringent since in practice all sorts of unexpected things can
arise if they are not stringent.
Another factor which stops complete losslessness is that
there are some features that Mathematica has, for which there is no
real corresponding analogue in MathML, for example, CounterValueBox or
FormBox. Conversely, there are certain features that MathML has
which Mathematica does not yet have, for example, spanning rows and
columns, and units in attributes. Finally, there are some differences at
the structural level between Mathematica and MathML.
Peculiarities and typesetting
We have defined typesetting for several
"objects" which do not normally occur in Mathematica. One
such object is the piecewise function. In Mathematica this is
commonly represented with Bool expressions. Any concepts
which do not occur normally in Mathematica are included in the
XML`MathML`Symbols` context. Here is an example of the Mathematica
expression corresponding to a MathML piecewise function together with its
typesetting.
In[6]:=
![XML`MathML`Symbols`Piecewise[ XML`MathML`Symbols`Piece[r,
0 < y < 1], XML`MathML`Symbols`Piece[t, 1 < y < 2],
XML`MathML`Symbols`Otherwise[2]] // TraditionalForm](HTMLFiles/MathML2002_Harris_11.gif)
Out[6]//TraditionalForm=

The typeset form is fully editable and interpretable. For
example, if we change the to a
and evaluate the resulting
typeset expression and display the result in standard form then we obtain
the underlying internal MathML form.
In[11]:=

Out[11]//StandardForm=
![XML`MathML`Symbols`Piecewise[XML`MathML`Symbols`Piece[r,
0 < z < 1], XML`MathML`Symbols`Piece[t, 1 < z < 2],
XML`MathML`Symbols`Otherwise[2]]](HTMLFiles/MathML2002_Harris_16.gif)
Another such MathML "object" which does not yet
have a standard representation in Mathematica is the multiscripts
tag. However, we synthesize this functionality in a way similar to the
above; and indeed, a native Multiscripts box object is being developed for
a future version of Mathematica. Finally, it should be pointed out
that according to the MathML specification, there are some rather bizarre
objects which we need to handle. One such object is a conditioned limit.
The mathematical semantics for such an object are not clear, but
nevertheless we have endeavoured to provide a typeset form for such an
object.
Interaction of Mathematica and external
programs
It should come as no surprise that we can use
Mathematica to actually perform computations on MathML, even
presentation MathML. Here is an example of the presentation form of an
integral. We can import this as follows:
In[8]:=

Out[8]=
![FormBox[TagBox[RowBox[{∫, RowBox[{SuperscriptBox[e,
RowBox[{-, SuperscriptBox[x, 2]}]], RowBox[{d, x}]}]}],
MathMLPresentationTag, AutoDelete -> True], TraditionalForm]](HTMLFiles/18.gif)
These boxes in Mathematica are structurally and
conceptually similar to the presentation boxes of MathML. They display as
follows:
In[9]:=

Out[9]//DisplayForm=

We can then perform the computation by turning the boxes
into an expression.
In[10]:=

Out[10]=

We could easily transform the resulting expression back
into MathML by the following statement. (We omit the output due to space
constraints.)
In[11]:=
![XML`MathML`ExpressionToMathML[%, "OutputForms"
-> "Content"]](HTMLFiles/MathML2002_Harris_23.gif)
In the context of interaction with programs external to
Mathematica, it should be mentioned that the whole interfacing
process of handling web requests and serving the resulting dynamic web
pages can be highly automated with WebMathematica in conjunction with our
MathML functionality. However, for casual use, it can sometimes suffice
just to select a section of typeset mathematics and copy it as MathML.
Similarly, a user can paste MathML into a notebook and it will be
automatically transformed into typeset mathematics.
It should be mentioned that it is even possible to perform
computations with some MathML presentation that does not have the correct
underlying structure as dictated by the MathML 2 specification. Such
MathML presentation markup could come from automatic conversions of legacy
LaTeX documents, for instance. Our conversion process will regroup and
reformat presentation to a degree, and interpret it based on a set of
common notations used in textbooks. If the topic area and
notations used by the legacy data are known and are not part of the common
notations that are understood by Mathematica, then new rules can be
easily written to interpret the notation. Combined with
exportation to MathML, this process could be used to "clean up"
legacy MathML.
Summary and future plans
Our MathML functionality is built on top of a common base
for handling XML documents in general. SymbolicXML expressions
isomorphically represent textual XML fragments, but the SymbolicXML
expressions are comprised of standard Mathematica expressions,
hence they have the full structure of expressions and can therefore be
transformed and pattern matched against in the standard ways of
Mathematica.
We have given several simple examples using the above
mentioned functionality. Unfortunately, since MathML and XML in general
are comparatively verbose formats, length restrictions have precluded us
from presenting any real-world examples. However, the foregoing should be
indicative of the capabilities of the package.
It should be stated that it is extremely easy to define
one's own rules for handling specific fragments of MathML for which there
are no predefined interpretations. However, for space reasons we have not
given such an example.
At present we have internal experimental tools for
conversions between CSS style sheets and Mathematica style sheets,
yet the style sheet conversion mechanism is still not robust enough to
release. Regrettably, there are still several issues of compatibility
which arise from browser capabilities and compliance.
There are also many issues to further explore in regards
to the limitations of DTDs since we now have mixtures of NotebookML,
ExpressionML, MathML, and SVG. Schemas may alleviate these problems to
some degree, but they currently introduce new problems with entity
handling.
In conclusion, our MathML capabilities are now in
compliance with the MathML 2 specification, and by our accounting we pass
all of the tests in the test suite (barring the aforementioned cross
compatibility issues). We look forward to the adoption of MathML as a
common standard for the interchange of mathematics on the web.
References
[1] MathML Working Group
(2001) Mathematical Markup Language (MathML) 2.0 Specification.
W3C Recommendation. http://www.w3.org/TR/MathML2
[2] Soiffer N. (2000)
Computing with Both Content and Presentation MathML in Mathematica.
In the Proceedings of MathML 2000, Champaign-Urbana, USA, 19-23 October.
http://www.mathmlconference.org/2000/presentations.html
[3] SVG Working Group
(2001) Scalable Vector Graphics (SVG) 1.0 Specification. W3C
Recommendation. http://www.w3.org/TR/SVG/
[4] Waterloo Maple (1997)
Maple, version V release 4, a computer program. Maplesoft,
Waterloo, Ontario. http://www.maplesoft.com
[5] Wolfram S. (1999)
The Mathematica Book 4.0, 4th edition. Wolfram Media/Cambridge
University Press. http://www.wolfram.com
|