MathML Logo


#Call for Papers

#General Information

#Registration

#Accommodations

#Travel

#Tutorials

#Presentations

#Schedule

    

Implementing MathML in Mathematica

Jason Harris
Wolfram Research


Abstract

This paper outlines the state of MathML[1] in Mathematica[5]. It details the changes that have occurred in Mathematica's handling of MathML as compared to previous versions[2]. These include changes to incorporate content and, as near as possible, lossless translations. It also details several novel aspects of the MathML and XML handling which are now possible in Mathematica. These involve the ability to perform integrated computations with MathML and to add specific conversions between Mathematica and MathML. This paper further goes on to detail some of the aspects of MathML which were problematic to implement. And finally, we conclude with our plans for the future and some of the more speculative things we are working on.


Mathematica's XML functionality

The latest version of Mathematica has implemented the ability to flexibly handle XML documents at a fundamental level. There are extremely close parallels between the way Mathematica and XML represent data structures. XML documents are basically tree structures, and Mathematica is, among other things, an extremely efficient processing language for tree structures. Thus, Mathematica allows us to process XML structures with ease, since we can utilize the full power of Mathematica's pattern matching language working on symbolic XML structures. For comparison's sake, the XSLT and the XPath functionality can be replicated in Mathematica with ease. Mathematica's processing capabilities for MathML are built on top of its XML handling.


Symbolic XML

XML structures are represented in Mathematica in terms of SymbolicXML. SymbolicXML structures are Mathematica expressions which isomorphically represent XML structures. Thus, just as MathML is a "flavour" of XML, SymbolicMathML is a flavour of SymbolicXML. Here is an example which generates a SymbolicMathML expression:

In[1]:=

XML`MathML`ExpressionToSymbolicMathML[x^2 + 2, 
"OutputForms" -> "Presentation"]

Out[1]=

XMLElement[math, {xmlns -> http://www.w3.org/1998/Math/MathML},  
{XMLElement[mrow, {},  {XMLElement[msup, {}, {XMLElement[mi, {}, {x}], 
XMLElement[mn, {}, {2}]}],  XMLElement[mo, {}, {+}],  XMLElement[mn, {}, 
{2}]}]}]

This SymbolicXML structure is isomorphic to the following XML text.

In[2]:=

ExportString[%, "XML"]

Out[2]=

<math xmlns=\"http://www.w3.org/1998/Math/MathML\">\n 
<mrow>\n  <msup>\n   <mi>x</mi>\n   <mn>2</mn>\n  </msup>\n  <mo>+</mo>\n  <mn>2</mn>\n 
</mrow>\n</math>


The conversion process

Here is a diagram representing the process whereby expressions and box structures are converted from and transformed to MathML.

[Graphics:HTMLFiles/MathML2002_Harris_5.gif]

Here is an example where MathML is parsed into an expression.

In[3]:=

XML`MathML`MathMLToExpression[ "<math> 
<mrow> <mi>x</mi> <mo>+</mo> 
<mi>y</mi> </mrow> </math>"]

Out[3]=

x + y


Complete XML documents in Mathematica

Since version 3.0, Mathematica has incorporated a highly advanced front-end which has a notebook interface. This notebook interface allows true editing of WYSIWYG two dimensional typeset structures for both input and output. These mathematical notations are fully customizable. Style sheets are also an integral part of the notebook interface, and their use allows for the production of uniform and consistent formatting styles throughout the notebook document. Graphics can be included directly into the notebooks. In short, Mathematica can act as a complete mathematical document preparation system.

Since XML is accommodated at a fundamental level, we are able to provide other XML forms besides just MathML. One such form is NotebookML which is an isomorphic representation of a notebook expression. Here is an example of a call which would generate a NotebookML file representing the notebook nb.

In[4]:=

Export["example.xml", nb, BoxFormats -> 
{"MathML"}, GraphicsFormats -> {"SVG"}]

In the creation of the NotebookML file, there are a number of options which control the form of the output. In this example we have specified that the cells containing typeset mathematics should be recorded as MathML markup. Similarly, any graphics should be stored in SVG[3], the emerging standard for vector graphics in XML. Since NotebookML is a flavour of XML, we would obtain a document which is completely XML based. This allows the entire document to be processed with XML tools, e.g. CSS style sheets or XSLT. One consequence of this is that notebooks can now be archived in a purely XML format, which is important for some companies with large numbers of documents that they wish to handle in a uniform way.

There is one other noteworthy XML that should be mentioned, ExpressionML. This allows an arbitrary Mathematica expression to be isomorphically represented in XML. Sometimes, when necessary, ExpressionML fragments are included in NotebookML documents.


Faithful conversions

One of the important features we have striven for in our revamped implementation of MathML is to be as “lossless" as possible. One important feature of our losslessness is that we preserve the underlying MathML, including its mixed content and semantics. That is, the processing retains the same form of annotations which were present in the original document. The MathML specification calls this being content-faithful. For illustration, below is an example in the MathML2 specification which has some presentation along with some content semantics. The MathML that comprises this example has several subparts which are constructed from a <semantics> tag surrounding a presentation subpart and a content subpart. When this is imported into Mathematica, it appears as prescribed by the presentation and it is interpreted as prescribed by the content.

In[5]:=

n ≡ 1^Overscript[n, -]/1^Overscript[n - 1, -]

Out[5]=

n ≡ n !/(n - 1) !

Moreover, if one were to import the MathML and then immediately export the resulting expression as MathML, one would obtain the original MathML. In this way we have not decided that content or presentation takes precedence, but have losslessly maintained the same structure.

Unfortunately, there are several things which stop complete losslessness in practice. In a distilled generalization, these occur since there are multiple interpretations or ways to convert something and we must pick one such way.  For instance, if we are exporting an integer as content, do we or do we not include the attribute type="integer"? It is legal in either case. If we choose to include this attribute and import something that does not have this attribute and then immediately export it, we will have added the attribute. (Or if we choose not to include this attribute, then we would unintentionally strip the attribute if it was originally present.)

It should be pointed out that we could preserve everything if we translated MathML to non-standard purpose built Mathematica structures, but of course this would defeat the purpose of having a conversion back and forth from standard Mathematica to standard MathML.

Among problems of this kind are the preservation of named character entities, since in our processing these entities will be rewritten to their Unicode numeric code points or to their named entities; however, we do not preserve which form the original entity was given in. (This differentiation is sometimes done by XML editor applications, but typically most XML processing tools will also make this same standardization step.) Another facet of our conversion process which follows the same general theme occurs with invisible times and invisible application, which are required in MathML but not in Mathematica. Thus we must insert these operators on exporting Mathematica to MathML and strip them on importing MathML into Mathematica. Regrettably if these operators occur in the Mathematica before it is exported they will get stripped upon roundtripping. This consequently influences the typesetting rules, which are necessarily designed to be fairly stringent since in practice all sorts of unexpected things can arise if they are not stringent.

Another factor which stops complete losslessness is that there are some features that Mathematica has, for which there is no real corresponding analogue in MathML, for example, CounterValueBox or FormBox.  Conversely, there are certain features that MathML has which Mathematica does not yet have, for example, spanning rows and columns, and units in attributes. Finally, there are some differences at the structural level between Mathematica and MathML.


Peculiarities and typesetting

We have defined typesetting for several "objects" which do not normally occur in Mathematica. One such object is the piecewise function. In Mathematica this is commonly represented with Bool expressions.  Any concepts which do not occur normally in Mathematica are included in the XML`MathML`Symbols` context. Here is an example of the Mathematica expression corresponding to a MathML piecewise function together with its typesetting.

In[6]:=

XML`MathML`Symbols`Piecewise[ XML`MathML`Symbols`Piece[r, 
0 < y < 1],  XML`MathML`Symbols`Piece[t, 1 < y < 2],  
XML`MathML`Symbols`Otherwise[2]] // TraditionalForm

Out[6]//TraditionalForm=

{ r                 0 < y < 1    t                 
1 < y < 2    2                 otherwise

The typeset form is fully editable and interpretable. For example, if we change the y to a z and evaluate the resulting typeset expression and display the result in standard form then we obtain the underlying internal MathML form.

In[11]:=

{ r                 0 < z < 1 // StandardForm    t                 
1 < z < 2    2                 otherwise

Out[11]//StandardForm=

XML`MathML`Symbols`Piecewise[XML`MathML`Symbols`Piece[r, 
0 < z < 1], XML`MathML`Symbols`Piece[t, 1 < z < 2], 
XML`MathML`Symbols`Otherwise[2]]

Another such MathML "object" which does not yet have a standard representation in Mathematica is the multiscripts tag. However, we synthesize this functionality in a way similar to the above; and indeed, a native Multiscripts box object is being developed for a future version of Mathematica. Finally, it should be pointed out that according to the MathML specification, there are some rather bizarre objects which we need to handle. One such object is a conditioned limit. The mathematical semantics for such an object are not clear, but nevertheless we have endeavoured to provide a typeset form for such an object.


Interaction of Mathematica and external programs

It should come as no surprise that we can use Mathematica to actually perform computations on MathML, even presentation MathML. Here is an example of the presentation form of an integral. We can import this as follows:

In[8]:=

XML`MathML`MathMLToBoxes @  "<math> 
<mrow> <mo>&int;</mo> <mrow> <msup> 
<mi>&ee;</mi><br /><mrow> <mo>-</mo> 
<msup> <mi>x</mi> <mn>2</mn> </msup> 
</mrow><br /></msup> <mrow> 
<mo>&DifferentialD;</mo> <mi>x</mi> 
</mrow></mrow><br /></mrow> </math>"

Out[8]=

FormBox[TagBox[RowBox[{∫, RowBox[{SuperscriptBox[e, 
RowBox[{-, SuperscriptBox[x, 2]}]], RowBox[{d, x}]}]}], 
MathMLPresentationTag, AutoDelete -> True], TraditionalForm]

These boxes in Mathematica are structurally and conceptually similar to the presentation boxes of MathML. They display as follows:

In[9]:=

DisplayForm @ %

Out[9]//DisplayForm=

∫ e^(-x^2) d x

We can then perform the computation by turning the boxes into an expression.

In[10]:=

ToExpression @ %

Out[10]=

1/2 π^(1/2) erf(x)

We could easily transform the resulting expression back into MathML by the following statement. (We omit the output due to space constraints.)

In[11]:=

XML`MathML`ExpressionToMathML[%, "OutputForms" 
-> "Content"]

In the context of interaction with programs external to Mathematica, it should be mentioned that the whole interfacing process of handling web requests and serving the resulting dynamic web pages can be highly automated with WebMathematica in conjunction with our MathML functionality. However, for casual use, it can sometimes suffice just to select a section of typeset mathematics and copy it as MathML. Similarly, a user can paste MathML into a notebook and it will be automatically transformed into typeset mathematics.

It should be mentioned that it is even possible to perform computations with some MathML presentation that does not have the correct underlying structure as dictated by the MathML 2 specification. Such MathML presentation markup could come from automatic conversions of legacy LaTeX documents, for instance. Our conversion process will regroup and reformat presentation to a degree, and interpret it based on a set of common notations used in textbooks.  If the topic area and notations used by the legacy data are known and are not part of the common notations that are understood by Mathematica, then new rules can be easily written to interpret the notation.  Combined with exportation to MathML, this process could be used to "clean up" legacy MathML.


Summary and future plans

Our MathML functionality is built on top of a common base for handling XML documents in general. SymbolicXML expressions isomorphically represent textual XML fragments, but the SymbolicXML expressions are comprised of standard Mathematica expressions, hence they have the full structure of expressions and can therefore be transformed and pattern matched against in the standard ways of Mathematica.

We have given several simple examples using the above mentioned functionality. Unfortunately, since MathML and XML in general are comparatively verbose formats, length restrictions have precluded us from presenting any real-world examples. However, the foregoing should be indicative of the capabilities of the package.

It should be stated that it is extremely easy to define one's own rules for handling specific fragments of MathML for which there are no predefined interpretations. However, for space reasons we have not given such an example.

At present we have internal experimental tools for conversions between CSS style sheets and Mathematica style sheets, yet the style sheet conversion mechanism is still not robust enough to release. Regrettably, there are still several issues of compatibility which arise from browser capabilities and compliance.

There are also many issues to further explore in regards to the limitations of DTDs since we now have mixtures of NotebookML, ExpressionML, MathML, and SVG. Schemas may alleviate these problems to some degree, but they currently introduce new problems with entity handling.

In conclusion, our MathML capabilities are now in compliance with the MathML 2 specification, and by our accounting we pass all of the tests in the test suite (barring the aforementioned cross compatibility issues). We look forward to the adoption of MathML as a common standard for the interchange of mathematics on the web.


References

[1] •       MathML Working Group (2001) Mathematical Markup Language (MathML) 2.0 Specification. W3C Recommendation. http://www.w3.org/TR/MathML2

[2] •       Soiffer N. (2000) Computing with Both Content and Presentation MathML in Mathematica. In the Proceedings of MathML 2000, Champaign-Urbana, USA, 19-23 October. http://www.mathmlconference.org/2000/presentations.html

[3] •       SVG Working Group (2001) Scalable Vector Graphics (SVG) 1.0 Specification. W3C Recommendation. http://www.w3.org/TR/SVG/

[4] •       Waterloo Maple (1997) Maple, version V release 4, a computer program. Maplesoft, Waterloo, Ontario. http://www.maplesoft.com

[5] •       Wolfram S. (1999) The Mathematica Book 4.0, 4th edition. Wolfram Media/Cambridge University Press. http://www.wolfram.com