MathML Logo


#Call for Papers

#General Information

#Registration

#Accommodations

#Travel

#Tutorials

#Presentations

#Schedule

    

A Lisp Subset Based on MathML

Yuzhen Xie, Stephen Watt, and Luca Padovani
The University of Western Ontario


Abstract

We are interested in an XML representation for programming languages. Could content markup of MathML be used for this purpose, especially in the setting of a functional programming language? In order to address this question we have developed an interpreter for a Scheme-like language encoded in XML. We demonstrate that, with some degree of extension to its core elements, syntax and default semantics of content MathML are quite sufficient to represent Scheme expressions. We also describe the basic architecture of the interpreter, which has been written in Java. Our work suggests that the potential of a web scripting language encoded in XML (possibly based on MathML) is very high. In particular, it offers very natural mechanisms to construct XML expressions dynamically incorporating desired XML subexpressions.


1. Content MathML as the Nucleus of a Functional Programming Language

XML (eXtensible Markup Language) [1] is used to describe structured documents by user-defined markup tags. Due to its focus on structure, XML can be easily used to create a programming language framework where all the programming constructs like control and data flow, functions, format of message passing, are encoded in XML. Programs encoded in this form could immediately have the advantages of XML, in particular standardization of the encoding format, independence of the delivery medium, easiness of parsing by standard software tools.

A natural place to start an investigation into these ideas is with the Lisp family of languages, as Lisp S-Expressions and XML are quite similar. In fact, it has been previously noted that XML documents can be represented as S-Expressions [2]. We address the question of the opposite embedding. To this aim we consider the S-Expressions of the Scheme [3] dialect of Lisp. In another paper [4], Boley has considered content markup in MathML as a sub-language for Lisp-like notation, with a view to develop a markup language for functional relations.

To start with, we investigate the content markup of MathML [5] as an expression language. For example, the Scheme expression (lambda (x) (+ x x)) can be represented in MathML content markup as:

<lambda>
 <bvar><ci>x/<ci>/<bvar>
 <apply>
  <plus/>
  <ci>x</ci>
  <ci>x</ci>
 </apply>
</lambda>

Since the intent of content markup in MathML is to provide an explicit encoding of the structure underlying a mathematical expression, it is quite rich in describing lambda constructs and definitions. For instance, the apply element corresponds to a complete mathematical expression where an operator (or a function) is applied to its arguments. The lambda element is used to construct a function given a list of variables to bind (the functions' arguments) and an expression (the function's body). A declare construct associates specific properties or meanings to a mathematical object. Identifiers, numbers and symbols are encoded by ci, cn, and csymbol respectively. csymbol, in particular, can be used to extend content MathML when none of the default elements is suitable to represent a specific operator or a specific function.

The characteristics of the container elements in MathML content markup make them adequate to represent expressions, functions, and definitions in a programming language. Container elements are used in a simple, recursive way and constructs can be nested at arbitrary depth. As a more complete example, the Scheme expression (define reverse-subtract (lambda (x y) (- y x))) can be encoded as the following MathML content fragment:

<declare>
 <ci>reverse-subtract</ci>
 <lambda>
  <bvar><ci>x</ci></bvar>
  <bvar><ci>y</ci></bvar>
  <apply>
   <minus/>
   <ci>y</ci>
   <ci>x</ci>
  </apply>
 </lambda>
</declare> 

After having established such encoding, the next natural step is the definition of a notion of expression evaluation in MathML. MathML does not encompass any notion of parameter passing. Furthermore, some operators and symbols that are specific to Scheme, such as cons and '(), are not part of the core content elements in MathML. But, as we noticed before, content MathML provides a csymbol element whose semantics can be defined externally by the user. Such element gives content MathML enough flexibility to be used as a programming language.

There is also an issue about types: MathML types do not correspond directly to the types of Scheme. Although the cn element has an attribute type that can be used to specify the type of numbers, the data types available are far too simple and not extendible. In particular, there is no way to declare complex data types for pairs or lists in Scheme. Nevertheless, since Scheme is loosely and dynamically typed, data typing can be handled in the XML-Scheme interpreter.


2. XML-Scheme Syntax and Grammar

The XML-Scheme language is described by the content MathML elements with some degree of extension to its elements. The core content elements of MathML are used to represent most of the first class operators, constants and symbols, data flow and control, expressions, and function definitions. As already anticipated, the csymbol element is used to create symbols having a particular meaning in Scheme but that are not part of the core content elements in MathML. Even though it would be easier to define tags and semantics for symbols such as <cons>, this approach is not extendible, and treats some functions, like cons, differently from other functions like reverse-subtract. The core content MathML together with the elements created using csymbol form XML-Scheme, which is an XML representation for a Lisp-like subset of functional programming language.

In this study, a few new elements were invented to meet the particular requirements in Scheme. These were represented as a set of csymbols. In practice, these could equally well be elements in a separate namespace. The <xml-scheme> element is used as the root element in a document representing a program written in XML-Scheme. The Scheme constant '() is denoted by the symbol null. The Scheme constructor and destructors for pairs are denoted by the symbols cons, car, and cdr. The Scheme null? operator is denoted by a null? symbol in XML-Scheme.

We define a number of syntactic rules that define expression reduction and parameter passing. An apply construct whose first child is one of the first class operators or lambda construct performs operation on the rest of its children which are operands. An apply construct whose first child is ci indicates that its first child is an identifier to a function, and the rest of its children are the argument list. What follows is a complete example of XML-Scheme program that uses +, -, *, cons, if, lambda, and define. The corresponding Scheme expressions are illustrated in the XML comments.

<?xml version="1.0" encoding="UTF-8"?>

<xml-scheme>

 <!-- (define my_pair (cons 1 -1)) -->
 <!-- (define my_abs (lambda (x) 
       (if (< x 0) (- x) (+ x)))) -->
 <!-- (define my_double (lambda (x) 
       (* 2 x))) -->
 <!-- (define compose-fn (lambda (f1 f2) 
       (lambda (n) (f1 (f2 n))))) -->
 <!-- (define my_fun 
       (compose-fn my_double my_abs)) -->
 <!-- (my_fun (car my_pair)) -->

 <declare>
  <ci>my_pair</ci>
  <apply>
   <csymbol>cons</csymbol>
   <cn>1</cn>
   <cn>-1</cn>
  </apply>
 </declare>

 <declare>
  <ci>my_abs</ci>
  <lambda>
   <bvar><ci>x</ci></bvar>
   <piecewise>
    <piece>
     <apply>
      <minus/>
      <ci>x</ci>
     </apply>
     <apply>
      <lt/>
      <ci>x</ci>
      <cn>0</cn>
     </apply>
    </piece>
    <otherwise>
     <apply>
      <plus/>
      <ci>x</ci>
     </apply> 
    </otherwise>
   </piecewise>
  </lambda>
 </declare>

 <declare>
  <ci>my_double</ci>
  <lambda>
   <bvar><ci>x</ci></bvar>
   <apply>
    <times/>
    <cn>2</cn>
    <ci>x</ci>
   </apply>
  </lambda>
 </declare>

 <declare>
  <ci>compose-fn</ci>
  <lambda>
   <bvar><ci>f1</ci></bvar>
   <bvar><ci>f2</ci></bvar>
   <lambda>
    <bvar><ci>n</ci></bvar>
    <apply>
     <ci>f1</ci>
     <apply>
      <ci>f2</ci>
      <ci>n</ci>
     </apply>
    </apply>
   </lambda>
  </lambda>
 </declare>

 <declare>
  <ci>my_fun</ci>
  <apply>
   <ci>compose-fn</ci>
   <ci>my_double</ci>
   <ci>my_abs</ci>
  </apply>
 </declare>

 <apply>
  <ci>my_fun</ci>
  <apply>
   <csymbol>car</csymbol>
   <ci>my_pair</ci>
  </apply>
 </apply>

</xml-scheme>

The program starts declaring a pair named my_pair with value (1 . -1) created by the cons operation. Then, a function with the identifier of my_abs is defined to calculate the absolute value of a variable. The second function that is named my_double will double a given value. The function that is called compose_fn is composed of two general functions f1 and f2, and a parameter n. The my_fun function is defined by calling the compose_fn with my_double and my_abs as arguments. Finally, the my_fun function is called by passing the car value of my_pair.

3. XML-Scheme Interpreter

The XML-Scheme interpreter was designed in an Object-Oriented model and implemented in Java [6]. Figure 1 shows a diagram with the most important classes of the XML-Scheme interpreter.

The interpreter parses a program in XML-Scheme and executes it. The Xerces-J DOM parser [7] is used to parse the XML document with embodied Scheme expressions. The result of the parsing step is a DOM Document. The Document is then processed and evaluated in the XMLScheme class. The key operations include value computation, variable binding, and procedure creation and evaluation.

Data types are determined and checked while the evaluation moves on. Typing rules are the same as those in Scheme. Values handled by the interpreter are Java objects implementing the common Value interface. In this program, seventeen Scheme operators were implemented, including +, -, *, /, <, =, define, lambda, if, cons, car, cdr, null?, '(), quote, quasiquote, and unquote. Data types handled by the interpreter include integer, real, Boolean, null and pair.

As procedure are first-class objects in Scheme, there is a class, FunctionValue, for objects representing procedure values. All the MathML operators recognized by the interpreter have a corresponding Java method that implements their semantics. The expression tree associated to define is treated as an identifier corresponding to a FunctionValue. When a procedure is called or an expression is processed, it is recursively evaluated. The Environment class uses a hash table structure to keep track of variable binding by associating each variable to its value. Variables are bound in two circumstances. The first one is in a declaration, which binds a value to a variable. The second one is when a function is applied to one or more arguments. In this case the arguments are evaluated, and their values are bound to the corresponding parameters of the function. It is worth noting that, despite the name "variable", in our interpreter there is no notion of assignment. Variables can be declared multiple times, but the association between a variable and its value is static.


4. Quotation

One of the intriguing possibilities brought out by this work is the potential for using Lisp-style quasi- quoting mechanisms to construct XML documents in a natural, structured manner.

Scripting languages such as PHP [8] provide mechanisms to create document components in an unstructured manner, e.g.

<? 
   print("<ul>");
   for ($ix = 0; $ix < count($stuff); $ix++)
   {
       print("<li> $ix. $stuff($ix) </li>\n");
   }
?>

The above PHP fragment would be included in a web page with some later fragment providing the closing </ul>.

Note that the XHTML of the page would be explicit, and the dynamic content filled in by the print statements in the <? ?> processing directive. On one hand, this provides a familiar model of computation to C programmers. But on the other, it is error-prone and forces programmers and software tools to deal with multiple paradigms.

Another approach would be to adapt the solutions from the "program is data" world view of Lisp. The various dialects of Lisp all have mechanisms for "quoting" data. That is, they provide mechanisms to include data structures in a natural syntax in programs. Most modern Lisps, Scheme included, provide mechanisms for mostly quoted data to include executable code that fills in parts of the structure. This is usually called "quasi quoting." As an example, the following Scheme fragment defines mylist to be a quoted list structure with two sub-expressions (preceded by commas) evaluated and spliced in.

(set! mylist '(1 2 ,(+ 2 1) 4 ,(* n n) 6)) 

We propose that a useful scripting model for XHTML documents would be to consider a document as quasi-quoted data. Then scripting components would be programs also in XML syntax (e.g. XML- Scheme) with evaluation indicated by an unquoting mechanism exactly analogous to the comma in Scheme. An example is as follows. This document fragment would be part of a larger document, which would be implicitly enclosed in <xs:[ ).

<... xmlns:xs="http://www.orcca.on.ca/MathML/XML-Scheme"
        xmlns:m="http://www.w3.org/TR/MathML2">
<ul>
 <li><p>A fragment of quasi-quoted data in XML 
format</p></li>
 <li>
  <xs:unquote>
   <m:apply>
    <m:ci>my_fun</m:ci>
    <m:apply>
     <m:csymbol>car</m:csymbol>
     <m:ci>my_pair</m:ci>
    </m:apply>
   </m:apply>
  </xs:unquote>
 </li>
 <li>
  <xs:unquote>
   <m:apply>
    <m:ci>my_fun</m:ci>
    <m:apply>
     <m:csymbol>cdr</m:csymbol>
     <m:ci>my_pair</m:ci>
    </m:apply>
   </m:apply>
  </xs:unquote>
 </li>
</ul>


5. Concluding Remarks

The prototype of the XML-Scheme interpreter shows that it is feasible to use content MathML to represent a simple functional programming language. Our XML-Scheme language was defined according to the syntax of MathML content markup. It can describe expressions that are composed of a basic set of Scheme operators, in particular the lambda calculus and definitions. Thanks to the the csymbol element, MathML content markup can be extended so to represent a full-featured functional programming language like Scheme. This study indicates that it would be useful to examine the potential uses of programming languages represented in XML. In particular, the quoting/unquoting mechanism described in Section 4 might lead to a new family of scripting languages for Web pages, entirely described in an XML (and possibly MathML) syntax.


References

[1] W3C Recommendation: Extensible Markup Language (XML) 1.0, Feb. 1998.
[2] XML and Scheme, a micro-talk presentation at the Workshop on Scheme and Functional Programming, Sep. 2002.
[3] Richard Kelsey, William Clinger, and Jonathan Rees. Revised5 Report on the Algorithmic Language Scheme. Feb. 1998.
[4] Harold Boley, Markup Languages for Functional-Logic Programming, Proc. 9th WFLP 2000, Benicassim, Spain, Sep. 2000.
[5] W3C Recommendation: Mathematical Markup Language (MathML) Version 2.0, Feb. 2001.
[6] Ken Arnold, James Gosling, and David Homes. JavaTM Programming Language, Addison-Wesley, 2000.
[7] Xerces APACHE: Xerces-J DOM Parser 1.4.4, Jan. 2002.
[8] PHP Manual, Stig Saether Bakken et al, http://www.php.net