MathML Logo


#Call for Papers

#General Information

#Registration

#Accommodations

#Travel

#Tutorials

#Presentations

#Schedule

    

MathML in the MOWGLI Project

Andrea Asperti and Michael Kohlhase
Università degli Studi di Bologna and Carnegie Mellon University


Abstract

MOWGLI (Mathematics On the Web: Get it by Logic and Interfaces) is a European Project founded by the European Community in the ``Information Society Technologies'' (IST) Program. The project partners are the University of Bologna (coordinator), the German Research Center for Artificial Intelligence (DFKI Saarbrücken), Katholieke Universiteit Nijmegen, Max Planck Institute for Gravitational Physics (Albert Einstein Institute, TU Berlin), and Trusted Logic (France). The aim of the project is the study and the development of a technological infrastructure for the the creation and maintenance of a virtual, distributed, hypertextual library of mathematical knowledge based on a content description of the information.

Currently, almost all mathematical documents available on the Web are marked up only for presentation, severely crippling the potential for added-value services like concept-oriented navigation and information retrieval, data mining or document personalization. The goal of MOWGLI is to overcome these limitations, passing from a machine-readable to a machine-understandable representation of the information and developing the technological infrastructure for its exploitation. Essentially, MOWGLI aspires to become an example of ``best practice'' in its use, and a leading project in the new area of the Semantic Web. In particular, we shall deeply explore the potentialities of XML in the following directions:

Publishing
XML offers sophisticated publishing technologies (Stylesheets, M ATHML, SVG,...) which can be profitably used to solve, in a standard way, the annoying notational problems that traditionally afflict content based and machine-understandable encodings of mathematical knowledge.

Searching and Retrieving
Metadata will play a major role in MOWGL I. New W3C languages such as the Resource Description Framework or XML Query are likely to produce major innovative solutions in this field.

Interoperability
Having a common, machine understandable content language is a major and essential step in this direction. On the other hand a web-accessible library of mathematical knowledge can serve as a reference-ontology that grounds the communication.

Distribution
All XML technology is finally aimed to the access of the Web as a single, distributed resource, with no central authority and few, simple rules.

figure252

The project deals with problems traditionally belonging to different scientific communities: digital libraries, Web publishing, automation of mathematics and computer aided reasoning. Any serious solution to the complex problem of mathematical knowledge management needs a coordinated effort of all these groups and a synergy of their different expertise. To our knowledge, MOWGLI is the first attempt to build a solid cooperation environment between these communities. To achieve this, MOWGLI builds on previous ``standards'' for the management and publishing of mathematical documents (MATHML [2], OPENMATH  [4], OMDOC  [15]), leveraging the different XML technologies [1] (XSLT [6], RDF [5], SOAP [7],...). All these languages cover different and orthogonal aspects of the information and its management; our aim is not to propose a new standard, but to study and to develop the technological infrastructure required for taking advantage of the respective strengths.



Mathematics and the Semantic Web

The goals of the MOWGL I project are shared on a larger scale by the so-called ``Semantic Web'' (see [12]), which is the new frontier of Content Based Information Systems: Associating meaning with content or establishing a layer of machine understandable data will allow will enable a higher degree of automation and more intelligent applications than are possible in the presentation-markup dominated Web today. The ultimate goal of the Semantic Web is to allow machines to share and exploit knowledge in the Web way, i.e. without central authority, with few basic rules, in a scalable, adaptable, extensible manner. However, the actual development of the Semantic Web and its technologies has been hindered so far by the lack of large scale, distributed repositories of structured, content oriented information. The case of Mathematical knowledge, the most rigorous and condensed form of knowledge, is paradigmatic. The World Wide Web is already now the largest single resource of mathematical knowledge, and its importance will be exponentiated by the emerging display technologies like MATH ML, OPEN MATH, and OMDOC.

Due to its rich notational, logical and semantical structure, mathematical knowledge is thus a main case study for the development of the new generation of semantic Web systems: A content-oriented ``semantic Web'' for mathematics would make it possible to offer added-value services like

  • Preservation of the real informative content in a highly structured and machine understandable format, suitable for transformation, automatic elaboration and processing.
  • Cut and paste on the level of computation (take the output from a Web search engine and paste it into a computer algebra system).
  • Automatic proof checking of published proofs (if only to reveal hidden assumptions)
  • Semantical search for mathematical concepts (rather than keywords).
  • Indexing and Classification (Q: ``I have structure with two operations that commute and distribute,... is something known about this?'', A:``this is a lower semi-lattice, read more about this '')
However, almost all mathematical documents available on the Web are marked up only for presentation to human readers and not for machine manipulation. Therefore to obtain these added-value-services, there has to be a leap towards content-markup for mathematics on the web. The aim of the MOWGL I project is both to help in this process, as well as pave the way towards a really useful virtual, distributed, hyper-textual resource for the working mathematician, scientist or engineer.

Content Markup based on MATHML, OPENMATH, and OMDOC

To represent the content of the mathematical libraries the MOWGLI project will develop a representation format either by layering on existing techologies, or by influencing them to incorporate the necessary functionality.

  • The MATHML [2] standard was started with the vision of supporting all the goals above in mind. In fact it is the first standard that introduced a content markup layer in parallel with a presentational one, and has therefore become a pioneering project towards the mining of the mathematical treasure available on the web.
    However, MATHML is merely focused on mathematical expressions. However, in order to bring the idea of a Semantic Web of Mathematics to its full potentialities, other layers of mathematical information must be considered as well. In particular, we need a clean, microscopic description of proofs, a markup for mathematical statements (Theorems, Lemmas, Corollaries, Examples, etc.), a markup for structured collections of these objects (Documents, Theories, etc.), possibly functors between these collections, and finally a good metadata layer.
  • OPENMATH is a standard for content markup for mathematical expressions, with a focus on extensibility. Instead of supplying a wide range of primitive elements for mathematical concepts and operators, it totally relies on a representation of primitive symbols that reference concepts defined in so-called content dictionaries. These are X ML documents in a standardized form that declare symbols and specify their meaning. O PENMATH supplies symbols in content dictionaries for all MATHML primitives, OPENMATH and content MATHML. As the MATHML csymbol element can provide the same function as the OPENMATH OMS elements, OPEN MATH and content M ATHML are roughly isomorphic.
  • The OMDOC format integrates content MATHML and OPEN MATH for the level of mathematical expressions, and extends it with levels of mathematical statements (Definitions, Lemmata, Proofs, etc.), and a level of mathematical theories. The latter is essential for the MOWGL I project, a rich repertoire for structuring the mathematical libraries at the theory level allows for better navigation, theory reuse  , and modularization. Without this additional structure, mathematical libraries become unwieldy and unmanageable in practice. Even though there have been standardization efforts for theories in the field of algebraic specification, the most recent one being the CASL standard (Common Algebraic Specification Language; see [3]) OMDOC is the first effort of developing this into a general markup language for mathematics with attention to web communication and standards.

figure255

Of course the building blocks MATHML, OPEN MATH, and OMDOC are just (important) pieces in a much wider technological puzzle (the fact these formats are well-integrated in this puzzle is one of their main strengths!).

For instance passing from content markup to a good presentational format requires sophisticated operations; on the other side, these transformations are themselves a basic component of the whole mathematical knowledge (like mathematical fonts, or notation conventions). XSLT [6] provides here the right technology, opening the way to the creation of well maintained and documented libraries of mathematical stylesheets [9, 16].

Similarly, the creation and maintenance of the library as a distributed repository, and the crucial aspect of managing the information in the ``web way'', i.e. with few basic rules and without any central authority, requires a light but powerful communication protocol, overcoming the limitations of HTTP (SOAP [7] looks as a promising solution).

Metadata will eventually require a fairly sophisticated model, much beyond what is currently offered by typical metadata models as the Dublin-Core System [13]. Here, RDF [5] looks as the right framework for developing the model.

Authoring, transformation and rendering

The fact of encoding also the microscopic, logical level of mathematics opens the possibility to have completely formalized subsystems of the library [8], which could be checked automatically by standard tools for the automation of formal reasoning and the mechanization of mathematics (proof assistants and logical frameworks [14]). At the same time, any of these tools could be used as an authoring system for documents of the library, by simply exporting their internal libraries into XML, and using stylesheets to transform the output into a standard, machine-understandable representation, such as MATHML content markup or OPEN MATH.

Thus a rapid way to get meaningful repositories of fully structured mathematical knowledge is by exporting them from the available libraries of Logical Frameworks and Proof Assistants, along the lines of the HELM Project [10] or the work of Oostdijk et al. [17]. In MOWGL I we shall primarily rely on the Coq Proof Assistant of INRIA [11], but we will also address the question of transforming controlled subsets of LaTeX input into mathematical libraries. Once we have the relevant information encoded in XML we are faced with the complex issue of its transformation into suitable rendering formats (like xhtml+MATHML for web presentation, and LaTeX for print media).

In the rendering phase we must also heavily work on proofs in order to put them in a form more suitable to human reading. Typically, this requires the reorganization of the structure of the proof, where subproofs may be extracted from their position and brought to the top level (as a sort of let in). Finally, we must also automatically extract from the document all those informations which can be profitably used as metadata, i.e. data which provides some ``useful'' information on the document.


References

[1]   Extensible Markup Language (XML) Specification. Version 1.0. W3C Recommendation, 1999,

[2]   Mathematical Markup Language (MATHML) 2.0 W3C Recommendation, 2001.

[3]   Language Design Task Group CoFI. Casl -- the CoFI algebraic specification language -- summary, version 1.0. Technical report, 1998.

[4]   Olga Caprotti and Arjeh M. Cohen. Draft of the Open Math standard. The Open Math Society,  1998.

[5]   Resource Description Framework (RDF) Model and Syntax Specification, W3C Recommendation 1999.

[6]   XSL Transformations (XSLT). Version 1.0, W3C Recommendation, 1999.

[7]   SOAP Version 1.2 Part 0: Primer. W3C Working Draft 2001.

[8]   Asperti, A., Padovani, L., Sacerdoti Coen, C., Schena, I., `` Formal Mathematics in MATH ML ''. Proceedings ``MATH ML and Math on the Web'', 2000.

[9]   Asperti, A., Padovani, L., Sacerdoti Coen, C., Schena, I., `` XML, Stylesheets and the re-mathematization of Formal Content ''. Proceedings of ``Extreme Markup Languages 2001 Conference'', 2001.

[10]   Asperti, A., Padovani, L., Sacerdoti Coen, C., Schena, I., `` HELM and the semantic Math-Web ''. Proc. TPHOLS. 2001, 01.

[11]   B. Barras et al., `` The Coq Proof Assistant Reference Manual, version 6.3.1 '',

[12]   Tim Berner's Lee. The Semantic Web. W3C Architecture Note, 1998.

[13]   The Dublin Core Metadata Initiative.

[14]   G. Huet, G. Plotkin (eds). Logical Environments. Cambridge University Press. 1993.

[15]   Kohlhase, M. ``OMDOC : An Open Markup Format for Mathematical Documents (Version 1.1) ''. Open Specification.

[16]   B. Naylor, S. Watt. ``Meta Style Sheets for the Conversion of Mathematical Documents into Multiple Forms''. On-line Proceedings of the First International Workshop on Mathematical Knowledge Management, 2001.

[17]   M. Oostdijk. Generation and Presentation of Formal Mathematical Documents, PhD. Thesis, University of Eindhoven.