MathML Logo


#Presentations

#Schedule of Events

    

Illinois DLIB Testbed Technologies for Converting Legacy Mathematics for Display on the Web

Timothy W. Cole, Thomas G. Habing, and William H. Mischo
University of Illinois at Urbana-Champaign


Abstract

The University of Illinois at Urbana-Champaign Grainger Engineering Library, under the auspices of the federally funded Digital Library Initiative (DLI-I), has developed a Testbed currently comprised of over 50,000 full-text articles in XML format from more than 44 Sci-Tech journal titles. The Illinois Testbed is now supported under the CNRI D-Lib Test Suite program and a Collaborating Partners program that provides monetary and in-kind support for the Testbed technologies. The principal focus of the Illinois Testbed has been on developing techniques for the representation and delivery of full-text engineering and physics journal articles in a web environment.

The Illinois Testbed is constructed from source-text journal articles supplied in SGML format by professional society and commercial publishers. Testbed materials are converted from SGML to XML as they are added to the Testbed. While the mathematics markup provided is generally compliant with ISO 12083 Mathematics DTD or the AAP equivalent, each publisher has independently implemented supplemental tags. Testbed materials have been contributed by the American Institute of Physics, the American Physical Society, the American Society of Civil Engineers, the Institution of Electrical Engineers, the Association for Computing Machinery, Elsevier ScienceDirect, and the American Society for Materials. The Testbed includes both a production environment and research and development components.

The Testbed Team has developed a custom approach for rendering XML mathematics natively within web browsers. This approach takes mathematics marked up as well-formed XML, performs transformations on the server to improve its renderability, and then sends it to the browser to render using CSS rules, DHTML, XSLT, and downloadable fonts. There are a number of advantages to dealing with mathematics natively in the browser. One is that the mathematics will scale with the rest of the page if the user changes font size. The user can also programmatically search for text or characters that occur in the mathematics. No special plug-ins are required, and the network bandwidth required is generally less than if bitmap versions of the mathematics were downloaded. We are in the process of converting the ISO 12083-based XML mathematics markup to MathML and applying the same techniques to rendering MathML. As part of this process, we have developed automated algorithms for the conversion of the legacy mathematics into MathML, using the presentation subset of elements.

Our experience has shown that mathematics marked up in XML can be effectively rendered natively in web browsers using CSS, DHTML, and XSLT. We have been most successful in utilizing these techniques using Version 5 of the Microsoft Internet Explorer web browser (IE5). IE5 is currently the only widely available browser that supports all of the above technologies to the extent needed for native rendering of complex mathematics. We have had less success with Netscape web browsers to date, though Netscape browsers are capable of rendering some complex mathematics acceptably. In both browsers, however, there remain limitations. The quality of the resulting display mathematics, while understandable, does not approach the quality of a dedicated mathematics typesetting system or the quality of printed mathematics.

This paper will discuss implementation issues associated with the conversion process from SGML/XML 12083 and AAP mathematics to presentation MathML, focusing particularly on the development of automated conversion algorithms and the additional work necessary for conversion into semantic markup. We will also describe the effectiveness of natively rendering MathML in web browsers using the custom CSS/DHTML approach described above, as contrasted to rendering the same markup in a dedicated plug-in or MathML tool.

Publishers have a great deal of time and resources invested in SGML authoring and publishing tools. Typically, the publisher's contracted typesetter or in-house typesetting shops will supply the publishers with an SGML version of the full-text article. Automated transformations between common SGML mathematics schemas to MathML will help simplify and facilitate the scholarly publishing process. MathML provides a single-target mathematics markup language that will eliminate the need for the publisher-specific CSS/DHTML rendering implementations currently used. Additionally, it is anticipated that tools and plug-ins optimized for MathML and projects such as STIX (Scientific and Technical Information Publisher's Group project to create a comprehensive collection of characters needed in the course of scientific and technical publishing) will provide better rendering and value-added capabilities than currently possible natively in generic web browsers.