Presentations
Schedule of Events
|
|
Illinois DLIB Testbed Technologies for Converting
Legacy Mathematics for Display on the Web
Timothy W. Cole, Thomas G. Habing, and William H. Mischo
University of Illinois at Urbana-Champaign
Abstract
The University of Illinois at Urbana-Champaign Grainger Engineering
Library, under the auspices of the federally funded Digital Library Initiative
(DLI-I), has developed a Testbed currently comprised of over 50,000 full-text
articles in XML format from more than 44 Sci-Tech journal titles. The Illinois
Testbed is now supported under the CNRI D-Lib Test Suite program and a
Collaborating Partners program that provides monetary and in-kind support
for the Testbed technologies. The principal focus of the Illinois Testbed
has been on developing techniques for the representation and delivery of
full-text engineering and physics journal articles in a web environment.
The Illinois Testbed is constructed from source-text journal articles
supplied in SGML format by professional society and commercial publishers.
Testbed materials are converted from SGML to XML as they are added to the
Testbed. While the mathematics markup provided is generally compliant with
ISO 12083 Mathematics DTD or the AAP equivalent, each publisher has independently
implemented supplemental tags. Testbed materials have been contributed
by the American Institute of Physics, the American Physical Society, the
American Society of Civil Engineers, the Institution of Electrical Engineers,
the Association for Computing Machinery, Elsevier ScienceDirect, and the
American Society for Materials. The Testbed includes both a production
environment and research and development components.
The Testbed Team has developed a custom approach for rendering XML mathematics
natively within web browsers. This approach takes mathematics marked up
as well-formed XML, performs transformations on the server to improve its
renderability, and then sends it to the browser to render using CSS rules,
DHTML, XSLT, and downloadable fonts. There are a number of advantages to
dealing with mathematics natively in the browser. One is that the mathematics
will scale with the rest of the page if the user changes font size. The
user can also programmatically search for text or characters that occur
in the mathematics. No special plug-ins are required, and the network bandwidth
required is generally less than if bitmap versions of the mathematics were
downloaded. We are in the process of converting the ISO 12083-based XML
mathematics markup to MathML and applying the same techniques to rendering
MathML. As part of this process, we have developed automated algorithms
for the conversion of the legacy mathematics into MathML, using the presentation
subset of elements.
Our experience has shown that mathematics marked up in XML can be effectively
rendered natively in web browsers using CSS, DHTML, and XSLT. We have been
most successful in utilizing these techniques using Version 5 of the Microsoft
Internet Explorer web browser (IE5). IE5 is currently the only widely available
browser that supports all of the above technologies to the extent needed
for native rendering of complex mathematics. We have had less success with
Netscape web browsers to date, though Netscape browsers are capable of
rendering some complex mathematics acceptably. In both browsers, however,
there remain limitations. The quality of the resulting display mathematics,
while understandable, does not approach the quality of a dedicated mathematics
typesetting system or the quality of printed mathematics.
This paper will discuss implementation issues associated with the conversion
process from SGML/XML 12083 and AAP mathematics to presentation MathML,
focusing particularly on the development of automated conversion algorithms
and the additional work necessary for conversion into semantic markup.
We will also describe the effectiveness of natively rendering MathML in
web browsers using the custom CSS/DHTML approach described above, as contrasted
to rendering the same markup in a dedicated plug-in or MathML tool.
Publishers have a great deal of time and resources invested in SGML
authoring and publishing tools. Typically, the publisher's contracted typesetter
or in-house typesetting shops will supply the publishers with an SGML version
of the full-text article. Automated transformations between common SGML
mathematics schemas to MathML will help simplify and facilitate the scholarly
publishing process. MathML provides a single-target mathematics markup
language that will eliminate the need for the publisher-specific CSS/DHTML
rendering implementations currently used. Additionally, it is anticipated
that tools and plug-ins optimized for MathML and projects such as STIX
(Scientific and Technical Information Publisher's Group project to create
a comprehensive collection of characters needed in the course of scientific
and technical publishing) will provide better rendering and value-added
capabilities than currently possible natively in generic web browsers.
|