Annotating XML Schemas with reStructuredText ============================================ :Author: Ladislav Lhotka :Contact: lhotka@cesnet.cz :Date: 19.6.2006 :RepNo: 3/2006 :Abstract: This technical report describes a method for annotating XML schemas expressed in the RELAX NG language. The annotations use a natural text markup of reStructuredText (reST). The annotated schema is a valid RELAX NG XML document that can transformed into a reST document via an XSLT stylesheet. Consequently, the schema can be easily presented as HTML or LaTeX with added value of bidirectional hyperlinks between RELAX NG definitions and their references. :Keywords: XML, RELAX NG, reStructuredText, XSLT Introduction ------------ Extensible Markup Language (XML) is increasingly used as a flexible format for representing various structured data in networking and other software applications, even if the data is intended to be parsed by machines. The concrete data model can be defined by means of a special language known as *XML schema*. The base XML 1.0 specification [XML]_ offers the DTD (Document Type Definition) language for this purpose. However, this language has a number of deficiencies: * DTD itself is not an XML document * DTD does not allow to use elements with identical names in different contexts * Namespaces are not supported The W3C consortium thus prepared a much more sophisticated language with a rather unfortunate name â€“ *XML Schema* [XSch1]_, [XSch2]_ – as if it was supposed to be *the* ultimate XML schema. As it turns out, it is a typical consortium-driven specification: very complex and in certain places even unclear and ambiguous. An interesting alternative with approximately the same expressive power is *RELAX NG* [RNG]_. It is based on a sound mathematical basis of the tree automaton theory and in general is easier to use than W3C XML Schema. Any XML schema is primarily used for validating XML documents: a validating XML parser is able to verify whether an XML document conforms to the given schema. However, an XML schema can also serve as *authoritative documentation* of the data model. This documentation role of an XML schema can be further improved by interspersing the schema with annotations. This can be realised either via XML comments analogical to comments in the source code of programs, or by using extra markup, for example special XML elements. RELAX NG specification [RNG]_ requires the parsers to ignore elements in foreign namespaces that are included in the schema document, so annotations can be quite naturally enclosed in elements whose namespace is different from that of RELAX NG schema itself (``http://relaxng.org/ns/structure/1.0``). This technical report presents a method for annotating RELAX NG schemas and an XSLT stylesheet that allows to convert a RELAX NG schema augmented with annotations to common presentation forms such as HTML or LaTeX. The annotations use the simple and effective markup of reStructuredText_ (reST). The XSLT stylesheet mentioned above is then able to transform the annotated RELAX NG schema into a valid reST document that essentially combines RELAX NG mechanisms (pattern definitions and references) with reST features such as hyperlinks. The result closely resembles *literate programs* [Knu92]_. .. _reStructuredText: http://docutils.sourceforge.net/rst.html This work was inspired by a similar XSLT stylesheet written by ZdenÄ›k Wagner [Wag05]_. However, the present approach is slightly more general, thanks to the flexibility of reST as opposed to HTML, which is the result of Wagner's stylesheet. Also, my stylesheet uses only XSLT 1.0 and can thus be used with virtually any XSLT processor. The listing of the entire stylesheet can be found in the `appendix`_. It is also available `online `__. Adding annotations ------------------ The annotation system introduces just a single new XML element: `rest`. Its namespace is ``http://www.cesnet.cz/ns/rngrest-annotations/1.0``. In other words, annotations are usually included as follows:: ... where the namespace prefix ``a`` must be properly defined as an abbreviation of the above URL, typically in the `grammar` element. Any number of `rest` elements can be created as direct children of the following RELAX NG elements: `grammar`, `start` and `define`. As a matter of fact, they may appear in other places as well but will be ignored there by the XSLT stylesheet. Any valid reStructuredText is allowed inside the `rest` element â€“ multiple paragraphs with tables, figures, hyperlinks etc. Sections should be used with care. If used, the section titles must not be underlined with exclamation marks ``!`` as these are used internally for automatic sections that the stylesheet creates for each `define` element in the schema. The following example is a simple annotated RELAX NG schema:: Example: Annotated RELAX NG Schema ================================== :Author: Ladislav Lhotka :Contact: lhotka@cesnet.cz .. contents:: This is an example of a RELAX NG schema annotated with reStructuredText. Every annotation consists of one or more paragraphs and may contain text in *italics*, **boldface** or ``monospace font``, numbered or bulleted lists, hyperlinks etc. We use a simple data model of a football_ team. .. _football: http://en.wikipedia.org/wiki/Football The root element is `football-team`. It contains any number of players. Each player has the attribute `role` with one of the four choices below, and element `name` which is supposed to contain full name of the player. goalkeeper defender midfielder striker Several thing are worth pointing out here: * The annotations must be indented strictly according to the rules of reST so that, for example, normal paragraph text starts at column 1. This slightly breaks the canonical indentation structure of XML documents, but there is no easy way around this problem, as reST is very fussy about indentation. * One has to be careful when including characters that are not allowed in XML documents: ``<``, ``>`` or ``&``. XML entity references such as ``<`` do not work here because the reST parser does not interpret them. The solution is to use the ``unicode`` directive of reST, for example :: .. |lt| unicode:: U+003C The ``<`` character can then be represented as ``|lt|`` in annotations. * A special case of the previous issue are XML element names inside annotations. Should the autor wish to write them with the ``<`` and ``>`` delimiters â€“ and I generally recommend not to â€“ then he or she might consider creating a new *interpreted text role*, see [Goo05]_. An XML element could then be conveniently written as, for example, ``:xml:`football-team```. * The ``.. contents::`` directive offers an easy way for generating an index of all RELAX NG pattern definitions used in the schema. Transformations --------------- An annotated RELAX NG schema can be converted to reStructuredText by means of an XSLT processor and the XSLT stylesheet ``rngrest.xsl`` shown in the `appendix`_. The following command uses the `xsltproc`_ processor:: $ xsltproc --output example.rest rngrest.xsl example.rng .. _xsltproc: http://xmlsoft.org/XSLT/xsltproc2.html When applied to the example schema shown above, this command gives the following reST file as output:: Example: Annotated RELAX NG Schema ================================== :Author: Ladislav Lhotka :Contact: lhotka@cesnet.cz .. contents:: This is an example of a RELAX NG schema annotated with reStructuredText. Every annotation consists of one or more paragraphs and may contain text in *italics*, **boldface** or ``monospace font``, numbered or bulleted lists, hyperlinks etc. We use a simple data model of a football_ team. .. _football: http://en.wikipedia.org/wiki/Football :: start !!!!! The root element is `football-team`. It contains any number of players. .. parsed-literal:: player-content !!!!!!!!!!!!!! Each player has the attribute `role` with one of the four choices below, and element `name` which is supposed to contain full name of the player. The pattern is referenced by: * start_ .. parsed-literal:: goalkeeper defender midfielder striker Comparing it to the original annotated schema, we see that the annotation of the `grammar` element became the introductory part of the reST file. The annotations of the other elements – `start` and `define` – were moved before their parent elements and also received automatically generated titles. These titles serve as targets for hyperlinks from the corresponding RELAX NG `ref` elements as well as from the lists of referring definitions that are also automatically created (note the underline characters at the end of the referenced element names). The reST file is then converted to the desired presentation format (HTML, LaTeX or XML) by the standard `docutils`_ tools. For example, to convert our example reST file to HTML, use the following command:: $ rest2html example.rest example.html .. _docutils: http://docutils.sourceforge.net/ Part of the result rendered by Firefox is shown in Figure 1. .. figure:: browser.png :alt: Firefox rendering Our example annotated RELAX NG schema rendered as HTML. Conclusions ----------- This technical report describes a simple yet flexible way of annotating RELAX NG schemas. The annotations use the plain text markup of `reStructuredText`_, which means, in the first place, that the annotated schema is easily readable even in the source form. In addition, XSLT stylesheet ``rngrest.xsl`` in `appendix`_ (also available `online `__) transforms the annotated schema into a valid reST document that can be in turn converted to HTML, LaTeX or XML with the additional benefits of bidirectional links between RELAX NG pattern definitions and their references and, optionally, a listing of all definitions. A more complex schema annotated using the method described in this report is the data model for FlowMon probe configuration [Lho06]_. .. [Goo05] Goodger, D. *Creating reStructuredText Interpreted Text Roles*. Developer documentation of reST, 2005. Available `online `__. .. [Knu92] Knuth D.E. *Literate Programming*. Stanford, California: Center for the Study of Language and Information, 1992. 368pp. ISBN 0-937073-80-6. .. [Lho06] Lhotka L. *XML Schema of FlowMon Configuration Data*. Available `online `__. .. [RNG] Clark J. and Murata M. (Editors). *RELAX NG Specification*. OASIS Consortium, 2001. Available `online `__. .. [Wag05] Wagner Z. *Tool for Annotation of Relax NG Schemas*. IceBearSoft, 2005. Available `online `__. .. [XML] Bray T., Paoli J., Sperberg-McQueen C.M., Maler E. and Yergeau, F. (Editors). *Extensible Markup Language (XML) 1.0.* Third edition. W3C Consortium, 2004. Available `online `. .. [XSch1] Thompson H.S., Beech D., Maloney M. and Mendelsohn N. (Editors). *XML Schema Part 1: Structures.* Second edition. W3C Consortium, 2004. Available `online `__. .. [XSch2] Biron P.V. and Malhotra A. (Editors). *XML Schema Part 2: Datatypes.* Second edition. W3C Consortium, 2004. Available `online `__. Appendix. XSLT Stylesheet ``rngrest.xsl`` ----------------------------------------- .. _appendix: `Appendix. XSLT Stylesheet rngrest.xsl`_ :: ! :: <grammar =" " xmlns : =" " > start !!!!! .. parsed-literal:: < > </ > The pattern is referenced by: * start_ * _ .. parsed-literal:: < > </ > < / > </ > </ > =" " =" _ "