David Madore's WebLog: A note about XML and the DTD annoyance

Index of all entries / Index de toutes les entréesXML (RSS 1.0) • Recent comments / Commentaires récents

Entry #0031 [older|newer] / Entrée #0031 [précédente|suivante]:

(Monday) · Memorial Day (US)

A note about XML and the DTD annoyance

XML is an interesting (but possibly heavily over-hyped) generic (or “meta”) description language, based on the markup concept (aka, “tags”), that allows a very wide variety of applications, from hypertext to mathematics, through graphics, abstract semantics, synchronized multimedia, user interface description, site syndication and much more. One of its ingenious (or, actually, completely trivial) ideas is the use of namespaces to separate tag vocabularies into semantic units.

One of the dubious features of XML, however, is the fact that it maintains compatibility with the older SGML (or ISO 8879). For this reason, an XML file typically contains, or includes by reference, a Document Type Declaration (DTD). The unfortunate thing is that (for legacy reasons) this DTD plays a double role: it serves both to define validity constraints on the document (such as to define which tags can contain which), which can be verified by a “validating parser”, and on the other hand to provide genuine semantic content / data information. For example, the fact that in HTML the entity &eacute; represents the character “é” (LATIN SMALL LETTER E WITH ACUTE) is determined by the DTD; whereas the fact that the tag <link> determines a link is governed by the namespace (in the case of XHTML, http://www.w3.org/1999/xhtml). This is an unpleasant grouping of attributions, for one would sometimes want to be able to invent an XML application that has its own namespace and also its own set of entities but that does not care about validation (or for which validation is pretty much meaningless). Also, it is the DTD which determines the fact that such or such an attribute might have this or that default or fixed value, which means that the DTD has to be read not only to validate the document but also to determine entity replacements (part of the actual document content) and attribute values. And the DTD also determines which attribute (normally named id) is the fragment ID reference.

It has become much more obvious to me while working with XSLT (which cannot produce an internal DTD subset in the output document, unfortunately), and XUL or MathML (which, embedded in HTML, are very difficult to describe through a DTD), that DTDs are rather inadequate, but that they are tied up with XML in such a way that it is difficult to dispense with them entirely.

↑Entry #0031 [older|newer] / ↑Entrée #0031 [précédente|suivante]

Recent entries / Entrées récentesIndex of all entries / Index de toutes les entrées