XML is an interesting (but possibly heavily over-hyped) generic (or “meta”) description language, based on the markup concept (aka, “tags”), that allows a very wide variety of applications, from hypertext to mathematics, through graphics, abstract semantics, synchronized multimedia, user interface description, site syndication and much more. One of its ingenious (or, actually, completely trivial) ideas is the use of namespaces to separate tag vocabularies into semantic units.
One of the dubious features of XML, however, is the
fact that it maintains compatibility with the older SGML (or
ISO 8879). For this reason, an XML file typically
contains, or includes by reference, a Document Type Declaration
(DTD). The unfortunate thing is that (for legacy
reasons) this DTD plays a double role: it serves both to
define validity constraints on the document (such as to define which
tags can contain which), which can be verified by a “validating
parser”, and on the other hand to provide genuine semantic
content / data information. For example, the fact that in
HTML the entity é
represents the
character “é” (LATIN SMALL LETTER E WITH
ACUTE) is determined by the DTD; whereas the fact
that the tag <link>
determines a link is governed
by the namespace (in the case of XHTML,
http://www.w3.org/1999/xhtml
). This is an unpleasant
grouping of attributions, for one would sometimes want to be able to
invent an XML application that has its own namespace and
also its own set of entities but that does not care about validation
(or for which validation is pretty much meaningless). Also, it is the
DTD which determines the fact that such or such an
attribute might have this or that default or fixed value, which means
that the DTD has to be read not only to validate the
document but also to determine entity replacements (part of the actual
document content) and attribute values. And the DTD also
determines which attribute (normally named id
) is the
fragment ID reference.
It has become much more obvious to me while working with XSLT (which cannot produce an internal DTD subset in the output document, unfortunately), and XUL or MathML (which, embedded in HTML, are very difficult to describe through a DTD), that DTDs are rather inadequate, but that they are tied up with XML in such a way that it is difficult to dispense with them entirely.