I'm slowly beginning to grasp the concept of metadata, so I thought I'd say a few words about it. In truth, the difficulty is not with the concept itself (there's nothing so complicated about it: metadata are just ancillary data, that are not part of a document's content but describe the document itself), but the strange and fascinating architecture that various organizations, notably the World Wide Web Consortium, have built around it, and that bizarre language, RDF. RDF is a language of which one can easily (and that is what happened to me) read the specs and guide, and still not have the slightest idea of what it's all about: it seems at once completely abstract and devoid of utility. (As a mathematician, I really shouldn't have any problem with notions that are abstract and devoid of utility, yet…) Yet RDF is not a content-free language, and although claims that it is the universal “semantic” language (whatever that may mean) are pompous and not very meaningful, it is an interesting idea.
The most basic use of metadata would be, say, in an HTML document: to indicate a
list of keywords associated with the document, one might write
<meta name="Keywords" content="foo, bar, baz, qux"
/>
, for example. Or to indicate who wrote the document,
<meta name="Creator" content="Doe, John" />
might
be used. But who decides what tags like “Keywords” and
“Creator” are available? It could be, of course, a simple
de facto list, with various tags understood by
various kinds of potential users. But there is a more formal aspect:
metadata vocabularies can be defined and collected in so-called
“profiles”. In HTML, the
profile
attribute to the head
element
specifies the metadata vocabulary profile that is used.
One such profile is the Dublin
core element
set, which specifies a small list of basic (“core”)
metadata properties. The formal description of the Dublin core
namespace is an RDF file located at http://purl.org/dc/elements/1.1/
,
so one appropriate way to specify the keywords and creator for the
document might be <head
profile="http://purl.org/dc/elements/1.1/"> <meta
name="Keywords" content="foo, bar, baz, qux" /> <meta
name="Creator" content="Doe, John" /> </head>
; another
way, which is recommended by RFC 2731
consists of using the <link>
element as follows:
<head> <link rel="schema.DC"
href="http://purl.org/dc/elements/1.1/" /> <meta
name="DC.Keywords" content="foo, bar, baz, qux" /> <meta
name="DC.Creator" content="Doe, John" /> </head>
.
To exploit metadata to their full power, and to describe them
commodiously, however, the RDF language is necessary.
For example, to write the same metainformation as above in
RDF, one would write, if I am not mistaken,
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description
rdf:about="http://www.somedomain.tld/some/uri/" dc:keywords="foo, bar,
baz, qux" dc:creator="Doe, John" /> </rdf:RDF>
: this
formally states that John Doe is the creator of
http://www.somedomain.tld/some/uri/
which has keywords
foo, bar, baz and qux. But RDF goes much beyond that: it
is capable, for example, of making metastatements about the metadata
themselves (as in “Jane Smith says that John Doe is the author
of http://www.somedomain.tld/some/uri/
”), or of
defining (to some extent, of course—at a point it becomes
necessary to express things in a natural language) the vocabulary that
it uses (thus, the Dublin Core RDF vocabulary is itself
expressed in RDF).
A strange and fascinating architecture, but beautiful it its way! I guess I should slowly start attaching some correctly labeled metadata to these pages.