Why I don't like RDF

Clearly a rant, please treat as such. Still undergoing occasional development.

I've written before about What's right with RDF, but I have to admit that overall I'm deeply unhappy with RDF on a number of different levels. While I think the kinds of things that RDF does are useful at times, I think RDF gets a number of things painfully wrong. Not only does RDF get them wrong, but RDF's getting things wrong screws up other things as well.

There are two main areas where I think RDF's creators have made deeply unfortunate decisions. The model at the heart of RDF - directed graphs composed of connected nodes - is a great model for many kinds of projects. The means by which these graphs are expressed, however, is a disaster on at least two different levels:

  1. RDF uses Uniform Resource Identifiers (URIs) at the heart of its nodes, leading to infinite philosophical and practical confusions, as while as difficulty in reading, even given tools.
  2. RDF is generally serialized as XML, causing problems for both RDF and XML.

The origins of URIs, in the immediately useful URL, seemed pretty harmless. Even if people didn't know about the caching and processing mechanisms between their browser and a Web server, the metaphors all worked well enough. Locations, places to find things, going somewhere to get something - clear, friendly, comprehensible.

Unfortunately, as URLs combined with URNs, these metaphors were tossed aside in favor of the seemingly clearer work of "identification". URIs identify resources, which are sort of whatever you want or I want or whatever URI owners want if they can be bothered to communicate it. The understandings of the division of labor between server and browser regarding things like fragment identifiers have been tossed out in favor of nonsense about magic hashes (#) signalling a difference between representation and abstract ideas. (There are people who will tell you this is all fine and good, of course, but they all seem to be coming from RDF.) We can take http://example.com# and claim that "the RDF interpretation of a fragment identifier allows it to indicate a thing that is entirely external to the document, or even to the 'shared information space' known as the Web. That is, it can be an abstract idea, like Graham Klyne's car or a mythical Unicorn." Right.

While such delusions would probably be fine if they were kept in the philosophical asylums where they belong, they tend to creep out into public view as supposedly important deliberations. For some as yet undetermined reason, the "philosophical engineers" running this show seem to think that calling their signifiers "URIs" will let them avoid all the problems inherent in those signifiers called "words".

Even if we can ignore the bizarrely philosophical pretensions of URIs, RDF documents tend to be enormous compilations of URIs. People can keep track of these things in small quantities, perhaps, but programs are the only hope once more than a few of them appear. Making them work more generally is difficult given that their connection to actual meaning is vague (the notorious "social meaning" problem), their nature is opaque, and their syntax makes XML's angle-brackets look friendly. It's hard to see this as a great advantage.

Looking beyond the URIs to how they're used, RDF developers talk a lot more about querying and combining graphs than about transforming or translating them. While XML to XML transformations, typically using XSLT, are pretty commonly used to convert between various semantic forms, the RDF community seems far more excited about the notion of standardizing vocabularies through layers of schemas, ontologies, and agreement. The XML world has frequently deluded itself into thinking that agreement-by-committee can solve its communications problems, but at the same time it has hedged its bets with transformation practices that let developers get from vocabulary A to vocabulary B just in case the big-picture designs aren't quite what was needed.

RDF's graph nature also conflicts with the tree structures of the XML in which it is serialized on a regular basis. This has two consequences, which tend to appear at different times. The first is that serializations of arbitrary RDF are enormously difficult to process or interpret using XML tools. The second, perhaps more perverse, is that RDF has managed to inflict its assumptions on XML enough (largely through namespaces) that there are periodic efforts to convince XML users to put themselves in the RDF straitjacket. Rather than merely accepting the constraints of trees, these people suggest, developers would be wise to subject their content to the constraints of trees and graphs simultaneously.

RDF is useful stuff, once you ignore the biohazard signs and rebuild your brain to process URIs instead of words. It's very sad that this technology emerged from the same institution that key SGML community members chose to be the home of their XML project - sad both for RDF, trapped in an inappropriate serialization format, and for XML, which now carries an enormous burden of extra baggage imposed on it by its supposedly helpful distant relative.

And then there's the incredible naivete of thinking first-order logic is a great way to describe the world, but why go there tonight?

Copyright 2003 Simon St.Laurent