Working Draft - 10 May 2000
Editor's Note:This is only a draft. All comments, suggestions, and contributions are welcome and will be credited. I'd like this to be as open a process as possible, including discussion on xml-dev, the primary XML development mailing list. This document has no official standing with any standards body or process. Many thanks to Nathan Kurz, Bill la Forge, Gabe Beged-Dov, Gavin Thomas Nicol, Mike Daconta, Elliotte Rusty Harold, Guy Murphy, John J. Barton, Ed Nixon, Ian Hickson, and Chris Lilley for comments. (The existence of those comments is certainly not an endorsement, but all of the discussions have helped.)
1.0 - Improving XML Document Processing
1.1 - XML 1.0 and Resource Usage
1.2 - The Goals of XPDL 1.0
1.3 - Relationship to Other Standards
2.0 - Creating an XML Processing Description
2.1 - XPD Structure
2.2 - Class Information
2.3 - Content Information
2.4 - Style Information
2.5 - XML Profile Information
2.6 - Extension Information
3.0 - Linking XML Processing Descriptions to XML Documents
3.1 - HTTP Header
3.2 - Processing Instruction
3.3 - Attribute with Namespace
3.4 - XLink
3.5 - Application Decision
3.6 - Inline with document
3.7 - Namespace URI
4.0 - Conformance
Appendix A - DTD for XPDs
Appendix B - XPD for XPDs
Appendix C - References
XML Processing Description Language (XPDL) simplifies the management of document sets and makes document processing more reliable. By creating descriptions for classes of documents, rather than relying on documents to link to processing descriptions themselves, XPDL makes it possible to move beyond the monolithic model presented by DTDs today and to add new resources, like schemas, style sheets, and processing information to the concept of a document class.
There are two deeply intertwined problems with the XML 1.0 spec facing document managers and developers of XML tools. Management difficulties and lack of reliability are the twin demons that promise to keep XML in the basement for any kind of sophisticated document or data management application. Applications that want to perform sophisticated document management are forced to create their own tools for doing so, and the lack of reliability issue is just plain hard to get around.
Document managers face a specification that was designed so that documents could control their own destinies. XML 1.0 by itself describes containers for information, and only a very basic (though complex) set of rules for processing them. XML 1.0 permitted documents to identify the resources (DTDs) that provide them with some kind of structural discipline, but the approach that XML 1.0 takes (the DOCTYPE declaration) isn't capable of supporting alternate tools for validation (schemas) or for including information about other configurable aspects of XML processing (like style sheets, or the ContextML boot process, or anything else). As a result, the W3C and others have taken to piling processing instructions in the prolog to provide applications the information they need to use style sheets, schemas, and other XML 'accessories'.
These PIs in the prolog raise two problems. The first affects those managing a large collection of documents, where keeping all of those declarations in sync is difficult. When new versions of DTDs, schemas, and style sheets appear, updating the collection (and ensuring compatibility) can be a huge task in itself. Because the PIs are spread across the entire collection of documents, updating the collection can become an enormous search-and-replace operation. Automating this, even with handy tools like Perl, is a pain in the neck. The second problem affects document developers, who have to process the prolog and figure out what to do with all of these declarations, if in fact they care. Because all of this information lives outside the XML document structure itself, creating applications that process it intelligently can require the segmentation of programs and additional on-the-fly configuration. It's not too difficult to do this with a DOM, but smaller programs based on SAX will have a difficult time.
The reliability problem has a number of faces. Documents have both too much power to identify their structures and too little power to declare how they should be processed. The existence of an internal subset - which XML processors are required to parse to earn the title of 'XML processor' - is an opportunity for document authors to create structures that no longer conform to the expectations built into the applications that process them. As a result, application developers must build much stronger code that checks for such problems. It may seem like a minor task, but repeated thousands or millions of times, it can become a significant hassle. If left undone, it may expose critical applications to surprising (and exploitable) failures.
Worse still, documents that rely on XML features like attribute defaulting and which use an external DTD subset (much easier to manage and standardize) may arrive missing features when parsed by a nonvalidating parser. Nonvalidating parsers are not required to load external resources, leading to potential missing parts. For applications that rely heavily on attribute defaulting (like XLink), this can be a disaster. Unfortunately, the document has no way to insist that a parser load its external resources. The 'standalone' declaration in the XML declaration seems like it should do that, but it doesn't, and attribute defaults will disappear silently. Specs can require validation at every turn, but this is both difficult to enforce in interchange situations and an unnecessary waste of processing time if all that was really wanted was default attributes.
Solving these two problems seems relatively simple: centralize the description of the resources documents (now treated as document classes) can use, and provide stricter rules for ensuring that they are indeed used if required. This requires a philosophical shift from every-document-for-itself to documents-as-classes, but the latter view is capable of supporting the former (with classes pertaining to individual documents) if necessary.
By standardizing the tools for describing such document classes, and making the rules inside that standard less open to variable interpretation, we can solve the two problems described above while also creating a space capable of supporting other important aspects of XML document processing. Documentation for the class can finally be provided in a way that is meaningful to humans (titles and specs) and computers (MIME types, for example.) Room for other types of schemas (XML-Data, DCD, DDML, SOX, etc.) can be added as well as space for other tools which provide similar functionality (like ContextML's facilities for attribute defaulting.) Finally, by making the standard extensible, we give XPDL the opportunity to serve as a resource file for applications, providing them with a roadmap (perhaps using ContextML) for processing these document types.
XPDL proposes to be such a standard, supporting the creation and processing of document classes. It provides for documentation, constraints checking, attribute defaulting, entity replacement, style specification, and application extensions. XPDL files are themselves written in XML, simplifying processing, and may be used in conjunction with XML or RDF processing. Documents can return to their primary role as storehouses of information and let XPDs handle the structural and processing descriptions.
Note: The first Working Draft of the W3C's XML Schema specification includes approximately the same (though extended) feature set as XML 1.0 DTDs. Constraints, attribute defaulting, and entity and notation declaration are still combined within a single set of declarations. The rules for schema processing and connecting schemas to documents are not yet clear; however, XPDL should work as well managing XML Schema processing as with XML DTD processing. If a processor can handle both XML DTDs and XML Schemas, both types of declarations could be used for different aspects of the same document.
XPDL builds on three key standards and is related in some way to various other standards. Most importantly, XPDL is built using XML 1.0 but is a replacement for a particular mechanism (the DOCTYPE declaration) within XML 1.0. XPDL uses the XLink standard to reference and describe resources. (This will be made more explicit as the XLink standard develops.) XPDL is also built with RDF in mind. XPDL will be described through an RDF schema (in addition to an XML 1.0 DTD) and can be used to connect documents with RDF schema information.
XPDL also interacts with a number of other XML standards, replacing the processing instructions used to connect style sheets to documents. It may eventually provide a means of connecting scripts to documents and provides an extension area in which developers may connect documents to other standards.
XPDL has also been influenced by the Common XML Activity of the SML-DEV mailing list.
XML Processing Descriptions (XPDs) are XML files that provide resources for XML document classes. XPDs include general information describing the document class, as well as constraints that limit membership in the class, resources that can be used to add content to members of the class, style information, and an extensible area that can be used to provide more detailed information regarding processing.
Note:At present, XPDs are described with an XML 1.0 DTD, though XPD validation is not required. In the long run, this will probably be supplemented with an XML Schema and an RDF schema describing a similar XML document.
XPDs contain four general sections describing different aspects of the document class:
Possibilities: Supporting scripting libraries and link directories directly, or shifting to a more generic framework for resources.
Possibility: Providing an inheritance mechanism to permit XPDs to add to (and subtract from) other XPDs.
The root element of an xpd document must be xpd, optionally with a version attribute describing the version of XPDL in use, as shown below:
<!ELEMENT xpd (class?, content?, styles?, profile?, extension?)> <!ATTLIST xpd version CDATA "wd01092000" xmlns CDATA "http://purl.oclc.org/NET/xpdl/" >
For example, the xpd element for XPDL would look like:
<xpd> <class...> <content...> <styles...> <profile...> <extension...> </xpd>
The class element contains information about the document class as a whole, not about the documents that are its members. The class element may provide identifiers for referencing this document class (both human- and machine-readable), as well as information about the owner of the document class and a reference to complete documentation.
<!ELEMENT class (owner?, description?)> <!ATTLIST class classID CDATA #IMPLIED MIMEtype CDATA "application/xml" className CDATA #IMPLIED version CDATA #IMPLIED> <!ELEMENT owner (#PCDATA)> <!ATTLIST owner href CDATA #IMPLIED> <!ELEMENT description (#PCDATA)> <!ATTLIST description href CDATA #IMPLIED>
The attributes of the class element provide information that may be used to integrate processing of this class with generic frameworks. The
classID attribute should contain a unique indentifier for this class. (Note that no infrastructure for such an identifier is provided; URIs may be used.) The
MIMEtype attribute describes the MIME content-type label used for documents that are members of this class. By default, this will be
application/xml, but other values may be used. The
className attribute provides a human readable identifier for the class that may be used in menus and other UI contexts. Finally, the
version identifier allows classes to identify which version of the class this document represents.
The owner and description elements both provide space for a brief description of this document type's creators or maintainers as well as a human-readable description of the class itself. The href attribute on both elements should reference (if provided) more detailed information.
A class element that contained a description of the XPD for XPDL itself might look like:
<class classID="http://purl.oclc.org/NET/xpdl/v1" MIMEtype="application/x-xpdl" className="XPD" version="1.0"> <owner href="http://www.simonstl.com/">Simon St.Laurent, initial editor</owner> <description href="http://purl.oclc.org/NET/xpdl">XML Processing Description Language (XPDL) </description> </class>
All of this information provides description of the class, but none of it is necessary for purely automated processing of members of the document class.
content element provides the information that was stored in the XML 1.0 document type declaration, though it may be broken down more finely. (No provision is made for an internal subset either.) The
content element has the following content model:
<!ELEMENT content (all* | (constraints*, attributes*, entities*, notations*) )>
all element may used to simplify compatibility with XML document type declarations, and is the equivalent of having a set of
entities elements with identical attribute values. All of these elements are EMPTY, and all have the same set of attributes (except for
constraints, which have one additional attribute):
<!ELEMENT all EMPTY> <!ELEMENT constraints EMPTY> <!ELEMENT attributes EMPTY> <!ELEMENT entities EMPTY> <!ELEMENT notations EMPTY> <!ENTITY % schema ' href CDATA #REQUIRED public CDATA #IMPLIED type CDATA "Application/xml-dtd" required (yes | no) "yes" internal (yes | no) "no" '> <!ATTLIST all %schema; root NMTOKENS #IMPLIED> <!ATTLIST constraints %schema; root NMTOKENS #IMPLIED> <!ATTLIST attributes %schema;> <!ATTLIST entities %schema;> <!ATTLIST notations %schema;>
all element is used, the document referenced by the href will be used for constraints (structural validation in the case of DTDs, possibly more for schemas), attribute defaulting, entity processing, and notations, reproducing the set of capabilities provided through references to an external DTD subset in XML 1.0. If
all is used, the document must meet a full set of constraints specified in the document referenced. Situations where constraints should not be checked should use the
entities elements instead. In all cases, the
public attribute provides an optional public identifier that may be used to retrieve local copies of the information. This should be treated as PUBLIC is treated in XML 1.0 - if the processor understands the public identifier, it may use it, but if it doesn't, it should fall back on the URI in the href attribute.
If, instead of
all, the other set of choices is used, one document may be referenced for constraints, another for the attribute defaulting, another for entity processing, and yet another for notations. (There may be overlap - the same document could be referenced for constraints testing and attribute defaulting, while another is referenced for entity processing and notations, for example.)
root attribute is present, on either the
constraints element, the name of the root element of the document must be one of the name tokens inside that attribute value for the document to meet the constraints. If the
root attribute is not present, then no constraints apply to the root element of the document. Possibility: Adding a rootNS attribute for prefix-independent namespace support.
In all cases, the
required attribute indicates whether or not the processor is required to retrieve the resources. The default is yes - XPDL is intended to promote as complete a rendition of XML as possible - but a no value may be appropriate in certain situations. Attribute defaulting and entity processing might not be necessary for a document that has already had this processing performed, but constraints checking may still be valuable, for instance. In other cases, attribute defaulting and entity processing may be useful, but constraints checking may be left out entirely.
type attribute is intended to provide a means of communicating the kind of resource on the other end of the HREF. As schemas and other tools for describing XML documents become available in addition to XML 1.0 DTDs, processors will need a way to decide if they can process this information. MIME content-type identifiers should be used for these identifiers. (At present, XML 1.0 DTDs, RDF Schemas, and the experimental W3C XML Schemas, XML-Data, DCD, SOX, DDML, RELAX, and DSD proposals are the primary contenders for these descriptions.)
In all cases, if a document fails to meet the constraints imposed by the document referenced by the constraints or all element's href attribute, the processor must report that an error has occurred to the application and that the document is not in fact a member of this document class. (Even if required is set to 'no', if the processor loads the constraints, it must make this report.) Also, if the processor finds errors in the files containing the constraint, attribute defaulting, or entity processing information, or simply cannot understand their format, it is required to report an error to the application.
The 'internal' attribute, which applies to all of these elements, allows the XPD to specfiy whether or not the internal subset of a document may be used to supplement the resources identified by the XPD. If its value is yes, then the internal subset may be processed in addition to the resources described by the XPD. (This should probably only be set to yes when the schema resources are themselves DTDs, and maximum XML 1.0 compatibility is desired.)
The content element for XPDL might look like:
<content> <all href="http://www.simonstl.com/projects/xpdl/xpdl1.dtd" public="-//SIMONSTL//DTD XPDL//EN" type="Application/xml-dtd" required="no" internal="no" root="xpd"/> </content>
XPDs can also convey style information. Indeed, some XPDs may only carry style information, acting as a style management tool that centralizes the process of specifying which styles are used by a document. The style mechanism looks much like the processing instruction syntax used to connect style sheets to documents that is provided by the W3C, though it uses elements rather than processing instructions to provide style information. The stylegroup element provides a way to create hierarchical sets of stylesheets which may be applied based on user or application choice.
style elements may appear within the
styles element, and their attributes provide the same information provided by the W3C processing instruction's pseudo-attributes, plus a title attribute that applications can use to present users with their choice of stylesheets:
<!ELEMENT styles (style | stylegroup)*> <!ELEMENT style EMPTY> <!ATTLIST style href CDATA #REQUIRED type CDATA #REQUIRED title CDATA #IMPLIED media CDATA #IMPLIED charset CDATA #IMPLIED alternate (yes | no) "no"> <!ELEMENT stylegroup (style | stylegroup)*> <!ATTLIST stylegroup title CDATA #IMPLIED type CDATA #REQUIRED>
Styles referenced in the XPD should be considered to occur before any stylesheets referenced in the document itself, though ideally such references will be moved to the XPD and left out of the document. Stylesheets of different types should not be mixed within a single style group. Applications should discard style elements within a style group whose type does not match that of the parent stylegroup element.
XPD processing applications should provide support for users to choose among various style sheets, though they are not required to provide support for style sheets in formats they don't understand. (CSS-only applications can't support XSL, and vice-versa; robust XPDs may provide alternatives for both.)
The styles element for XPD might look like:
<styles> <style href="http://www.simonstl.com/projects/xpdl/xpdl1.css" type="text/css" title="XPDL Default Style Sheet" /> </styles>
XPDL allows developers to describe which parts of XML may be used within a document class and whether the document class is dependent on other components of the XML family tree, like XLink or XPointer.
<!ELEMENT profile EMPTY> <!ATTLIST profile fragmentIdentifier CDATA "http://www.w3.org/TR/WD-xptr" linking CDATA "http://www.w3.org/TR/xlink" xinclude CDATA (yes | no) "no" xbase CDATA (yes | no) "no" namespaces (yes | no) "yes" namespacesonroot (yes | no) "no" attinheritelemns (yes | no) "yes" >
Documents may identify whether they use the XML Working Group's XPointer and XLink tools for representing fragment identifiers and identifying links. Applications that use other schemas should use identifiers that uniquely identify the specification (preferably URIs) for these values. (These default values will be updated if and when these specs become W3C recommendations.)
Similarly, the xinclude and xbase attributes allow documents classes to specify whether the W3C's XInclude tool for including content and XBase tool for setting a base URL for relative URIs need to be supported by processors.
The namespaces attribute allows XPDs to state whether namespace processing should be performed on documents of this class. The namespacesonroot attribute allows XPDs to state whether all namespace declarations should be made in the root element, simplifying processing considerably for documents where this is appropriate.
The attinheritelemns attribute identifies whether or not unprefixed attributes should inherit the namespace of the element that contains them - if namespaces are supported and if that element has a namespace. This addresses uncertainty left in namespace processing by the Namespaces in XML recommendation, where default namespaces are not held to apply directly to attributes. If attinheritelemns is set to 'yes', unprefixed attributes will be assumed to be in the same namespace as the element containing them.
The profile element for XPD would look like:
XPD uses the default fragment identifier, linking, and namespace options.
XPDL provides no semantics for content within the extension area, and processors are not required to do anything with that that content. This area may prove useful to applications (like those using MDSAX) that support configurable processing, and may be used according to additional specifications written by the developers of those tools.
<!ELEMENT extension ANY>
XPD itself doesn't use any extensions. The extension element may appear as an empty element or be omitted.
Several mechanisms for linking XML processing descriptions are available, and at this draft stage, it isn't clear which is most appropriate. The three mechanisms described below are possibilities at this point. One thing is clear, however: applications using an XPD for content processing should ignore DOCTYPE declarations and all of their contents in XML 1.0 documents to avoid duplicate (and possibly conflicting) processing. Possibility: Explicit switches.
When available, an HTTP header (like Link) could be used to prescribe an XPD for a particular document, much as the Content-type header provides MIME types at present. These headers appear to have been removed from HTTP 1.1, however. Creating additional headers outside the standards process may be possible for particular situations, but is difficult to propose as a general case.
A simple processing instruction with the target 'xpd' may be used to connect a document to an XPD. Multiple xpd processing instructions may connect a document to multiple XPDs, though the document will have to meet the constraints of all of them. (The first XPD declared has priority over all others if there are conflicting attribute defaulting or entity processing declarations.) The syntax for the PI uses a target and a single pseudo-attribute, href.
All xpd processing instructions must appear in the prolog, before the document type declaration (which it replaces) and the root element. Possibility: required pseudo-attribute for cases where XPD is informative but not necessary.
The possibility of using XPDs on subdocuments and the related possibility of processing documents with no prolog suggest that a means of connecting XPDs to documents through an attribute, identified with a namespace, could be viable.
<elementName xmlns:xpd="http://purl.oclc.org/NET/xpdl/" xpd:href="xpdfile.xpd">
When the XLink standard stabilizes, it may be an appropriate tool for connecting XPDs to documents using more than a simple href.
Applications may use XPDs to store their sets of preferences for particular types of file processing, effectively treating them as resource files. This may reduce the need to modify code directly to implement relatively simple changes.
Possibility: Although it loses all the management benefits of storing XPDs externally, it may make sense in certain cases to provide an XPD internal to the document. As the XPD cannot easily replace the root element, it would have to be the first child element of the root element, and processors would have to cope with this late appearance of the XPD somehow.
Possibility: XPDs could be stored at the URI used in the namespace declaration for a particular element. While the namespaces recommendation doesn't require anything to be at the URI used to identify namespaces uniquely, it doesn't prohibit this practice either.
XPD will provide a much more robust set of rules for ensuring that applications using XPD return the same set of documents every time. In cases where an application doesn't understand a portion of the content area of the XPD, an error message must be generated. This section will define possible errors and provide rules for their reporting and handling.
<!ELEMENT xpd (class?, content?, styles?, profile?, extension?)> <!ATTLIST xpd version CDATA "wd01092000" xmlns CDATA "http://purl.oclc.org/NET/xpdl/" > <!ELEMENT class (owner?, description?)> <!ATTLIST class classID CDATA #IMPLIED MIMEtype CDATA "application/xml" className CDATA #IMPLIED version CDATA #IMPLIED> <!ELEMENT owner (#PCDATA)> <!ATTLIST owner href CDATA #IMPLIED> <!ELEMENT description (#PCDATA)> <!ATTLIST description href CDATA #IMPLIED> <!ELEMENT content (all* | (constraints*, attributes*, entities*, notations*) )> <!ELEMENT all EMPTY> <!ELEMENT constraints EMPTY> <!ELEMENT attributes EMPTY> <!ELEMENT entities EMPTY> <!ELEMENT notations EMPTY> <!ENTITY % schema ' href CDATA #REQUIRED public CDATA #IMPLIED type CDATA "Application/xml-dtd" required (yes | no) "yes" internal (yes | no) "no" '> <!ATTLIST all %schema; root NMTOKENS #IMPLIED> <!ATTLIST constraints %schema; root NMTOKENS #IMPLIED> <!ATTLIST attributes %schema;> <!ATTLIST entities %schema;> <!ATTLIST notations %schema;> <!ELEMENT styles (style*)> <!ELEMENT style EMPTY> <!ATTLIST style href CDATA #REQUIRED type CDATA #REQUIRED title CDATA #IMPLIED media CDATA #IMPLIED charset CDATA #IMPLIED alternate (yes | no) "no"> <!ELEMENT stylegroup (style)*> <!ATTLIST stylegroup title CDATA #IMPLIED type CDATA #REQUIRED> <!ELEMENT profile EMPTY> <!ATTLIST profile fragmentIdentifier CDATA "XPointer" linking CDATA "XLink" namespaces (yes | no) "yes" attinheritelemns (yes | no) "yes" > <!ELEMENT extension ANY>
<xpd> <class classID="http://purl.oclc.org/NET/xpdl/v1" MIMEtype="application/x-xpdl" className="XPD" version="1.0"> <owner href="http://www.simonstl.com/">Simon St.Laurent, initial editor</owner> <description href="http://purl.oclc.org/NET/xpdl">XML Processing Description Language (XPDL) </description> </class> <content> <all href="http://www.simonstl.com/projects/xpdl/xpdl1.dtd" public="-//SIMONSTL//DTD XPDL//EN" type="Application/xml-dtd" required="no" internal="no" root="xpd"/> </content> <styles> <style href="http://www.simonstl.com/projects/xpdl/xpdl1.css" type="text/css" title="XPDL Default Style Sheet" /> </styles> <profile /> <extension /> </xpd>
[Associating] - Associating Style Sheets with XML Documents. James Clark, ed. Available at http//www.w3.org/TR/xml-stylesheet.
[Namespaces] - Namespaces in XML, Tim Bray, Dave Hollander, and Andrew Layman, eds. Available at http//www.w3.org/TR/REC-xml-names.
[Unicode] - The Unicode Consortium. The Unicode Standard, version 3.0. ISBN 0-201-61633-5. Described at http//www.unicode.org/unicode/standard/versions/Unicode3.0.html.
[Schemas] - XML Schema, Parts 0-2. Henry Thompson, Paul Biron, David Fallside, et al. eds. Available at http//www.w3.org/TR/xmlschema-0 (Primer), http//www.w3.org/TR/xmlschema-1 (Structures), and http//www.w3.org/TR/xmlschema-2 (Datatypes).
[XBase] - XML Base (XBase). Jonathan Marshm ed. Available at http://www.w3.org/TR/xmlbase.
[XInclude] - XML Inclusions (XInclude). Jonathan Marsh and David Orchard, eds. Available at http//www.w3.org/TR/xinclude.
[XML] - Extensible Markup Language (XML) 1.0, Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen, eds.. 10 February 1998. Available at http//www.w3.org/TR/REC-xml.
Copyright (c) 2000 Simon St.Laurent. Redistribution permitted.