XML Processing Description Language (XPDL)

Working Draft - 4 April 1999

Editor's Note:This is merely an initial draft. All comments, suggestions, and contributions are welcome and will be credited. I'd like this to be as open a process as possible, including discussion on xml-dev, the primary XML development mailing list. This document has no official standing with any standards body or process.

1.0 - XML Document Typing, Present and Possible

2.0 - Creating an XML Processing Description

2.1 - XPD Structure

2.2 - Class Information

2.3 - Content Information

2.4 - Style Information

2.5 - Extension Information

3.0 - Linking XML Processing Descriptions to XML Documents

3.1 - HTTP Header

3.2 - Processing Instruction

3.3 - Attribute with Namespace

3.4 - XLink

4.0 - Conformance

Appendix A - DTD for XPDs

Appendix B - RDF Schema for XPDs

1.0 - XML Document Typing, Present and Possible

In order for document processing to be reliable, it is necessary to be able to describe classes of documents and to identify individual documents as members of these classes. These document classes may then provide a shared set of constraints or other processing that applies to all documents that are members of the class. Class membership may be determined by comparing documents against constraints, but classes can provide other services to their members, including shared resources for document content, presentation, and processing.

While XML DTDs can be used to describe document classes, a number of 'features' in the XML 1.0 specification make the use of DTDs unreliable, with consequences ranging from missing default attributes to unexpanded entities to applications foiled in their attempts to use style sheets requiring particular document structures. Some of these problems lie in the nature of DTDs, while others originate in the different mechanisms by which applications are permitted to process (or not process) DTDs.

At the same time XML's document-centric approach requires every document to provide information required for its processing rather than being members of a class of documents which share that information. As document sets grow larger, making changes to the set can be difficult, especially if those changes involve which DTDs, schemas, or style sheets a document references.

XML Processing Description Language (XPDL) seeks to provide a means of describing document classes which will simplify the management of document classes and make processing more reliable. By creating descriptions for classes of documents, rather than relying on documents to link to sets of resources themselves, XPDL makes it possible both to move beyond the monolithic model presented by DTDs today and to add new resources, like schemas, style sheets, and processing information to the concept of a document class.

2.0 - Creating an XML Processing Description

XML Processing Descriptions (XPDs) are XML files that provide resources for XML document classes. XPDs include general information describing the document class, as well as constraints that limit membership in the class, resources that can be used to add content to members of the class, style information, and an extensible area that can be used to provide more detailed information regarding processing.

Note:At present, XPDs are described with an XML 1.0 DTD, though XPD validation is not required. In the long run, this will probably be replaced with an RDF schema describing a similar XML document.

2.1 - XPD Structure

XPDs contain four general sections describing different aspects of the document class:

Class - General information about the class, including human-readable documentation and possibly other machine-readable properties like the MIME content type of documents in the class.
Content - Information relating to the content of documents in the class. Constraints describing required content and structures are referenced here, as are tools for attribute defaulting and entity processing.
Style - References to styles that may be used with documents in the class is provided here.
Extension - This area is left open to provide support for document classes that may require particular processing in given environments. Other specifications that make use of XPD may use this space and require conformance to particular vocabularies within it, but XPD only requires that its contents be well-formed XML.

The root element of an xpd document must be xpd, optionally with a version attribute describing the version of XPDL in use, as shown below:

<!ELEMENT xpd (class?, content?, styles?, extension?>
<!ATTLIST xpd
    version CDATA "wd040499">

2.2 - Class Information

The class element contains information about the document class as a whole, not about the documents that are its members. The class element may provide identifiers for referencing this document class (both human- and machine-readable), as well as information about the owner of the document class and a reference to complete documentation.

<!ELEMENT class (owner?, description?)>
<!ATTLIST class
    classID CDATA #IMPLIED
    MIMEtype CDATA "application/xml"
    className CDATA #IMPLIED
    version CDATA #IMPLIED>

<!ELEMENT owner (#PCDATA)>
<!ATTLIST owner
    href CDATA #IMPLIED>

<!ELEMENT description (#PCDATA)>
<!ATTLIST description
    href CDATA #IMPLIED>

The attributes of the class element provide information that may be used to integrate processing of this class with generic frameworks. The classID attribute should contain a unique indentifier for this class. (Note that no infrastructure for such an identifier is provided; URIs may be used.) The MIMEtype attribute describes the MIME content-type label used for documents that are members of this class. By default, this will be application/xml, but other values may be used. The className attribute provides a human readable identifier for the class that may be used in menus and other UI contexts. Finally, the version identifier allows classes to identify which version of the class this document represents.

The owner and description elements both provide space for a brief description of this document type's creators or maintainers as well as a human-readable description of the class itself. The href attribute on both elements should reference (if provided) more detailed information.

A class element that contained a description of the XPD for XPDL itself might look like:

<class classID="http://purl.oclc.org/NET/xpdl/v1" 
    MIMEtype="application/x-xpdl" 
    className="XPD" 
    version="1.0">
<owner href="http://www.simonstl.com/">Simon St.Laurent, 
initial editor</owner>
<description href="http://purl.oclc.org/NET/xpdl">XML Processing Description Language (XPDL)
</description>
</class>

All of this information provides description of the class, but none of it is necessary for purely automated processing of members of the document class.

2.3 - Content Information

The content element provides the information that was stored in the XML 1.0 document type declaration, though it may be broken down more finely. (No provision is made for an external subset, either.) The content element has the following content model:

<!ELEMENT content (all | (constraints?, attributes?, entities?) )>

The all element may used to simplify compatibility with XML document type declarations, and is the equivalent of having a set of constraints, attributes, and entities elements with identical attribute values. All of these elements are EMPTY, and all have the same set of attributes (except for all and constraints, which have one additional attribute):

<!ELEMENT all EMPTY>
<!ELEMENT constraints EMPTY>
<!ELEMENT attributes EMPTY>
<!ELEMENT entities EMPTY>

<!ENTITY % schema '
    href CDATA #REQUIRED
    type CDATA "dtd"
    required (yes | no) "yes"
'>

<!ATTLIST all
    %schema;
    root NMTOKENS #IMPLIED>
<!ATTLIST constraints
    %schema;
    root NMTOKENS #IMPLIED>
<!ATTLIST attributes
    %schema;>
<!ATTLIST entities
    %schema;>

If the all element is used, the document referenced by the href will be used for constraints (structural validation in the case of DTDs, possibly more for schemas), attribute defaulting, and entity processing, reproducing the set of capabilities provided through references to an external DTD subset in XML 1.0. If all is used, the document must meet a full set of constraints specified in the document referenced. Situations where constraints should not be checked should use the attributes and entities elements instead.

If, instead of all, the other set of choices is used, one document may be referenced for constraints, another for the attribute defaulting, and yet another for entity processing. (There may be overlap - the same document could be referenced for constraints testing and attribute defaulting, while another is referenced for entity processing, for example.)

If the root attribute is present, on either the all or constraints element, the name of the root element of the document must be one of the name tokens inside that attribute value for the document to meet the constraints. If the root attribute is not present, then no constraints apply to the root element of the document.

In all cases, the required attribute indicates whether or not the processor is required to retrieve the resources. The default is yes - XPDL is intended to promote as complete a rendition of XML as possible - but a no value may be appropriate in certain situations. Attribute defaulting and entity processing might not be necessary for a document that has already had this processing performed, but constraints checking may still be valuable, for instance. In other cases, attribute defaulting and entity processing may be useful, but constraints checking may be left out entirely.

The type attribute is intended to provide a means of communicating the kind of resource on the other end of the HREF. As schemas and other tools for describing XML documents become available in addition to XML 1.0 DTDs, processors will need a way to decide if they can process this information. A common notation for identifying these documents is needed, but is outside the scope of this proposal.

In all cases, if a document fails to meet the constraints imposed by the document referenced by the constraints or all element's href attribute, the processor must report that an error has occurred to the application and that the document is not in fact a member of this document class. (Even if required is set to 'no', if the processor loads the constraints, it must make this report.) Also, if the processor finds errors in the files containing the constraint, attribute defaulting, or entity processing information, or simply cannot understand their format, it is required to report an error to the application.

2.4 - Style Information

XPDs can also convey style information. Indeed, some XPDs may only carry style information, acting as a style management tool that centralizes the process of specifying which styles are used by a document. The style mechanism looks much like the processing instruction syntax used to connect style sheets to documents that is provided by the W3C, though it uses elements rather than processing instructions to provide style information.

Empty style elements may appear within the styles element, and their attributes provide the same information provided by the W3C processing instruction's pseudo-attributes:

<!ELEMENT styles (style*)>
<!ELEMENT style EMPTY>
<!ATTLIST all
    href CDATA #REQUIRED
    type CDATA #REQUIRED
    title CDATA #IMPLIED
    media CDATA #IMPLIED
    charset CDATA #IMPLIED
    alternate (yes | no) "no">

Styles referenced in the XPD should be considered to occur before any stylesheets referenced in the document itself, though ideally such references will be moved to the XPD and left out of the document.

XPD processing applications should provide support for users to choose among various style sheets, though they are not required to provide support for style sheets in formats they don't understand. (CSS-only applications can't support XSL, and vice-versa; robust XPDs may provide alternatives for both.)

2.5 - Extension Information

XPDL provides no semantics for content within the extension area, and processors are not required to do anything with that that content. This area may prove useful to applications (like those using MDSAX) that support configurable processing, and may be used according to additional specifications written by the developers of those tools.

<!ELEMENT extension ANY>

3.0 - Linking XML Processing Descriptions to XML Documents

Several mechanisms for linking XML processing descriptions are available, and at this draft stage, it isn't clear which is most appropriate. The three mechanisms described below are possibilities at this point. One thing is clear, however: applications using an XPD should ignore DOCTYPE declarations and all of their contents in XML 1.0 documents to avoid duplicate (and possibly conflicting) processing.

3.1 - HTTP Header

When available, an HTTP header (like Link) could be used to prescribe an XPD for a particular document, much as the Content-type header provides MIME types at present. These headers appear to have been removed from HTTP 1.1, however. Creating additional headers outside the standards process may be possible for particular situations, but is difficult to propose as a general case.

3.2 - Processing Instruction

A simple processing instruction with the target 'xpd' may be used to connect a document to an XPD. Multiple xpd processing instructions may connect a document to multiple XPDs, though the document will have to meet the constraints of all of them. (The first XPD declared has priority over all others if there are conflicting attribute defaulting or entity processing declarations.) The syntax for the PI uses a target and a single pseudo-attribute, href.

<?xpd href="xpdfile.xpd"?>

All xpd processing instructions must appear in the prolog, before the document type declaration (which it replaces) and the root element.

3.3 - Attribute with Namespace

The possibility of using XPDs on subdocuments and the related possibility of processing documents with no prolog suggest that a means of connecting XPDs to documents through an attribute, identified with a namespace, could be viable.

<elementName xmlns:xpd="http://purl.oclc.org/NET/xpdl/" xpd:href="xpdfile.xpd">

3.4 - XLink

When the XLink standard stabilizes, it may be an appropriate tool for connecting XPDs to documents using more than a simple href.

4.0 - Conformance

XPD will provide a much more robust set of rules for ensuring that applications using XPD return the same set of documents every time. In cases where an application doesn't understand a portion of the content area of the XPD, an error message must be generated. This section will define possible errors and provide rules for their reporting and handling.

Appendix A - DTD for XPDs

<!ELEMENT xpd (class?, content?, styles?, extension?>
<!ATTLIST xpd
    version CDATA "wd040499">

<!ELEMENT class (owner?, description?)>
<!ATTLIST class
    classID CDATA #IMPLIED
    MIMEtype CDATA "application/xml"
    className CDATA #IMPLIED
    version CDATA #IMPLIED>

<!ELEMENT owner (#PCDATA)>
<!ATTLIST owner
    href CDATA #IMPLIED>

<!ELEMENT description (#PCDATA)>
<!ATTLIST description
    href CDATA #IMPLIED>

<!ELEMENT content (all | (constraints?, attributes?, entities?) )>

<!ELEMENT all EMPTY>
<!ELEMENT constraints EMPTY>
<!ELEMENT attributes EMPTY>
<!ELEMENT entities EMPTY>

<!ENTITY % schema '
    href CDATA #REQUIRED
    type CDATA "dtd"
    required (yes | no) "yes"
'>

<!ATTLIST all
    %schema;
    root NMTOKENS #IMPLIED>
<!ATTLIST constraints
    %schema;
    root NMTOKENS #IMPLIED>
<!ATTLIST attributes
    %schema;>
<!ATTLIST entities
    %schema;>

<!ELEMENT styles (style*)>
<!ELEMENT style EMPTY>
<!ATTLIST all
    href CDATA #REQUIRED
    type CDATA #REQUIRED
    title CDATA #IMPLIED
    media CDATA #IMPLIED
    charset CDATA #IMPLIED
    alternate (yes | no) "no">

<!ELEMENT extension ANY>

Appendix B - RDF Schema for XPDs

To come...

XML Processing Description Language (XPDL)

Table of Contents