Working Draft - 5 April 1999
Editor's Note:This is merely an initial draft. All comments, suggestions, and contributions are welcome and will be credited. I'd like this to be as open a process as possible, including discussion on xml-dev, the primary XML development mailing list. This document has no official standing with any standards body or process. Many thanks to Nathan Kurz, Bill la Forge, and Gabe Beged-Dov.
1.0 - XML Document Typing, Present and Possible
2.0 - Creating an XML Processing Description
2.1 - XPD Structure
2.2 - Class Information
2.3 - Content Information
2.4 - Style Information
2.5 - Extension Information
3.0 - Linking XML Processing Descriptions to XML Documents
3.1 - HTTP Header
3.2 - Processing Instruction
3.3 - Attribute with Namespace
3.4 - XLink
3.5 - Application Decision
4.0 - Conformance
Appendix A - DTD for XPDs
Appendix B - RDF Schema for XPDs
Appendix C - XPD for XPDs
In order for document processing to be reliable, it is necessary to be able to describe classes of documents and to identify individual documents as members of these classes. These document classes may then provide a shared set of constraints or other processing that applies to all documents that are members of the class. Class membership may be determined by comparing documents against constraints, but classes can provide other services to their members, including shared resources for document content, presentation, and processing.
While XML DTDs can be used to describe document classes, a number of 'features' in the XML 1.0 specification make the use of DTDs unreliable, with consequences ranging from missing default attributes to unexpanded entities to applications foiled in their attempts to use style sheets requiring particular document structures. Some of these problems lie in the nature of DTDs, while others originate in the different mechanisms by which applications are permitted to process (or not process) DTDs.
At the same time XML's document-centric approach requires every document to provide information required for its processing rather than being members of a class of documents which share that information. As document sets grow larger, making changes to the set can be difficult, especially if those changes involve which DTDs, schemas, or style sheets a document references.
XML Processing Description Language (XPDL) seeks to provide a means of describing document classes which will simplify the management of document classes and make processing more reliable. By creating descriptions for classes of documents, rather than relying on documents to link to sets of resources themselves, XPDL makes it possible both to move beyond the monolithic model presented by DTDs today and to add new resources, like schemas, style sheets, and processing information to the concept of a document class.
XML Processing Descriptions (XPDs) are XML files that provide resources for XML document classes. XPDs include general information describing the document class, as well as constraints that limit membership in the class, resources that can be used to add content to members of the class, style information, and an extensible area that can be used to provide more detailed information regarding processing.
Note:At present, XPDs are described with an XML 1.0 DTD, though XPD validation is not required. In the long run, this will probably be replaced with an RDF schema describing a similar XML document.
XPDs contain four general sections describing different aspects of the document class:
The root element of an xpd document must be xpd, optionally with a version attribute describing the version of XPDL in use, as shown below:
<!ELEMENT xpd (class?, content?, styles?, extension?> <!ATTLIST xpd version CDATA "wd040499">
The class element contains information about the document class as a whole, not about the documents that are its members. The class element may provide identifiers for referencing this document class (both human- and machine-readable), as well as information about the owner of the document class and a reference to complete documentation.
<!ELEMENT class (owner?, description?)> <!ATTLIST class classID CDATA #IMPLIED MIMEtype CDATA "application/xml" className CDATA #IMPLIED version CDATA #IMPLIED> <!ELEMENT owner (#PCDATA)> <!ATTLIST owner href CDATA #IMPLIED> <!ELEMENT description (#PCDATA)> <!ATTLIST description href CDATA #IMPLIED>
The attributes of the class element provide information that may be used to integrate processing of this class with generic frameworks. The
classID attribute should contain a unique indentifier for this class. (Note that no infrastructure for such an identifier is provided; URIs may be used.) The
MIMEtype attribute describes the MIME content-type label used for documents that are members of this class. By default, this will be
application/xml, but other values may be used. The
className attribute provides a human readable identifier for the class that may be used in menus and other UI contexts. Finally, the
version identifier allows classes to identify which version of the class this document represents.
The owner and description elements both provide space for a brief description of this document type's creators or maintainers as well as a human-readable description of the class itself. The href attribute on both elements should reference (if provided) more detailed information.
A class element that contained a description of the XPD for XPDL itself might look like:
<class classID="http://purl.oclc.org/NET/xpdl/v1" MIMEtype="application/x-xpdl" className="XPD" version="1.0"> <owner href="http://www.simonstl.com/">Simon St.Laurent, initial editor</owner> <description href="http://purl.oclc.org/NET/xpdl">XML Processing Description Language (XPDL) </description> </class>
All of this information provides description of the class, but none of it is necessary for purely automated processing of members of the document class.
content element provides the information that was stored in the XML 1.0 document type declaration, though it may be broken down more finely. (No provision is made for an external subset, either.) The
content element has the following content model:
<!ELEMENT content (all | (constraints?, attributes?, entities?) )>
all element may used to simplify compatibility with XML document type declarations, and is the equivalent of having a set of
entities elements with identical attribute values. All of these elements are EMPTY, and all have the same set of attributes (except for
constraints, which have one additional attribute):
<!ELEMENT all EMPTY> <!ELEMENT constraints EMPTY> <!ELEMENT attributes EMPTY> <!ELEMENT entities EMPTY> <!ENTITY % schema ' href CDATA #REQUIRED type CDATA "dtd" required (yes | no) "yes" '> <!ATTLIST all %schema; root NMTOKENS #IMPLIED> <!ATTLIST constraints %schema; root NMTOKENS #IMPLIED> <!ATTLIST attributes %schema;> <!ATTLIST entities %schema;>
all element is used, the document referenced by the href will be used for constraints (structural validation in the case of DTDs, possibly more for schemas), attribute defaulting, and entity processing, reproducing the set of capabilities provided through references to an external DTD subset in XML 1.0. If
all is used, the document must meet a full set of constraints specified in the document referenced. Situations where constraints should not be checked should use the
entities elements instead.
If, instead of
all, the other set of choices is used, one document may be referenced for constraints, another for the attribute defaulting, and yet another for entity processing. (There may be overlap - the same document could be referenced for constraints testing and attribute defaulting, while another is referenced for entity processing, for example.)
root attribute is present, on either the
constraints element, the name of the root element of the document must be one of the name tokens inside that attribute value for the document to meet the constraints. If the
root attribute is not present, then no constraints apply to the root element of the document. Possibility: Adding a rootNS attribute for namespace support.
In all cases, the
required attribute indicates whether or not the processor is required to retrieve the resources. The default is yes - XPDL is intended to promote as complete a rendition of XML as possible - but a no value may be appropriate in certain situations. Attribute defaulting and entity processing might not be necessary for a document that has already had this processing performed, but constraints checking may still be valuable, for instance. In other cases, attribute defaulting and entity processing may be useful, but constraints checking may be left out entirely.
type attribute is intended to provide a means of communicating the kind of resource on the other end of the HREF. As schemas and other tools for describing XML documents become available in addition to XML 1.0 DTDs, processors will need a way to decide if they can process this information. A common notation for identifying these documents is needed, but is outside the scope of this proposal. (At present, XML 1.0 DTDs, RDF Schemas, and the experimental XML-Data, DCD, SOX, and DDML proposals are the primary contenders for these descriptions.)
In all cases, if a document fails to meet the constraints imposed by the document referenced by the constraints or all element's href attribute, the processor must report that an error has occurred to the application and that the document is not in fact a member of this document class. (Even if required is set to 'no', if the processor loads the constraints, it must make this report.) Also, if the processor finds errors in the files containing the constraint, attribute defaulting, or entity processing information, or simply cannot understand their format, it is required to report an error to the application.
XPDs can also convey style information. Indeed, some XPDs may only carry style information, acting as a style management tool that centralizes the process of specifying which styles are used by a document. The style mechanism looks much like the processing instruction syntax used to connect style sheets to documents that is provided by the W3C, though it uses elements rather than processing instructions to provide style information.
style elements may appear within the
styles element, and their attributes provide the same information provided by the W3C processing instruction's pseudo-attributes:
<!ELEMENT styles (style*)> <!ELEMENT style EMPTY> <!ATTLIST all href CDATA #REQUIRED type CDATA #REQUIRED title CDATA #IMPLIED media CDATA #IMPLIED charset CDATA #IMPLIED alternate (yes | no) "no">
Styles referenced in the XPD should be considered to occur before any stylesheets referenced in the document itself, though ideally such references will be moved to the XPD and left out of the document. Possibility: Style groups to gather sets of styles and their alternatives. Provides more structure for style mechanisms to work with.
XPD processing applications should provide support for users to choose among various style sheets, though they are not required to provide support for style sheets in formats they don't understand. (CSS-only applications can't support XSL, and vice-versa; robust XPDs may provide alternatives for both.)
XPDL provides no semantics for content within the extension area, and processors are not required to do anything with that that content. This area may prove useful to applications (like those using MDSAX) that support configurable processing, and may be used according to additional specifications written by the developers of those tools.
<!ELEMENT extension ANY>
Several mechanisms for linking XML processing descriptions are available, and at this draft stage, it isn't clear which is most appropriate. The three mechanisms described below are possibilities at this point. One thing is clear, however: applications using an XPD for content processing should ignore DOCTYPE declarations and all of their contents in XML 1.0 documents to avoid duplicate (and possibly conflicting) processing. Possibility: Explicit switches.
When available, an HTTP header (like Link) could be used to prescribe an XPD for a particular document, much as the Content-type header provides MIME types at present. These headers appear to have been removed from HTTP 1.1, however. Creating additional headers outside the standards process may be possible for particular situations, but is difficult to propose as a general case.
A simple processing instruction with the target 'xpd' may be used to connect a document to an XPD. Multiple xpd processing instructions may connect a document to multiple XPDs, though the document will have to meet the constraints of all of them. (The first XPD declared has priority over all others if there are conflicting attribute defaulting or entity processing declarations.) The syntax for the PI uses a target and a single pseudo-attribute, href.
All xpd processing instructions must appear in the prolog, before the document type declaration (which it replaces) and the root element. Possibility: required pseudo-attribute for cases where XPD is informative but not necessary.
The possibility of using XPDs on subdocuments and the related possibility of processing documents with no prolog suggest that a means of connecting XPDs to documents through an attribute, identified with a namespace, could be viable.
<elementName xmlns:xpd="http://purl.oclc.org/NET/xpdl/" xpd:href="xpdfile.xpd">
When the XLink standard stabilizes, it may be an appropriate tool for connecting XPDs to documents using more than a simple href.
Applications may use XPDs to store their sets of preferences for particular types of file processing, effectively treating them as resource files. This may reduce the need to modify code directly to implement relatively simple changes.
XPD will provide a much more robust set of rules for ensuring that applications using XPD return the same set of documents every time. In cases where an application doesn't understand a portion of the content area of the XPD, an error message must be generated. This section will define possible errors and provide rules for their reporting and handling.
<!ELEMENT xpd (class?, content?, styles?, extension?> <!ATTLIST xpd version CDATA "wd040499"> <!ELEMENT class (owner?, description?)> <!ATTLIST class classID CDATA #IMPLIED MIMEtype CDATA "application/xml" className CDATA #IMPLIED version CDATA #IMPLIED> <!ELEMENT owner (#PCDATA)> <!ATTLIST owner href CDATA #IMPLIED> <!ELEMENT description (#PCDATA)> <!ATTLIST description href CDATA #IMPLIED> <!ELEMENT content (all | (constraints?, attributes?, entities?) )> <!ELEMENT all EMPTY> <!ELEMENT constraints EMPTY> <!ELEMENT attributes EMPTY> <!ELEMENT entities EMPTY> <!ENTITY % schema ' href CDATA #REQUIRED type CDATA "dtd" required (yes | no) "yes" '> <!ATTLIST all %schema; root NMTOKENS #IMPLIED> <!ATTLIST constraints %schema; root NMTOKENS #IMPLIED> <!ATTLIST attributes %schema;> <!ATTLIST entities %schema;> <!ELEMENT styles (style*)> <!ELEMENT style EMPTY> <!ATTLIST all href CDATA #REQUIRED type CDATA #REQUIRED title CDATA #IMPLIED media CDATA #IMPLIED charset CDATA #IMPLIED alternate (yes | no) "no"> <!ELEMENT extension ANY>
Copyright (c) 1999 Simon St.Laurent. Redistribution permitted.