Cement Shoes for XML?


Note: In the six months since this was written, the first of the XML/CSS browsers to go 'live', Microsoft Internet Explorer 5.0, has arrived. While it does provide support well beyond the initial beta that prompted this article, it still has some severe shortcomings, so watch for a future article.


While the XML hype machine proceeds full speed ahead, critical components needed to make XML ubiquitous are missing in action. Two of XML's most prominent supporters, Microsoft and Netscape, have proven extraordinarily vague about the plans to allow users to view XML documents through their browsers. Microsoft seems intent on providing support for XML documents only as sub-components of HTML documents, while Netscape's Mozilla project seems likely to provide fuller support but on an extremely uncertain timeline. Without a strong client infrastructure, XML stands to lose much of its power, becoming another standardized file format that gets lots of use for interchange behind the scenes but no opportunity to shine in the spotlight.

Why Seeing XML Matters

XML offers developers a unique opportunity to create documents that can be reused in applications from printing to databases to interactive presentations. XML provides a combination of processing simplicity (well, mostly) and human-readability that makes it much easier for users to visit a Web site and immediately do something useful with the information they find there. XML provides organizations with the ability to create their own standards for document markup, while still using tools that work across all of those standards. Instead of having to create entirely new applications to support new standards, developers can customize tools that already exist, just creating a new style sheet or a small application module to show their information in an intelligible format or transfer it to a new processing application. By allowing developers to share a wide variety of reasonably simple tools, XML allows developers to lower their costs dramatically.

Low-level tools, parsers that read in XML documents and present them to an application in standardized form, have been available for over a year now. Many of these tools are free, and an interface for connecting different parsers to applications transparently (SAX), is also available. Browsers that support specific XML applications, like MathML (Amaya) and Chemical Markup Language (Jumbo) are also available. A limited amount of support, requiring scripting, is also available in Microsoft's Internet Explorer 4.0. A variety of tools for creating XML, transforming it into other markup (typically SGML or HTML), and using for specific applications are also arriving slowly.

As useful as these tools are, they don't provide much transparency for XML. If users have the bad fortune to receive an XML document in the current mainstream (Netscape or Microsoft) browsers, they may see the markup in all of its glory spread across their browser window, they may get a blank screen, or they may be asked if they want to save the file to their hard drives, depending on how the browser is configured. XML documents cannot be rendered with the Cascading Style Sheets (CSS) functionality already, though incompletely, built into the browsers. As a result, XML is not yet a reasonable choice for sending information to end users, needing either a transformation to HTML or a separate viewing application.

This may be acceptable for a narrow range of applications that need a separate viewing application anyway, and some organizations may not mind incurring the processing costs of transforming XML documents to HTML. It is not acceptable for organizations or individuals who expect to be able to publish in the traditional style of the Web, uploading files (the document plus possibly a style sheet) to a server without the need for transformations. Applications that publish information from programs are also unlikely to switch their output to XML if the XML will still need to undergo additional transformation before becoming usable to the average Web browser.

XML's Potential for Client-Server Architectures

Some people take this situation calmly, arguing that HTML is adequate for presenting information to users and that XML is only really necessary for data interchange between applications. If developers really want to send XML to their Web browser clients, they can write supporting code that converts everything to HTML, either on the server or at the client. HTML has sufficed for thousands of client-server applications so far, and can probably support them for years to come.

The problem with this scenario is that HTML is not useful as an interchange format. While developers have created tools for extracting information from HTML (in formats other than HTML and text), HTML gives users information structured for formatted presentation on a computer screen, not for reuse in other applications. HTML is much more complicated to parse, and its usage varies dramatically even across sites containing similar information.

By standardizing the markup used to encode information in documents, and relegating formatting information to a separate (but very robust) style sheet, XML makes it possible for users to take the information available in Web pages and put it to use in their own applications. This could mean automatic updating of price lists for a vendor, smooth transfers of calendar information for an organization, or precise information about parts that can be imported directly into a CAD program and used to draw the part without the user needing to go through a long series of cut-and-pastes or the developer creating new transfer formats. XML does more than present information to users in a client-server application; it provides them with information they can use immediately in other programs.

Making this reusability possible will require making XML ubiquitous; making XML ubiquitous requires making it as easy as possible for developers to present XML information in a wide variety of situations, the most common (and therefore the most important) of which is the browser screen. Converting browsers from a display for information to a switchboard for viewing information and sending it from place to place is a difficult task, in large part because of the implications of genuinely open document standards.

Why XML is Scary

XML's vision of extreme openness is hardly comforting for an industry long accustomed to maintaining market share by locking in users with file formats and incompatible features. While companies may still be able to create obfuscated DTDs with element and attribute names that are misleading or difficult to figure out, the usefulness of XML increases as more and more applications are able to read a file. Creating open standards for document syntax encourages open standards for document structures, making it harder and harder for vendors to maintain their hold on a market.

Keeping XML locked away in back-room applications and limited to use within a small set of applications on the client is one way of keeping this extreme openness constrained. Some of the companies that have most loudly promoted their limited use of XML are among those most threatened by its openness, and providing a commonly used implementation that offers only partial support for XML is an excellent way to claim standards compliance (and take advantage of it where it seems useful) without opening the Pandora's box of full compliance.

XML promises document interchange on a scale never before seen. XML was designed with computers from Personal Digital Assistants to mainframes in mind, and can be implemented easily on many operating systems in many languages. (The need for Unicode support, included to make internationalization easier, is a necessary stumbling block that keeps XML off some legacy equipment and operating systems.) Allowing documents to move from OS to OS and application to application is a significant threat to the current business style of the software industry, akin to the threat presented by Java's abilities to run applications on multiple operating systems.

XML has other implications as well, potentially encouraging users to abandon their current file systems for more sophisticated object stores. XLink, the linking component of the XML initiatives, promises much more sophisticated linking that the current HTML links, making it possible to weave much tighter document Webs without the use of expensive systems for cross-referencing and analyzing data. XPointer, which provides tools for describing portions of documents, may set off revolutions in the way that data is managed and manipulated.

Implications

While the browser vendors have supported the XML development process and have applied XML where it seems most advantageous to them, a full implementation of XML that encourages its ubiquity may not be in their best interest. XML is a revolutionary standard in a world that has settled down to an oligarchy, promising to radically open the transfer of information to developers and users, even those without the resources of large companies behind them.

The growing complexity of the XML family of standards and the split in the styles activity at the World Wide Web Consortium give those who might delay the implementation of XML a powerful set of weapons. The browser vendors never fully supported Cascading Style Sheets (CSS) Level 1; now they can use the need to implement CSS Level 2 as a smokescreen to cover their previous failure while making themselves look progressive. The slow development of a competing and more powerful standard, Extensible Style Language (XSL), gives them another opportunity to delay full implementation of XML in the browser.

Developers and others who stand to benefit from XML are going to have to band together to get browser vendors to support display of XML documents, or develop competing solutions and promote them against the massive installed base of the older browsers. Netscape's open development process is one option for contributing to the development of a browser that can natively understand and render XML; creating another browser is another. Pressuring Microsoft is difficult, as their process is closed and they are the company that probably has the most to lose from open file formats.

Making XML ubiquitous is going to require a lot of work by a lot of people, but the benefits promise to be enormous. The tools for creating truly reusable as well as human-readable information have arrived; creating products that genuinely take advantage of them is the next step. Letting XML run at full speed is going to require committed projects dedicated to its implementation, and the removal of the main obstacle - the installed base of XML-unfriendly browsers - that stands in its way.


Comments? Suggestions?

Please contact Simon St.Laurent


Some of my other XML essays are also available.

Copyright 1998 Simon St.Laurent