XML, Integration, and the Smaller Developer

Many thanks to Richard Goerwitz, Baden Hughes, Dan Brickley, and Clifford Thompson. Comments are welcome.


XML is an opportunity to remake the software development landscape as well as the world of documents. The ease with which XML-based architectures can be created and rebuilt is an opportunity for smaller developers to take back the integration tasks that had been disappearing into the hands of larger companies. Small developers can now compete on the same playing field as established vendors, using open and easily-mastered XML documents to build their own infrastructure. At the same time, there are some dark clouds hanging over XML. Historical quirks, especially complexity created by its derivation from SGML, make XML seem more difficult than it really is. The ongoing development of XML and related standards at the W3C (a consortium dominated by larger vendors) also threatens to make XML more difficult to process.

Small developers have a chance to escape the grasp of the companies that have dominated software development for the last decade. Finally, a tool has arrived - a mere data format - that promises to free computing from the tar of proprietary formats and incompatible features. The freedom that Extensible Markup Language (XML) gives data has business implications well outside the world of document and data interchange. Developers have an opportunity to integrate components on their own terms, mixing and matching the most appropriate software for a task rather than having to accept choices made for them by schemes which lock developers into particular vendors' vision of computing.

Document, Data, and Object Exchange

XML wasn't invented to create a paradise for developers building software and integrating systems. The W3C created XML as a new document format for the Web, allowing the creation of custom document vocabularies, with supplementary standards for presenting those documents and linking them into hypertexts. XML's roots are in SGML, a markup language used primarily for document management and publishing. By simplifying SGML drastically, while still keeping XML documents compatible with SGML, XML's creators hoped to bring SGML's tools to a larger audience and reinvigorate the Web.

Unfortunately, the development of XML document tools has been remarkably slow, and XML's march on the Web even slower. Over a year after the XML 1.0 specification was released, Microsoft's Internet Explorer 5.0 is the first Web browser providing significant support for XML, and its support is still quite controversial (and somewhat buggy). Netscape's Mozilla provides more substantial support, but is probably six months from release, and Opera's plans for XML are still cloudy. Generic document editors for XML have been slow to appear, though some simple tools have been released and a number of announcements for more complex tools promise a brighter future. The low-cost document tools that seemed like a large part of XML's promise at its inception aren't here yet.

While XML's future as a ubiquitous document format isn't certain, its use in data-centric fields is expanding rapidly. Developers who focus on pushing data around are often closer to code, more willing to write their own applications than document authors. As result, XML may have a very bright future - soon, even now - as an exchange format for data. While XML isn't the most efficient format for storing some types of information because of its text orientation and the verbosity of its markup, it's an easy thing to agree on. XML is capable of representing sophisticated structures of a variety of types, well beyond the simple tables of delimited text commonly used to exchange information, and comes with tools for describing those structures. At the same time, free tools for processing, creating, and transforming XML information are widely available, as are standards for integrating them, like SAX and the W3C's Document Object Model.

While data-centric applications can take advantage of XML's ability to store complex structures, object-centric applications can use the close correlation between the hierarchical structure of XML documents and the hierarchical structure of object-based applications. While XML is not the most efficient tool for serializing object data, it has the significant advantage of not being tied to any particular object architecture. An XML document representing a serialized Java object may be used to construct a similar object in C++, Limbo, or even used in environments with no conception of 'objects'. For programmers, XML is an opportunity to get information out of their programs and exchange it with other programs and systems without needing to worry about information loss.

The Problem of Integration

Current software environments go to great lengths to provide integration without requiring developers to do much work. Application integration has been treated as a complex task that shouldn't be left up to the application developers, and has been moved into the operating system or other infrastructure like object brokers. Large-scale applications running on multiple systems still need integration, and software developers have to build hooks into their software to support integration, but for the most part, integration has been seen as a problem that should be made to disappear as much as possible.

This approach fueled the recent growth of the PC industry (which learned it, to some extent, from the Macintosh). Integration is now spreading into servers. Vendors are trying to present their goods as having a low total cost of ownership because of their smoothly integrated tool sets and the availability of complete sets of tools from single vendors. This move toward integration has helped large software vendors in a number of ways, helping them justify their moves into new products while locking their customers into single vendor solutions. (The approach seems to have worked on both the low- and high-ends of the computing industry so far, though some sectors, mostly in Unix, have definitely resisted the trend.)

As the glue connecting applications has disappeared into the background, the level of expertise needed to modify that glue has risen dramatically, while building applications on top of that glue has become easier. The situation has both benefits and costs. Developers can snap together components more easily, building applications quickly, but can only do so within the range permitted by the environment providing integration. Changing the rules by which the components connect is the sole realm of the vendor providing the framework, giving them significant control over how programs are built, and how easily those programs can connect to other programs.

XML may change all of this. While XML may not replace all existing standards for program integration, it opens up new possiblities that allow developers to reclaim control over how their programs interact with other programs. Because XML provides a highly structured format for exchanging information that supports multiple vocabularies, creating open exchange formats for trading information between programs is not especially difficult. Some communities, like electronic commerce, are hard at work building standardized vocabularies to facilitate just this kind of exchange. Other communities are pondering the creation of other standardized vocabularies.

The creation of these vocabularies doesn't have to be limited to standards for large communities. Developers can create their own vocabularies to describe the contents of their programs and build standards on them. Once the information is in XML, it's free of its original source, and ready to travel to another application. There are some dangers in the potential for an explosion of vocabularies, but fortunately there are tools (notably XSL and architectural forms) for transforming information from one vocabulary to another and other tools (like MDSAX) that can map document contents expressed using different vocabularies into common object architectures.

In addition to the vocabulary choices, new protocols that use XML over a network are emerging. XML-RPC uses XML to express remote procedure calls made between computers on a network, and those procedures may be performed in any environment on any computer on a network that can parse the XML-RPC information and respond appropriately. The Extensible Protocol (XP), under development at the IETF, provides two-channel communication over a network connection, again using XML as the core vocabulary for managing transactions. At the same time, plain-vanilla Hypertext Transfer Protocol (HTTP) provides a well-supported conduit for XML information.


While developers of all sizes may benefit from the new possibilities XML has created for data exchange, certain categories stand to gain more than others. Organizations that have invested large amounts of time creating integrated systems and branding them may find that they have new competition, and that the heavy investment they made in integration isn't going to pay off forever. Smaller firms that do more integration of other companies' components stand to gain, as components are freed from particular vendor's integration strategies. As more tools become capable of returning information and accepting information in XML format, those tools become more and more interchangeable and therefore compete more directly.

For some large vendors, this promises to become a headache. 'Single-system' approaches are no longer as necessary to ensure reliable operation. Relying on a single vendor's closed integration requires an enormous amount of trust in that vendor, and even using components from multiple vendors within a common framework requires trust in that framework. As open source developments provide new frameworks that can be readily explored, and XML (and standards associated with it) provides a tool for exchanging information within those frames, trusting a vendor to provide the integration as well as the components may no longer be such sound strategy. Shorn of their control over the framework, many of these vendors may have to focus on their components, now reduced to interchangeable commodities, to compete. These vendors may want to resist the siren call of XML, but that is difficult, as wave after wave of announcements has brought new products - especially databases - into the XML fold.

Smaller developers and integrators, on the other hand, stand to gain tremendously. By opening up the integration process, XML technologies greatly expand the catalog of choices available to these groups. The twin foundations of XML and TCP/IP networking make it possible to connect applications running in any environment on any platform on the network. It may not always be the most efficient route, but it opens up enormous possibilities. At the same time, the cost of building components declines substantially, especially general purpose components. XML parsers are commodities, typically available for free with open source licensing. Small components that do one thing and do it well can be connected together into more complex structures, taking advantage of XML's clear structures to provide new capabilities.

Small developers need to remember, however, where these capabilities are coming from, and recognize that they have very little voice in the vendor-dominated consortium that controls XML. The W3C seems to be slowly piling up disconnected layers on top of the standard, making interoperability more difficult. Even within the XML specification itself, a number of issues make it difficult to exchange XML information reliably when the same document is read by two different parsers, possibly creating road blocks to efficient development. XML is not yet well-entrenched, and all development focused on XML still carries significant risk, despite the enormous amount of hype surrounding the standard. A white knight for smaller developers has appeared, but may yet be stuck in the mud. Pulling that knight out of the mud is going to take some work, but the benefits may prove enormous.

Comments? Suggestions?

Please contact Simon St.Laurent

Some of my other XML essays are also available.

Copyright 1999 Simon St.Laurent