Converting from HTML to XML

Update: While I'm continuing to update my site, XHTML has arisen. For more on the W3C's XMLized HTML, see this list of XHTML links or the XHTML-L mailing list.

September 18, 1998.

Well, the top-level is now well-formed. Mostly it meant tracking down missing end tags and adding the '/' to empty tags. Per John Cowan's advice, I used a space in front of the / (<BR />) and it seems to work okay in the browsers I've tested so far.

Eventually, I intend to make the entire site (minus a few large and highly impractical sections that are purely historical) validate using John Cowan's Itsy Bitsy Teeny Weeny Simple Hypertext (IBTWSH) DTD, supplemented with some additional document structures.

I've tested the pages with Netscape Navigator 3.0 and 4.06, as well as Internet Explorer 3.02 and 4.01, all on Windows 95 and NT. On the Macintosh, Internet Explorer 3.01 and Netscape Navigator 4.01 seem to like it. Lynx 2.8 also seems to like the pages, for the most part. (It's time to build a NOFRAMES element.)

One big suggestion that sounds promising is to use Dave Raggett's TIDY, which adds missing end tags and does a number of other good things, especially with the -asxml option. I'll see how it does and report back.

September 21, 1998.

It occurs to me that I should explain why I'm undertaking this seemingly pointless exercise before encouraging everyone else to leap into XML-syntax with an HTML vocabulary.

This approach isn't what most people have in mind for 'XML-over-the-Web', undoubtedly. It's a small first step, even a baby step. Still, it's an important exercise in these very early days. Getting HTML developers used to the syntactic constraints of XML is an important first step toward getting out the word. Once we have real browser support for generic XML, we can start developing more meaningful vocabularies.

Using XML syntax with an HTML vocabulary does a number of important things:

I'm not sure this is as compelling as I'd like it to be, but, like I said, it's a start, and it prepares me for later moves to my own XML vocabularies when browser support becomes available.

October 4, 1998

It's been a little while, but I've been busy finishing a book. Time to get back to the site and continue the clean up. The next steps in this project are adding an XML declaration to the pages, developing the necessary extensions to the IBTWSH DTD, and building a style sheet. Fortunately, I've already developed most of a style sheet for IBTWSH as part of Building XML Applications. The style sheet isn't really necessary for HTML browsers to display these documents - they do, after all, effectively have built-in styles for HTML elements - but it will allow XML+CSS2 viewers and editors to display the document without needing to know anything about the semantics of HTML itself.

The approach I'd like to see for the next generation of HTML would actually remove most direct processing of the HTML, and replace it with either generic style sheet-based processing (i.e., <B>=font-weight:bold) or a plug-in model for browsers. CSS2 seems capable of handling everything in the HTML vocabulary except frames, which may be worth handling as a special case.