DOCTYPE replacement using a Java FilterReader/FilterStream


DOCTYPEChanger and DOCTYPEChangerStream are Java classes that allow you to change the DOCTYPE declarations as a document is being read into an XML parser. This may be useful in several cases:

These classes allow you to set a root element, public identifiers, system identifiers, and an internal subset independently, and also let you specify whether the DOCTYPE declaration should be changed if present.

This code is very simple - it doesn't extract information from the existing DOCTYPE, check for the correct root element, or anything like that. It either adds a new DOCTYPE when there isn't one present, blots out the old DOCTYPE, or leaves the old DOCTYPE alone.

This code came out of my experience writing O2KCleaner, a similar filter that does a lot more massaging of Office 2000 HTML documents. Nigel Whitaker created a FilterStream-based implementation based on my FilterReader-based implementation, making it easier to change DOCTYPE when the encoding of the document is not yet known.

DOCTYPEChanger and DOCTYPEChangerStream are distributed under the Mozilla Public License (MPL), version 1.1. Feedback and contributions are welcome. Contributions will be acknowledged.

Thanks to Nigel Whitaker for extending the code to Streams, and thanks to Elliotte Rusty Harold for his fine book Java I/O, which made this adventure possible. (David Flanagan's Java in a Nutshell was helpful for test code as well.)

Comments are welcome.

Download here:

Last updated 11/18/00