Text plus context
Labeled structured content
XML provides labels placed directly into textual content, and these labels define structure. Text processors can pick up labels easily, but tracking structure is much harder. And namespaces? Ouch.
Most XML parsers report slightly different textual content to the application than appeared in the document. XML 1.0 blesses these changes, but preserving the original text may be useful in many contexts.
Striking a balance
Text processors are too dumb to understand XML structures, while XML processors are pretty uncaring about the original text of XML documents. The range in between may be useful.
The Ripper processor tears documents into their component pieces, providing text plus context, without leaping all the way to XML 1.0 processing.
Previous Page <
> Next Page