mod_publisher: DTD-based processing

A DTD constrains what markup is allowed in a document. mod_publisher offers partial support for XML DTDs.

Ideally, we should like to support full validation. However, that is a more expensive operation than is reasonably whilst processing on the fly. In particular, it would require backtracking, which would immediately break Apache's pipelining. This is acceptable for occasional operations (mod_annot validates in full when a document is edited), but not for normal serving of all documents. mod_publisher implements instead a more limited form of enforcement, just checking whether each element and attribute is valid according to the selected DTD.

Examples

A few cases where you might apply a DTD are:

HTML fixups
You can eliminate bogus and deprecated HTML or XHTML markup by loading the W3C's XHTML Strict DTD. This works for HTML4 as well as XHTML1, as the elements and attributes are shared between the two. This is similar to the "Asis" option in mod_accessibility.
Customising for a client device
If you are serving a particular client device whose capabilities are known, you can apply a DTD that corresponds exactly to the client's needs. An example of this is the Opera Mobile Accelerator.

How to use it

mod_publisher doesn't load a DTD specified in the document itself, as the operation of loading and parsing a DTD is too expensive to apply to every request. Instead it applies a DTD configured in httpd.conf.

The directive MLDTD is all you need to apply a DTD:

	MLDTD	url-of-dtd-to-use

You control the scope of the MLDTD directive in the usual way, using <VirtualHost>, <Location>, <Directory>, <Files>, or their variants.

MLDTD loads and parses DTDs at server startup, avoiding the need to do so on every request. It is recommended that you load it locally (from a file:// URL), and that you strip out external entities that are not relevant to specifying elements and attributes.

Directives

Precedence

DTD processing can of course be combined with other processing supported by mod_publisher. Any elements declared as macros or handled by namespace modules take precedence over the DTD, but rewrites will apply only after DTD fixups.