mod_publisher: Text Editing

mod_publisher turns the URL Mapping of mod_proxy_html into a general-purpose text search and replace. Whereas mod_proxy_html applies rewrites to HTML URLs, and in Version 2 extends that to other contexts where a link might occur, mod_publisher extends it further to allow parsing of text wherever it can occur.

Unlike mod_proxy_html, there is no presumption of the rewrites serving any particular purpose - this is entirely up to the user. This means we are potentially parsing all text in a document, which is a significantly higher overhead than mod_proxy_html. To deal with this, we provide fine-grained control over what is or isn't parsed, replacing the simple ProxyHTMLExtended with a more general MLRewriteOptions directive.


Reverse Proxy URL mapping

mod_publisher performs the same URL mapping as mod_proxy_html:

ProxyPass /internal/
ProxyPassReverse /internal/
MLRewriteOptions +urls
MLRewriteRule / /internal/
MLRewriteRule /internal
Reverse Proxy Extended URL mapping

To emulate mod_proxy_html's ProxyHTMLExtended directive in the above, we use
MLRewriteOptions +urls +events +cdata
in the above example.

Now, rewriting every time we see a / has bad side-effects when we extend it beyond simple links, so our first MLRewriteRule has to change too. For example:

MLRewriteRule	/	/internal/	ce
MLRewriteRule	([\"\'])/	$1/internal/	hR
MLRewriteRule	url(/	url(/internal/	he

The first rewrite rule is limited by its flags to URLs, which it deals with in full. The second rule catches slashes at the start of a quoted string for scripts and scripting events, using a regexp to remember whether it's a single or double quote, and the h flag to ignore URL attributes, which are dealt with by the previous rule. The third catches the pattern url(/ used in stylesheets, and is limited to CDATA sections by the flags.

Site-wide updates

If some detail that appears on thousands of pages across a site changes, you can update it globally with an MLRewrite rule. For example, suppose my phone number changes from 123 456 7890 to 987 654 3210. I have it in the text of many pages, and it might be in different formats such as 123-456-7890 or 123.456.7890, so I need to use a regexp to match it, but I can take the opportunity to standardise the format of the output:

MLRewriteOptions	+characters
MLRewriteRule	123.456.7890	987-654-3210	Rn

Obviously this is unnecessary if the details were included by some means that permit easy site-wide changes, but markup rewriting offer a working fix even when details are included as static text, or from mixed sources.

Variable Substitution

Authors can define their own variables to appear in document text, and expand them using markup rewrites. Here are three rules, to expand the word COPYRIGHT, the server variable DATE_GMT, and (with mod_setenvif) to create a message based on a property of the request itself.

MLRewriteOptions	+characters
MLRewriteRule	COPYRIGHT "Copyright © 2004, WebThing Ltd" he
SetEnvIf	Request_Protocol	^HTTP/1\.0$	browser="You appear to be using an old browser"
SetEnvIf	Request_Protocol	^HTTP/1\.1$	browser="You appear to be using a modern browser"
MLRewriteRule	BROWSERTEST	"$browser|My protocol detection doesn't recognise you;"	Vhe

Although you can do this, it is not generally recommended where there is another option: amongst other techniques supported by mod_publisher, macros or SSI will usually be more efficient.


The main directives you need to use text search-and-replace are MLRewriteOptions and MLRewriteRule.


URL remapping is only fully implemented for HTML/XHTML. For XML you can either use general attribute search-and-replace or delegate to mod_proxy_xml or other namespace handler.


Since text parsing is relatively slow, it is worth keeping MLRewriteOptions to the minimum required. Also for performance reasons, it is more efficient to use macros or (second choice) SSI if you are deliberately implementing a "variable" in HTML or XML.


Text rewriting applies only to text in the original document that is not consumed by any other processing. In particular, when other capabilities are enabled, SSI gets first bite at comments, while both macros and namespace processors process attributes before any rewrite rules are applied.