mod_publisher turns the URL Mapping of mod_proxy_html into a general-purpose text search and replace. Whereas mod_proxy_html applies rewrites to HTML URLs, and in Version 2 extends that to other contexts where a link might occur, mod_publisher extends it further to allow parsing of text wherever it can occur.
Unlike mod_proxy_html, there is no presumption of the rewrites serving any particular purpose - this is entirely up to the user. This means we are potentially parsing all text in a document, which is a significantly higher overhead than mod_proxy_html. To deal with this, we provide fine-grained control over what is or isn't parsed, replacing the simple ProxyHTMLExtended with a more general MLRewriteOptions directive.
mod_publisher performs the same URL mapping as mod_proxy_html:
ProxyPass /internal/ http://internal.example.com/ ProxyPassReverse /internal/ http://internal.example.com/ MLRewriteOptions +urls MLRewriteRule / /internal/ MLRewriteRule http://internal.example.com /internal
To emulate mod_proxy_html's ProxyHTMLExtended directive
in the above, we use
MLRewriteOptions +urls +events +cdata
in the above example.
Now, rewriting every time we see a / has bad side-effects when we extend it beyond simple links, so our first MLRewriteRule has to change too. For example:
MLRewriteRule / /internal/ ce MLRewriteRule ([\"\'])/ $1/internal/ hR MLRewriteRule url(/ url(/internal/ he
The first rewrite rule is limited by its flags to URLs, which it deals with in full. The second rule catches slashes at the start of a quoted string for scripts and scripting events, using a regexp to remember whether it's a single or double quote, and the h flag to ignore URL attributes, which are dealt with by the previous rule. The third catches the pattern url(/ used in stylesheets, and is limited to CDATA sections by the flags.
If some detail that appears on thousands of pages across a site changes, you can update it globally with an MLRewrite rule. For example, suppose my phone number changes from 123 456 7890 to 987 654 3210. I have it in the text of many pages, and it might be in different formats such as 123-456-7890 or 123.456.7890, so I need to use a regexp to match it, but I can take the opportunity to standardise the format of the output:
MLRewriteOptions +characters MLRewriteRule 123.456.7890 987-654-3210 Rn
Obviously this is unnecessary if the details were included by some means that permit easy site-wide changes, but markup rewriting offer a working fix even when details are included as static text, or from mixed sources.
Authors can define their own variables to appear in document text, and expand them using markup rewrites. Here are three rules, to expand the word COPYRIGHT, the server variable DATE_GMT, and (with mod_setenvif) to create a message based on a property of the request itself.
MLRewriteOptions +characters MLRewriteRule COPYRIGHT "Copyright © 2004, WebThing Ltd" he MLRewriteRule DATE_GMT $DATE_GMT; Vhe SetEnvIf Request_Protocol ^HTTP/1\.0$ browser="You appear to be using an old browser" SetEnvIf Request_Protocol ^HTTP/1\.1$ browser="You appear to be using a modern browser" MLRewriteRule BROWSERTEST "$browser|My protocol detection doesn't recognise you;" Vhe
The main directives you need to use text search-and-replace are MLRewriteOptions and MLRewriteRule.
Since text parsing is relatively slow, it is worth keeping MLRewriteOptions to the minimum required. Also for performance reasons, it is more efficient to use macros or (second choice) SSI if you are deliberately implementing a "variable" in HTML or XML.
Text rewriting applies only to text in the original document that is not consumed by any other processing. In particular, when other capabilities are enabled, SSI gets first bite at comments, while both macros and namespace processors process attributes before any rewrite rules are applied.