mod_proxy_html: history

In view of the increasing popularity of this module, we're maintaining reasonably strict version control.

Origins

mod_proxy_html is based on one specific capability of mod_accessibility, namely that of rewriting HTML links so that they don't break in a reverse proxy. The problem arises when links such as <a href="http://private-address.example.com/"> are used in a proxied page, when the server private-address.example.com needs to be proxied (for example, because its address is not valid beyond a private network). mod_proxy_html rewrites such URLs into the proxy's own namespace.

Version 1

The original mod_proxy_html was introduced in 2003, and served to rewrite URLs in HTML and XHTML. At first it seemed a trivial derivative of mod_accessibility. However, it attracted a good deal of interest, and in January 2004 I wrote a tutorial on reverse proxying published at ApacheWeek. The tutorial deals with the problem in some detail, including but not limited to use of mod_proxy_html.

Version 1.1 followed in March 2004 and added a capability to make some minor fixups to broken HTML.

Version 2

Version 2 adds several frequently-requested capabilities: most importantly, remapping of URLs within Javascript and CSS, and better detection of character encoding from backend servers. In view of the increased complexity, verbose logging is also available to help with your configuration.

Development of Version 2 has been supported by sponsorship from Swisscom IT Services AG and a contribution from Cowles Library.

Version 2.0 (not publicly released)
Version 2.0 introduced the extended remapping capabilities, support for HTML <META>, and charset detection using all possible methods (HTTP rules, XML rules, HTML rules).
Version 2.1 (July 2004)
Version 2.1 introduced verbose logging for debug/diagnostics, and was the first published 2.x version.
Version 2.2 (July 2004)
Version 2.3 (September 2004)
These were bugfix releases. Under current numbering, they should have been 2.1.x increments. Version 2.3 also introduced a server token.
Version 2.4 (September), 2.4.1 (October), 2.4.2, 2.4.3 (November) 2004
Version 2.4 made the CDATA buffer size configurable, and corrected the list of javascript events supported to include ad-hoc events. Version 2.4.1 is an important bugfix. Version 2.4.2 is a very minor performance enhancement. Version 2.4.3 adds additional diagnostics when in verbose mode.
Version 2.5.0 (August), 2.5.1 (September) 2005
For users of Apache httpd/2.0.x this is unchanged from version 2.4.3, but it adds support for compiling with versions 2.1 and 2.2. 2.5.1 eliminates compiler warnings introduced in 2.5.0, and adds a compile option for using old libxml2 versions.

Upgrading to mod_proxy_html 2

mod_proxy_html 2 requires a reasonably up-to-date libxml2. If you get compile errors, either update your libxml2 installation or compile with -DUSE_OLD_LIBXML2. The latter will make no difference to processing well-formed markup, but will affect its ability to recover from badly-broken markup. Do not use libxml2 versions older than 2.5.10, as these have a bug that can have a severe impact on mod_proxy_html's performance when parsing large documents.

Users of earlier versions of mod_proxy_html can use this as a drop-in replacement. The only thing you should have to change is to add a ProxyHTMLDocType directive if you were correctly using the default before.

Registered Users may request binaries for any available platform: there is no charge to upgrade from earlier versions. Since April 2004, binaries have been supplied to registered users for Linux, FreeBSD, Windows, Solaris and MacOSX. If you need it for another platform, please ask.

mod_proxy_xml

A companion module mod_proxy_xml was introduced in October 2004. This serves the same purpose as mod_proxy_html for XML document types (including XHTML but not HTML), and serves to support XML namespaces. It is trivially extensible to non-HTML document types; for example, WML support is provided. It is simpler than mod_proxy_html because it uses mod_xmlns as parser and so only needs to implement the event handlers itself.