mod_proxy_html: history

In view of the increasing popularity of this module, we're maintaining reasonably strict version control.

Origins

mod_proxy_html is based on one specific capability of mod_accessibility, namely that of rewriting HTML links so that they don't break in a reverse proxy. The problem arises when links such as <a href="http://private-address.example.com/"> are used in a proxied page, when the server private-address.example.com needs to be proxied (for example, because its address is not valid beyond a private network). mod_proxy_html rewrites such URLs into the proxy's own namespace.

Version 1

The original mod_proxy_html was introduced in 2003, and served to rewrite URLs in HTML and XHTML. At first it seemed a trivial derivative of mod_accessibility. However, it attracted a good deal of interest, and in January 2004 I wrote a tutorial on reverse proxying published at ApacheWeek. The tutorial deals with the problem in some detail, including but not limited to use of mod_proxy_html.

Version 1.1 followed in March 2004 and added a capability to make some minor fixups to broken HTML.

Version 2

Version 2 adds several frequently-requested capabilities: most importantly, remapping of URLs within Javascript and CSS, and better detection of character encoding from backend servers. In view of the increased complexity, verbose logging is also available to help with your configuration.

Development of Version 2 has been supported by sponsorship from Swisscom IT Services AG and a contribution from Cowles Library.

Version 2.0 (not publicly released)
Version 2.0 introduced the extended remapping capabilities, support for HTML <META>, and charset detection using all possible methods (HTTP rules, XML rules, HTML rules).
Version 2.1 (July 2004)
Version 2.1 introduced verbose logging for debug/diagnostics, and was the first published 2.x version.
Version 2.2 (July 2004)
Version 2.3 (September 2004)
These were bugfix releases. Under current numbering, they should have been 2.1.x increments. Version 2.3 also introduced a server token.
Version 2.4 (September), 2.4.1 (October), 2.4.2, 2.4.3 (November) 2004
Version 2.4 made the CDATA buffer size configurable, and corrected the list of javascript events supported to include ad-hoc events. Version 2.4.1 is an important bugfix. Version 2.4.2 is a very minor performance enhancement. Version 2.4.3 adds additional diagnostics when in verbose mode.
Version 2.5.0 (August), 2.5.1 (September) 2005, 2.5.2 (April 2006)
For users of Apache httpd/2.0.x this is unchanged from version 2.4.3, but it adds support for compiling with versions 2.1 and 2.2. 2.5.1 eliminates compiler warnings introduced in 2.5.0, and adds a compile option for using old libxml2 versions. Version 2.5.2 fixes a bug that had led to segfaults with Apache 2.2 on some platforms.

Upgrading to mod_proxy_html 2

mod_proxy_html 2 requires a reasonably up-to-date libxml2. If you get compile errors, either update your libxml2 installation or compile with -DUSE_OLD_LIBXML2. The latter will make no difference to processing well-formed markup, but will affect its ability to recover from badly-broken markup. Do not use libxml2 versions older than 2.5.10, as these have a bug that can have a severe impact on mod_proxy_html's performance when parsing large documents.

Users of earlier versions of mod_proxy_html can use this as a drop-in replacement. The only thing you should have to change is to add a ProxyHTMLDocType directive if you were correctly using the default before.

Registered Users may request binaries for any available platform: there is no charge to upgrade from earlier versions. Since April 2004, binaries have been supplied to registered users for Linux, FreeBSD, Windows, Solaris and MacOSX. If you need it for another platform, please ask.

Version 3

A development version incorporating the first new capabilities since 2004 was first published as 3.0-dev in December 2006. Problems with it remained until June 2007 when several bugs were fixed. Since then, users who previously experienced problems have reported it running well. So at the end of July, the first 3.0 public release was announced.

Version 3.0.0 (July 2007)
The first public Version 3 release enables the current capabilities. Version 3.0.1 was a bugfix release.
Version 3.1.0 (Originally about end-2007, "current release" 2009).
This version is a simplification from 3.0. Internationalisation code is replaced by hooks into mod_xml2enc, a more generic module to deal with i18n on behalf of mod_proxy_html and other modules using libxml2 to filter HTML and/or XML. This brings some further bugfixes to mod_proxy_html users.
Version 3.1.1 (October 2009).
This fixes a long-standing bug sufficiently serious to make ProxyHTMLMeta unusable for some users.
Version 3.1.2 (October 2009).
This was created in response to a note from Steffen at ApacheLounge that version 3.1.1 generated compile errors on the Windows platform. For users of other platforms, there is no need to upgrade from 3.1.1.

Upgrading to mod_proxy_html 3

Users of Version 2 can use Version 3 as a drop-in replacement, with one important change. Whereas Versions 1 and 2 had knowledge of HTML link attributes hardcoded, Version 3 reads this information from httpd.conf at server startup instead. This enables users to support proprietary HTML variants without modifying the code, but means you need to define a list of attributes to rewrite yourself. To help with this, a configuration example proxy_html.conf that defines standard HTML (equivalent to the knowledge hardcoded in Version 2) is included with mod_proxy_html 3.

You may choose to make further changes to take advantage of the new capabilities of version 3. But these are not essential.

mod_proxy_xml

A companion module mod_proxy_xml was introduced in October 2004. This serves the same purpose as mod_proxy_html for XML document types (including XHTML but not HTML), and serves to support XML namespaces. It is trivially extensible to non-HTML document types; for example, WML support is provided. It is simpler than mod_proxy_html because it uses mod_xmlns as parser and so only needs to implement the event handlers itself.