In view of the increasing popularity of this module, we're maintaining reasonably strict version control.
mod_proxy_html is based on one specific capability of
mod_accessibility, namely that of
rewriting HTML links so that they don't break in a reverse proxy.
The problem arises when links such as <a
href="http://private-address.example.com/">
are used in
a proxied page, when the server private-address.example.com
needs to be proxied (for example, because its address is not valid
beyond a private network). mod_proxy_html
rewrites
such URLs into the proxy's own namespace.
The original mod_proxy_html was introduced in 2003, and served to rewrite URLs in HTML and XHTML. At first it seemed a trivial derivative of mod_accessibility. However, it attracted a good deal of interest, and in January 2004 I wrote a tutorial on reverse proxying published at ApacheWeek. The tutorial deals with the problem in some detail, including but not limited to use of mod_proxy_html.
Version 1.1 followed in March 2004 and added a capability to make some minor fixups to broken HTML.
Version 2 adds several frequently-requested capabilities: most importantly, remapping of URLs within Javascript and CSS, and better detection of character encoding from backend servers. In view of the increased complexity, verbose logging is also available to help with your configuration.
Development of Version 2 has been supported by sponsorship from Swisscom IT Services AG and a contribution from Cowles Library.
mod_proxy_html 2 requires a reasonably up-to-date libxml2. If you get compile errors, either update your libxml2 installation or compile with -DUSE_OLD_LIBXML2. The latter will make no difference to processing well-formed markup, but will affect its ability to recover from badly-broken markup. Do not use libxml2 versions older than 2.5.10, as these have a bug that can have a severe impact on mod_proxy_html's performance when parsing large documents.
Users of earlier versions of mod_proxy_html can use this as a drop-in replacement. The only thing you should have to change is to add a ProxyHTMLDocType directive if you were correctly using the default before.
Registered Users may request binaries for any available platform: there is no charge to upgrade from earlier versions. Since April 2004, binaries have been supplied to registered users for Linux, FreeBSD, Windows, Solaris and MacOSX. If you need it for another platform, please ask.
A development version incorporating the first new capabilities since 2004 was first published as 3.0-dev in December 2006. Problems with it remained until June 2007 when several bugs were fixed. Since then, users who previously experienced problems have reported it running well. So at the end of July, the first 3.0 public release was announced.
Users of Version 2 can use Version 3 as a drop-in replacement, with one important change. Whereas Versions 1 and 2 had knowledge of HTML link attributes hardcoded, Version 3 reads this information from httpd.conf at server startup instead. This enables users to support proprietary HTML variants without modifying the code, but means you need to define a list of attributes to rewrite yourself. To help with this, a configuration example proxy_html.conf that defines standard HTML (equivalent to the knowledge hardcoded in Version 2) is included with mod_proxy_html 3.
You may choose to make further changes to take advantage of the new capabilities of version 3. But these are not essential.
A companion module mod_proxy_xml was introduced in October 2004. This serves the same purpose as mod_proxy_html for XML document types (including XHTML but not HTML), and serves to support XML namespaces. It is trivially extensible to non-HTML document types; for example, WML support is provided. It is simpler than mod_proxy_html because it uses mod_xmlns as parser and so only needs to implement the event handlers itself.