mod_xml2enc is a transcoding module that can be used to extend the internationalisation support of libxml2-based filter modules by converting encoding before and/or after the filter has run. Thus an unsupported input charset can be converted to UTF-8, and output can also be converted to another charset if required.
Modules that you might use this with include:
Filter modules need to be enabled for mod_xml2enc. Modules should
use the xml2enc_charset optional function to retrieve the charset
argument to pass to the libxml2 parser, and may use the xml2enc_filter
optional function to postprocess to another encoding.
For normal operation with an xml2enc-enabled module, it is sufficient to insert mod_xml2enc in the filter chain ahead of any libxml2-based filter. No additional configuration is required. For example, to use it with mod_publisher to improve the latter's i18n support with HTML and XML, it is sufficient to use
FilterProvider iconv xml2enc Content-Type $text/html
FilterProvider iconv xml2enc Content-Type $xml
FilterProvider markup markup-publisher Content-Type $text/html
FilterProvider markup markup-publisher Content-Type $xml
FilterChain iconv markup
mod_publisher will now support any character set supported by either (or both) of libxml2 or apr_xlate/iconv.
The following configuration directives are available:
Syntax xml2EncDefault name
This defines the default encoding to assume when absolutely no charset
information is available from the backend server. The default value for
this is ISO-8859-1, as specified in HTTP/1.0 and assumed in
earlier modules.
Syntax xml2EncAlias charset alias [alias ...]
This server-wide directive aliases one or more charset to another charset. This enables encodings not recognised by libxml2 to be handled internally by libxml2's charset support using the translation table for a recognised charset. This serves two purposes: to support character sets (or names) not recognised either by libxml2 or iconv, and to skip conversion for a charset where it is known to be unnecessary.
Syntax xml2StartParse element [elt*]
Specify that the markup parser should start at the first instance of any of the elements specified. This can be used where a broken backend inserts leading junk that messes up the parser (example here). It should never be used for XML, nor well-formed HTML.
There is currently no direct way to configure post-processing. Another module can invoke post-processing to output a desired character set, but a system administrator cannot do so directly.
mod_xml2enc.c source code is available under the Apache License, Version 2.0.