mod_xml2enc

mod_xml2enc is a transcoding module that can be used to extend the internationalisation support of libxml2-based filter modules by converting encoding before and/or after the filter has run. Thus an unsupported input charset can be converted to UTF-8, and output can also be converted to another charset if required.

Modules that you might use this with include:

Usage

Filter modules need to be enabled for mod_xml2enc. Modules should use the xml2enc_charset optional function to retrieve the charset argument to pass to the libxml2 parser, and may use the xml2enc_filter optional function to postprocess to another encoding.

Example

For normal operation with an xml2enc-enabled module, it is sufficient to insert mod_xml2enc in the filter chain ahead of any libxml2-based filter. No additional configuration is required. For example, to use it with mod_publisher to improve the latter's i18n support with HTML and XML, it is sufficient to use


FilterProvider iconv    xml2enc Content-Type $text/html
FilterProvider iconv    xml2enc Content-Type $xml
FilterProvider markup	markup-publisher Content-Type $text/html
FilterProvider markup	markup-publisher Content-Type $xml
FilterChain	iconv markup

mod_publisher will now support any character set supported by either (or both) of libxml2 or apr_xlate/iconv.

Configuration

The following configuration directives are available:

xml2EncDefault

Syntax xml2EncDefault name

This defines the default encoding to assume when absolutely no charset information is available from the backend server. The default value for this is ISO-8859-1, as specified in HTTP/1.0 and assumed in earlier modules.

xml2EncAlias

Syntax xml2EncAlias charset alias [alias ...]

This server-wide directive aliases one or more charset to another charset. This enables encodings not recognised by libxml2 to be handled internally by libxml2's charset support using the translation table for a recognised charset. This serves two purposes: to support character sets (or names) not recognised either by libxml2 or iconv, and to skip conversion for a charset where it is known to be unnecessary.

xml2StartParse

Syntax xml2StartParse element [elt*]

Specify that the markup parser should start at the first instance of any of the elements specified. This can be used where a broken backend inserts leading junk that messes up the parser (example here). It should never be used for XML, nor well-formed HTML.

Post-Processing

There is currently no direct way to configure post-processing. Another module can invoke post-processing to output a desired character set, but a system administrator cannot do so directly.

Availability

mod_xml2enc.c source code is available under the Apache License, Version 2.0.