Whitespace processing is often a key consideration when XML documents are compared. This document outlines how DeltaXML Core's whitespace-processing features have been enhanced for Release 8.2 of DeltaXML Core.
A more complete description of whitespace-processing can be found in the following guides:
'Ignorable Whitespace' refers to text nodes with whitespace-only characters found in the XML tree in places where text is not allowed (these are most frequently added for formatting purposes). Core's LexicalPreservation filter can identify and treat such nodes specially:
As in previous releases, the 'NormalizeSpace' filter will not normalize whitespace-only text nodes if a 'mixed-content' attribute with a value of 'true' is found on the parent element. All other whitespace-only text nodes are treated as 'ignorable whitespace' and removed. Behaviour has been enhanced in the following ways for cases where a DTD or XML Schema for the input XML has not been loaded.
In previous releases, the NormalizeSpace filter reduced any sequence of whitespace characters in significant text content to a single space character unless an xml:space="preserve' attribute was found on an ancestor element. This behaviour has now been extended as follows.
As described in the previous sections, information is added by the LexicalPreservation and NormalizeSpace filters to the input XML to assist with whitespace processing; This information, kept as 'grammar', 'mixed-content' and 'space' attributes, may in some cases be useful for formatting the comparison result (our own 'folding' DiffReport exploits this).
For this reason there is now a new LexicalPreservation 'PreserveContentModel' setting:
* The 'preserve' and 'deltaxml' namespaces are respectively:
The type of comparator used for comparison determines whether whitespace-processing filters are controlled implicitly through comparator properties or by adding the filters explicitly. This is summarised below.
Filters affecting whitespace processing - '✓' indicates implicit control via comparator properties
The whitespace processing changes are fully integrated into the DocumentComparator.
For LexicalPreservation, whitespace processing is managed automatically through LexicalPreservationConfig properties.
Because the NormalizeSpace filter is added to the pipeline explicitly, the 'pre-normalization' filter should be added immediately before this (unless an XML Schema or DTD will always be loaded). This is added as the resource XSLT filter 'whitespace-detection.xsl'.
Here, the LexicalPreservation filter is added explicitly. It should therefore be followed immediately by the 'lexical-whitespace.xsl' resource XSLT filter.
The NormalizeSpace filter is also added explicitly. It should therefore be immediately preceded by the 'whitespace-detection.xsl' resource XSLT filter.