Table of Contents
Handling Numeric Tolerances
XML is often used to represent engineering, scientific or financial data where floating point numbers are widely used. Comparison using tolerances is used when writing software which handles floating point numbers and this article describes techniques which can be used in conjunction with DeltaXML comparison.
The XSLT filter code used in this article was written based on the DeltaXMLCore 5.3.9 release; it will be included in the samples sub-directory of a future release.
1 Background
The comparator processes well-formed XML which in turn represents numbers as textual XML. It performs textual comparison of PCDATA and therefore will only report numbers being equal if they have the same lexical representation. If different processors or different serialization software is being used to generate the different XML data being compared it is even possible that the 'same' numbers will have different lexical representations (think of '1.0' and '1.00') and therefore be reported as differing. The W3C XML Schema Datatypes, also supported as part of XSLT 2.0, provide facilities for converting, reading and writing floating point numbers. Rather than build complicated datatype facilities and associated mechanics into the comparison engine, we recommend the use of XSLT 2.0 for post-processing delta output to resolve these issues with floating point numbers and their tolerances.
This article will use a worked example to explain some possible techniques.
2 Example Data
<weather time="12437389"> | <weather time="12437395"> |
The above example is designed to show numeric values used in element content and in attributes. There are some differences in handling and so we'll discuss elements and attributes separately.
3 Element tolerances
When these are compared using the comparator some of these changes are represented in deltaV2 as follows; here is part of the file corresponding to a change in the 'Salisbury' record element containing a floating point number:
<record deltaxml:deltaV2="A!=B">
<place deltaxml:deltaV2="A=B">Salisbury</place>
<temperature deltaxml:deltaV2="A!=B">
<deltaxml:textGroup deltaxml:deltaV2="A!=B">
<deltaxml:text deltaxml:deltaV2="A">19.8</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="B">19.85</deltaxml:text>
</deltaxml:textGroup>
</temperature>
</record>The numbers represented in the deltaV2 representation of this change are fairly easy to process with XSLT 2.0. Here is a function that we will use later (defined in tolerance-checker.xsl in the sample directory):
<xsl:function name="deltaxml:element-in-tolerance" as="xs:boolean">
<xsl:param name="elem" as="element()"/>
<xsl:param name="tolerance" as="xs:double"/>
<xsl:sequence select="if (exists($elem/@deltaxml:deltaV2) and exists($elem/deltaxml:textGroup[@deltaxml:deltaV2='A!=B'])) then
abs(number($elem/deltaxml:textGroup/deltaxml:text[@deltaxml:deltaV2='A']) -
number($elem/deltaxml:textGroup/deltaxml:text[@deltaxml:deltaV2='B'])) le $tolerance
else
false()"/>
</xsl:function>This function when applied to the temperature element (the first parameter), will report if the values are within the tolerance (the second parameter). Given this function we can then use XPath match expression where we know floating point numbers will be used, for example:
<xsl:template match="temperature[deltaxml:element-in-tolerance(. , $temperature_tolerance)]">
We could implement a template which removed the extra change information at this point and replaced one of the values. However certain output filters which we could use to further process the result expect to deal with well-formed deltas (they assume deltaV2 attributes are accurate for any subtree). The generic ignore changes mechanism is designed to deal with these issues so it makes sense to utilize it. So all that we will do when we detect a change within tolerance is to add an ignore change attribute, using a template based on the identity template (defined in tolerance-checker.xsl in the sample directory):
<xsl:template match="temperature[deltaxml:element-in-tolerance(. , $temperature_tolerance)]"> <xsl:copy> <xsl:attribute name="deltaxml:ignore-changes" select="'true'"/> <xsl:copy-of select="@*"/> <xsl:apply-templates select="node()"/> </xsl:copy> </xsl:template>
After applying our tolerance detection filter our Salisbury temperature record becomes:
<record deltaxml:deltaV2="A!=B">
<place deltaxml:deltaV2="A=B">Salisbury</place>
<temperature deltaxml:ignore-changes="true" deltaxml:deltaV2="A!=B">
<deltaxml:textGroup deltaxml:deltaV2="A!=B">
<deltaxml:text deltaxml:deltaV2="A">19.8</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="B">19.85</deltaxml:text>
</deltaxml:textGroup>
</temperature>
</record>The tolerance detection filter is equivalent to the mark changes filter in
our standard ignore changes process. The next filter to apply is
apply-ignore-changes.xsl. This will convert the above record to:
<record deltaxml:deltaV2="A!=B"> <place deltaxml:deltaV2="A=B">Salisbury</place> <temperature deltaxml:deltaV2="A=B">19.85</temperature> </record>
This is almost correct, however notice that the deltaV2 attribute on the
record element is incorrectly reporting a change when both child elements are
now unchanged. The propagate-ignore-changes.xsl filter is finally
used to correct this problem:
<record deltaxml:deltaV2="A=B"> <place>Salisbury</place> <temperature>19.85</temperature> </record>
4 Using XPaths to identify the numeric values
In the above example we used a template which matched all temperature elements, assuming they would contain a numeric value. However more explicit XPaths could also be used and also a template could be used to handle multiple numeric elements. Here are some examples which partially illustrate the power of XPath:
temperature /weather/record/temperature /weather/record[place='Malvern']/temperature temperature | pressure | weight
5 Attribute Tolerances
The representation of attribute change in deltaV2 is more complicated than that for element content, shown above. Here is how the 'time' attribute used in the example above is represented:
<weather ... deltaxml:deltaV2="A!=B" ...> <deltaxml:attributes deltaxml:deltaV2="A!=B"> <dxa:time deltaxml:deltaV2="A!=B"> <deltaxml:attributeValue deltaxml:deltaV2="A">12437389</deltaxml:attributeValue> <deltaxml:attributeValue deltaxml:deltaV2="B">12437395</deltaxml:attributeValue> </dxa:time> </deltaxml:attributes> ...
In the input data the attribute would have an XPath of
/weather/@time, however when this is represented in deltaV2 the
XPath becomes /weather/deltaxml:attributes/dxa:time. The reasons
for this change are covered in the deltaV2
documentation but arise from ease of XSLT processing and differences in XML
namespace inheritance rules for attributes and elements requiring the use of
various namespaces. Therefore to identify and process the attribute change the
following template is used (defined in
tolerance-checker.xsl
in the sample directory):
<xsl:template match="/weather/deltaxml:attributes/dxa:time[deltaxml:attribute-in-tolerance(. , xs:double('10.0'))]">
<xsl:copy>
<xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
<xsl:copy-of select="@*"/>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>6 Specifying Tolerances
There are a number of ways in which the tolerances may be specified. Here are some suggestions:
6.1 Fixed values in XSLT
When using the tolerance checking functions it is possible to specify a fixed parameter value, for example:
deltaxml:attribute-in-tolerance(. , xs:double('10.0'))6.2 Paramerized XSLT
Rather than fixing the value, it can be passed into the filter using a parameter, for example:
<xsl:stylesheet ....> <xsl:parameter name="temperature_tolerance" as="xs:double" select="0.1"/> ... <xsl:template match="temperature[deltaxml:element-in-tolerance(. , $temperature_tolerance)]"> ...
The parameter has a default value, alternatively one could be passed in from the invoking code, which could include DXP parameters.
6.3 Annotated instance data
It may be possible to annotate your data with the tolerances, for example:
<temperature tolerance="0.1">21.5</temperature>
The match statement would then become:
<xsl:template match="temperature[deltaxml:element-in-tolerance(. , @tolerance)]">
It may even be possible to use the attribute to identify toleranced numeric data:
<xsl:template match="*[@tolerance][deltaxml:element-in-tolerance(. , @tolerance)]">
However, please remember that there are two comparator inputs and you will need to ensure that both tolerance attributes are identical or you deal with possible changes.
6.4 Implicit annotation via DTDs and schemas
One way of avoiding the problem of mismatched tolerance attributes would be to include them as default and/or fixed attributes in a DTD or schema, for example:
<!ATTLIST temperature tolerance CDATA #FIXED "0.1">
7 Summary
If you wish to handle toleranced numeric data we suggest using this approach:
- use the
tolerance-checker.xslfilter with the predefined xsl:functions for element/attribute tolerances - apply these functions using XPaths to where you use numeric data
- the functions add deltaxml:ignore-changes attributes at appropriate places to the data
- the final two filter stages of the general ignore changes process then process these attributes
Also included in the sample directory is a filter for applying tolerance checking to all numeric text items and attributes (generic-tolerance-checker.xsl). This can be adapted as necessary to suit your needs.
8 Caveats
- The code and examples assume that the elements and attributes contain exactly one numeric value. Unfortunately DTDs are not type aware and cannot enforce such constraints (but W3C XML Schema and RelaxNG can do so). If you are unsure of the numeric values we would recommend schema chceking and if that is not possible consider adding more error checking to the XSLT filter code as appropriate for the data.
- Word-by-word filtering and certain punctuation characters typically found in numeric values should not be used in conjunction with this code. If you need to apply word-by-word filtering to other parts of the data please ensure that numeric values are not processed using the appropriate filter control attributes.
- We have used very simple example tolerances in this article, real tolerances
for float point numbers are more complex. An ideal tolerance is not a fixed
value but depends on the magnitude of the numbers involved. Discussion of this
in an XML specific context is limited, however the following article while aimed
at Java programmers discusses ULP (Units of Least Precision) in detail and is
also applicable to XML:
IBM
Developer Works, Java's new math, Part 2: Floating-point numbers If
java.lang.Math.ulp()is available on your platform we would suggest using it via an XSLT extension function as a basis for tolerance values.
9 Running the sample
If you have Ant installed, use the build script provided to run the sample. Simply type the following command to run the pipeline and produce the output file result.xml.
ant run
If you don't have Ant installed, you can run the sample from a command line by issuing the following command from the sample directory (ensuring that you use the correct slashes for your operating system).
java -jar ../../command.jar compare tolerance file1.xml file2.xml result.xml
To clean up the sample directory, run the following Ant command.
ant clean
