Numeric Tolerances

1. Introduction

XML is often used to represent engineering, scientific or financial data where floating point numbers are widely used.  Comparison using tolerances is used when writing software which handles floating point numbers and this article describes techniques which can be used in conjunction with DeltaXML comparison.

The XSLT filter code used in this article was written based on the DeltaXMLCore 5.3.9 release; it will be included in the samples sub-directory of a future release.

2. Background

The comparator processes well-formed XML which in turn represents numbers as textual XML.  It performs textual comparison of PCDATA and therefore will only report numbers being equal if they have the same lexical representation.  If different processors or different serialization software is being used to generate the different XML data being compared it is even possible that the 'same' numbers will have different lexical representations (think of '1.0' and '1.00') and therefore be reported as differing.  The W3C XML Schema Datatypes, also supported as part of XSLT 2.0,  provide facilities for converting, reading and writing floating point numbers.  Rather than build complicated datatype facilities and associated mechanics into the comparison engine, we recommend the use of XSLT 2.0 for post-processing delta output to resolve these issues with floating point numbers and their tolerances.

This article will use a worked example to explain some possible techniques.

3. Example Data

file1.xmlfile2.xml
<weather time="12437389">
  <record>
    <place>Malvern</place>
    <temperature>21.5</temperature>
  </record>
  <record>
    <place>Salisbury</place>
    <temperature>19.8</temperature>
  </record>
  <record>
    <place>Upton</place>
    <temperature>20.5</temperature>
  </record>
</weather>
<weather time="12437395">
  <record>
    <place>Malvern</place>
    <temperature>21.7</temperature>
  </record>
  <record>
    <place>Salisbury</place>
    <temperature>19.85</temperature>
  </record>
  <record>
    <place>Upton</place>
    <temperature>20.5</temperature>
  </record>
</weather>

The above example is designed to show numeric values used in element content and in attributes.  There are some differences in handling and so we'll discuss elements and attributes separately.

4. Element tolerances

When these are compared using the comparator some of these changes are represented in deltaV2 as follows; here is part of the file corresponding to a change in the 'Salisbury' record element containing a floating point number:

<record deltaxml:deltaV2="A!=B">
  <place deltaxml:deltaV2="A=B">Salisbury</place>
  <temperature deltaxml:deltaV2="A!=B">
    <deltaxml:textGroup deltaxml:deltaV2="A!=B">
       <deltaxml:text deltaxml:deltaV2="A">19.8</deltaxml:text>
       <deltaxml:text deltaxml:deltaV2="B">19.85</deltaxml:text>
    </deltaxml:textGroup>
  </temperature>
</record>

The numbers represented in the deltaV2 representation of this change are fairly easy to process with XSLT 2.0.  Here is a function that we will use later (defined in tolerance-checker.xsl in the sample directory):

<xsl:function name="deltaxml:element-in-tolerance" as="xs:boolean">
  <xsl:param name="elem" as="element()"/>
  <xsl:param name="tolerance" as="xs:double"/>
  <xsl:sequence select="if (exists($elem/@deltaxml:deltaV2) and exists($elem/deltaxml:textGroup[@deltaxml:deltaV2='A!=B'])) then
                          abs(number($elem/deltaxml:textGroup/deltaxml:text[@deltaxml:deltaV2='A']) -
                              number($elem/deltaxml:textGroup/deltaxml:text[@deltaxml:deltaV2='B'])) le $tolerance
                        else
                          false()"/>
</xsl:function>

This function when applied to the temperature element (the first parameter), will report if the values are within the tolerance (the second parameter).  Given this function we can then use XPath match expression where we know floating point numbers will be used, for example:

<xsl:template match="temperature[deltaxml:element-in-tolerance(. , $temperature_tolerance)]">

We could implement a template which removed the extra change information at this point and replaced one of the values.  However certain output filters which we could use to further process the result expect to deal with well-formed deltas (they assume deltaV2 attributes are accurate for any subtree).  The generic ignore changes mechanism is designed to deal with these issues so it makes sense to utilize it.  So all that we will do when we detect a change within tolerance is to add an ignore change attribute, using a template based on the identity template (defined in tolerance-checker.xsl in the sample directory):

<xsl:template match="temperature[deltaxml:element-in-tolerance(. , $temperature_tolerance)]">
  <xsl:copy>
    <xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates select="node()"/>
  </xsl:copy>
</xsl:template>

After applying our tolerance detection filter our Salisbury temperature record becomes:

<record deltaxml:deltaV2="A!=B">
  <place deltaxml:deltaV2="A=B">Salisbury</place>
  <temperature deltaxml:ignore-changes="true" deltaxml:deltaV2="A!=B">
    <deltaxml:textGroup deltaxml:deltaV2="A!=B">      
      <deltaxml:text deltaxml:deltaV2="A">19.8</deltaxml:text>
      <deltaxml:text deltaxml:deltaV2="B">19.85</deltaxml:text>
    </deltaxml:textGroup>
  </temperature>
</record>

The tolerance detection filter is equivalent to the mark changes filter in our standard ignore changes process.  The next filter to apply is apply-ignore-changes.xsl. This will convert the above record to:

<record deltaxml:deltaV2="A!=B">
  <place deltaxml:deltaV2="A=B">Salisbury</place>
  <temperature deltaxml:deltaV2="A=B">19.85</temperature>
</record>

This is almost correct, however notice that the deltaV2 attribute on the record element is incorrectly reporting a change when both child elements are now unchanged. The propagate-ignore-changes.xsl filter is finally used to correct this problem:

<record deltaxml:deltaV2="A=B">
  <place>Salisbury</place>
  <temperature>19.85</temperature>
</record>

5. Using XPaths to identify the numeric values

In the above example we used a template which matched all temperature elements, assuming they would contain a numeric value.  However more explicit XPaths could also be used and also a template could be used to handle multiple numeric elements.  Here are some examples which partially illustrate the power of XPath:

  temperature
  /weather/record/temperature
  /weather/record[place='Malvern']/temperature
  temperature | pressure | weight 

6. Attribute Tolerances

The representation of attribute change in deltaV2 is more complicated than that for element content, shown above.  Here is how the 'time' attribute used in the example above is represented:

<weather ... deltaxml:deltaV2="A!=B" ...>
  <deltaxml:attributes deltaxml:deltaV2="A!=B">  
    <dxa:time deltaxml:deltaV2="A!=B">
      <deltaxml:attributeValue deltaxml:deltaV2="A">12437389</deltaxml:attributeValue>
      <deltaxml:attributeValue deltaxml:deltaV2="B">12437395</deltaxml:attributeValue>
    </dxa:time>
  </deltaxml:attributes>
  ...

In the input data the attribute would have an XPath of /weather/@time, however when this is represented in deltaV2 the XPath becomes /weather/deltaxml:attributes/dxa:time.  The reasons for this change are covered in the deltaV2 documentation but arise from ease of XSLT processing and differences in XML namespace inheritance rules for attributes and elements requiring the use of various namespaces. Therefore to identify and process the attribute change the following template is used (defined in tolerance-checker.xsl in the sample directory):

<xsl:template match="/weather/deltaxml:attributes/dxa:time[deltaxml:attribute-in-tolerance(. , xs:double('10.0'))]">
  <xsl:copy>
    <xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates select="node()"/>
  </xsl:copy>
</xsl:template>

7. Specifying Tolerances

There are a number of ways in which the tolerances may be specified. Here are some suggestions:

7.1. Fixed values in XSLT

When using the tolerance checking functions it is possible to specify a fixed parameter value, for example:

deltaxml:attribute-in-tolerance(. , xs:double('10.0'))

7.2. Paramerized XSLT

Rather than fixing the value, it can be passed into the filter using a parameter, for example:

<xsl:stylesheet ....>
  <xsl:parameter name="temperature_tolerance" as="xs:double" select="0.1"/>
  ...
  <xsl:template match="temperature[deltaxml:element-in-tolerance(. , $temperature_tolerance)]">
  ...

The parameter has a default value, alternatively one could be passed in from the invoking code, which could include DXP parameters.

7.3. Annotated instance data

It may be possible to annotate your data with the tolerances, for example: 

<temperature tolerance="0.1">21.5</temperature>

The match statement would then become:

<xsl:template match="temperature[deltaxml:element-in-tolerance(. , @tolerance)]">

It may even be possible to use the attribute to identify toleranced numeric data:

<xsl:template match="*[@tolerance][deltaxml:element-in-tolerance(. , @tolerance)]">

However, please remember that there are two comparator inputs and you will need to ensure that both tolerance attributes are identical or you deal with possible changes.

7.4. Implicit annotation via DTDs and schemas

One way of avoiding the problem of mismatched tolerance attributes would be to include them as default and/or fixed attributes in a DTD or schema, for example:

<!ATTLIST temperature tolerance CDATA #FIXED "0.1">

8. Summary

If you wish to handle toleranced numeric data we suggest using this approach:

  • use the tolerance-checker.xsl filter with the predefined xsl:functions for element/attribute tolerances
  • apply these functions using XPaths to where you use numeric data
  • the functions add deltaxml:ignore-changes attributes at appropriate places to the data
  • the final two filter stages of the general ignore changes process then process these attributes

Also included in the sample directory is a filter for applying tolerance checking to all numeric text items and attributes (generic-tolerance-checker.xsl). This can be adapted as necessary to suit your needs.

9. Caveats

  • The code and examples assume that the elements and attributes contain exactly one numeric value.  Unfortunately DTDs are not type aware and cannot enforce such constraints (but W3C XML Schema and RelaxNG can do so).  If you are unsure of the numeric values we would recommend schema chceking and if that is not possible consider adding more error checking to the XSLT filter code as appropriate for the data.
  • Word-by-word filtering and certain punctuation characters typically found in numeric values should not be used in conjunction with this code.  If you need to apply word-by-word filtering to other parts of the data please ensure that numeric values are not processed using the appropriate filter control attributes.
  • We have used very simple example tolerances in this article, real tolerances for float point numbers are more complex.  An ideal tolerance is not a fixed value but depends on the magnitude of the numbers involved.  Discussion of this in an XML specific context is limited, however the following article while aimed at Java programmers discusses ULP (Units of Least Precision) in detail and is also applicable to XML:  IBM Developer Works, Java's new math, Part 2: Floating-point numbers  If java.lang.Math.ulp() is available on your platform we would suggest using it via an XSLT extension function as a basis for tolerance values.

10. Running the sample

If you have Ant installed, use the build script provided to run the sample. Simply type the following command to run the pipeline and produce the output file result.xml.

ant run

If you don't have Ant installed, you can run the sample from a command line by issuing the following command from the sample directory (ensuring that you use the correct directory separators for your operating system).

java -jar ../../command.jar compare tolerance file1.xml file2.xml result.xml

To clean up the sample directory, run the following Ant command.

ant clean