Ignoring Whitespace When Comparing XQuery

In this, my third post on XQuery code comparison, I look at the issue of ignoring whitespace changes where they are not significant (see previous posts: Comparing XQuery with DeltaXML Core and Adding Structure to an XQuery Comparison).

Here’s the ‘A’ version of the XQuery:

‘A’ version of the XQuery image

Now for the ‘B’ version of the code – with some extra whitespace added – most of which is not significant, you might also notice that the local:summary-full() and local-summary:short() are swapped over:

‘B’ version of the XQuery image

Lets now compare these files using the same DXP pipeline as developed over my previous 2 blog posts on this, the pipeline converts the XQuery to XML token elements and then adds wrapper elements and keys for the functions – which are also marked as non-ordered:

Comparison of ‘A’ version and 'B' version of the XQuery

This result (shown above) is fine, except for a couple of whitespace problems which are highlighted. This extra whitespace is a distraction and causes extra effort when performing a code merge, fortunately, DeltaXML Core comes with ‘Ignore Changes’ output XSLT filters that we can added to the pipeline, all that I need to do to insert a further XSLT filter ahead of these, to mark the changes that can be ignored.

Here’s the ‘mark-ignore-changes.xsl’ output XSLT filter:

xml version="1.0" encoding="utf-8"?>



<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:dxa="https://www.deltaxml.com/ns/non-namespaced-attribute"
                xmlns:deltaxml="https://www.deltaxml.com/ns/well-formed-delta-v1" 
                xmlns="http://www.w3.org/1999/xhtml"
                xpath-default-namespace="http://www.w3.org/1999/xhtml">
  
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    xsl:copy>
  xsl:template>
  
   
  <xsl:template match="span[@class eq 'whitespace'][@deltaxml:deltaV2]">
    <xsl:copy>
      <xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
      <xsl:apply-templates select="@* | node()"/>
    xsl:copy>        
  xsl:template>
  
   
  <xsl:template match="span[@class eq 'txt'][@deltaxml:deltaV2]">
    <xsl:choose>
      <xsl:when test="string-length(normalize-space(.)) eq 0">
        <xsl:copy>
          <xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
          <xsl:apply-templates select="@* | node()"/>
        xsl:copy>
      xsl:when>
      <xsl:otherwise>
        <xsl:copy-of select="."/>
      xsl:otherwise>
    xsl:choose>
  xsl:template>
  
xsl:stylesheet>

The above filter is an ‘identity transform’ with two added templates designed to match changes to the two types of whitespace changes that we wish to ignore, the tokens of interest (span elements) have ‘txt’ and ‘whitespace’ class attributes, a further check is required for ‘txt’ tokens to enusre only whitespace-only tokens of this type are marked. Now the filter has been created we need to add this to the DXP pipeline along with the built-in ‘ignore changes’ filters – as shown below:

DOCTYPE comparatorPipeline SYSTEM "../dxp/dxp.dtd">

<comparatorPipeline description="compare xquery" id="xquery">
  
  <inputFilters>
    <filter>
      <file path="xquery2xml.xsl" relBase="dxp"/>
    filter>
    <filter>
      <file path="key-xquery.xsl" relBase="dxp"/>
    filter>   
  inputFilters>
  
  <outputFilters>
    
    <filter>
      <file path="mark-ignore-changes.xsl" relBase="dxp"/>
    filter>
    
    
    <filter>
      <resource name="/xsl/apply-ignore-changes.xsl"/>
    filter>
    <filter>
      <resource name="/xsl/propagate-ignore-changes.xsl"/>
    filter>
    <filter>
      <file path="xquery-tokens2html.xsl" relBase="dxp">file>      
    filter>
  outputFilters>
  
  <outputProperties>
    <property name="indent" literalValue="no"/>
  outputProperties>
  
  <comparatorFeatures>
    <feature name="http://deltaxml.com/api/feature/isFullDelta" literalValue="true"/>
    <feature name="http://deltaxml.com/api/feature/enhancedMatch1" literalValue="true"/>
  comparatorFeatures>
  
comparatorPipeline>

With the DXP pipeline now modified as above to ignore whitespace changes, this is the result of running DeltaXMLCore:

DXP pipeline modified comparison

The result (above) is what we wanted, the whitespace added in the ‘B’ version of XQuery code is in the result, but is not marked as a change.

Conclusion

It has proved relatively simple to refine the XQuery code comparison pipeline I built previously so that certain whitespace changes are ignored. This is one of the great strengths of using a transform pipeline – the capabilities of the comparison can gradually be improved as new requirements for our comparison arise, and we can also easily exploit filters that come bundled with DeltaXML Core. The main motive for this exercise was really to investigate how non-XML could be converted to XML within a Core pipeline to allow a comparison, but in the process we’ve already built a code comparison solution that I’ve found to be considerably more robust than many off-the-shelf equivalents.

Before finishing this blog series, there’s just one more fix I’d like to add: currently a function that is moved is compared correctly, but, because it’s treated as orderless, the new position of the function is not shown in the result. There’s a ‘HandleMoves’ filter included with Core that I will probably be using for this, but I’ll save this work for another day.

Keep Reading

Managing Risk in Legal Documentation

/
Proactively addressing compliance, accuracy, and security risks in legal documentation is essential to protect from costly errors.

Ensuring Accuracy in Legal Documentation

/
Efficient document comparison and merging can drastically improve accuracy, collaboration, and compliance for legal teams.

Introducing HTML Compare

/
HTML Compare is your go-to for tracking, comparing, and managing HTML content changes with ease, offering clear visual highlights and customisable settings.

Introducing Subtree Processing Mode for Greater Flexibility

/
A new feature that lets you control how content is compared by processing sections as either text or data.

Beyond Step-Through XSLT Debugging

/
Print-debugging in XSLT provides a broader view of code behaviour by capturing variable values at multiple points.

Solving Common Challenges with Inaccurate Document Management

Discover practical strategies to overcome common challenges in regulated industries.

How to avoid non-compliance when updating technical documents in regulated industries

Navigate the challenges of updating technical documents in regulated industries.

Built-in XML Comparison vs Document Management Systems (DMS)

Compare using specialised XML comparison software versus a DMS in regulated industries.

How Move Detection Improves Document Management

Learn how move detection technology improves document management by accurately tracking relocated content.