Ignoring Whitespace When Comparing XQuery

In this, my third post on XQuery code comparison, I look at the issue of ignoring whitespace changes where they are not significant (see previous posts: Comparing XQuery with DeltaXML Core and Adding Structure to an XQuery Comparison).

Here’s the ‘A’ version of the XQuery:

‘A’ version of the XQuery image

Now for the ‘B’ version of the code – with some extra whitespace added – most of which is not significant, you might also notice that the local:summary-full() and local-summary:short() are swapped over:

‘B’ version of the XQuery image

Lets now compare these files using the same DXP pipeline as developed over my previous 2 blog posts on this, the pipeline converts the XQuery to XML token elements and then adds wrapper elements and keys for the functions – which are also marked as non-ordered:

Comparison of ‘A’ version and 'B' version of the XQuery

This result (shown above) is fine, except for a couple of whitespace problems which are highlighted. This extra whitespace is a distraction and causes extra effort when performing a code merge, fortunately, DeltaXML Core comes with ‘Ignore Changes’ output XSLT filters that we can added to the pipeline, all that I need to do to insert a further XSLT filter ahead of these, to mark the changes that can be ignored.

Here’s the ‘mark-ignore-changes.xsl’ output XSLT filter:

xml version="1.0" encoding="utf-8"?>



<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:dxa="https://www.deltaxml.com/ns/non-namespaced-attribute"
                xmlns:deltaxml="https://www.deltaxml.com/ns/well-formed-delta-v1" 
                xmlns="http://www.w3.org/1999/xhtml"
                xpath-default-namespace="http://www.w3.org/1999/xhtml">
  
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    xsl:copy>
  xsl:template>
  
   
  <xsl:template match="span[@class eq 'whitespace'][@deltaxml:deltaV2]">
    <xsl:copy>
      <xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
      <xsl:apply-templates select="@* | node()"/>
    xsl:copy>        
  xsl:template>
  
   
  <xsl:template match="span[@class eq 'txt'][@deltaxml:deltaV2]">
    <xsl:choose>
      <xsl:when test="string-length(normalize-space(.)) eq 0">
        <xsl:copy>
          <xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
          <xsl:apply-templates select="@* | node()"/>
        xsl:copy>
      xsl:when>
      <xsl:otherwise>
        <xsl:copy-of select="."/>
      xsl:otherwise>
    xsl:choose>
  xsl:template>
  
xsl:stylesheet>

The above filter is an ‘identity transform’ with two added templates designed to match changes to the two types of whitespace changes that we wish to ignore, the tokens of interest (span elements) have ‘txt’ and ‘whitespace’ class attributes, a further check is required for ‘txt’ tokens to enusre only whitespace-only tokens of this type are marked. Now the filter has been created we need to add this to the DXP pipeline along with the built-in ‘ignore changes’ filters – as shown below:

DOCTYPE comparatorPipeline SYSTEM "../dxp/dxp.dtd">

<comparatorPipeline description="compare xquery" id="xquery">
  
  <inputFilters>
    <filter>
      <file path="xquery2xml.xsl" relBase="dxp"/>
    filter>
    <filter>
      <file path="key-xquery.xsl" relBase="dxp"/>
    filter>   
  inputFilters>
  
  <outputFilters>
    
    <filter>
      <file path="mark-ignore-changes.xsl" relBase="dxp"/>
    filter>
    
    
    <filter>
      <resource name="/xsl/apply-ignore-changes.xsl"/>
    filter>
    <filter>
      <resource name="/xsl/propagate-ignore-changes.xsl"/>
    filter>
    <filter>
      <file path="xquery-tokens2html.xsl" relBase="dxp">file>      
    filter>
  outputFilters>
  
  <outputProperties>
    <property name="indent" literalValue="no"/>
  outputProperties>
  
  <comparatorFeatures>
    <feature name="http://deltaxml.com/api/feature/isFullDelta" literalValue="true"/>
    <feature name="http://deltaxml.com/api/feature/enhancedMatch1" literalValue="true"/>
  comparatorFeatures>
  
comparatorPipeline>

With the DXP pipeline now modified as above to ignore whitespace changes, this is the result of running DeltaXMLCore:

DXP pipeline modified comparison

The result (above) is what we wanted, the whitespace added in the ‘B’ version of XQuery code is in the result, but is not marked as a change.

Conclusion

It has proved relatively simple to refine the XQuery code comparison pipeline I built previously so that certain whitespace changes are ignored. This is one of the great strengths of using a transform pipeline – the capabilities of the comparison can gradually be improved as new requirements for our comparison arise, and we can also easily exploit filters that come bundled with DeltaXML Core. The main motive for this exercise was really to investigate how non-XML could be converted to XML within a Core pipeline to allow a comparison, but in the process we’ve already built a code comparison solution that I’ve found to be considerably more robust than many off-the-shelf equivalents.

Before finishing this blog series, there’s just one more fix I’d like to add: currently a function that is moved is compared correctly, but, because it’s treated as orderless, the new position of the function is not shown in the result. There’s a ‘HandleMoves’ filter included with Core that I will probably be using for this, but I’ll save this work for another day.

Keep Reading

Move detection when comparing XML files

/
DeltaXML introduces an enhanced move detection feature that provides a clearer insight of how your content has changed.

Configuring XML Compare for Efficient XML Comparison

/
Define pipelines and fine-tune the comparison process with various configuration options for output format, parser features, and more.

A Beginner’s Guide to Comparing XML Files

/
With XML Compare, you receive more than just a basic comparison tool. Get started with the most intelligent XML Comparison software.

Introducing Character By Character Comparison

/
Find even the smallest differences in your documents with speed and precision with character by character comparison.

Everything Great About DeltaJSON

Accessible through an intuitive online GUI or REST API, DeltaJSON is the complete package for managing changing JSON data. Learn everything about makes DeltaJSON great.

Mastering Complex Table Comparisons Within Your Technical Documentation

Our software excels at presenting changes in complex tables and technical content.

Simplifying Your JSON Management Experience with DeltaJSON

DeltaJSON simplifies JSON data management with the introduction of an NPM package.

Navigating XML Change in Aviation

Discover how the aviation industry can effectively manage XML changes to ensure compliance and safety while enhancing operational excellence.

File Formats and ConversionQA Functionality

ConversionQA is a tool by DeltaXML ensuring the success of content conversion projects by comparing content from any two XML formats.

Never miss an update

Sign up to our newsletter and never miss an update on upcoming features or new products