DeltaXML

Ignoring Whitespace When Comparing XQuery

In this, my third post on XQuery code comparison, I look at the issue of ignoring whitespace changes where they are not significant (see previous posts: Comparing XQuery with DeltaXML Core and Adding Structure to an XQuery Comparison).

Here’s the ‘A’ version of the XQuery:

Now for the ‘B’ version of the code – with some extra whitespace added – most of which is not significant, you might also notice that the local:summary-full() and local-summary:short() are swapped over:

Lets now compare these files using the same DXP pipeline as developed over my previous 2 blog posts on this, the pipeline converts the XQuery to XML token elements and then adds wrapper elements and keys for the functions – which are also marked as non-ordered:

This result (shown above) is fine, except for a couple of whitespace problems which are highlighted. This extra whitespace is a distraction and causes extra effort when performing a code merge, fortunately, DeltaXML Core comes with ‘Ignore Changes’ output XSLT filters that we can added to the pipeline, all that I need to do to insert a further XSLT filter ahead of these, to mark the changes that can be ignored.

Here’s the ‘mark-ignore-changes.xsl’ output XSLT filter:

<?xml version="1.0" encoding="utf-8"?>
<!-- Copyright (c) 2005-2010 DeltaXML Ltd. All rights reserved -->
<!-- $Id$ -->

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                
xmlns:dxa="http://www.deltaxml.com/ns/non-namespaced-attribute"
                
xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
                
xmlns="http://www.w3.org/1999/xhtml"
                
xpath-default-namespace="http://www.w3.org/1999/xhtml">
  
  
<xsl:template match="@* | node()">
    
<xsl:copy>
      
<xsl:apply-templates select="@* | node()"/>
    
</xsl:copy>
  
</xsl:template>
  
  
<!-- Mark whitespace tokens found in XQuery expressions --> 
  
<xsl:template match="span[@class eq 'whitespace'][@deltaxml:deltaV2]">
    
<xsl:copy>
      
<xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
      
<xsl:apply-templates select="@* | node()"/>
    
</xsl:copy>        
  
</xsl:template>
  
  
<!-- Mark whitespace-only text-nodes found in XQuery element constructors --> 
  
<xsl:template match="span[@class eq 'txt'][@deltaxml:deltaV2]">
    
<xsl:choose>
      
<xsl:when test="string-length(normalize-space(.)) eq 0">
        
<xsl:copy>
          
<xsl:attribute name="deltaxml:ignore-changes" select="'true'"/>
          
<xsl:apply-templates select="@* | node()"/>
        
</xsl:copy>
      
</xsl:when>
      
<xsl:otherwise>
        
<xsl:copy-of select="."/>
      
</xsl:otherwise>
    
</xsl:choose>
  
</xsl:template>
  
</xsl:stylesheet>

The above filter is an ‘identity transform’ with two added templates designed to match changes to the two types of whitespace changes that we wish to ignore, the tokens of interest (span elements) have ‘txt’ and ‘whitespace’ class attributes, a further check is required for ‘txt’ tokens to enusre only whitespace-only tokens of this type are marked. Now the filter has been created we need to add this to the DXP pipeline along with the built-in ‘ignore changes’ filters – as shown below:

<!DOCTYPE comparatorPipeline SYSTEM "../dxp/dxp.dtd">
<!-- $Id$ -->
<comparatorPipeline description="compare xquery" id="xquery">
  
  
<inputFilters>
    
<filter>
      
<file path="xquery2xml.xsl" relBase="dxp"/>
    
</filter>
    
<filter>
      
<file path="key-xquery.xsl" relBase="dxp"/>
    
</filter>   
  
</inputFilters>
  
  
<outputFilters>
    
<!-- The following filter is where the change to be ignored is marked -->
    
<filter>
      
<file path="mark-ignore-changes.xsl" relBase="dxp"/>
    
</filter>
    
    
<!-- The following two filters are included as part of the release, and
         are general purpose. They update the delta based on the marks added
         by the previous filter.
-->
    
<filter>
      
<resource name="/xsl/apply-ignore-changes.xsl"/>
    
</filter>
    
<filter>
      
<resource name="/xsl/propagate-ignore-changes.xsl"/>
    
</filter>
    
<filter>
      
<file path="xquery-tokens2html.xsl" relBase="dxp"></file>      
    
</filter>
  
</outputFilters>
  
  
<outputProperties>
    
<property name="indent" literalValue="no"/>
  
</outputProperties>
  
  
<comparatorFeatures>
    
<feature name="http://deltaxml.com/api/feature/isFullDelta" literalValue="true"/>
    
<feature name="http://deltaxml.com/api/feature/enhancedMatch1" literalValue="true"/>
  
</comparatorFeatures>
  
</comparatorPipeline>

With the DXP pipeline now modified as above to ignore whitespace changes, this is the result of running DeltaXMLCore:

The result (above) is what we wanted, the whitespace added in the ‘B’ version of XQuery code is in the result, but is not marked as a change.

Conclusion

It has proved relatively simple to refine the XQuery code comparison pipeline I built previously so that certain whitespace changes are ignored. This is one of the great strengths of using a transform pipeline – the capabilities of the comparison can gradually be improved as new requirements for our comparison arise, and we can also easily exploit filters that come bundled with DeltaXML Core. The main motive for this exercise was really to investigate how non-XML could be converted to XML within a Core pipeline to allow a comparison, but in the process we’ve already built a code comparison solution that I’ve found to be considerably more robust than many off-the-shelf equivalents.

Before finishing this blog series, there’s just one more fix I’d like to add: currently a function that is moved is compared correctly, but, because it’s treated as orderless, the new position of the function is not shown in the result. There’s a ‘HandleMoves’ filter included with Core that I will probably be using for this, but I’ll save this work for another day.