Case Insensitive Comparison

1. Introduction

There are a variety of ways to achieve a case insensitive comparison, with varying degrees of accuracy. We outline two of the simpler approaches now:

  1. Normalize inputs approach: Use an input filter to convert all the text within the document to lower (or upper) case.
  2. Simple output filtering approach: Use the ignore change output filter processing to ignore 'case changes'.

In the remainder of this guide we provide further explanation of the above approaches, along with some illustrative example XSLT scripts, which enable text changes within text nodes and attribute values to be compared in a case insensitive manner. It would be straightforward to extend the case insensitive comparison to handle both comments and processing instructions.

Limitation: The example scripts do not take 'i18n - international language' considerations into account, so the use of 'lower-case' XSLT function might need to be updated to appropriately normalise a given languages text string for the purposes of a comparison.

2. Normalize inputs approach filter

The normalize inputs approach requires that all the text within the input documents is converted to either lower or upper case. The following XSLT filter that provides an example of how lower case normalisation of the inputs can be applied for English text.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

  <xsl:template match="node()[not(self::text())]">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()">
    <xsl:value-of select="lower-case(string(.))" />
  </xsl:template>

  <xsl:template match="@*">
    <xsl:attribute name="{name(.)}" select="lower-case(string(.))" />
  </xsl:template>

</xsl:stylesheet>

3. Simple output filtering approach

The simple output filtering approaches involves:

  1. identifying deltaxml:textGroups whose only change is within the 'case' of its string value;
  2. optionally identify 'case' of attribute values (the children of the deltaxml:attributes);
  3. writing a filter to marking these changes with the appropriate deltaxml:ignoreChanges attribute;
  4. applying the ignore change processing filters.

The following XSLT filter provides an example of stage 3 of the above approach, i.e. the initial markup script for the 'ignore changes' processing (for English text).

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
  version="2.0">

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="deltaxml:textGroup[@deltaxml:deltaV2='A!=B']
                                         [deltaxml:elementsHaveSameCase(.)]">
     <xsl:copy>
       <xsl:attribute name="deltaxml:ignore-changes" select="'B'" />
       <xsl:apply-templates select="@*, node()" />
     </xsl:copy>
  </xsl:template>

  <xsl:template match="deltaxml:attributes/*[@deltaxml:deltaV2='A!=B']
                                            [deltaxml:elementsHaveSameCase(.)]">
    <xsl:copy>
      <xsl:attribute name="deltaxml:ignore-changes" select="'B'" />
      <xsl:apply-templates select="@*, node()" mode="#default"/>
    </xsl:copy>
  </xsl:template>

  <!--
    This function may need to be updated to perform a case-insensitive
    comparison for a given language. 
  --> 
  <xsl:function name="deltaxml:compareIgnoreCase" as="xs:boolean">
    <xsl:param name="str1" as="xs:string" />
    <xsl:param name="str2" as="xs:string" />
    <xsl:value-of select="lower-case($str1) eq lower-case($str2)"/>
  </xsl:function>

  <xsl:function name="deltaxml:elementsHaveSameCase" as="xs:boolean">

    <xsl:param name="dxmlTextGroupOrAttribute" as="element()"/>
    <xsl:choose>
      <xsl:when test="$dxmlTextGroupOrAttribute[count(*) = 2]">
        <xsl:variable name="A" as="element()?" 
          select="$dxmlTextGroupOrAttribute/*[@deltaxml:deltaV2='A'][1]"/>
        <xsl:variable name="B" as="element()?" 
          select="$dxmlTextGroupOrAttribute/*[@deltaxml:deltaV2='B'][1]"/>

        <xsl:choose>
          <xsl:when test="exists($A) and exists($B)">
            <xsl:value-of 
              select="deltaxml:compareIgnoreCase(string($A), string($B))"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="false()"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="false()"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

</xsl:stylesheet>