Specifying a Comparison Pipeline

1. Introduction

One of the main features of DeltaXML Core is the ability to define a comparison pipeline to use when processing your delta. This pipeline definition was introduced in version 3.0 (as the PipelinedComparator class) and allows the specification of input and output filter chains to apply to the data before and after a comparison takes place. This adds powerful functionality which allows the processing of delta files into standards-compliant output files that show change using the grammar of the input file format only.

The DXP Configuration Format

In DeltaXML Core 3.1, the DXP file format was introduced. DXP files are XML definitions of pipelines that can be used with the provided command line tool or used to generate a pre-configured pipeline instance. DXP allows the specification of almost all of the features available on the corresponding API classes.

PipelinedComparatorS9

Since the introduction of the original com.deltaxml.core API package in version 3.0, an improved package based on the Saxon s9api interfaces (com.deltaxml.cores9api) has been released.  This new API package has more features and better performance than our original JAXP-based implementaion;  PipelinedComparatorS9 and the DocumentComparator (see below) are recommended for new users, this document will primarily focus on the PipelinedComparatorS9.

DocumentComparator and DCP
Version 7.0 of Core introduced a new DocumentComparator component in the cores9api package. This extends the PipelinedComparatorS9 with extra features focused on the comparison of narrative content (as opposed to data-centric content). 'DCP' is the file format, available from Version 7.2, that can be used to configure the DocumentComparator as an alterntive to the Java and C# APIs. A full description of the DocumentComparator can be found in the Document Comparator Guide

This document, plus the code samples included in the DeltaXML Core release, walk you through these technologies for pipeline specification showing how they relate to each other.

2. PipelinedComparatorS9

The first thing you need before configuring the pipeline is a PipelinedComparatorS9. This is the class that will be configured and on which you start the actual comparison.

2.1. Examples

DXP

The minimal DXP file that creates an 'unconfigured'  pipeline is shown below:

<comparatorPipeline description="A simple comparison" id="compare"/>

This will validate against the DXP DTD and if the pipeline is run using the command line interface, will perform a simple comparison, producing a changes-only delta.

Java

Simply create a new PipelinedComparatorS9 instance using the following Java code:

import com.deltaxml.cores9api.PipelinedComparatorS9;
...
PipelinedComparatorS9 pc= new PipelinedComparatorS9();

C#

Simply create a new PipelinedComparatorS9 instance using the following C# code:

using DeltaXML.CoreS9Api;
...
PipelinedComparatorS9 pc= new PipelinedComparatorS9();

3. Parser Features

When a comparison is triggered, the PipelinedComparator creates new parser instances (if necessary) with which to parse the input files. Unless otherwise specified, the Apache Xerces parser that is distributed with DeltaXML Core will be used.

It may be necessary to configure the parser before use, to enable XInclude or specify how to validate the inputs for example.

3.1. Examples

The following examples show how to enable XInclude on the Apache Xerces parser with each technology.

DXP

Specify parser features using the <parserFeatures> element. Each feature is set using its own <feature> child element:

<parserFeatures>
  <feature name="http://apache.org/xml/features/xinclude" literalValue="true"/>
</parserFeatures>

Java and C#

Call the setParserFeature() method for each feature you wish to specify:

pc.setParserFeature("http://apache.org/xml/features/xinclude", true);

4. Comparator Features

Comparator features are switchable functionality settings on the comparator. As a feature is essentially switched on or off, it is set using a boolean value of true or false. Features on the comparator include http://deltaxml.com/api/feature/isFullDelta, http://deltaxml.com/api/feature/enhancedMatch1 and http://deltaxml.com/api/feature/deltaV2. See the Features List for more details on comparator features.

4.1. Examples

The following examples show how to turn on the comparator feature http://deltaxml.com/api/feature/isFullDelta which turns the delta from a changes-only delta to a full-context delta.

DXP

Specify comparator features using the <comparatorFeatures> element. Each feature is set using its own <feature> child element:

<comparatorFeatures>
  <feature name="http://deltaxml.com/api/feature/isFullDelta" literalValue="true"/>
</comparatorFeatures>

Java and C#

Call the setComparatorFeature() method on the PipelinedComparator for each feature you wish to specify:

pc.setComparatorFeature("http://deltaxml.com/api/feature/isFullDelta", true);

5. Comparator Properties

Comparator properties are settings that take an instantiated Object as a value. One example is http://deltaxml.com/api/property/orderlessPresentation which can take a number of String values. See the Properties List for more details on comparator properties.

5.1. Examples

The following examples show how to set the comparator property http://deltaxml.com/api/property/orderlessPresentation to 'a_matches_deletes_adds'.

DXP

Specify comparator properties using the <comparatorProperties> element. Each property is set using its own <property> child element. N.B. DXP only supports the setting of String property values.

<comparatorProperties>
  <property name="http://deltaxml.com/api/property/orderlessPresentation"
    literalValue="a_matches_deletes_adds"/>
</comparatorProperties>

Java and C#

Call the setComparatorProperty() method on the PipelinedComparator for each property you wish to set:

pc.setComparatorProperty("http://deltaxml.com/api/property/orderlessPresentation",
  "a_matches_deletes_adds");

6. Input Filters

The main purpose of the PipelinedComparatorS9 is to make the addition of filters into input and output chains a lot simpler. It is possible to use JAXP to chain filters together but the use of PipelinedComparatorS9 is recommended.

Input filters are applied to the input files in the order specified before the comparison takes place. They can be Java-based streaming filters or XSLT filters.

6.1. Symmetrical Input Filter Chains

If you wish to pass each of the two inputs through the same set of filters, with the same parameter values (where applicable), you only need to define the filter chain once and it will be run on each input in turn.

The following examples show how to specify an input filter chain that consists of two XSLT filters followed by a Java filter.

DXP

If the same filter chain should be applied to both inputs, use the <inputFilters> element to define them. Each filter is defined in its own <filter> child element:

<inputFilters>
  <filter>
    <file path="input-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="input-filter2.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <class name="com.deltaxml.demo.SimpleJavaFilter"/>
  </filter>
</inputFilters>

Java

Symmetrical input filter chains should be set using the setInputFilters() method on the PipelinedComparatorS9.

import com.deltaxml.demo.SimpleJavaFilter;
import java.io.File;
import com.deltaxml.cores9api.FilterChain;
import com.deltaxml.cores9api.FilterStepHelper;
...
FilterStepHelper fsh= pc.newFilterStepHelper();
FilterChain inChain= fsh.newFilterChain();

inChain.addStep(fsh.newFilterStep(new File("input-filter1.xsl"), "in-filter1"));
inChain.addStep(fsh.newFilterStep(new File("input-filter2.xsl"), "in-filter2")):
inChain.addStep(fsh.newFilterStep(SimpleJavaFilter.class, "in-java-filter"));

pc.setInputFilters(inChain);

C#

The C# method is similar to the Java equivalent, although the List parameter contains different types of Object.  The typeof function has a similar effect to using a Java class literal.

using com.deltaxml.demo;
using System.Collections.Generic;
using System.IO;
...
IList<Object> inputFilters = new List<Object>();
inputFilters.Add(new FileInfo("input-filter1.xsl"));
inputFilters.Add(new FileInfo("input-filter2.xsl")):
inputFilters.Add(typeof(SimpleJavaFilter));

pc.setInputFilters(inputFilters);

6.2. Asymmetrical Input Filter Chains

For some pipelines, you may wish to apply different filter chains to each input or to pass different parameters to the same filter chains depending on which input is being processed. The PipelinedComparator allows the use of asymmetrical input filter chains for this reason.

The following examples show how to specify different filter chains for each input, including how to pass different parameter values to the same filter.

DXP

When using different filter chains for each input, replace the <inputFilters> element with <input1Filters> and <input2Filters>:

<input1Filters>
  <filter>
    <file path="input-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="input-filter-with-parameter.xsl"/>
    <parameter name="an-input-param" literalValue="input1-value"/>
  </filter>
</input1Filters>
<input2Filters>
  <filter>
    <file path="input-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="input-filter-with-parameter.xsl"/>
    <parameter name="an-input-param" literalValue="input2-value"/>
  </filter>
  <filter>
    <class name="com.deltaxml.demo.SimpleJavaFilter"/>
  </filter>
</input2Filters>

Java

Asymmetrical input filter chains should be specified with setInput1Filters() and setInput2Filters() methods on PipelinedComparatorS9:

import com.deltaxml.demo.SimpleJavaFilter;
import com.deltaxml.cores9api.FilterChain;
import com.deltaxml.cores9api.FilterStepHelper;
import com.deltaxml.cores9api.ParameterizedFilterS9;
import java.io.File;
...
FilterStepHelper fsh= pc.newFilterStepHelper();
FilterStep step= null;
...
FilterChain inChain1= fsh.newFilterChain();
step= fsh.newFilterStep(new File("input-filter1.xsl"), "in-filter1");
inChain1.addStep(step);
step= fsh.newFilterStep(
        new File("input-filter-with-parameter.xsl"),
        "a-param-filter"
);
step.setParameterValue("an-input-param", "input1-value");
inChain1.addStep(step);

FilterChain inChain2= fsh.newFilterChain();
step= fsh.newFilterStep(new File("input-filter1.xsl"), "in-filter1");
inChain2.addStep(step);
step= fsh.newFilterStep(
        new File("input-filter-with-parameter.xsl"),
        "a-param-filter"
);
step.setParameterValue("an-input-param", "input2-value");
inChain2.addStep(step);
step= fsh.newFilterStep(SimpleJavaFilter.class, "java-filter");
inChain2.addStep(step);

pc.setInput1Filters(inChain1);
pc.setInput2Filters(inChain2);

C#

Asymmetrical input filter chains should be specified with setInput1Filters() and setInput2Filters() methods on the PipelinedComparatorS9:

using com.deltaxml.demo;
using System.Collections.Generic;
using System.IO;
...

IList<Object> input1Filters = new List<Object>();
input1Filters.Add(new FileInfo("input-filter1.xsl"));
ParameterizedFilterS9 pf = new ParameterizedFilterS9(
                            new FileInfo("input-filter-with-parameter.xsl"));
pf["an-input-param"] = "input1-value";
input1Filters.Add(pf): 

IList<Object> input2Filters = new List<Object>();
input2Filters.add(new FileInfo("input-filter1.xsl")):
pf = new ParameterizedFilterS9(
      new FileInfo("input-filter-with-parameter.xsl"));
pf["an-input-param"] = "input2-value";
input2Filters.Add(pf);
input2Filters.Add(typeof(SimpleJavaFilter));

pc.setInput1Filters(input1Filters);
pc.setInput2Filters(input2Filters);

7. Output Filters

Output filter chains are constructed in the same way as input filter chains except that there is only one chain and it is applied to the result of the comparison.

7.1. Examples

The following examples show how to add a simple XSLT filter chain to the output.

DXP

Specify output filters using the <outputFilters> element. Each filter is defined in its own <filter> child element:

<outputFilters>
  <filter>
    <file path="output-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="output-filter2.xsl" relBase="dxp"/>
  </filter>
</outputFilters>

Java

Output filters are added using the setOutputFilters() methods on the PipelinedComparatorS9.

import com.deltaxml.cores9api.FilterChain;
import com.deltaxml.cores9api.FilterStepHelper;
import java.io.File;
...
FilterStepHelper fsh= pc.newFilterStepHelper();
...
FilterChain outChain= fsh.newFilterChain();
outChain.addStep(fsh.newFilterStep(new File("output-filter1.xsl"), "out-filter1"));
outChain.addStep(fsh.newFilterStep(new File("output-filter2.xsl"), "out-filter2"));
pc.setOutputFilters(outChain);

C#

Output filters are added using the setOutputFilters() methods on the PipelinedComparatorS9.

using System.IO;
...
IList<Object> filters = new List<Object>();
filters.Add(new FileInfo("output-filter1.xsl"));
filters.Add(new FileInfo("output-filter2.xsl"));
pc.setOutputFilters(filters);

8. Conditional Filters

When using DXP as a pipeline specification, it is helpful to be able to make some filters conditional. This can be achieved by specifying a boolean input parameter that is tested when adding the filter. Filters can be made conditional 'if' a parameter is true or 'unless' a parameter is true (or a combination of both). At present, only a single parameter can be tested for.

8.1. Examples

DXP

The boolean to test against should be specified as a <booleanParameter> element. The <filter> element then refers to the parameter name in an if or unless attribute:

<pipelineParameters>
   <booleanParameter name="run-A" defaultValue="true"/>               <!-- filter A will run by default -->
   <booleanParameter name="dont-run-B" defaultValue="false"/>         <!-- filter B will run by default -->
   <booleanParameter name="tidy-inputs" defaultValue="true"/>         <!-- inputs should be 'tidied' by default -->
   <booleanParameter name="input-already-tidy" defaultValue="false"/> <!-- inputs are not already tidy by default -->
</pipelineParameters>

<inputFilters>
  <!-- will tidy input if told to unless it is already tidy -->
  <filter if="tidy-inputs" unless="input-already-tidy">
    <file path="tidy-input.xsl"/>
  </filter>
  <filter if="run-A">
    <file path="filterA.xsl"/>
  </filter>
  <filter unless="dont-run-B">
    <file path="filterB.xsl"/>
  </filter>
</inputFilters>

N.B. An alternative way to optionally add the first of the filters above is to use the when attribute. This attribute is only available if using the com.deltaxml.cores9api.DXPConfigurationS9 class to load the DXP file. Its value should be an XPath statement that evaluates to a boolean. Parameters (both string parameters and boolean parameters) can be referenced by adding a $ character to the start of their names, e.g.:

  <!-- will tidy input if told to unless it is already tidy -->
  <filter when="$tidy-inputs and not($already-tidy)"> 
    <file path="tidy-input.xsl"/>
  </filter>

Java

Conditional filters in a source code pipeline may be implemented by either using if statements to decide whether or not to add specific filters to List objects, or by enabling or disabling filter steps using setEnabled(boolean).

In Java the following code will tidy input when the boolean variable tidyInputs is set, unless it is already tidy.

  pc= new PipelinedComparatorS9();
  FilterStepHelper fsh= pc.newFilterStepHelper();
  FilterChain inChain1= fsh.newFilterChain();
  FilterStep step= fsh.newFilterStep(new File("tidy-input.xsl"), "tidy-input");
  inChain1.addStep(step);
  step.setEnabled(tidyInputs && !inputAlreadyTidy);

C#

In C# the following code will tidy input when the boolean variable tidyInputs is set, unless it is already tidy.

  pc = new PipelinedComparatorS9();
  FilterStepHelper fsh = pc.newFilterStepHelper();
  FilterChain fc = fsh.newFilterChain();
  FilterStep step = fsh.newFilterStep(new FileInfo("tidy-input.xsl"), "tidy-input");
  fc.addStep(step);
  step.Enabled = (tidyInputs && !inputAlreadyTidy);

9. Filter Parameters

DXP and the PipelinedComparatorS9 allow String values to be passed to filters as parameters. These enable you to change the behaviour of filters and make more flexible pipelines. Parameters could be dependent on external values being passed in or could be dependent on which input chain is being processed (in the case of asymmetrical input filter chains). If a parameter is being passed to an XSLT filter, it should declare an <xsl:param> with the same name as the parameter being passed. If a parameter is being passed to a Java filter, it should have a public method called set{parameterName} that takes a single String, e.g. for a parameter called myParam, there should be a method public void setMyParam(String value) declared on the Java filter.

N.B. For DeltaXML Core versions earlier than 6.0, the capitalisation of the set method is important. The case of the parameter name  in the set method should be the same as in the name of the parameter itself e.g. to set myParam, a method called setmyParam() must be present; to set MyParam, a method called setMyParam() should be defined. From version 6.0, parameters with a lower case letter can be defined with a set method containing the lower case form or the upper case form. The prefix 'set' must always be lower case.

9.1. Examples

DXP

Filter parameters can be added to any filter type using the <parameter> element as a child of the <filter>. They can take a fixe value (defined using a literalValue attribute) or can take the value of a parameter defined as either <booleanParameter> or <stringParameter> elements underneath the <pipelineParameters> element (using the parameterRef attribute). In the case of boolean parameters, the boolean value is first converted to a String before being passed to the filter:

<pipelineParameters>
  <stringParameter name="external-parameter" defaultValue="default"/>
</pipelineParameters>

<inputFilters>
  <filter>
    <file path="input-filter-with-parameter.xsl" relBase="dxp"/>
    <parameter name="an-input-param" literalValue="both-inputs"/>
  </filter>
  <filter>
    <class name="com.deltaxml.demo.SimpleJavaFilter"/>
    <parameter name="myParam" parameterRef="external-parameter"/>
  </filter>
</inputFilters>

An alternative way to pass parameters to filters when using com.deltaxml.cores9api.DXPConfigursationS9 to load the DXP file is to use the xpath attribute on the parameter element. This attribute contains an XPath statement that evaluates to a single atomic value (it will be converted to a string before being passed to the filter). The statement can reference any of the pipeline parameters by adding a $ character to the start of their names.

<pipelineParameters>
  <stringParameter name="first-name" defaultValue="John"/>
  <stringParameter name="surname" defaultValue="Smith"/>
</pipelineParameters>

<inputFilters>
  <filter>
    <file path="name-replacement-filter.xsl" relBase="dxp"/>
    <parameter name="full-name" xpath="concat($first-name, ' ', $surname)"/>
  </filter>
</inputFilters>

Java

A FilterStep object's parameter can be set using the setParameterValue method as illustrated below. A parameter value can be changed at any time before a comparison, but should not be updated during a comparison.

Implementation Note: When setting a parameter on a Java filter reflection is used.

import com.deltaxml.cores9api.FilterChain;
import com.deltaxml.cores9api.FilterStep;
import com.deltaxml.cores9api.FilterStepHelper;
import com.deltaxml.core.ParameterizedFilter;
import com.deltaxml.demo.SimpleJavaFilter;
import java.io.File;
...
FilterStepHelper fsh= pc.newFilterStepHelper();
FilterChain inChain= fsh.newFilterChain()
...
String externalParameter= "default";
FilterStep step= null;
...
step= fsh.newFilterStep(
        new File("input-filter-with-parameter.xsl"),
        "in-java-filter"
);
step.setParameterValue("an-input-param", "both-inputs");
inChain.addStep(step);

step= fsh.newFilterStep(SimpleJavaFilter.class, "in-java-filter");
inChain.addStep(fsh.newFilterStep(step));
step.setParameterValue("myParam", externalParameter);

pc.setInputFilters(inChain);

C#

Similarly in C#, filter parameters are set by creating a ParameterizedFilterS9 wrapper around the normal filter. A String parameter can be set on the ParameterizedFilter, using a C# specific syntax and then it is added to the filter list as normal.

using DeltaXML.CoreS9Api;
using com.deltaxml.demo;
using System.IO;
using System.Collections.Generics;

String externalParameter = "default";
...
IList&lt;Object&gt; filters = new List&lt;Object&gt;();
ParameterizedFilterS9 pf = new ParameterizedFilterS9(
                            new FileInfo("input-filter-with-parameter.xsl"));
pf["an-input-param"] = "both-inputs";
filters.Add(pf);
pf = new ParameterizedFilterS9(typeof(SimpleJavaFilter));
pf["myParam"] = externalParameter;
filters.Add(pf);

pc.setInputFilters(filters);

10. Filter Types

Filters can be declared using a variety of types. Java filters are always declared as Class filters but XSLT filters can be declared as files, URLs, Templates objects, classpath resources, XsltTransformers or XsltExecutables depending on which technology is being used.

10.1. Adding Filters using DXP

Filters can be added to a DXP pipeline as named classes, files, URLs or classpath resources:

<filter>
  <class name="com.deltaxml.demo.SimpleJavaFilter"/>
</filter>

<filter>
  <file path="input-filter1.xsl"/>
</filter>

<filter>
  <http url="http://www.deltaxml.com/core/current/samples/PipelineDefinition/input-filter1.xsl"/>
</filter>

<filter>
  <resource name="xsl/input-filter1.xsl"/>
</filter

10.2. Comparing DXP with Java and C#

The following table shows the relationship between the DXP elements that are used as children of the filter element and the corresponding Object types allowed as filter list members in the 'Core S9API' packages.

concept/package DXP element name Java com.deltaxml.cores9api package filter list member .NET DeltaXML.CoreS9Api package filter list member
compiled source filters classFilterStep
(from a class literal)
System.Type (typeof function)
file system fileFilterStep
(from a java.io.File object)
System.IO.FileInfo
http filters httpFilterStep
(from a java.net.URL object)
System.Uri
jar/classpath resource filters resourceFilterStep
(from a resource string)
-
pre-compiled
filter objects
- FilterStep
(from a net.sf.saxon.s9api.
XsltExecutable
object)
-
parameterized filter parameter any of the above DeltaXML.CoreS9Api.
ParameterizedFilterS9

11. Output Properties

If the result of the comparison is to be serialized, it is possible to configure the final Transfomer step using output properties. More details on output properties can be found in the W3C Recommendation for XSLT.

11.1. Examples

The following examples show how to configure the pipeline to indent the result file.

DXP

Output properties can be specified using the <outputProperties> element. Each property is defined in its own <property> child element:

<outputProperties>
  <property name="indent" literalValue="yes"/>
</outputProperties>

Java

Call the setOuptutProperty() method on the PipelinedComparatorS9 for each property you wish to set.  These are specified using values of the  net.sf.saxon.s9api.Serializer.Property enumeration:

import net.sf.saxon.s9api.Serializer;
...
pc.setOutputProperty(Serializer.Property.INDENT, "yes");

C#

The C# code is very similar, where an import statement was used a slightly different using statement is required:

using net.sf.saxon.s9api;
...
pc.setOutputProperty(Serializer.Property.INDENT, "yes");