Managing change in an XML environment

How to Specify a Comparison Pipeline

1 Introduction

One of the main features of DeltaXML Core is the ability to define a comparison pipeline to use when processing your delta. This pipeline definition was introduced in version 3.0 (as the PipelinedComparator class) and allows the specification of input and output filter chains to apply to the data before and after a comparison takes place. This adds powerful functionality which allows the processing of delta files into standards-compliant output files that show change using the grammar of the input file format only.  In DeltaXML Core 3.1, the DXP file format was introduced. DXP files are XML definitions of pipelines that can be used with the provided command line tool or used to generate a pre-configured pipeline instance. DXP allows the specification of almost all of the features available on the corresponding API classes.

This document, plus the code samples included in the DeltaXML Core release, walk you through these technologies for pipeline specification showing how they relate to each other.

Since the introduction of the original com.deltaxml.core API package in version 3.0, an improved package based on the Saxon s9api interfaces (com.deltaxml.cores9api) has been released.  This new API package has more features and better performance than our original JAXP-based implementaion;  it is recommended for new users and this document will primarily focus on this new package.

2 PipelinedComparatorS9

The first thing you need before configuring the pipeline is a PipelinedComparatorS9. This is the class that will be configured and on which you start the actual comparison.

DXP

The minimal DXP file that creates an 'unconfigured'  pipeline is shown below:

<comparatorPipeline description="A simple comparison" id="compare"/>

This will validate against the DXP DTD and if the pipeline is run using the command line interface, will perform a simple comparison, producing a changes-only delta.

Java

Simply create a new PipelinedComparatorS9 instance using the following Java code:

import com.deltaxml.cores9api.PipelinedComparatorS9;
...
PipelinedComparatorS9 pc= new PipelinedComparatorS9();

C#

Simply create a new PipelinedComparatorS9 instance using the following C# code:

using DeltaXML.CoreS9Api;
...
PipelinedComparatorS9 pc= new PipelinedComparatorS9();

3 Parser Features

When a comparison is triggered, the PipelinedComparator creates new parser instances (if necessary) with which to parse the input files. Unless otherwise specified, the Apache Xerces parser that is distributed with DeltaXML Core will be used.

It may be necessary to configure the parser before use, to enable XInclude or specify how to validate the inputs for example.

The following examples show how to enable XInclude on the Apache Xerces parser with each technology.

DXP

Specify parser features using the <parserFeatures> element. Each feature is set using its own <feature> child element:

<parserFeatures>
  <feature name="http://apache.org/xml/features/xinclude" literalValue="true"/>
</parserFeatures>

Java and C#

Call the setParserFeature() method for each feature you wish to specify:

pc.setParserFeature("http://apache.org/xml/features/xinclude", true);

4 Comparator Features

Comparator features are switchable functionality settings on the comparator. As a feature is essentially switched on or off, it is set using a boolean value of true or false. Features on the comparator include http://deltaxml.com/api/feature/isFullDelta, http://deltaxml.com/api/feature/enhancedMatch1 and http://deltaxml.com/api/feature/deltaV2. See the comparator javadoc for more details on comparator features.

The following examples show how to turn on the comparator feature http://deltaxml.com/api/feature/isFullDelta which turns the delta from a changes-only delta to a full-context delta.

DXP

Specify comparator features using the <comparatorFeatures> element. Each feature is set using its own <feature> child element:

<comparatorFeatures>
  <feature name="http://deltaxml.com/api/feature/isFullDelta" literalValue="true"/>
</comparatorFeatures>

Java and C#

Call the setComparatorFeature() method on the PipelinedComparator for each feature you wish to specify:

pc.setComparatorFeature("http://deltaxml.com/api/feature/isFullDelta", true);

5 Comparator Properties

Comparator properties are settings that take an instantiated Object as a value. One example is http://deltaxml.com/api/property/orderlessPresentation which can take a number of String values. See the comparator javadoc for more details on comparator properties.

The following example show how to set the comparator property http://deltaxml.com/api/property/orderlessPresentation to 'a_matches_deletes_adds'.

DXP

Specify comparator properties using the <comparatorProperties> element. Each property is set using its own <property> child element. N.B. DXP only supports the setting of String property values.

<comparatorProperties>
  <property name="http://deltaxml.com/api/property/orderlessPresentation" literalValue="a_matches_deletes_adds"/>
</comparatorProperties>

Java and C#

Call the setComparatorProperty() method on the PipelinedComparator for each property you wish to set:

pc.setComparatorProperty("http://deltaxml.com/api/property/orderlessPresentation", "a_matches_deletes_adds");

6 Input Filters

The main purpose of the PipelinedComparatorS9 is to make the addition of filters into input and output chains a lot simpler. It is possible to use JAXP to chain filters together but the use of PipelinedComparatorS9 is recommended.

Input filters are applied to the input files in the order specified before the comparison takes place. They can be Java-based streaming filters or XSLT filters.

6.1 Symmetrical Input Filter Chains

If you wish to pass each of the two inputs through the same set of filters, with the same parameter values (where applicable), you only need to define the filter chain once and it will be run on each input in turn.

The following examples show how to specify an input filter chain that consists of two XSLT filters followed by a Java filter.

DXP

If the same filter chain should be applied to both inputs, use the <inputFilters> element to define them. Each filter is defined in its own <filter> child element:

<inputFilters>
  <filter>
    <file path="input-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="input-filter2.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <class name="com.deltaxml.demo.SimpleJavaFilter"/>
  </filter>
</inputFilters>

Java

Symmetrical input filter chains should be set using the setInputFilters() method on the PipelinedComparatorS9.

import com.deltaxml.demo.SimpleJavaFilter;
import java.io.File;
import java.util.List;
import java.util.ArrayList;
...
List<Object> inputFilters= new ArrayList<Object>();
inputFilters.add(new File("input-filter1.xsl"));
inputFilters.add(new File("input-filter2.xsl")):
inputFilters.add(SimpleJavaFilter.class);

pc.setInputFilters(inputFilters);

C#

The C# method is similar to the Java equivalent, although the List parameter contains different types of Object.  The typeof function has a similar effect to using a Java class literal.

using com.deltaxml.demo;
using System.Collections.Generic;
using System.IO;
...
IList<Object> inputFilters= new List<Object>();
inputFilters.Add(new FileInfo("input-filter1.xsl"));
inputFilters.Add(new FileInfo("input-filter2.xsl")):
inputFilters.Add(typeof(SimpleJavaFilter));

pc.setInputFilters(inputFilters);

6.2 Asymmetrical Input Filter Chains

For some pipelines, you may wish to apply different filter chains to each input or to pass different parameters to the same filter chains depending on which input is being processed. The PipelinedComparator allows the use of asymmetrical input filter chains for this reason.

The following examples show how to specify different filter chains for each input, including how to pass different parameter values to the same filter.

DXP

When using different filter chains for each input, replace the <inputFilters> element with <input1Filters> and <input2Filters>:

<input1Filters>
  <filter>
    <file path="input-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="input-filter-with-parameter.xsl"/>
    <parameter name="an-input-param" literalValue="input1-value"/>
  </filter>
</input1Filters>
<input2Filters>
  <filter>
    <file path="input-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="input-filter-with-parameter.xsl"/>
    <parameter name="an-input-param" literalValue="input2-value"/>
  </filter>
  <filter>
    <class name="com.deltaxml.demo.SimpleJavaFilter"/>
  </filter>
</input2Filters>

Java

Asymmetrical input filter chains should be specified with setInput1Filters() and setInput2Filters() methods on PipelinedComparatorS9:

import com.deltaxml.demo.SimpleJavaFilter;
import com.deltaxml.cores9api.ParameterizedFilterS9;
import java.io.File;
import java.util.List;
import java.util.ArrayList;
...
List<Object> input1Filters= new ArrayList<Object>();
input1Filters.add(new File("input-filter1.xsl"));
ParameterizedFilterS9 pf= new ParameterizedFilterS9(new File("input-filter-with-parameter.xsl"));
pf.setStringParameter("an-input-param", "input1-value");
input1Filters.add(pf): 

List<Object> input2Filters= new ArrayList<Object>();
input2Filters.add(new File("input-filter1.xsl")):
pf= new ParameterizedFilterS9(new File("input-filter-with-parameter.xsl"));
pf.setStringParameter("an-input-param", "input2-value");
input2Filters.add(pf);
input2Filters.add(SimpleJavaFilter.class);

pc.setInput1Filters(input1Filters);
pc.setInput2Filters(input2Filters);

C#

Asymmetrical input filter chains should be specified with setInput1Filters() and setInput2Filters() methods on the PipelinedComparatorS9:

using com.deltaxml.demo;
using System.Collections.Generic;
using System.IO;
...

IList<Object> input1Filters= new List<Object>();
input1Filters.Add(new FileInfo("input-filter1.xsl"));
ParameterizedFilterS9 pf= new ParameterizedFilterS9(new FileInfo("input-filter-with-parameter.xsl"));
pf["an-input-param"]= "input1-value";
input1Filters.Add(pf): 

IList<Object> input2Filters= new List<Object>();
input2Filters.add(new FileInfo("input-filter1.xsl")):
pf= new ParameterizedFilterS9(new FileInfo("input-filter-with-parameter.xsl"));
pf["an-input-param"]= "input2-value";
input2Filters.Add(pf);
input2Filters.Add(typeof(SimpleJavaFilter));

pc.setInput1Filters(input1Filters);
pc.setInput2Filters(input2Filters);

7 Output Filters

Output filter chains are constructed in the same way as input filter chains except that there is only one chain and it is applied to the result of the comparison.

The following examples show how to add a simple XSLT filter chain to the output.

DXP

Specify output filters using the <outputFilters> element. Each filter is defined in its own <filter> child element:

<outputFilters>
  <filter>
    <file path="output-filter1.xsl" relBase="dxp"/>
  </filter>
  <filter>
    <file path="output-filter2.xsl" relBase="dxp"/>
  </filter>
</outputFilters>

Java

Output filters are added using the setOutputFilters() methods on the PipelinedComparatorS9.

import java.io.File;
...
List<Object> filters= new ArrayList<Object>();
filters.add(new File("output-filter1.xsl"));
filters.add(new File("output-filter2.xsl"));
pc.setOutputFilters(filters);

C#

Output filters are added using the setOutputFilters() methods on the PipelinedComparatorS9.

using System.IO;
...
IList<Object> filters= new List<Object>();
filters.Add(new FileInfo("output-filter1.xsl"));
filters.Add(new FileInfo("output-filter2.xsl"));
pc.setOutputFilters(filters);

8 Conditional Filters

When using DXP as a pipeline specification, it is helpful to be able to make some filters conditional. This can be achieved by specifying a boolean input parameter that is tested when adding the filter. Filters can be made conditional 'if' a parameter is true or 'unless' a parameter is true (or a combination of both). At present, only a single parameter can be tested for.

DXP

The boolean to test against should be specified as a <booleanParameter> element. The <filter> element then refers to the parameter name in an if or unless attribute:

<pipelineParameters>
   <booleanParameter name="run-A" defaultValue="true"/>               <!-- filter A will run by default -->
   <booleanParameter name="dont-run-B" defaultValue="false"/>         <!-- filter B will run by default -->
   <booleanParameter name="tidy-inputs" defaultValue="true"/>         <!-- inputs should be 'tidied' by default -->
   <booleanParameter name="input-already-tidy" defaultValue="false"/> <!-- inputs are not already tidy by default -->
</pipelineParameters>

<inputFilters>
  <filter if="tidy-inputs" unless="input-already-tidy"> <!-- will tidy input if told to unless it is already tidy -->
    <file path="tidy-input.xsl"/>
  </filter>
  <filter if="run-A">
    <file path="filterA.xsl"/>
  </filter>
  <filter unless="dont-run-B">
    <file path="filterB.xsl"/>
  </filter>
</inputFilters>

N.B. An alternative way to optionally add the first of the filters above is to use the when attribute. This attribute is only available if using the com.deltaxml.cores9api.DXPConfigurationS9 class to load the DXP file. Its value should be an XPath statement that evaluates to a boolean. Parameters (both string parameters and boolean parameters) can be referenced by adding a $ character to the start of their names, e.g.:

  <filter when="$tidy-inputs and not($already-tidy)"> <!-- will tidy input if told to unless it is already tidy -->
    <file path="tidy-input.xsl"/>
  </filter>

Java and C#

Conditional filters in a source code pipeline are simply a case of using if statements to decide whether or not to add specific filters to List objects. As this is a trivial example to code, it has not been shown.

9 Filter Parameters

DXP and the PipelinedComparatorS9 allow String values to be passed to filters as parameters. These enable you to change the behaviour of filters and make more flexible pipelines. Parameters could be dependent on external values being passed in or could be dependent on which input chain is being processed (in the case of asymmetrical input filter chains). If a parameter is being passed to an XSLT filter, it should declare an <xsl:param> with the same name as the parameter being passed. If a parameter is being passed to a Java filter, it should have a public method called set{parameterName} that takes a single String, e.g. for a parameter called myParam, there should be a method public void setMyParam(String value) declared on the Java filter.

N.B. For DeltaXML Core versions earlier than 6.0, the capitalisation of the set method is important. The case of the parameter name  in the set method should be the same as in the name of the parameter itself e.g. to set myParam, a method called setmyParam() must be present; to set MyParam, a method called setMyParam() should be defined. From version 6.0, parameters with a lower case letter can be defined with a set method containing the lower case form or the upper case form. The prefix 'set' must always be lower case.

DXP

Filter parameters can be added to any filter type using the <parameter> element as a child of the <filter>. They can take a fixe value (defined using a literalValue attribute) or can take the value of a parameter defined as either <booleanParameter> or <stringParameter> elements underneath the <pipelineParameters> element (using the parameterRef attribute). In the case of boolean parameters, the boolean value is first converted to a String before being passed to the filter:

<pipelineParameters>
  <stringParameter name="external-parameter" defaultValue="default"/>
</pipelineParameters>

<inputFilters>
  <filter>
    <file path="input-filter-with-parameter.xsl" relBase="dxp"/>
    <parameter name="an-input-param" literalValue="both-inputs"/>
  </filter>
  <filter>
    <class name="com.deltaxml.demo.SimpleJavaFilter"/>
    <parameter name="myParam" parameterRef="external-parameter"/>
  </filter>
</inputFilters>

An alternative way to pass parameters to filters when using com.deltaxml.cores9api.DXPConfigursationS9 to load the DXP file is to use the xpath attribute on the parameter element. This attribute contains an XPath statement that evaluates to a single atomic value (it will be converted to a string before being passed to the filter). The statement can reference any of the pipeline parameters by adding a $ character to the start of their names.

<pipelineParameters>
  <stringParameter name="first-name" defaultValue="John"/>
  <stringParameter name="surname" defaultValue="Smith"/>
</pipelineParameters>

<inputFilters>
  <filter>
    <file path="name-replacement-filter.xsl" relBase="dxp"/>
    <parameter name="full-name" xpath="concat($first-name, ' ', $surname)"/>
  </filter>
</inputFilters>

Java

Filter parameters are set by creating a ParameterizedFilterS9 wrapper around the normal filter. A String parameter can be set on the ParameterizedFilter and then it is added to the filter list as normal. Please note that forJava filters, the set method is not called directly on an instantiated Object because the Object itself is created by the PipelinedComparator when the filters are added. The parameter setting method is called using reflection on the instantiated filter.

import com.deltaxml.core.ParameterizedFilter;
import com.deltaxml.demo.SimpleJavaFilter;
import java.io.File;
import java.util.List;
import java.util.ArrayList;

String externalParameter= "default";
...
List<Object> filters= new ArrayList<Object>();
ParameterizedFilterS9 pf= new ParameterizedFilterS9(new File("input-filter-with-parameter.xsl"));
pf.setStringParameter("an-input-param", "both-inputs");
filters.add(pf);
pf= new ParameterizedFilterS9(SimpleJavaFilter.class);
pf.setStringParameter("myParam", externalParameter);
filters.add(pf);

pc.setInputFilters(filters);

C#

Similarly in C#, filter parameters are set by creating a ParameterizedFilterS9 wrapper around the normal filter. A String parameter can be set on the ParameterizedFilter, using a C# specific syntax and then it is added to the filter list as normal.

using DeltaXML.CoreS9Api;
using com.deltaxml.demo;
using System.IO;
using System.Collections.Generics;

String externalParameter= "default";
...
IList<Object> filters= new List<Object>();
ParameterizedFilterS9 pf= new ParameterizedFilterS9(new FileInfo("input-filter-with-parameter.xsl"));
pf["an-input-param"]= "both-inputs";
filters.Add(pf);
pf= new ParameterizedFilterS9(typeof(SimpleJavaFilter));
pf["myParam"]= externalParameter;
filters.Add(pf);

pc.setInputFilters(filters);

10 Filter Types

Filters can be declared using a variety of types. Java filters are always declared as Class filters but XSLT filters can be declared as files, URLs, Templates objects, classpath resources, XsltTransformers or XsltExecutables depending on which technology is being used.

DXP

Filters can be added to a DXP pipeline as named classes, files, URLs or classpath resources:

<filter>
  <class name="com.deltaxml.demo.SimpleJavaFilter"/>
</filter>

<filter>
  <file path="input-filter1.xsl"/>
</filter>

<filter>
  <http url="http://www.deltaxml.com/core/current/samples/PipelineDefinition/input-filter1.xsl"/>
</filter>

<filter>
  <resource name="xsl/input-filter1.xsl"/>
</filter

Java and C#

The following table shows the relationship between the DXP elements that are used as children of the filter element and the corresponding Object types allowed as filter list members in the various API packages. For completeness details of the com.deltaxml.core package are include, however use of the more recent cores9api package is recommended for new users.

concept/
package

DXP element name

Java com.deltaxml.core
package filter list member

Java com.deltaxml.cores9api
package filter list member

.NET DeltaXML.CoreS9Api
package filter list member

compiled source filters

class

java.lang.Class (class literals)

java.lang.Class (class literals)

System.Type (typeof function)

file system

file

java.io.File

java.io.File

System.IO.FileInfo

http filters

http

java.net.URL

java.net.URL

System.Uri

jar/classpath
resource filters

resource

-

-

-

pre-compiled
filter objects

-

javax.xml.transformer.
Templates

net.sf.saxon.s9api.
XsltExecutable

-

parameterized filter

parameter

com.deltaxml.core.
ParameterizedFilter

com.deltaxml.cores9api.
ParameterizedFilterS9

DeltaXML.CoreS9Api.
ParameterizedFilterS9

11 Output Properties

If the result of the comparison is to be serialized, it is possible to configure the final Transfomer step using output properties. More details on output properties can be found in the W3C Recommendation for XSLT.

The following examples show how to configure the pipeline to indent the result file.

DXP

Output properties can be specified using the <outputProperties> element. Each property is defined in its own <property> child element:

<outputProperties>
  <property name="indent" literalValue="yes"/>
</outputProperties>

Java

Call the setOuptutProperty() method on the PipelinedComparatorS9 for each property you wish to set.  These are specified using values of the  net.sf.saxon.s9api.Serializer.Property enumeration:

import net.sf.saxon.s9api.Serializer;
...
pc.setOutputProperty(Serializer.Property.INDENT, "yes");

C#

The C# code is very similar, where an import statement was used a slightly different using statement is required:

using net.sf.saxon.s9api;
...
pc.setOutputProperty(Serializer.Property.INDENT, "yes");