Loading login details...

Guide to using Filters with DeltaXML

Table of Contents

List of Examples

Chapter 1. Filters

1.1. Introduction

This tutorial explains how Java XML filters and XSL filters can be used to complement the DeltaXML Core API when comparing XML files. Such filters should be viewed as an integral part of DeltaXML processing. While not mandatory, they belong in the top drawer of your toolbox. Java XML filters and XSL scripts can perform a range of data pre and post processing functions which greatly enhance the power of DeltaXML comparisons. For example, with pre-processing filters you can:

They can also be used to customize the display of DeltaXML output to make it easier for users or programs to identify and use changes.

XML filtering makes possible DeltaXML comparisons that would otherwise be impossible. XML filters can also shield data from DeltaXML annotations.

DeltaXML ships with several ready-made Java XML filters and XSL scripts. These do not exhaust all possibilities, but serve as useful teaching aids which will be used in this paper to explain how and why XSL filters should be used to complement DeltaXML.

1.2. What are filters?

A filter is a processing element that receives XML data from some other processing element, carries out some form of operation on that data, and then passes that data onto another processing element.

The primary aim of a filter is to produce an XML document as output, that shares more similarities than differences with the input XML document.

For example, consider the role of a water filter. The intention of such a filter is to remove impurities from the water, but the result is still water. This is similar to the role of a filter with respect to DeltaXML - that is you may use a filter to remove un-wanted or noisy data from an XML document, before presenting it to DeltaXML for comparison.

Thus it is normal to create a pipeline containing 1 or more filters that either pre process or post process that data which is compared by DeltaXML. This is illustrated in the following diagram:

PIC-adding-xsl-filters.gif
Using filters with DeltaXML

In terms of implementation, there are two ways inwhich a filter may be implemented, these are:

If you wish to learn more about writing your own filters then see the appropriate tutorials.

1.3. Usage Patterns

The conceivable applications of XML filters are endless, but guidelines and usage patterns common to DeltaXML merit attention.

1.3.1. Input Side

Input-side, or prefilters, typically perform two functions: normalization and attribute insertion. As a rule, the same prefilter should process both input files, unless they stem from sources so different as to mandate separate processing.

Normalization means enforced conformance to the "normal form" that applies in a given differencing application. This form varies from one application to another, but the idea is simple to grasp by example. Legacy data employing outdated element names could pass through a renaming filter, matching them to modern practice. Legacy elements no longer meaningful can be stripped. Differencing particular classes of elements is possible by stripping others out. The common theme of normalization is to eliminate uninteresting XML differences ahead of comparison.

Attribute insertion has primarily in view the deltaxml:ordered and deltaxml:key attributes, but may involve others dictated by application needs. DeltaXML detects attribute changes too, so legacy considerations apply to them just as to elements. With respect to DeltaXML attributes, XSL development follows a simple recipe:

In addition to the DeltaXML attributes, your XSL script(s) can insert, remove, or modify attributes specific to the application. It is always possible, and often advisable, to perform these operations in separate XSL scripts, factoring out DeltaXML-specific code.

1.3.2. Output Side

DeltaXML offers a choice of output format: standard deltas and full context deltas. Standard deltas shows only changes. In the absence of changes, this output is empty. Full context deltas shows all data, with changes in situ. In the absence of changes, this output equals the corresponding DeltaXML inputs. Your XSL scripts may utilize either type of output. The proper choice depends on the application in question. A middle ground is also possible. When the full context is too much, XSL can be used to generate a partial context format. Select the full context option and use XSL to discard irrelevant portions of the full context.

Postfilters (or transforms) might produce more XML, a formatted report, a web page in XHTML, or output suitable for downstream processing. User visualization generally benefits from the full context option. Most kinds of automated processing should opt for standard deltas.

1.3.3. Pipelines

One common XML Filter configuration is the pipeline. A pipeline employs related pre- and postfilters. They work in tandem to achieve a required result. The term simply denotes a chain of operations. One useful pipeline is a mirror pipeline in which the postfilter reverses the prefilter's effects. Arranged back-to-back, these filters would accomplish nothing, but practical possibilities manifest when DeltaXML is used between them. One filter may create ephemeral data, such as keys, for DeltaXML's private use while the other removes them so that users do not see data only required to ensure accurate comparisons.

PIC-pipelines-in-deltaxml.gif
Click to enlarge
Pipelining filters with DeltaXML

An important point to note about the natur eof pipelines, is that because intermediate data appears only when and where needed, it obeys a kind of encapsulation rule.

Data encapsulation confers several benefits. Suppose you require orderless or keyed comparisons. These comparisons necessitate the application of DeltaXML attributes. There are two ways to incorporate them. One is to write them directly into the XML files, for permanent storage. The other is to write them into XSL scripts. The latter option encapsulates DeltaXML attributes inside the scripts. The benefit is that, as underlying XML formats evolve, these attribute assignments need change only in the scripts, not everywhere the XML data is required. For example, if an element switches from ordered to orderless, a simple rule change in the XSL enacts the change with zero impact on extant data files. For the same reason, encapsulation means that new features can easily be added in future versions of DeltaXML.

1.3.4. Code re-use

DeltaXML has been designed to facilitate Java XML and XSL filter re-use. Its output format clones that of its input. Element names and hierarchy nesting are identical. Consequently any input filter serves as good boilerplate for another to manipulate DeltaXML on the output side.

1.4. Tools

There are essentially two XSL/DeltaXML usage modes: development and production. The distinction arises from issues such as manual versus automatic operation and lowversus high-volume processing. IDEs greatly help development mode, while production mode usually requires console tools. Production mode is often the ultimate goal, unless the processing is to be guided manually. Manual operation is satisfactory for smaller batches and one-of-a-kind processing. Otherwise it eventually yields to production mode.

1.4.1. Development Mode - Eclipse IDE

The Eclipse IDE is an ideal development platform. For example, you can step through XSL scripts just as you might step through code in a C++ debugger. Setting up Eclipse for XSL work involves three installations:

Java Runtime Engine: http://java.sun.com

Eclipse IDE: http://www.eclipse.org

SunBow plugins: http://radio.weblogs.com/0108489/

The sunBow suite offers two ways to execute XSL transformations from a simple mouse context menu. After you have added the desired XML and XSL files to your project, select both files in the Navigator view, then right-click on them to obtain the mouse context menu. The two options are XSL Trace... and XSL Transformer....

1.4.2. Production Mode

Two good choices for production mode are Apache Xalan and Saxon. Here are brief instructions on their installation.

1.4.2.1. Apache Xalan

Java Runtime Engine: http://java.sun.com

Xalan: http://xml.apache.org/xalan-j

1.4.2.2. Saxon

Java Runtime Engine: http://java.sun.com

Saxon: http://saxon.sourceforge.net

Chapter 2. DeltaXML Filter Descriptions

2.1. Introduction

This section provides short summaries of the XSL filters supplied with the DeltaXML distribution. There are two types of filter provided, Java XML Filters and XSL stylesheet Filters. The XSL definitions are provided as XSL files within the samples directory or the distribution. The Java XML Filters are provided as part of the public source.

The Java classes and the XSL files both incorporate comments elaborating their particular implementation details.

These files may serve as starting points for your own Java and XSL development. They represent a sampling of what is possible.

Note that in general, we have found that the Java classes have lower memory overheads and offer faster execution times and so are generally preferred to the XSL versions. Also please note that all the Java filters are also available as XSL filters (although in some cases there is no Java equivalent for a provided XSL filter).

2.2. Input Filters / Prefilters

Input filters (or prefilters) are filters used to pre-processing the XML before it is presented to DeltaXML.

2.2.1. Normalize Space Filter

This prefilter reduces each sequence of whitespace characters to a single space. Whitespace has semantic relevance only within PCDATA, so it can be consolidated harmlessly everywhere else. This filter preserves leading and trailing spaces within PCDATA as required by XHTML applications of DeltaXML.

This filter is available both as an XSL file (normalize-space.xsl) and as a Java CML Filter class (com.deltaxml.pipe.filters.NormalizeSpace).

2.2.2. svg-input-filter.xsl

Scalable Vector Graphics is an XML format describing two-dimensional vector graphics. SVG is an ordered format: objects are rendered in the order listed. Yet this order matters only in case of overlaps, and even then, transparency may nullify the graphical effect. Without overlaps, the rendering order does not matter. For this reason most SVG files may be considered orderless from a DeltaXML standpoint.

Svg-input-filter.xsl inserts annotations to enable orderless SVG differencing. The following SVG file describes three semi-transparent, overlapping circles:

Example 2.1. Original SVG describing three semi-transparent, overlapping circles

<?xml version="1.0" encoding="UTF-8"?> 
<svg xmlns="http://www.w3.org/2000/svg"> 
   <g style="fill-opacity:0.7; stroke:black; stroke-width:0.1cm;"> 
      <circle cx="6cm" cy="2cm" r="100" style="fill:red;" 
              transform="translate(0,50)" /> 
      <circle cx="6cm" cy="2cm" r="100" style="fill:blue;" 
              transform="translate(70,150)" /> 
      <circle cx="6cm" cy="2cm" r="100" style="fill:green;" 
              transform="translate(-70,150)" /> 
   </g> 
</svg>

The XSL-filtered result includes DeltaXML annotations for orderless comparison:

Example 2.2. XSL-filtered SVG with DeltaXML annotations for orderless differencing

<?xml version="1.0" encoding="UTF-8"?> 
<svg xmlns="http://www.w3.org/2000/svg" 
     xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
     deltaxml:ordered="false"> 
   <g deltaxml:ordered="false"
      style="fill-opacity:0.7; stroke:black; stroke-width:0.1cm;"> 
      <circle deltaxml:ordered="false" cx="6cm" cy="2cm" r="100" 
              style="fill:red;" transform="translate(0,50)" /> 
      <circle deltaxml:ordered="false" cx="6cm" cy="2cm" r="100" 
              style="fill:blue;" transform="translate(70,150)" /> 
      <circle deltaxml:ordered="false" cx="6cm" cy="2cm" r="100" 
              style="fill:green;" transform="translate(-70,150)" /> 
   </g> 
</svg>

Not at present there is no Java equivalent to this filter.

2.2.3. Schema-input-filter.xsl

Schemas are valid XML files in their own right, and may be differenced with each other. This input-side XSL script facilitates schema comparisons. (The 1999 version assumes the XML Schema 1999 definition.) It is useful for tracking schema changes over time, for example, over the course of schema development. Schema differencing can also be useful in comparing and consolidating schemas designed for similar purposes.

While the XSL derivation of this filter is rather involved, its action is simple. The filter insets a set of deltaxml:key attributes to enable orderless comparison of element definitions. Suppose we have a short schema:

Example 2.3. Raw XML Schema

<?xml version='1.0'?> 
<schema xmlns='http://www.w3.org/2001/XMLSchema'> 
   <element name='test1'> 
      <complexType> 
         <all> 
            <annotation> 
               <documentation> 
                  Some documentation 
               </documentation> 
            </annotation> 
            <element ref='A' minOccurs='1' maxOccurs='1'/> 
            <element ref='B' minOccurs='1' maxOccurs='1'/> 
            <element ref='C' minOccurs='1' maxOccurs='1'/> 
         </all> 
      </complexType> 
   </element> 
   <element name='test2'> 
      <complexType> 
         <sequence> 
            <element ref='A'/> 
            <element ref='B'/> 
            <element ref='C'/> 
         </sequence> 
      </complexType> 
   </element> 
   <element name='test3'> 
      <complexType> 
         <choice> 
            <element ref='A' /> 
            <element ref='B' /> 
            <element ref='C' /> 
         </choice> 
      </complexType> 
   </element> 
</schema>

The result of XSL processing is:

Example 2.4. Filtered XML Schema

<?xml version="1.0" encoding="UTF-8"?> 
<schema xmlns="http://www.w3.org/2001/XMLSchema" 
xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
deltaxml:ordered="false"> 
   <element deltaxml:ordered="false" deltaxml:key="test1" name="test1"> 
      <complexType deltaxml:ordered="false" deltaxml:key="single"> 
         <all deltaxml:ordered="false" deltaxml:key="single"> 
            <annotation deltaxml:key="single"> 
               <documentation> 
                  Some documentation 
               </documentation> 
            </annotation> 
            <element deltaxml:key="A" ref="A" 
                     minOccurs="1" maxOccurs="1" /> 
            <element deltaxml:key="B" ref="B" 
                     minOccurs="1" maxOccurs="1" /> 
            <element deltaxml:key="C" ref="C" 
                     minOccurs="1" maxOccurs="1" /> 
         </all> 
      </complexType> 
   </element> 
   <element deltaxml:ordered="false" deltaxml:key="test2" name="test2"> 
      <complexType deltaxml:ordered="false" deltaxml:key="single"> 
         <sequence deltaxml:key="single"> 
            <element ref="A" /> 
            <element ref="B" /> 
            <element ref="C" /> 
         </sequence> 
      </complexType> 
   </element> 
   <element deltaxml:ordered="false" deltaxml:key="test3" name="test3"> 
      <complexType deltaxml:ordered="false" deltaxml:key="single"> 
         <choice deltaxml:ordered="false" deltaxml:key="single"> 
            <element deltaxml:key="A" ref="A" /> 
            <element deltaxml:key="B" ref="B" /> 
            <element deltaxml:key="C" ref="C" /> 
         </choice> 
      </complexType> 
   </element> 
</schema>

Now the schema is ready for DeltaXML comparisons with previous or later versions (which must also run through the filter).

Schemas are XML that describe external XML. Reasoning about their differences exercises a few more gray cells than normal XML. The schema filter's operations are described by comments in the XSL file. The salient aspects are manifest in the output above. Note the use of element names as keys, and the use of the "single" key. This key enables DeltaXML to correlate schema elements that may appear only once, but not necessarily always in the same place. The "single" key sometimes appears superfluous, but this is only because the filter applies it to all matching templates indiscriminately. This conservative design simplifies the logic of the XSL stylesheet.

Document Type Definitions are an older alternative to schemas, but are not encoded in XML . Therefore DeltaXML cannot difference DTDs. DeltaXML can difference the XML files controlled by a DTD, or indeed any well-formed XML at all, whether tied to a schema, a DTD, or free-standing.

Note that at present there is no Java equivalent for the XSL filter.

2.3. Output filters / Postfilters

An output (or post) filter is a filter that processing the XML data produced by the DeltaXML engine. For example, it may post process the DeltaXML delta file into XHTML for presentation within a browser.

2.3.1. deltaxml-folding-html.xsl

If you are working with normal delta output, use the deltaxml-tables.xsl filter to create XHTML tables that can be viewed within a browser. This is a particularly useful filter allowing the delta file informaiton to be viewed in a particularly digestable format.

An example of the output generated by this XSL file, viewed in a browser, is presented below:

PIC-html-changes-table.gif
Click to enlarge

Note that at present there is no Java equivalent fo rthis filter.

2.3.2. Merge Scripts

A primary application for change detection is merging XML data from multiple versions of a file. DeltaXML can perform two-way and three-way mergings. (The three-way operation is not available prior to version 3.0.)

Two-way merging unifies data from two input files. These files might be closely related or completely different. It is supported by the deltaxml-merge.xsl file. This XSL file converts a full delta file, produced by deltaXML programs, into a merged file. The merged file will have all the elements and attributes from both input files. In cases where this is not desired, the stylesheet can be modified. PCDATA is written out with a delimiter to show old and new data, and again this can be changed as needed in specific circumstances. Note that this is a simple merge of two files.

A three-way merging unifies two changes that have branched from a shared base file and is much more complex than a two way merge. Changes for each branch are reconciled and merged. Note that a three-way merging involves more than a simple XSL script.

A separate paper details merging operations in more detail.

2.4. Pipelined filters

Pipelined filters are filters that are explicitly designed to receive XML data as input and to generate XML data as output to be fed into either another filter or to be processed by DeltaXML. These filters differ from post filters in that the post filter may generate an output format that is not compatible with JAXP pipelines. In contrast these filters must work as a component within a JAXP pipeline. This implies that they must work within the mechanisms used to trigger JAXP pipelines

2.4.1. Word-by-word

PCDATA is not structured XML. DeltaXML can detect when one PCDATA block differs from another, but does not resolve individual changes within the blocks. DeltaXML records only that the block has changed. Sometimes one needs to identify changes within PCDATA. DeltaXML is supplied with filters that convert PCDATA blocks into structured XML so that individual, "word-byword" changes become detectable, and then convert the output back to PCDATA. The pipeline consists of three filters, used in the following order:

  1. com.deltaxml.pipe.filters.WordByWordInfilter or word-by-word-infilter.xsl,

  2. DeltaXML,

  3. com.deltaxml.pipe.filters.WordByWordOutfilter1 or word-by-word-outfilter1.xsl

  4. com.deltaxml.pipe.filters.WordByWordOutfilter2 or word-by-word-outfilter2.xsl

Here is a simple example of the input-side filter at work:

Example 2.5. Simple test file with PCDATA

<?xml version="1.0" encoding="UTF-8"?> 
<Document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"> 
   <TestParagraph> 
      This is a test. 
   </TestParagraph> 
</Document>

We can then process this using the word-by-word-infilter.xsl and we obtain:

Example 2.6. Word-by-word XSL result

<?xml version="1.0" encoding="UTF-8"?> 
<Document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"> 
   <TestParagraph> 
      <deltaxml:space /> 
         This 
      <deltaxml:space /> 
         is 
      <deltaxml:space /> 
         a 
      <deltaxml:space /> 
         test. 
      <deltaxml:space /> 
   </TestParagraph> 
</Document>

The output-side filters complete this pipeline by reversing the effects shown, i.e., stripping the inserted annotations and consolidating adjacent changes. After performing DeltaXML comparison this is typically the desired behavior. Note that the second XSL output filter is highly recursive and appears to run much faster in Saxon than Xalan. For larger files Saxon may be preferred. However, it is worth considering the Java filters before trying anything more drastic.

2.4.2. DocBook change bar generation

DeltaXML provide an as is XSL stylesheet that takes the delta generate from comparing two DocBook files and generates a merged file with revision information included. This can then be processed by the DocBook style sheets provided by Norman Walsh to generate HTML pages with change information presented visually.

Two filters are provided by DeltaXML for this process, an input filter docbook-infilter.xsl and an output filter docbook-outfilter.xsl

For the change bars style sheet see DocBook XSL Stylesheets: http://docbook.sourceforge.net/projects/xsl/

2.4.3. XHTML

When differencing XHTML, it is often convenient to inspect changes visually. The xhtmlinfilter. xsl stylesheet can be used to convert XHTML unique identifiers and metadata element names into DeltaXML keys to enhance comparisons. It also normalizes spaces in attribute values, etc, to ensure the closest possible matches.

The xhtml-outfilter.xsl stylesheet produces XHTML showing the differences in situ. The word-by-word pipeline may be used in conjunction for a combined effect.

2.4.4. Clean House Filter

It is always preferable to maintain clean separation between DeltaXML and permanent XML storage files. The com.deltaxml.pipe.filters.CleanHouse and clean-house.xsl filters removes DeltaXML attributes. Using either of them directly against DeltaXML delta files is senseless, as that destroys information content. Rather, either of the filters should be used to purge XML which is derived from DeltaXML output. The XSL logic in this filter is easily integrated into other XSL scripts (and is indeed part of the merge script). Use one of them as a final processing step to catch any DeltaXML elements or attributes that have not previously been replaced.

Chapter 3. Using Filters with PipelinedComparator

3.1. Introduction

Using filters within a pipeline is a fundamental architectural principle for DeltaXML.

Building such a pipeline the first time can be daunting. However, once your first pipeline has been built, the application of such pipelining techniques can be seen to be both repetitive and somewhat verbose.

To overcome this DeltaXML has provided the PipelinedComparator class in the com.deltaxml.core package. This greatly simplifies the job of creating an XML processing pipeline. Indeed it makes extremely straight forward and improves the clarity of your code.

Filters are used both before and after, execution of DeltaXML and thus PipelinedComparator allows you to define one or more input filters and one or more output filters. This is done using simple to use methods such as setOutputFilters() and setInputFilters().

These methods are overloaded such that they can take either a set of Java XML Filter classes, a list of XSL files, a set of templates, a set of URLs or a mixture of all of these, making it easy to construct a pipeline implemented by a mixture of Java, XSL, templates, etc.

It is also possible to set parser properties and features as well as comparator properties and features using the PipelinedComparator.

We will look at all of these in the following sections.

3.2. Using Java XML Filters

The com.deltaxml.core.PipelinedComparator class allows a list of input and output filters to be specified and then a comparison performed. If the filters are implemented in Java, as com.deltaxml.pipe.filters.NormalizeSpace is, then the class object is passed to the PipelinedComparator.

A class object can be obtained in a number of ways, for example, by using the class Class and the method forName:

Class.forName("com.deltaxml.pipe.filters.NormalizeSpace");

or by using the .class extension on the name:

com.deltaxml.pipe.filters.NormalizeSpace

This is illustrated in the sample programs presented in this chapter.

As an example consider the following program. This simple program uses only Java XML Filters. It carries out a common DeltaXML pipeline. That is, it normalises the data, and then applies the Word-By-Word filters around the actual DeltaXML comparison.

import java.io.File; 
 
import com.deltaxml.core.PipelinedComparator; 
import com.deltaxml.core.PipelinedComparatorException;
import com.deltaxml.pipe.filters.NormalizeSpace; 
import com.deltaxml.pipe.filters.WordByWordInfilter; 
import com.deltaxml.pipe.filters.WordByWordOutfilter1; 
import com.deltaxml.pipe.filters.WordByWordOutfilter2; 
 
public class PipelinedComparatorTest1 { 
  public static void main(String[] args) 
    throws PipelinedComparatorException
  { 
    PipelinedComparator pc = new PipelinedComparator(); 
    // Set up the input filter 
    Class[] inFilterClasses =  
        new Class[] { NormalizeSpace.class, 
                      WordByWordInfilter.class }; 
    pc.setInputFilters(inFilterClasses); 
    // Now setup the output filters 
    Class[] outFilterClasses =  
        new Class[] { WordByWordOutfilter1.class, 
                      WordByWordOutfilter2.class }; 
    pc.setOutputFilters(outFilterClasses); 
    // Now run the DeltaXML comparison 
    pc.compare(new File("old.xml"),  
               new File("new.xml"),  
               new File("out.xml")); 
  } 
}
        

As you can see from the above program, setting up a pipeline using PipelinedComparator is extremely easy and much simpler than if you had to constrcut the pipeline yourself.

Note that in this example, we are importing the classes implementing the Java XML Filters that we will use at the top of the listing. Thus we only need to reference the name of the class (i.e. NormalizeSpace.class) rather its fully qualified equivalent (i.e. com.deltaxml.pipe.filters.NormalizeSpace) when creating a class array.

An array of classes is just like any other object array in Java but holds class objects (you may not have been aware of this facility in Java but it can be quiet useful at times). In this case we pass the array of classes to either the setInputFilters method or the setOutputFilters method (depending on whether the array holds input or output filters). The filters are then applied in the order that they are defined within the class array. Thus the NormalizeSpace filter will be applied before the WordByWordInfilter.

In the above example we are defining the array of classes before we need to use them, we have done that for clarity here, however we could equally have used the rather more concise array initialise format, for example:

PipelinedComparator pc= = new PipelinedComparator(); 
pc.setInputFilters(
       new Class[] {NormalizeSpace.class, WordByWordInfilter.class});

We now need two XML files to compare to illustrate running this program. We will use the following two XML files:

Example 3.1. Initial old.xml file

<AddressList> 
 <person> 
  <name>John Smith</name> 
  <street>10 Grays Inn Road</street> 
  <city>London</city> 
  <postcode>WC1X 8TX</postcode> 
 </person> 
</AddressList>

And

Example 3.2. Modified new.xml file

<AddressList> 
 <person> 
  <name>John Smith</name> 
  <street>12 Grays Inn Road</street> 
  <city>London</city> 
  <postcode>WC1X 8TX</postcode> 
 </person> 
</AddressList>

As you can see the only difference between these two files is that the street number 10 has changed to 12. As we are using the Word-By-Word filters we will be able to identify this change from within the street elements PCDATA.

To execute this program we can issue the following command from the Windows command line:

java -cp deltaxml.jar;saxon.jar;xercesImpl.jar;. PipelinedComparatorTest1

This assumes that you have the three jars provided with the DeltaXML distribution in your current working directory, along with the files old.xml and new.xml. If you are on a Unix platform you will to modify this such that the separator used for the class path is ":"

The result of executing this program is presented below:

<?xml version="1.0" encoding="utf-8"?>
<AddressList 
     xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
     deltaxml:delta="WFmodify">
  <person deltaxml:delta="WFmodify">
     <name deltaxml:delta="unchanged"/>
     <street deltaxml:delta="WFmodify">
       <deltaxml:PCDATAmodify>
         <deltaxml:PCDATAold>
           10
         </deltaxml:PCDATAold>
         <deltaxml:PCDATAnew>
           12
         </deltaxml:PCDATAnew>
       </deltaxml:PCDATAmodify>
       Grays Inn Road
     </street>
     <city deltaxml:delta="unchanged"/>
     <postcode deltaxml:delta="unchanged"/>
   </person>
</AddressList>

That is all there is to running Java XML Filters with DeltaXML. In the next section we will look at how we can achieve exactly the same result using XSL filters and then move onto using a mixture of Java and XSL filters.

3.3. Using XSL Filters

In this example, we will present the previous sections program but using XSL filters instead of Java XML Filters. This version of the program is presented below:

import java.io.File; 
import java.io.FileNotFoundException; 

import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
 
public class PipelinedComparatorTest2 { 
  public static void main(String[] args) 
    throws PipelinedComparatorException,  
           FileNotFoundException 
  { 
    PipelinedComparator pc = new PipelinedComparator(); 
    // Set up the input filter 
    File [] inFilterFiles =  
        new File [] { new File("normalize-space.xsl"), 
                      new File("word-by-word-infilter.xsl")}; 
    pc.setInputFilters(inFilterFiles); 
    // Now setup the output filters 
    File [] outFilterFiles =  
        new File [] { new File("word-by-word-outfilter1.xsl"), 
            new File("word-by-word-outfilter2.xsl")}; 
    pc.setOutputFilters(outFilterFiles); 
    // Now run the DeltaXML comparison 
    pc.compare(new File("old.xml"),  
               new File("new.xml"),  
               new File("out.xml")); 
  } 
}
        

If you compare this program, with that presented in the last section you will find they are very similar. The only difference is that instead of using an array of classes, we are now using an array of files. This make sense as the XSL filters are implemented in a number of XSL files which we need to pass to the PipelinedComparator. The only other difference is that the main method now throws the FileNotFoundException - as the files may not be found at run time.

For completeness, when we run this XSL filter based program on the XML files old.xml and new.xml (presented in the last section) we obtained the following delta file:

<?xml version="1.0" encoding="utf-8"?>
<AddressList 
     xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
     deltaxml:delta="WFmodify">
  <person deltaxml:delta="WFmodify">
    <name deltaxml:delta="unchanged"/>
      <street deltaxml:delta="WFmodify">
        <deltaxml:PCDATAmodify>
          <deltaxml:PCDATAold>
            10
          </deltaxml:PCDATAold>
          <deltaxml:PCDATAnew>
            12
          </deltaxml:PCDATAnew>
        </deltaxml:PCDATAmodify> 
          Grays Inn Road
      </street>
    <city deltaxml:delta="unchanged"/>
    <postcode deltaxml:delta="unchanged"/>
  </person>
</AddressList>

If you compare this changes file with that presented in the last section you will find that they are exactly the same. Thus whether you use the Java or the XSL versions of the filters, you obtain the same result.

3.4. Mixing Java XML Filters and XSL filters

There is absolutely no reason at all why you must choose only Java or only XSL filters. You can mix the two together if you wish. That is, you can provide a mixed list of Java classes and XSL stylesheets in any order both for input and output purposes. This is illustrated in the following program:

import java.io.File; 
import java.io.FileNotFoundException; 
import java.util.ArrayList; 
import java.util.List; 

import com.deltaxml.core.PipelinedComparator; 
import com.deltaxml.core.PipelinedComparatorException;
import com.deltaxml.pipe.filters.WordByWordInfilter; 
import com.deltaxml.pipe.filters.WordByWordOutfilter1; 
import com.deltaxml.pipe.filters.WordByWordOutfilter2; 
 
public class PipelinedComparatorTest3 { 
 
  public static void main(String[] args) 
    throws PipelinedComparatorException, 
           FileNotFoundException {
      PipelinedComparator pc= new PipelinedComparator(); 
      // Set up the input filter 
      List inFilters = new ArrayList(); 
      inFilters.add(new File("normalize-space.xsl")); 
      inFilters.add(WordByWordInfilter.class); 
      pc.setInputFilters(inFilters); 
      // Now setup the output filters 
      List outFilters= new ArrayList(); 
      outFilters.add(WordByWordOutfilter1.class); 
      outFilters.add(WordByWordOutfilter2.class); 
      outFilters.add(new File("deltaxml-tables.xsl")); 
      pc.setOutputFilters(outFilters); 
      // Now run the comparison 
      pc.compare(new File("old.xml"),  
                 new File("new.xml"),  
                 new File("out.html"));    
  } 
} 
        

In the above program (PipelinedComparatorTest3) we have used two input filters and three output filters. In each case one of the filters is implemented as an XSL script and the others implemented as Java XML Filters. However, DeltaXML does not need to be concerned with the actual implementation, both approaches work as filters. Indeed both the Normalize Space and the Word-By-Word filters are available as XSL files or as Java XML Filters. You could try changing the type used and seeing the results (they should be exactly the same).

Note that this time we have not used an array of classes or files, instead we are using a List object (in fact an ArrayList). This is because we have a mixed or heterogeneous list of object types that must be passed to PipelinedComparator. Internally, PipelinedComparator will handle the differences between the different objects in the lists and configure the pipeline in the appropriate manner.

The actual result of running this program is that the out.html file is generated which is presented below:

PIC-mixed-filter-output.gif
Click to enlarge
Delta file generated from heterogeneous filters

Note that in general the Java XML filters are faster and have lower memory overheads and are thus often preferable to their XSL equivalents.

3.5. Parameterized Filters

It is often useful to write filters (whether they are Java filters or XSL filters) that take parameters, so that the filter's behaviour can be changed depending on the value of those parameters. The com.deltaxml.core.ParameterizedFilter class is provided for this purpose. It can be used with either XSL filters or Java filters with little difference to the code used.

To create a ParameterizedFilter pass either a Class object (for Java filters), or a File, Templates or URL object (for XSL filters) to the constructor:

ParameterizedFilter filter1= new ParameterizedFilter(MyFilter.class);

or

ParameterizedFilter filter2= new ParameterizedFilter(new File("myFilter.xsl"));

Once created, a ParameterizedFilter object can then have parameters assigned to it using the setStringParameter method. This method takes two strings, one for the name of the parameter and one for it's value. If the filter used to create the ParameterizedFilter is an XSL filter, it must include an <xsl:param/> element in the appropriate place with the name attribute having the same value as the name parameter to setStringParameter. If the filter is a Java filter, it must declare a method named setXXX where XXX is the same string as the name parameter passed to setStringParameter. The method declared on the Java filter must take a single string parameter.

filter1.setStringParameter("outputcomments", "true");

The above example will set the parameter 'outputcomments' to the value 'true'. To use the above with an XSL filter, it must contain the element <xsl:param name="outputcomments"/>. A Java filter must declare the method public void setoutputcomments(String value)

To use a ParameterizedFilter with the PipelinedComparator, you must use the List version of setInputFilters or setOutputFilters. Simply add the ParameterizedFilter to the List before calling one of these methods.

Using ParameterizedFilters can cause either the FilterParameterizationException or the FilterParameterizationNotSupportedException to be thrown. The first of these may be thrown when using Java filters, the second when using XSL filters. The exceptions are thrown by the setOutputFilters and setInputFilters methods. For more information on why the exceptions are thrown, see the API Javadoc.

Chapter 4. Java XML Filters

Those of you new to Java XML Filters should take a look at the tutorial we provide on writing such filters in "Guide to writing Java XML Filters for DeltaXML".

Writing Java XML Filters is not an area which is widely covered by some. References that might be of use include:

Chapter 5. XSL Filters

5.1. XSL Tutorials

Those new to XSL should undertake background study before using it with DeltaXML. There are many good books available, such as the XSLT Programmer's Reference by Michael Kay (author of Saxon). The world wide web offers a number of useful introductions, including those prepared by:

Some XSL tutorials focus on presentation issues such as (X)HTML creation. Keep in mind that DeltaXML typically involves XML-to-XML transformations, though presentation issues also have a place. If you produce web pages, we recommend you prefer XHTML over HTML as your data storage format.

5.2. XSL Software System Configuration

The Extensible Stylesheet Language (XSL) specifies transformations that can be made to XML data. XSL is a declarative language expressed in XML syntax. Many XML software packages offer XSL engines, which are what actually implement the language. XSL transformations can produce virtually any type of output, e.g. formatted reports or web pages, not just XML (hence the question mark in the following diagram).

PIC-xsl-dataflow.gif
Data flow of the Extensible Stylesheet Language (XSL)

At this point you may be wondering how XSL Transformations fit into the idea of filtering XML for using in DeltaXML pre and post processing tasks. The term filter is more appropriate here than transformation as the aim of these XSL scripts is to produce an XML document as output that shares more similarities than differences with the input XML document. This is exactly the definition of a filter presented at the start of this tutorial.

A water filter removes impurities, but the result is still water; boiling transforms water into steam, a substance altogether different. XSL has an equally broad capability. For example, XSL filters can strip spurious differences to isolate those of semantic importance, allowing DeltaXML to detect real differences unencumbered. Filters may augment as well as subtract; just as a water filter might add flavoring, an XSL filter might add markup to XML data. Fitting XSL into the overall picture, we obtain a DeltaXML processing chain:

PIC-xsl-deltaxml-chain.gif
Typical DeltaXML XSL Filter processing chain

As this diagram shows, XSL operates on both input and output sides of DeltaXML; but XSL is not mandatory on either side. Technically, all that DeltaXML requires is well-formed XML. Nonetheless, this scenario is the most flexible way to use DeltaXML. In fact, more general configurations can involve multiple XSL filters.