Guide to using Filters with DeltaXML
Table of Contents
- 1. Filters
- 2. DeltaXML Filter Descriptions
- 2.1. Introduction
- 2.2. Input Filters / Prefilters
- 2.3. Output filters / Postfilters
- 2.4. Pipelined filters
- 3. Using Filters with PipelinedComparator
- 3.1. Introduction
- 3.2. Using Java XML Filters
- 3.3. Using XSL Filters
-
3.4. Mixing Java XML Filters and XSL
filters
- 3.5. Parameterized Filters
- 4. Java XML Filters
- 5. XSL Filters
List of Examples
- 2.1. Original SVG describing three semi-transparent, overlapping circles
- 2.2. XSL-filtered SVG with DeltaXML annotations for orderless differencing
- 2.3. Raw XML Schema
- 2.4. Filtered XML Schema
- 2.5. Simple test file with PCDATA
- 2.6. Word-by-word XSL result
- 3.1. Initial old.xml file
- 3.2. Modified new.xml file
Chapter 1. Filters
1.1. Introduction
This tutorial explains how Java XML filters and XSL filters can be used to complement the DeltaXML Core API when comparing XML files. Such filters should be viewed as an integral part of DeltaXML processing. While not mandatory, they belong in the top drawer of your toolbox. Java XML filters and XSL scripts can perform a range of data pre and post processing functions which greatly enhance the power of DeltaXML comparisons. For example, with pre-processing filters you can:
-
perform data normalization,
-
carry out key insertion
-
white space removal,
-
and word-by-word markup prior to comparison.
They can also be used to customize the display of DeltaXML output to make it easier for users or programs to identify and use changes.
XML filtering makes possible DeltaXML comparisons that would otherwise be impossible. XML filters can also shield data from DeltaXML annotations.
DeltaXML ships with several ready-made Java XML filters and XSL scripts. These do not exhaust all possibilities, but serve as useful teaching aids which will be used in this paper to explain how and why XSL filters should be used to complement DeltaXML.
1.2. What are filters?
A filter is a processing element that receives XML data from some other processing element, carries out some form of operation on that data, and then passes that data onto another processing element.
The primary aim of a filter is to produce an XML document as output, that shares more similarities than differences with the input XML document.
For example, consider the role of a water filter. The intention of such a filter is to remove impurities from the water, but the result is still water. This is similar to the role of a filter with respect to DeltaXML - that is you may use a filter to remove un-wanted or noisy data from an XML document, before presenting it to DeltaXML for comparison.
Thus it is normal to create a pipeline containing 1 or more filters that either pre process or post process that data which is compared by DeltaXML. This is illustrated in the following diagram:
![]() |
| Using filters with DeltaXML |
In terms of implementation, there are two ways inwhich a filter may be implemented, these are:
-
as a Java XML Filter using the
org.xml.sax.XMLFilterinterface which is part of the SAX API -
as a XSL Script defined, for example, within an XSL file. Note that although XSL is typically thought of as a transform oriented language, it is also consistent to think of it as a language for processing XML.
If you wish to learn more about writing your own filters then see the appropriate tutorials.
1.3. Usage Patterns
The conceivable applications of XML filters are endless, but guidelines and usage patterns common to DeltaXML merit attention.
1.3.1. Input Side
Input-side, or prefilters, typically perform two functions: normalization and attribute insertion. As a rule, the same prefilter should process both input files, unless they stem from sources so different as to mandate separate processing.
Normalization means enforced conformance to the "normal form" that applies in a given differencing application. This form varies from one application to another, but the idea is simple to grasp by example. Legacy data employing outdated element names could pass through a renaming filter, matching them to modern practice. Legacy elements no longer meaningful can be stripped. Differencing particular classes of elements is possible by stripping others out. The common theme of normalization is to eliminate uninteresting XML differences ahead of comparison.
Attribute insertion has primarily in view the deltaxml:ordered and
deltaxml:key attributes, but may involve others dictated by
application needs. DeltaXML detects attribute changes too, so legacy
considerations apply to them just as to elements. With respect to DeltaXML
attributes, XSL development follows a simple recipe:
-
Expand the Document Type Definition (DTD) or Schema. If such a controlling specification exists, it reveals which XML elements require DeltaXML attributes. (If not, advance to the next step.) Expand each DTD/schema item fully to see its actual structure or you risk missing a repeating content particle.
-
Identify repeating content particles using the results of the previous step or knowledge of the XML data.
-
For each repeating content particle, determine whether its content is orderless, and if so, insert the attribute
deltaxml:ordered="false". (MIXEDandANYcontent is always considered ordered by virtue of PCDATA.) -
For orderless content, determine suitable keys for individual elements, and assign them using the deltaxml:key attribute. (A separate paper discusses how to design keys.)
-
For ordered content, determine if keys should be used, and if so, assign them. Keys are optional for ordered content, but often helpful to ensure matching of elements whose structural position has changed.
-
Check the behavior of the script. XSL scripts need debugging like any other computer program.
In addition to the DeltaXML attributes, your XSL script(s) can insert, remove, or modify attributes specific to the application. It is always possible, and often advisable, to perform these operations in separate XSL scripts, factoring out DeltaXML-specific code.
1.3.2. Output Side
DeltaXML offers a choice of output format: standard deltas and full context deltas. Standard deltas shows only changes. In the absence of changes, this output is empty. Full context deltas shows all data, with changes in situ. In the absence of changes, this output equals the corresponding DeltaXML inputs. Your XSL scripts may utilize either type of output. The proper choice depends on the application in question. A middle ground is also possible. When the full context is too much, XSL can be used to generate a partial context format. Select the full context option and use XSL to discard irrelevant portions of the full context.
Postfilters (or transforms) might produce more XML, a formatted report, a web page in XHTML, or output suitable for downstream processing. User visualization generally benefits from the full context option. Most kinds of automated processing should opt for standard deltas.
1.3.3. Pipelines
One common XML Filter configuration is the pipeline. A pipeline employs related pre- and postfilters. They work in tandem to achieve a required result. The term simply denotes a chain of operations. One useful pipeline is a mirror pipeline in which the postfilter reverses the prefilter's effects. Arranged back-to-back, these filters would accomplish nothing, but practical possibilities manifest when DeltaXML is used between them. One filter may create ephemeral data, such as keys, for DeltaXML's private use while the other removes them so that users do not see data only required to ensure accurate comparisons.
|
||
| Pipelining filters with DeltaXML |
An important point to note about the natur eof pipelines, is that because intermediate data appears only when and where needed, it obeys a kind of encapsulation rule.
Data encapsulation confers several benefits. Suppose you require orderless or keyed comparisons. These comparisons necessitate the application of DeltaXML attributes. There are two ways to incorporate them. One is to write them directly into the XML files, for permanent storage. The other is to write them into XSL scripts. The latter option encapsulates DeltaXML attributes inside the scripts. The benefit is that, as underlying XML formats evolve, these attribute assignments need change only in the scripts, not everywhere the XML data is required. For example, if an element switches from ordered to orderless, a simple rule change in the XSL enacts the change with zero impact on extant data files. For the same reason, encapsulation means that new features can easily be added in future versions of DeltaXML.
1.3.4. Code re-use
DeltaXML has been designed to facilitate Java XML and XSL filter re-use. Its output format clones that of its input. Element names and hierarchy nesting are identical. Consequently any input filter serves as good boilerplate for another to manipulate DeltaXML on the output side.
1.4. Tools
There are essentially two XSL/DeltaXML usage modes: development and production. The distinction arises from issues such as manual versus automatic operation and lowversus high-volume processing. IDEs greatly help development mode, while production mode usually requires console tools. Production mode is often the ultimate goal, unless the processing is to be guided manually. Manual operation is satisfactory for smaller batches and one-of-a-kind processing. Otherwise it eventually yields to production mode.
1.4.1. Development Mode - Eclipse IDE
The Eclipse IDE is an ideal development platform. For example, you can step through XSL scripts just as you might step through code in a C++ debugger. Setting up Eclipse for XSL work involves three installations:
Java Runtime Engine: http://java.sun.com
Eclipse IDE: http://www.eclipse.org
SunBow plugins: http://radio.weblogs.com/0108489/
-
If your system lacks a Java Runtime Engine, install one (bare JRE or full SDK).
-
Download the Eclipse binaries for your platform and run the Eclipse installer.
-
Launch Eclipse. Allow it to self-configure, then quit the program.
-
Download the sunBow plug-ins.
-
Unpack the archive and drag its contents into the Eclipse plugins subdirectory.
-
Launch Eclipse again, and sunBow will be configured.
-
Work through Eclipse and sunBow tutorials in their respective on-line help sections.
The sunBow suite offers two ways to execute XSL transformations from a simple
mouse context menu. After you have added the desired XML and XSL files to your
project, select both files in the Navigator view, then right-click on them to
obtain the mouse context menu. The two options are XSL Trace...
and XSL Transformer....
1.4.2. Production Mode
Two good choices for production mode are Apache Xalan and Saxon. Here are brief instructions on their installation.
1.4.2.1. Apache Xalan
Java Runtime Engine: http://java.sun.com
Xalan: http://xml.apache.org/xalan-j
-
If your system lacks a Java Runtime Engine, install one (bare JRE™ or full SDK™).
-
Download and unpack the current Xalan distribution. This distribution provides
xalan.jar(and, if you wish to use Xerces,xercesImpl.jarandxml-apis.jar). -
- Invoke Xalan against a DeltaXML output file, for example:
java -cp .../xalan.jar:.../xml-apis.jar:.../xercesImpl.jar org.apache.xalan.xslt.Process -xsl deltaxml-tables.xsl -in deltafile.xml -out pretty-delta.html
1.4.2.2. Saxon
Java Runtime Engine: http://java.sun.com
Saxon: http://saxon.sourceforge.net
-
If your system lacks a Java Runtime Engine, install one (bare JRE or full SDK).
-
Download and install Saxon per the instructions.
-
Invoke Saxon against a DeltaXML output file, for example:
saxon compare-1and2.xml deltaxml-tables.xsl > display-1and2.html
Chapter 2. DeltaXML Filter Descriptions
2.1. Introduction
This section provides short summaries of the XSL filters supplied with the DeltaXML distribution. There are two types of filter provided, Java XML Filters and XSL stylesheet Filters. The XSL definitions are provided as XSL files within the samples directory or the distribution. The Java XML Filters are provided as part of the public source.
The Java classes and the XSL files both incorporate comments elaborating their particular implementation details.
These files may serve as starting points for your own Java and XSL development. They represent a sampling of what is possible.
Note that in general, we have found that the Java classes have lower memory overheads and offer faster execution times and so are generally preferred to the XSL versions. Also please note that all the Java filters are also available as XSL filters (although in some cases there is no Java equivalent for a provided XSL filter).
2.2. Input Filters / Prefilters
Input filters (or prefilters) are filters used to pre-processing the XML before it is presented to DeltaXML.
2.2.1. Normalize Space Filter
This prefilter reduces each sequence of whitespace characters to a single space. Whitespace has semantic relevance only within PCDATA, so it can be consolidated harmlessly everywhere else. This filter preserves leading and trailing spaces within PCDATA as required by XHTML applications of DeltaXML.
This filter is available both as an XSL file (normalize-space.xsl) and as a Java CML Filter class (com.deltaxml.pipe.filters.NormalizeSpace).
2.2.2. svg-input-filter.xsl
Scalable Vector Graphics is an XML format describing two-dimensional vector graphics. SVG is an ordered format: objects are rendered in the order listed. Yet this order matters only in case of overlaps, and even then, transparency may nullify the graphical effect. Without overlaps, the rendering order does not matter. For this reason most SVG files may be considered orderless from a DeltaXML standpoint.
Svg-input-filter.xsl inserts annotations to enable orderless SVG
differencing. The following SVG file describes three semi-transparent,
overlapping circles:
Example 2.1. Original SVG describing three semi-transparent, overlapping circles
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg">
<g style="fill-opacity:0.7; stroke:black; stroke-width:0.1cm;">
<circle cx="6cm" cy="2cm" r="100" style="fill:red;"
transform="translate(0,50)" />
<circle cx="6cm" cy="2cm" r="100" style="fill:blue;"
transform="translate(70,150)" />
<circle cx="6cm" cy="2cm" r="100" style="fill:green;"
transform="translate(-70,150)" />
</g>
</svg>
The XSL-filtered result includes DeltaXML annotations for orderless comparison:
Example 2.2. XSL-filtered SVG with DeltaXML annotations for orderless differencing
<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg"
xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
deltaxml:ordered="false">
<g deltaxml:ordered="false"
style="fill-opacity:0.7; stroke:black; stroke-width:0.1cm;">
<circle deltaxml:ordered="false" cx="6cm" cy="2cm" r="100"
style="fill:red;" transform="translate(0,50)" />
<circle deltaxml:ordered="false" cx="6cm" cy="2cm" r="100"
style="fill:blue;" transform="translate(70,150)" />
<circle deltaxml:ordered="false" cx="6cm" cy="2cm" r="100"
style="fill:green;" transform="translate(-70,150)" />
</g>
</svg>
Not at present there is no Java equivalent to this filter.
2.2.3. Schema-input-filter.xsl
Schemas are valid XML files in their own right, and may be differenced with each other. This input-side XSL script facilitates schema comparisons. (The 1999 version assumes the XML Schema 1999 definition.) It is useful for tracking schema changes over time, for example, over the course of schema development. Schema differencing can also be useful in comparing and consolidating schemas designed for similar purposes.
While the XSL derivation of this filter is rather involved, its action is
simple. The filter insets a set of deltaxml:key attributes to
enable orderless comparison of element definitions. Suppose we have a short
schema:
Example 2.3. Raw XML Schema
<?xml version='1.0'?>
<schema xmlns='http://www.w3.org/2001/XMLSchema'>
<element name='test1'>
<complexType>
<all>
<annotation>
<documentation>
Some documentation
</documentation>
</annotation>
<element ref='A' minOccurs='1' maxOccurs='1'/>
<element ref='B' minOccurs='1' maxOccurs='1'/>
<element ref='C' minOccurs='1' maxOccurs='1'/>
</all>
</complexType>
</element>
<element name='test2'>
<complexType>
<sequence>
<element ref='A'/>
<element ref='B'/>
<element ref='C'/>
</sequence>
</complexType>
</element>
<element name='test3'>
<complexType>
<choice>
<element ref='A' />
<element ref='B' />
<element ref='C' />
</choice>
</complexType>
</element>
</schema>
The result of XSL processing is:
Example 2.4. Filtered XML Schema
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
deltaxml:ordered="false">
<element deltaxml:ordered="false" deltaxml:key="test1" name="test1">
<complexType deltaxml:ordered="false" deltaxml:key="single">
<all deltaxml:ordered="false" deltaxml:key="single">
<annotation deltaxml:key="single">
<documentation>
Some documentation
</documentation>
</annotation>
<element deltaxml:key="A" ref="A"
minOccurs="1" maxOccurs="1" />
<element deltaxml:key="B" ref="B"
minOccurs="1" maxOccurs="1" />
<element deltaxml:key="C" ref="C"
minOccurs="1" maxOccurs="1" />
</all>
</complexType>
</element>
<element deltaxml:ordered="false" deltaxml:key="test2" name="test2">
<complexType deltaxml:ordered="false" deltaxml:key="single">
<sequence deltaxml:key="single">
<element ref="A" />
<element ref="B" />
<element ref="C" />
</sequence>
</complexType>
</element>
<element deltaxml:ordered="false" deltaxml:key="test3" name="test3">
<complexType deltaxml:ordered="false" deltaxml:key="single">
<choice deltaxml:ordered="false" deltaxml:key="single">
<element deltaxml:key="A" ref="A" />
<element deltaxml:key="B" ref="B" />
<element deltaxml:key="C" ref="C" />
</choice>
</complexType>
</element>
</schema>
Now the schema is ready for DeltaXML comparisons with previous or later versions (which must also run through the filter).
Schemas are XML that describe external XML. Reasoning about their differences
exercises a few more gray cells than normal XML. The schema filter's operations
are described by comments in the XSL file. The salient aspects are manifest in
the output above. Note the use of element names as keys, and the use of the
"single" key. This key enables DeltaXML to correlate schema
elements that may appear only once, but not necessarily always in the same
place. The "single" key sometimes appears superfluous, but this is
only because the filter applies it to all matching templates indiscriminately.
This conservative design simplifies the logic of the XSL stylesheet.
Document Type Definitions are an older alternative to schemas, but are not encoded in XML . Therefore DeltaXML cannot difference DTDs. DeltaXML can difference the XML files controlled by a DTD, or indeed any well-formed XML at all, whether tied to a schema, a DTD, or free-standing.
Note that at present there is no Java equivalent for the XSL filter.
2.3. Output filters / Postfilters
An output (or post) filter is a filter that processing the XML data produced by the DeltaXML engine. For example, it may post process the DeltaXML delta file into XHTML for presentation within a browser.
2.3.1. deltaxml-folding-html.xsl
If you are working with normal delta output, use the deltaxml-tables.xsl filter to create XHTML tables that can be viewed within a browser. This is a particularly useful filter allowing the delta file informaiton to be viewed in a particularly digestable format.
An example of the output generated by this XSL file, viewed in a browser, is presented below:
| Click to enlarge |
Note that at present there is no Java equivalent fo rthis filter.
2.3.2. Merge Scripts
A primary application for change detection is merging XML data from multiple versions of a file. DeltaXML can perform two-way and three-way mergings. (The three-way operation is not available prior to version 3.0.)
Two-way merging unifies data from two input files. These files might be closely related or completely different. It is supported by the deltaxml-merge.xsl file. This XSL file converts a full delta file, produced by deltaXML programs, into a merged file. The merged file will have all the elements and attributes from both input files. In cases where this is not desired, the stylesheet can be modified. PCDATA is written out with a delimiter to show old and new data, and again this can be changed as needed in specific circumstances. Note that this is a simple merge of two files.
A three-way merging unifies two changes that have branched from a shared base file and is much more complex than a two way merge. Changes for each branch are reconciled and merged. Note that a three-way merging involves more than a simple XSL script.
A separate paper details merging operations in more detail.
2.4. Pipelined filters
Pipelined filters are filters that are explicitly designed to receive XML data as input and to generate XML data as output to be fed into either another filter or to be processed by DeltaXML. These filters differ from post filters in that the post filter may generate an output format that is not compatible with JAXP pipelines. In contrast these filters must work as a component within a JAXP pipeline. This implies that they must work within the mechanisms used to trigger JAXP pipelines
2.4.1. Word-by-word
PCDATA is not structured XML. DeltaXML can detect when one PCDATA block differs from another, but does not resolve individual changes within the blocks. DeltaXML records only that the block has changed. Sometimes one needs to identify changes within PCDATA. DeltaXML is supplied with filters that convert PCDATA blocks into structured XML so that individual, "word-byword" changes become detectable, and then convert the output back to PCDATA. The pipeline consists of three filters, used in the following order:
-
com.deltaxml.pipe.filters.WordByWordInfilterorword-by-word-infilter.xsl, -
DeltaXML,
-
com.deltaxml.pipe.filters.WordByWordOutfilter1orword-by-word-outfilter1.xsl -
com.deltaxml.pipe.filters.WordByWordOutfilter2orword-by-word-outfilter2.xsl
Here is a simple example of the input-side filter at work:
Example 2.5. Simple test file with PCDATA
<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1">
<TestParagraph>
This is a test.
</TestParagraph>
</Document>
We can then process this using the word-by-word-infilter.xsl and we obtain:
Example 2.6. Word-by-word XSL result
<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1">
<TestParagraph>
<deltaxml:space />
This
<deltaxml:space />
is
<deltaxml:space />
a
<deltaxml:space />
test.
<deltaxml:space />
</TestParagraph>
</Document>
The output-side filters complete this pipeline by reversing the effects shown, i.e., stripping the inserted annotations and consolidating adjacent changes. After performing DeltaXML comparison this is typically the desired behavior. Note that the second XSL output filter is highly recursive and appears to run much faster in Saxon than Xalan. For larger files Saxon may be preferred. However, it is worth considering the Java filters before trying anything more drastic.
2.4.2. DocBook change bar generation
DeltaXML provide an as is XSL stylesheet that takes the delta generate from comparing two DocBook files and generates a merged file with revision information included. This can then be processed by the DocBook style sheets provided by Norman Walsh to generate HTML pages with change information presented visually.
Two filters are provided by DeltaXML for this process, an input filter
docbook-infilter.xsl and an output filter
docbook-outfilter.xsl
For the change bars style sheet see DocBook XSL Stylesheets: http://docbook.sourceforge.net/projects/xsl/
2.4.3. XHTML
When differencing XHTML, it is often convenient to inspect changes visually.
The xhtmlinfilter. xsl stylesheet can be used to convert XHTML
unique identifiers and metadata element names into DeltaXML keys to enhance
comparisons. It also normalizes spaces in attribute values, etc, to ensure the
closest possible matches.
The xhtml-outfilter.xsl stylesheet produces XHTML showing the
differences in situ. The word-by-word pipeline may be used in conjunction for a
combined effect.
2.4.4. Clean House Filter
It is always preferable to maintain clean separation between DeltaXML and
permanent XML storage files. The
com.deltaxml.pipe.filters.CleanHouse and
clean-house.xsl filters removes DeltaXML attributes. Using either
of them directly against DeltaXML delta files is senseless, as that destroys
information content. Rather, either of the filters should be used to purge XML
which is derived from DeltaXML output. The XSL logic in this filter is easily
integrated into other XSL scripts (and is indeed part of the merge script). Use
one of them as a final processing step to catch any DeltaXML elements or
attributes that have not previously been replaced.
Chapter 3. Using Filters with PipelinedComparator
3.1. Introduction
Using filters within a pipeline is a fundamental architectural principle for DeltaXML.
Building such a pipeline the first time can be daunting. However, once your first pipeline has been built, the application of such pipelining techniques can be seen to be both repetitive and somewhat verbose.
To overcome this DeltaXML has provided the PipelinedComparator
class in the com.deltaxml.core package. This greatly simplifies the
job of creating an XML processing pipeline. Indeed it makes extremely straight
forward and improves the clarity of your code.
Filters are used both before and after, execution of
DeltaXML and thus PipelinedComparator allows you to define one or
more input filters and one or more output filters. This is done using simple to
use methods such as setOutputFilters() and
setInputFilters().
These methods are overloaded such that they can take either a set of Java XML Filter classes, a list of XSL files, a set of templates, a set of URLs or a mixture of all of these, making it easy to construct a pipeline implemented by a mixture of Java, XSL, templates, etc.
It is also possible to set parser properties and features as well as
comparator properties and features using the PipelinedComparator.
We will look at all of these in the following sections.
3.2. Using Java XML Filters
The com.deltaxml.core.PipelinedComparator class allows a list of
input and output filters to be specified and then a comparison performed. If the
filters are implemented in Java, as
com.deltaxml.pipe.filters.NormalizeSpace is, then the class object
is passed to the PipelinedComparator.
A class object can be obtained in a number of ways, for example, by using the
class Class and the method forName:
Class.forName("com.deltaxml.pipe.filters.NormalizeSpace");
or by using the .class extension on the name:
com.deltaxml.pipe.filters.NormalizeSpace
This is illustrated in the sample programs presented in this chapter.
As an example consider the following program. This simple program uses only Java XML Filters. It carries out a common DeltaXML pipeline. That is, it normalises the data, and then applies the Word-By-Word filters around the actual DeltaXML comparison.
import java.io.File;
import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
import com.deltaxml.pipe.filters.NormalizeSpace;
import com.deltaxml.pipe.filters.WordByWordInfilter;
import com.deltaxml.pipe.filters.WordByWordOutfilter1;
import com.deltaxml.pipe.filters.WordByWordOutfilter2;
public class PipelinedComparatorTest1 {
public static void main(String[] args)
throws PipelinedComparatorException
{
PipelinedComparator pc = new PipelinedComparator();
// Set up the input filter
Class[] inFilterClasses =
new Class[] { NormalizeSpace.class,
WordByWordInfilter.class };
pc.setInputFilters(inFilterClasses);
// Now setup the output filters
Class[] outFilterClasses =
new Class[] { WordByWordOutfilter1.class,
WordByWordOutfilter2.class };
pc.setOutputFilters(outFilterClasses);
// Now run the DeltaXML comparison
pc.compare(new File("old.xml"),
new File("new.xml"),
new File("out.xml"));
}
}
As you can see from the above program, setting up a pipeline using
PipelinedComparator is extremely easy and much simpler than if you
had to constrcut the pipeline yourself.
Note that in this example, we are importing the classes implementing the Java
XML Filters that we will use at the top of the listing. Thus we only need to
reference the name of the class (i.e. NormalizeSpace.class) rather
its fully qualified equivalent (i.e.
com.deltaxml.pipe.filters.NormalizeSpace) when creating a class
array.
An array of classes is just like any other object array in Java but holds
class objects (you may not have been aware of this facility in Java but it can
be quiet useful at times). In this case we pass the array of classes to either
the setInputFilters method or the setOutputFilters
method (depending on whether the array holds input or output filters). The
filters are then applied in the order that they are defined within the class
array. Thus the NormalizeSpace filter will be applied before the
WordByWordInfilter.
In the above example we are defining the array of classes before we need to use them, we have done that for clarity here, however we could equally have used the rather more concise array initialise format, for example:
PipelinedComparator pc= = new PipelinedComparator();
pc.setInputFilters(
new Class[] {NormalizeSpace.class, WordByWordInfilter.class});
We now need two XML files to compare to illustrate running this program. We will use the following two XML files:
Example 3.1. Initial old.xml file
<AddressList> <person> <name>John Smith</name> <street>10 Grays Inn Road</street> <city>London</city> <postcode>WC1X 8TX</postcode> </person> </AddressList>
And
Example 3.2. Modified new.xml file
<AddressList> <person> <name>John Smith</name> <street>12 Grays Inn Road</street> <city>London</city> <postcode>WC1X 8TX</postcode> </person> </AddressList>
As you can see the only difference between these two files is that the street number 10 has changed to 12. As we are using the Word-By-Word filters we will be able to identify this change from within the street elements PCDATA.
To execute this program we can issue the following command from the Windows command line:
java -cp deltaxml.jar;saxon.jar;xercesImpl.jar;. PipelinedComparatorTest1
This assumes that you have the three jars provided with the DeltaXML distribution in your current working directory, along with the files old.xml and new.xml. If you are on a Unix platform you will to modify this such that the separator used for the class path is ":"
The result of executing this program is presented below:
<?xml version="1.0" encoding="utf-8"?>
<AddressList
xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
deltaxml:delta="WFmodify">
<person deltaxml:delta="WFmodify">
<name deltaxml:delta="unchanged"/>
<street deltaxml:delta="WFmodify">
<deltaxml:PCDATAmodify>
<deltaxml:PCDATAold>
10
</deltaxml:PCDATAold>
<deltaxml:PCDATAnew>
12
</deltaxml:PCDATAnew>
</deltaxml:PCDATAmodify>
Grays Inn Road
</street>
<city deltaxml:delta="unchanged"/>
<postcode deltaxml:delta="unchanged"/>
</person>
</AddressList>
That is all there is to running Java XML Filters with DeltaXML. In the next section we will look at how we can achieve exactly the same result using XSL filters and then move onto using a mixture of Java and XSL filters.
3.3. Using XSL Filters
In this example, we will present the previous sections program but using XSL filters instead of Java XML Filters. This version of the program is presented below:
import java.io.File;
import java.io.FileNotFoundException;
import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
public class PipelinedComparatorTest2 {
public static void main(String[] args)
throws PipelinedComparatorException,
FileNotFoundException
{
PipelinedComparator pc = new PipelinedComparator();
// Set up the input filter
File [] inFilterFiles =
new File [] { new File("normalize-space.xsl"),
new File("word-by-word-infilter.xsl")};
pc.setInputFilters(inFilterFiles);
// Now setup the output filters
File [] outFilterFiles =
new File [] { new File("word-by-word-outfilter1.xsl"),
new File("word-by-word-outfilter2.xsl")};
pc.setOutputFilters(outFilterFiles);
// Now run the DeltaXML comparison
pc.compare(new File("old.xml"),
new File("new.xml"),
new File("out.xml"));
}
}
If you compare this program, with that presented in the last section you will
find they are very similar. The only difference is that instead of using an
array of classes, we are now using an array of files. This make sense as the XSL
filters are implemented in a number of XSL files which we need to pass to the
PipelinedComparator. The only other difference is that the main
method now throws the FileNotFoundException - as the files may not
be found at run time.
For completeness, when we run this XSL filter based program on the XML files old.xml and new.xml (presented in the last section) we obtained the following delta file:
<?xml version="1.0" encoding="utf-8"?>
<AddressList
xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
deltaxml:delta="WFmodify">
<person deltaxml:delta="WFmodify">
<name deltaxml:delta="unchanged"/>
<street deltaxml:delta="WFmodify">
<deltaxml:PCDATAmodify>
<deltaxml:PCDATAold>
10
</deltaxml:PCDATAold>
<deltaxml:PCDATAnew>
12
</deltaxml:PCDATAnew>
</deltaxml:PCDATAmodify>
Grays Inn Road
</street>
<city deltaxml:delta="unchanged"/>
<postcode deltaxml:delta="unchanged"/>
</person>
</AddressList>
If you compare this changes file with that presented in the last section you will find that they are exactly the same. Thus whether you use the Java or the XSL versions of the filters, you obtain the same result.
3.4. Mixing Java XML Filters and XSL filters
There is absolutely no reason at all why you must choose only Java or only XSL filters. You can mix the two together if you wish. That is, you can provide a mixed list of Java classes and XSL stylesheets in any order both for input and output purposes. This is illustrated in the following program:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
import com.deltaxml.pipe.filters.WordByWordInfilter;
import com.deltaxml.pipe.filters.WordByWordOutfilter1;
import com.deltaxml.pipe.filters.WordByWordOutfilter2;
public class PipelinedComparatorTest3 {
public static void main(String[] args)
throws PipelinedComparatorException,
FileNotFoundException {
PipelinedComparator pc= new PipelinedComparator();
// Set up the input filter
List inFilters = new ArrayList();
inFilters.add(new File("normalize-space.xsl"));
inFilters.add(WordByWordInfilter.class);
pc.setInputFilters(inFilters);
// Now setup the output filters
List outFilters= new ArrayList();
outFilters.add(WordByWordOutfilter1.class);
outFilters.add(WordByWordOutfilter2.class);
outFilters.add(new File("deltaxml-tables.xsl"));
pc.setOutputFilters(outFilters);
// Now run the comparison
pc.compare(new File("old.xml"),
new File("new.xml"),
new File("out.html"));
}
}
In the above program (PipelinedComparatorTest3) we have used two
input filters and three output filters. In each case one of the filters is
implemented as an XSL script and the others implemented as Java
XML Filters. However, DeltaXML does not need to be concerned with the
actual implementation, both approaches work as filters. Indeed both the
Normalize Space and the Word-By-Word filters are available as XSL files or as
Java XML Filters. You could try changing the type used and seeing the results
(they should be exactly the same).
Note that this time we have not used an array of classes or files, instead we
are using a List object (in fact an ArrayList). This
is because we have a mixed or heterogeneous list of object types that must be
passed to PipelinedComparator. Internally,
PipelinedComparator will handle the differences between the
different objects in the lists and configure the pipeline in the appropriate
manner.
The actual result of running this program is that the out.html
file is generated which is presented below:
|
||
| Delta file generated from heterogeneous filters |
Note that in general the Java XML filters are faster and have lower memory overheads and are thus often preferable to their XSL equivalents.
3.5. Parameterized Filters
It is often useful to write filters (whether they are Java filters or XSL
filters) that take parameters, so that the filter's behaviour can be changed
depending on the value of those parameters. The
com.deltaxml.core.ParameterizedFilter class is provided for this
purpose. It can be used with either XSL filters or Java filters with little
difference to the code used.
To create a ParameterizedFilter pass either a Class
object (for Java filters), or a File, Templates or
URL object (for XSL filters) to the constructor:
ParameterizedFilter filter1= new ParameterizedFilter(MyFilter.class);
or
ParameterizedFilter filter2= new ParameterizedFilter(new File("myFilter.xsl"));
Once created, a ParameterizedFilter object can then have
parameters assigned to it using the setStringParameter method. This
method takes two strings, one for the name of the parameter and one for it's
value. If the filter used to create the ParameterizedFilter is an
XSL filter, it must include an <xsl:param/> element in the appropriate
place with the name attribute having the same value as the name parameter to
setStringParameter. If the filter is a Java filter, it must declare
a method named setXXX where XXX is the same string as the name
parameter passed to setStringParameter. The method declared on the
Java filter must take a single string parameter.
filter1.setStringParameter("outputcomments", "true");
The above example will set the parameter 'outputcomments' to the value
'true'. To use the above with an XSL filter, it must contain the element
<xsl:param name="outputcomments"/>. A Java filter must
declare the method public void setoutputcomments(String value)
To use a ParameterizedFilter with the
PipelinedComparator, you must use the List version of
setInputFilters or setOutputFilters. Simply add the
ParameterizedFilter to the List before calling one of
these methods.
Using ParameterizedFilters can cause either the
FilterParameterizationException or the
FilterParameterizationNotSupportedException to be thrown. The first
of these may be thrown when using Java filters, the second when using XSL
filters. The exceptions are thrown by the setOutputFilters and
setInputFilters methods. For more information on why the exceptions
are thrown, see the API Javadoc.
Chapter 4. Java XML Filters
Those of you new to Java XML Filters should take a look at the tutorial we provide on writing such filters in "Guide to writing Java XML Filters for DeltaXML".
Writing Java XML Filters is not an area which is widely covered by some. References that might be of use include:
-
Java Lobby Forum http://www.javalobby.org/java/forums/t17133.html
-
Elliot Harolds', Processing XML with Java book extract, http://www.cafeconleche.org/books/xmljava/chapters/ch08.html
Chapter 5. XSL Filters
5.1. XSL Tutorials
Those new to XSL should undertake background study before using it with DeltaXML. There are many good books available, such as the XSLT Programmer's Reference by Michael Kay (author of Saxon). The world wide web offers a number of useful introductions, including those prepared by:
-
Miloslav Nic: http://www.zvon.org
-
Walsh and Grosso: http://nwalsh.com/docs/tutorials/xsl/
-
Roger Costello: http://www.xfront.com/xsl.html
-
W3 Schools: http://www.w3schools.com/xsl/
-
References: http://www.xslt.com/resources_tutorials.htm
Some XSL tutorials focus on presentation issues such as (X)HTML creation. Keep in mind that DeltaXML typically involves XML-to-XML transformations, though presentation issues also have a place. If you produce web pages, we recommend you prefer XHTML over HTML as your data storage format.
5.2. XSL Software System Configuration
The Extensible Stylesheet Language (XSL) specifies transformations that can be made to XML data. XSL is a declarative language expressed in XML syntax. Many XML software packages offer XSL engines, which are what actually implement the language. XSL transformations can produce virtually any type of output, e.g. formatted reports or web pages, not just XML (hence the question mark in the following diagram).
![]() |
| Data flow of the Extensible Stylesheet Language (XSL) |
At this point you may be wondering how XSL Transformations fit into the idea of filtering XML for using in DeltaXML pre and post processing tasks. The term filter is more appropriate here than transformation as the aim of these XSL scripts is to produce an XML document as output that shares more similarities than differences with the input XML document. This is exactly the definition of a filter presented at the start of this tutorial.
A water filter removes impurities, but the result is still water; boiling transforms water into steam, a substance altogether different. XSL has an equally broad capability. For example, XSL filters can strip spurious differences to isolate those of semantic importance, allowing DeltaXML to detect real differences unencumbered. Filters may augment as well as subtract; just as a water filter might add flavoring, an XSL filter might add markup to XML data. Fitting XSL into the overall picture, we obtain a DeltaXML processing chain:
![]() |
| Typical DeltaXML XSL Filter processing chain |
As this diagram shows, XSL operates on both input and output sides of DeltaXML; but XSL is not mandatory on either side. Technically, all that DeltaXML requires is well-formed XML. Nonetheless, this scenario is the most flexible way to use DeltaXML. In fact, more general configurations can involve multiple XSL filters.


