Advanced DeltaXML Techniques
Table of Contents
- 1. Configuring DeltaXML
- 1.1. Setting API features and properties
- 1.2. Using XSLT - input and output filters
- 1.3. Building a pipeline
- 1.4. Custom comparisons - XHTML, Schema, word-by-word
- 2. Advanced Features
- 2.1. Orderless comparisons
- 2.2. Using keys for precision control
- 2.3. Using filters to automate keyed and orderless comparisons
- 3. Using DeltaXML as a Merge Tool
Chapter 1. Configuring DeltaXML
1.1. Setting API features and properties
The DeltaXML Core API follows the SAX standard of using features
and properties to select options during comparisons and recombinations.
Properties and Features are used to configure the XMLComparator
and XMLCombiner through the comparator.setProperty,
comparator.setFeature, combiner.setProperty and
combiner.setFeature methods as appropriate. The following features
and properties are available:
-
http://deltaxml.com/api/feature/isFullDelta - full delta or changes only
-
http://deltaxml.com/api/feature/isCombineForward - "direction" in which to apply delta (i.e. roll-back or roll-forward)
Full details of all options are included in the DeltaXML Documentation.
Example - setting the "full delta" feature
PipelinedComparator pc= new PipelinedComparator();
pc.setComparatorFeature("http://deltaxml.com/api/feature/isFullDelta", true);
pc.compare(new File(args[0]),
new File(args[1]),
new File(args[2]));
1.2. Using XSLT - input and output filters
Customizing DeltaXML to your requirements is most simply achieved by automatically pre-processing your documents - perhaps removing extraneous whitespace or marking some elements as orderless - and then post-processing the output into the format you require. To do this you use XSL input and output filters.
Typical uses of input filters include:
-
Normalize whitespace - contiguous whitespace PCDATA is replaced by a single space
-
Ignoring information in the comparison - for example, a date-stamp change may be irrelevant
-
Marking elements as having "orderless" contents
-
Adding "keys" to elements to control a comparison
Input and output filters can be combined with the DeltaXML Core differencing engine to create a processing pipeline. Conceptually, a pipeline is a sequence of processing elements each of which does something to the data it receives and then passes that data (possibly transformed in someway) onto the next element in the pipeline. As such it is a little like piping the output from one command into the input of another in Unix.
To make advanced use of the XML pipeline architecture requires a basic understanding of TrAX. The Apache Xalan TrAX pages provide an excellent overview of this standard as well as detailed working code, mostly generic code that will work with any TrAX implementation.
However, to simplify this process DeltaXML have provided the
PipelinedComparator class which takes a list of input and output
XSL filters. These filters are used internally to create the appropriate TrAX
pipeline structures and will then pre and post process the data. An example of
using a number of predefined input and output filters is presented in the
following section.
1.3. Building a pipeline
Using the PipelinedComparator class it is possible to create a
pipeline of input and output filters merely by specifying either the Java class
and / or XSLT file that defines the filters. For example, the following example
Java code instantiates a new PipelinedComparator and configures it
to use to filters. The first is an input filter defined by a Java class that
will normalize whitespace called NormalizeSpace and referenced
using the ".class" extension. The second is an output filter that
post processes the delta generated by the DeltaXML differencing engine to create
a HTML version of the differences.
PipelinedComparator pc= new PipelinedComparator();
List infilters= new ArrayList();
infilters.add(NormalizeSpace.class);
pc.setInputFilters(infilters);
List outFilters= new ArrayList();
outFilters.add(new File("deltaxml-tables.xsl"));
pc.setOutputFilters(outFilters);
pc.compare(new File("a.xml"),
new File("b.xml"),
new File("out.html"));
Two methods are used to set up the input and output filter chains in this pipeline. These methods are:
-
setInputFiltersThis method allows one or more input filters to be specified. This is an overloaded method which allows either XSL files, Java classes or a combination of both to be used to create the filters -
setOutputFiltersThis method allows one or more output filters to be specified. This is an overloaded method which allows either XSL files, Java classes or a combination of both to be used to create the filters.
Note that the source for Java NormalizeSpace and the XSLT source
for deltaxml-tables.xsl are provided in the DeltaXML Core
distribution.
1.4. Custom comparisons - XHTML, Schema, word-by-word
Using the pipeline approach it is possible to chain together transformations
on both input and output to give customized comparisons for particular document
types. For example, XHTML comparisons can ignore whitespace except inside
<pre> elements. For Schema, the contents of a <choice> element are
conceptually orderless - we flag them as such (adding a
deltaxml:ordered="false" attribute - see
Orderless comparisons) using an input filter. Many other
such optimizations have been included.
For textual comparisons, a "word-by-word" pipeline is available which identifies changes to individual words. This creates a Microsoft Word(TM) style markup of additions, changes and deletions within PCDATA. An example of adding worb-by-word pre and post prcoessing to the PipelinedCOmparator is presented below:
PipelinedComparator pc= new PipelinedComparator();
// Set up input filters
List infilters= new ArrayList();
infilters.add(NormalizeSpace.class);
infilters.add(WordByWordInfilter.class);
pc.setInputFilters(infilters);
// Set up output filters
List outFilters= new ArrayList();
outFilters.add(WordByWordOutfilter1.class);
outFilters.add(WordByWordOutfilter2.class);
outFilters.add(new File("deltaxml-tables.xsl"));
// Initial the pipeline processing
pc.compare(new File("a.xml"),
new File("b.xml"),
new File("out.html"));
This example also illustrates the ability to specify more than one input or output filter. Note that the filters are applied in the order in which they are defined in the lists that hold them.
If you'd like to process any of these types, or another generic document type such as SOAP or BizTalk, please contact us to discuss our currently available filters.
Chapter 2. Advanced Features
2.1. Orderless comparisons
When comparing two versions of a document, changes in ordering may be found which you need to ignore. For example, an external addressList feed may contain unsorted <person> elements, each of which contains sorted child elements. In this case, specify an orderless comparison:
<addressList deltaxml:ordered="false">
<person id="1">
...
</person>
<person id="2">
...
</person>
</addressList>
DeltaXML will report no differences between this document and one having the <person> elements the other way round.
An element which you specify as orderless should not contain any text data or whitespace, it should have only elements as children. Note also that the "ordered" property is not communicated to the child elements - so you can nest ordered elements within orderless within ordered, etc. For real-world problems this is far more useful than a global "ignore order when comparing these documents" switch offered by some products.
These attributes need not be added manually - see Using filters to automate keyed and orderless comparisons. For detailed instructions on making effective use of orderless comparisons, see our white paper on Key-assisted Comparisons.
2.2. Using keys for precision control
When comparing two lists, two essential operations are involved. The first is alignment - deciding which items should be compared to which, the second is comparison. This is made more complex since XML is a tree-structured language, but at each level the same operations must be applied.
DeltaXML allows fine control over the alignment phase through use of keys. For example, when comparing legal documents you need to be sure that corresponding paragraphs are always aligned - by specifying a unique key for each paragraph, paragraph additions, deletions and changes are correctly reported. The syntax for specifying a key is straightforward:
<para deltaxml:key="para1"> ... </para>
These keys need not be added manually - as we will show on the next page. Detailed instructions on using keys are available in our white paper on Key-assisted Comparisons .
2.3. Using filters to automate keyed and orderless comparisons
The deltaxml:key and deltaxml:ordered features
described previously allow great flexibility in managing comparisons to get the
results you want. An approach taken by many users is to modify the systems
producing these documents to add appropriate key/ordered attributes when the
documents are generated. For other users, the document format is fixed, and
cannot easily be modified - yet they still want to use these features.
By using an input filter to pre-process the incoming file, appropriate
attributes can be added without changing the input documents. For our previous
examples, we need an input filter which adds
deltaxml:ordered="false" to all addressList items, a
simple process. Adding keys typically requires a little more care. Some unique
property of the element being processed must be available - this may be an
attribute value, PCDATA contents of a child (or other descendant) element, a
child's attributes, or any combination of these.
Chapter 3. Using DeltaXML as a Merge Tool
3.1. Merging
DeltaXML offers three types of merging:
-
2-way merging - merge documents A and B using the Core API
-
3-way merging - synchronizing multiple edits to a base document using the DeltaXML Sync add-on to the Core API
-
Database synchronization - multiple concurrent updates to a replicated XML database that must be synchronized into a master database
A paper delivered by DeltaXML CTO Robin La Fontaine at XML Europe 2002 discusses the issues and details a practical solution - see Merging XML files (PDF).
If you wish to be informed of ongoing DeltaXML developments, please subscribe to our newsletter.