Table of Contents
This tour of the DeltaXML technology starts by introducing you to the DeltaXML Core differencing engine. It introduces what this tool is, what it can do and how it is used. It then briefly introduces the DeltaXML changes file format before concluding with an introduction to the DeltaXML Sync synchronization tool.
DeltaXML provides a powerful way to identify and process the differences between any two XML files that share the same root element. The DeltaXML approach is unique in that:
the change file is recorded in a XML 'delta file'
the delta file has the same look and feel as the original files
the delta file can include changes only or changes plus unchanged data.
the delta file is easy to understand and to process because is an XML file.
the delta file can therefore be processed with standard XML tools.
DeltaXML runs locally on your own hardware and allows you to quickly embed XML comparison functionality into your own systems.
Since DeltaXML represents changes in XML, standard XML technologies such as XSLT can easily be applied, allowing sophisticated information pipelines to be built from proven components.
DeltaXML can:
find all the changes between any two XML files ('old' and 'new')
apply changes to convert an 'old' XML file into the 'new' version
undo changes to convert a 'new' XML file back into the 'old' version.
display change information in either XML or HTML form, using a standard web browser
report changes only or changes+unchanged data
check text for differences on a word-for-word basis
synchronize parallel sets of edits to a base file
use XSLT input and output filters to pre and post process the XML data
handle large files without performance degradation.
DeltaXML is a sophisticated XML-aware differencing engine. Normal text-comparison tools do not work well on XML data because they identify changes that are not relevant to XML (for example, XML attributes that appear in different orders). DeltaXML ignores changes that are apparent to a reader but not significant in an XML context, so DeltaXML will:
understand and ignore changes in the order of XML attributes,
understand and ignore changes to the end tags of empty elements,
understand and ignore changes to namespace prefixes,
In addition, DeltaXML can handle orderless elemens and use keys to understand and ignore changes to element ordering for orderless elements.
These are all changes that will be reported when a textual comparison is made, but which you do not want to have reported when comparing XML files.
There are two primary products delivered by DeltaXML. The first is the DeltaXML Core XML differencing engine. The second product is DeltaXML Sync. This is the tool used for synchronizing XML files.
There are four keys to the benefits that DeltaXML offers. These are
accuracy of the comparison,
performance of the core compare,
processability of the generated Delta file,
extendability using industry standard architectures.
That is, the results produced by DeltaXML are extremely accurate (which is obviously the first criteria for any comparison product). In addition the performance of the comparison is fast. Finally, the resulting changes file is an XML file that allows easy processing of this file using standard XML tools (such as SAX and DOM parsers, as well as stand alone tools such as XMLSpy).
This benefit can be extremely important as it allows the results of the comparison to be easily and quickly integrated into your own systems. This may be done using your own custom software or may be handled via further SAX oriented XML processing.
Which leads us into the final benefit listed above. It is straightforward to extend DeltaXML using SAX/TrAX filters. These filters can perform pre and post processing activities to refine, revise or format the data to be compared / results generated. This make sit very easy to take the DeltaXML products and build you own custom solutions.
DeltaXML Core is the differencing engine from DeltaXML implemented in 100% pure Java. It can be used to identify what has changed between two versions of an XML file and record the changes (deltas) in a file that can be processed using general-purpose XML tools.
The DeltaXML Core engine can:
detect differences between two XML files with the same root element,
record the differences in XML,
re-combine the delta file with either of the originals to generate the other original file.
This is illustrated in the following diagram:

The preceding diagram illustrates the relationship between the files to be compared, the DeltaXML Core differencing engine, and the output delta file containing the changes between the two files.
It is also very simple to configure DeltaXML for your own application requirements. DeltaXML exploits standard SAX/TrAX processing schemes to make any required pre and post processing operations easy to integrate. To help with this DeltaXML provides the source code for a number of input and output filters, both in Java and as XSL files that carry out the most common operations. For example, DeltaXML provide filters to support a range of operations, such as comparing XHTML files, examining PCDATA word-by-word, generating change bars from DocBook documents and comparing Schema definitions.
As an illustration of how DeltaXML compares to XML files consider the following two XML files:
old.xml | new.xml |
|---|---|
<root>
<a/>
<b/>
<c/>
<root/> | <root>
<a/>
<x/>
<b/>
<c title="hello world" />
<root/> |
These are two very simple XML files. They both have the element <root>
as their root and contains the elements <a>, <b> and <c>.
However in the new.xml file, we have added the element <x> and the
attribute title to the element <c>.
We can now use DeltaXML to compare these two XML files and generate a delta file of their differences.
The following program illustrates how DeltaXML makes it very simple to do just this.
PipelinedComparator pc= new PipelinedComparator();
pc.compare(new File("a.xml"), new File("b.xml"), new File("out.xml")); This program makes use of the PipelinedComparator class which
provides a simple to use interface to the underlying DeltaXML Core
functionality. This program asks DeltaXML to compare the contents of the two
files and to save the resulting delta into a file called out.xml. The results of
running this program and subsequently processing the output into a sample HTML
differneces report, for ease of presentation within a browser, are presented
below:

Note that the output produced above is just a sample of what you can produce from a Delta File and is included as an illustration, it is not the default output of DeltaXML.
As can been seen from the above figure, the data that has been added in the
new.xml file is illustrated by being drawn with a line underneath.
This example, makes it easy to observe that that the elements
<a> and <b> are unchanged, but that:
the element <x> has been "added"
the element <c> has had a new attribute "title" included
with the value "hello world".
Unlike many XML tools, DeltaXML is designed to be embedded within your own software and does not require you to use it as a separate stand alone tool. This is illustrated by the simple example illustrated earlier.
However, to illustrate how it might be used within a slightly larger
application, the source for a simple Swing based application, called
DeltaWing, is shipped with the DeltaXML Core distribution. This is
a very simple Swing application that allows users to select two XML documents,
as local files or URLs, and generates a full delta file.

An example of using DeltaWing is presented above. In this example DeltaWing is being used to compare two XML files (a.xml and b.xml).
Many software systems can be extended either by plugging in new components or by extending existing classes. In general, such extensions, require extensive knowledge of some proprietary framework developed by the software vendors concerned. In contrast, if you need to pre or post process your data before or after comparison by DeltaXML, you need only be familiar with the standard SAX XML pipeline methods (TrAX) to do this. That is, DeltaXML relies on standards based SAX/TrAX pipeline processing.

The above figure illustrates how a pipeline of filters can be set up to pre process the XML files and post process the output for DeltaXML. This is an extremely powerful architecture whcih makes it very easy to build custom solutions around DeltaXML products.
To help in the creation of powerful custom solutions, DeltaXML is shipped with a number of standard input and output filters that can be used within your own systems. These filters are available in XSLT form and some are also available as Java XSL filters. In many cases the Java implementations are faster and have lower memory requirements than the XSL stylesheets and are therefore often preferable, however DeltaXML will work with either approach.
Building a pipeline around DeltaXML is simplicity itself using the
PipelinedComparator. This class allows you to specify one or more
input and output filters using the setInputFilters or
setOutputFilters methods.
Once you have specified the input and putput filters that DeltaXML should
use, you only need to call the compare method on the
PipelinedComparator. This causes the data to be pulled through the
input filters, compared by DeltaXML and pushed to the output filters.
The following simple program is used to illustrate how straightforward it is to code such a solution using DeltaXML. In this example two input filters, and two output filters, are used:
try {
PipelinedComparator pc = new PipelinedComparator();
//---------------------------------------------------------
// Set up the input filter
Class[] inFilterClassses = new Class[] {NormalizeSpace.class,
WordInfilter.class };
pc.setInputFilters(inFilterClassses);
//---------------------------------------------------------
// Now setup the output filters
Class[] outFilterClasses = new Class[] {WordSpaceFixup.class,
WordOutfilter.class };
pc.setOutputFilters(outFilterClasses);
//---------------------------------------------------------
// Now run the DeltaXML comparison
pc.compare(new File("old.xml"),
new File("new.xml"),
new File("out.xml"));
} catch (PipelinedComparatorException pce) {
System.out.println("An exception was caught: " + pce);
}
There are a number of constraints that should be taken into account when determining whether DeltaXML can compare two XML files. These are mostly XML related issues, rather than DeltaXML issues, but include.
The XML to be compared must be well-formed XML.
The two XML files must have root elements of the same type, i.e. local name and namespace.
By default the order of the elements is considered to be significant.
By default, each PCDATA item is treated as a single string and is not subdivided into words or characters.
The DeltaXML Core, by default, ignores comments and processing instructions.
All elements and attributes are used within the comparison by default.
All bar the first of these can be overcome in DeltaXML by applying appropriate input and output filters. For example, using the Word-By-Word input and outfilters it is possible to treat each individual word within PCDATA as independent. Other filters are provided that illustrate how elements can be treated as orderless, how comments and processing instructions can be handled and how elements or attributes can be ignored by the DeltaXML.
When DeltaXML compares two similar XML files, an 'old' file and a 'new' file, it generates a new well-formed XML file. This new file describes the changes between the old and the new file and is known as a delta file.
The DeltaXML delta file has the same overall structure as the files being compared, with a few additional attributes and elements. This makes it easy to understand as well as to process.
Special attributes and elements are introduced by DeltaXML, to represent the differences between the old and the new files. The DeltaXML XML namespace distinguishes these special elements and attributes from those found in the input files.
As an example, consider the out.xml delta file created earlier in this tour:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
xmlns:dxx="http://www.deltaxml.com/ns/xml-namespaced-attribute"
xmlns:dxa="http://www.deltaxml.com/ns/non-namespaced-attribute"
deltaxml:deltaV2="A!=B"
deltaxml:version="2.0"
deltaxml:content-type="full-context">
<a deltaxml:deltaV2="A=B"/>
<x deltaxml:deltaV2="B"/>
<b deltaxml:deltaV2="A=B"/>
<c deltaxml:deltaV2="A!=B">
<deltaxml:attributes deltaxml:deltaV2="B">
<dxa:title deltaxml:deltaV2="B">
<deltaxml:attributeValue deltaxml:deltaV2="B">hello world</deltaxml:attributeValue>
</dxa:title>
</deltaxml:attributes>
</c>
</root>If you look careful at this XML file you can see that the structure of the
<a>, <b> and <c>
elements has been retained. These elements however, have been annotated with
deltaV2 attributes which indicate whether they were unchanged
('A=B') , added ('B') or in fact deleted
('A') between the two files. In the case of the element
<c> this same approach is used to mark the attribute "title"
as having been added. In all cases the new attributes and elements use the
DeltaXML namespace to avoid any clashes with actual data.
DeltaXML can generate a changes-only delta file (the default) or a "full context" delta file. A full delta file differs from a changes only delta in that the full delta includes unchanged data and thus provides a structured representation of two files within a single file where the common data is shared,. As such it forms an excellent basis for many subsequent processes, e.g. displaying changes to a document.
DeltaXML supports the comparison of orderless elements. One excellent way of
handling such a comparison is through the addition of unique keys (although this
is optional). These keys are added to the orderless elements prior to the
comparison. As an example, consider the following XML documents. These two
documents contain a list of names and addresses. Each of the names and addresses
is held in an addressList element in an orderless manner:
oldAddressList.xml | newAddressList.xml |
|---|---|
<?xml version="1.0" encoding="UTF-8"?>
<addressList deltaxml:ordered="false">
<person customerid="15">
<name>Joe Bloggs</name>
<email>jblogs@msn.com</email>
</person>
<person customerid="62">
<name>Pete Smith</name>
<email>pxs@hotmail.com</email>
</person>
</addressList> | <?xml version="1.0" encoding="UTF-8"?>
<addressList deltaxml:ordered="false">
<person customerid="62">
<name>Pete Smith</name>
<email>petesmith12@msn.com</email>
</person>
<person customerid="15">
<name>Joe Bloggs</name>
<email>jblogs@msn.com</email>
</person>
</addressList> |
In these XML files the customer Pete Smith is the last person in the
first file but the first person in the second file. However, the order of the
Person elements is not significant and thus this does not matter.
However, Pete Smith's email address has changed between the old
addressList and the new addressList This is the only
change that we want to be notified about.
By specify that the addressList element is orderless, DeltaXML
will ignore changes in the order of the elements within addressList and will
instead focus on changes in the content of those elements.

The result of comparing the two address lists, having specified that
Person is an orderless element, is presented in the above screen
dump. In this case we have post processed the Delta file to make it easy to look
at using a web browser (however, this is just an example of what you can do with
the delta file and is not the standard output mechanism).
DeltaXML Sync is the DeltaXML product that provides for synchronization of XML files. By "Synchronization" we mean that edits made in two different XML files can be merged into a single XML file. This sort of activity is important in many applications and a familiar example is synchronizing changes made by two developers to one source file with the original base file. This is exactly the same activity but applied to the structured nature of XML files.
There are many different situations where Synchronization of XML files can be useful. The most common example might be when two different users have updated the same XML file and both sets of changes must be kept. Managing such changes can be handled using an intelligent merge. There are two separate but related merging problems: the 2-way merge and the 3-way merge. The difference between these depends on whether there are two files to be merged or whether there is also a 'base' file from which the others are derived.
The 3-way merge (synchronization) has the potential to provide a more accurate solution where a base file exists, but it is more complex. For a synchronization, the basic requirements can be simply stated in an informal way: the merged document should contain the edits made between the base file and both the derived files. In all the use cases above, there is typically a base file and two variants.

The DeltaXML Sync product builds upon the DeltaXML Core engine to provide synchronization of multiple edits to a base file. The basic idea behind synchronization is illustrated in the figure above.
Using DeltaXML Sync you can add sophisticated synchronization behaviour to your own software systems with a minimum of difficulty. As an illustration of how straight forward it is to integrate DeltaXML Sync into your own programs, the following code snippet illustrates how you implement Synchronization using DeltaXML Sync:
Synchronizer syncer= SynchronizerFactory.newInstance().newSynchronizer();
syncer.sync(new StreamSource(new File(args[0])),
new StreamSource(new File(args[1])),
new StreamSource(new File(args[2])),
new StreamResult(new File(args[3])));As you can see from this example, the synchronization functionality within
DeltaXML Sync is extremely straight forward to use. As with the DeltaXML Core
differencing engine, DeltaXML Sync can also be used with input and output
filters to allow for pre and post processing of the XML files. In addition the
Synchronizer accepts standard JAXP input / output parameters, including
SAXSource and SAXResult objects.
More details on synchronization can be found in the on-line Synchronization Tutorial.
You can build your own custom solutions around the core of the DeltaXML
products. DeltaXML provides the high performance XML Differencing and
Synchronization engines, while you build simple to construct processing
pipelines around these engines. The filters you use in these processing chains
employ standard frameworks (such as the Java XMLFilter interface
and SAX events) and are as complex or simple as you need to make them.
The normal process of integrating DeltaXML into any application is one in which the developer focuses on configuring DeltaXML to solve their problem. This generally involves the following steps:
Identification of pre and post filters.
Selection of appropriate existing filters from the DeltaXML filters library.
Development of appropriate own Java or XSLT filters.
Configuration of DeltaXML using these filters.
Configuration of DeltaXML and / or the under lying parsers' features or properties.
Execution of DeltaXML within filter pipeline.
Note that this is usually an iterative process as the pipeline architecture adopted allows increasing levels of filters to be added as they are identified.
To illustrate the way in which custom solutions can be built around DeltaXML products, a number of examples are provided with the DeltaXML distributions. For example, DeltaXML ships solutions for:
comparing XHTML files,
comparing XML Schemas,
comparing text within XML documents on a word-by-word basis,
comparing DocBook documents and producing DocBook output that represents the changes.
As an illustration, the DocBook solution uses DeltaXML supplied input and output filters, plus the generally available changebars.xsl Stylesheet for producing a HTML version of a DocBook with change-bar information. The result is the following processing chain:

An example of the output generated is presented in a browser below:

An interesting point to note about this processing chain, is that the changebars.xsl stylesheet is being used as a filter within the DeltaXML pipeline, but was written without reference to DeltaXML for general processing of DocBook files. We have merely taken it and used it as is within this pipeline. This illustrates the reusable nature of the filters used within a DeltaXML pipeline. That is, DeltaXML works with any standard XSL stylesheet or Java XML Filter using the SAX/TrAX mechanisms.
These and many other solutions are built upon the accuracy and performance of the DeltaXML products and the processability of the XML Delta files.
A number of online demos and trial services are available, these allow experimentation with DeltaXML Core using our server.
Time-limited trial versions of the DeltaXML Core SDK can be obtained using the SDK request form.
All our technical documentation is publically available for you to either download or access on-line.
Please see: