Loading login details...

Tour of DeltaXML Technology

Table of Contents

Chapter 1. DeltaXML Technology Tour

This tour of the DeltaXML technology starts by introducing you to the DeltaXML Core differencing engine. It introduces what this tool is, what it can do and how it is used. It then briefly introduces the DeltaXML changes file format before concluding with an introduction to the DeltaXML Sync synchronization tool.

1.1. DeltaXML: XML differencing in XML

DeltaXML provides a powerful way to identify and process the differences between any two XML files that share the same root element. The DeltaXML approach is unique in that:

DeltaXML runs locally on your own hardware and allows you to quickly embed XML comparison functionality into your own systems.

Since DeltaXML represents changes in XML, standard XML technologies such as XSLT can easily be applied, allowing sophisticated information pipelines to be built from proven components.

1.2. DeltaXML functionality

DeltaXML can:

1.3. DeltaXML is XML aware

DeltaXML is a sophisticated XML-aware differencing engine. Normal text-comparison tools do not work well on XML data because they identify changes that are not relevant to XML (for example, XML attributes that appear in different orders). DeltaXML ignores changes that are apparent to a reader but not significant in an XML context, so DeltaXML will:

In addition, DeltaXML can handle orderless elemens and use keys to understand and ignore changes to element ordering for orderless elements.

These are all changes that will be reported when a textual comparison is made, but which you do not want to have reported when comparing XML files.

1.4. DeltaXML Core and Sync

There are two primary products delivered by DeltaXML. The first is the DeltaXML Core XML differencing engine. The second product is DeltaXML Sync. This is the tool used for synchronizing XML files.

1.5. Keys to DeltaXML's benefits

There are four keys to the benefits that DeltaXML offers. These are

  1. accuracy of the comparison,

  2. performance of the core compare,

  3. processability of the generated Delta file,

  4. extendability using industry standard architectures.

That is, the results produced by DeltaXML are extremely accurate (which is obviously the first criteria for any comparison product). In addition the performance of the comparison is fast. Finally, the resulting changes file is an XML file that allows easy processing of this file using standard XML tools (such as SAX and DOM parsers, as well as stand alone tools such as XMLSpy).

This benefit can be extremely important as it allows the results of the comparison to be easily and quickly integrated into your own systems. This may be done using your own custom software or may be handled via further SAX oriented XML processing.

Which leads us into the final benefit listed above. It is straightforward to extend DeltaXML using SAX/TrAX filters. These filters can perform pre and post processing activities to refine, revise or format the data to be compared / results generated. This make sit very easy to take the DeltaXML products and build you own custom solutions.

Chapter 2. DeltaXML Core

2.1 The DeltaXML Core differencing Engine

DeltaXML Core is the differencing engine from DeltaXML implemented in 100% pure Java. It can be used to identify what has changed between two versions of an XML file and record the changes (deltas) in a file that can be processed using general-purpose XML tools.

The DeltaXML Core engine can:

  1. detect differences between two XML files with the same root element,

  2. record the differences in XML,

  3. re-combine the delta file with either of the originals to generate the other original file.

This is illustrated in the following diagram:

PIC-deltaxml-compare.gif
Using DeltaXML to Compare two XML files

The preceding diagram illustrates the relationship between the files to be compared, the DeltaXML Core differencing engine, and the output delta file containing the changes between the two files.

It is also very simple to configure DeltaXML for your own application requirements. DeltaXML exploits standard SAX/TrAX processing schemes to make any required pre and post processing operations easy to integrate. To help with this DeltaXML provides the source code for a number of input and output filters, both in Java and as XSL files that carry out the most common operations. For example, DeltaXML provide filters to support a range of operations, such as comparing XHTML files, examining PCDATA word-by-word, generating change bars from DocBook documents and comparing Schema definitions.

2.2. An example DeltaXML based comparison

As an illustration of how DeltaXML compares to XML files consider the following two XML files:

old.xml

new.xml

<root>
    <a/>
    <b/>
    <c/>
<root/>
<root>
    <a/>
    <x/> 
    <b/>
    <c title="hello world" />
<root/>

These are two very simple XML files. They both have the element <root> as their root and contains the elements <a>, <b> and <c>. However in the new.xml file, we have added the element <x> and the attribute title to the element <c>.

We can now use DeltaXML to compare these two XML files and generate a delta file of their differences.

The following program illustrates how DeltaXML makes it very simple to do just this.

PipelinedComparator pc= new PipelinedComparator(); 
pc.compare(new File("a.xml"), new File("b.xml"), new File("out.xml")); 

This program makes use of the PipelinedComparator class which provides a simple to use interface to the underlying DeltaXML Core functionality. This program asks DeltaXML to compare the contents of the two files and to save the resulting delta into a file called out.xml. The results of running this program and subsequently processing the output into a sample HTML differneces report, for ease of presentation within a browser, are presented below:

PIC-changes-in-html.gif
Click to enlarge
Viewing Changes in HTML

Note that the output produced above is just a sample of what you can produce from a Delta File and is included as an illustration, it is not the default output of DeltaXML.

As can been seen from the above figure, the data that has been added in the new.xml file is illustrated by being drawn with a line underneath. This example, makes it easy to observe that that the elements <a> and <b> are unchanged, but that:

2.3. Embedding DeltaXML

Unlike many XML tools, DeltaXML is designed to be embedded within your own software and does not require you to use it as a separate stand alone tool. This is illustrated by the simple example illustrated earlier.

However, to illustrate how it might be used within a slightly larger application, the source for a simple Swing based application, called DeltaWing, is shipped with the DeltaXML Core distribution. This is a very simple Swing application that allows users to select two XML documents, as local files or URLs, and generates a full delta file.

PIC-deltawing.gif
Click to enlarge
DeltaWing

An example of using DeltaWing is presented above. In this example DeltaWing is being used to compare two XML files (a.xml and b.xml).

2.4. DeltaXML in JAXP Pipelines

Many software systems can be extended either by plugging in new components or by extending existing classes. In general, such extensions, require extensive knowledge of some proprietary framework developed by the software vendors concerned. In contrast, if you need to pre or post process your data before or after comparison by DeltaXML, you need only be familiar with the standard SAX XML pipeline methods (TrAX) to do this. That is, DeltaXML relies on standards based SAX/TrAX pipeline processing.

PIC-pipelines-in-deltaxml.gif
Click to enlarge
DeltaXML within a Processing Pipeline

The above figure illustrates how a pipeline of filters can be set up to pre process the XML files and post process the output for DeltaXML. This is an extremely powerful architecture whcih makes it very easy to build custom solutions around DeltaXML products.

To help in the creation of powerful custom solutions, DeltaXML is shipped with a number of standard input and output filters that can be used within your own systems. These filters are available in XSLT form and some are also available as Java XSL filters. In many cases the Java implementations are faster and have lower memory requirements than the XSL stylesheets and are therefore often preferable, however DeltaXML will work with either approach.

Building a pipeline around DeltaXML is simplicity itself using the PipelinedComparator. This class allows you to specify one or more input and output filters using the setInputFilters or setOutputFilters methods.

Once you have specified the input and putput filters that DeltaXML should use, you only need to call the compare method on the PipelinedComparator. This causes the data to be pulled through the input filters, compared by DeltaXML and pushed to the output filters.

The following simple program is used to illustrate how straightforward it is to code such a solution using DeltaXML. In this example two input filters, and two output filters, are used:

try {
  PipelinedComparator pc = new PipelinedComparator();
  //--------------------------------------------------------- 
  // Set up the input filter 
  Class[] inFilterClassses = new Class[] {NormalizeSpace.class, 
                                          WordInfilter.class }; 
  pc.setInputFilters(inFilterClassses); 
  
  //--------------------------------------------------------- 
  // Now setup the output filters 
  Class[] outFilterClasses = new Class[] {WordSpaceFixup.class, 
                                          WordOutfilter.class }; 
  pc.setOutputFilters(outFilterClasses); 
  
  //--------------------------------------------------------- 
  // Now run the DeltaXML comparison 
  pc.compare(new File("old.xml"),  
             new File("new.xml"),  
             new File("out.xml"));
} catch (PipelinedComparatorException pce) {
  System.out.println("An exception was caught: " + pce);
}
      

2.5. Constraints on the comparison

There are a number of constraints that should be taken into account when determining whether DeltaXML can compare two XML files. These are mostly XML related issues, rather than DeltaXML issues, but include.

  1. The XML to be compared must be well-formed XML.

  2. The two XML files must have root elements of the same type, i.e. local name and namespace.

  3. By default the order of the elements is considered to be significant.

  4. By default, each PCDATA item is treated as a single string and is not subdivided into words or characters.

  5. The DeltaXML Core, by default, ignores comments and processing instructions.

  6. All elements and attributes are used within the comparison by default.

All bar the first of these can be overcome in DeltaXML by applying appropriate input and output filters. For example, using the Word-By-Word input and outfilters it is possible to treat each individual word within PCDATA as independent. Other filters are provided that illustrate how elements can be treated as orderless, how comments and processing instructions can be handled and how elements or attributes can be ignored by the DeltaXML.

Chapter 3. The DeltaXML "Delta" Syntax

3.1. The Syntax

When DeltaXML compares two similar XML files, an 'old' file and a 'new' file, it generates a new well-formed XML file. This new file describes the changes between the old and the new file and is known as a delta file.

The DeltaXML delta file has the same overall structure as the files being compared, with a few additional attributes and elements. This makes it easy to understand as well as to process.

Special attributes and elements are introduced by DeltaXML, to represent the differences between the old and the new files. The DeltaXML XML namespace distinguishes these special elements and attributes from those found in the input files.

As an example, consider the out.xml delta file created earlier in this tour:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<root xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
      xmlns:dxx="http://www.deltaxml.com/ns/xml-namespaced-attribute"
      xmlns:dxa="http://www.deltaxml.com/ns/non-namespaced-attribute"
      deltaxml:deltaV2="A!=B"
      deltaxml:version="2.0"
      deltaxml:content-type="full-context">
   <a deltaxml:deltaV2="A=B"/>
   <x deltaxml:deltaV2="B"/>
   <b deltaxml:deltaV2="A=B"/>
   <c deltaxml:deltaV2="A!=B">
      <deltaxml:attributes deltaxml:deltaV2="B">
         <dxa:title deltaxml:deltaV2="B">
            <deltaxml:attributeValue deltaxml:deltaV2="B">hello world</deltaxml:attributeValue>
         </dxa:title>
      </deltaxml:attributes>
   </c>
</root>

If you look careful at this XML file you can see that the structure of the <a>, <b> and <c> elements has been retained. These elements however, have been annotated with deltaV2 attributes which indicate whether they were unchanged ('A=B') , added ('B') or in fact deleted ('A') between the two files. In the case of the element <c> this same approach is used to mark the attribute "title" as having been added. In all cases the new attributes and elements use the DeltaXML namespace to avoid any clashes with actual data.

3.2. Changes-only or changes+unchanged delta files

DeltaXML can generate a changes-only delta file (the default) or a "full context" delta file. A full delta file differs from a changes only delta in that the full delta includes unchanged data and thus provides a structured representation of two files within a single file where the common data is shared,. As such it forms an excellent basis for many subsequent processes, e.g. displaying changes to a document.

3.3. Ordered and orderless/keyed comparisons

DeltaXML supports the comparison of orderless elements. One excellent way of handling such a comparison is through the addition of unique keys (although this is optional). These keys are added to the orderless elements prior to the comparison. As an example, consider the following XML documents. These two documents contain a list of names and addresses. Each of the names and addresses is held in an addressList element in an orderless manner:

oldAddressList.xml

newAddressList.xml

<?xml version="1.0" encoding="UTF-8"?> 
<addressList deltaxml:ordered="false"> 
  <person customerid="15"> 
    <name>Joe Bloggs</name> 
    <email>jblogs@msn.com</email> 
  </person> 
  <person customerid="62"> 
    <name>Pete Smith</name> 
    <email>pxs@hotmail.com</email> 
  </person> 
</addressList>
<?xml version="1.0" encoding="UTF-8"?> 
<addressList deltaxml:ordered="false"> 
  <person customerid="62"> 
    <name>Pete Smith</name> 
    <email>petesmith12@msn.com</email> 
  </person> 
  <person customerid="15"> 
    <name>Joe Bloggs</name> 
    <email>jblogs@msn.com</email> 
  </person> 
</addressList>

In these XML files the customer Pete Smith is the last person in the first file but the first person in the second file. However, the order of the Person elements is not significant and thus this does not matter. However, Pete Smith's email address has changed between the old addressList and the new addressList This is the only change that we want to be notified about.

By specify that the addressList element is orderless, DeltaXML will ignore changes in the order of the elements within addressList and will instead focus on changes in the content of those elements.

PIC-changes-in-orderless-data.gif
Finding changes in orderless data

The result of comparing the two address lists, having specified that Person is an orderless element, is presented in the above screen dump. In this case we have post processed the Delta file to make it easy to look at using a web browser (however, this is just an example of what you can do with the delta file and is not the standard output mechanism).

Chapter 4. DeltaXML Sync for XML Synchronization

DeltaXML Sync is the DeltaXML product that provides for synchronization of XML files. By "Synchronization" we mean that edits made in two different XML files can be merged into a single XML file. This sort of activity is important in many applications and a familiar example is synchronizing changes made by two developers to one source file with the original base file. This is exactly the same activity but applied to the structured nature of XML files.

4.1. Synchronizing XML changes

There are many different situations where Synchronization of XML files can be useful. The most common example might be when two different users have updated the same XML file and both sets of changes must be kept. Managing such changes can be handled using an intelligent merge. There are two separate but related merging problems: the 2-way merge and the 3-way merge. The difference between these depends on whether there are two files to be merged or whether there is also a 'base' file from which the others are derived.

The 3-way merge (synchronization) has the potential to provide a more accurate solution where a base file exists, but it is more complex. For a synchronization, the basic requirements can be simply stated in an informal way: the merged document should contain the edits made between the base file and both the derived files. In all the use cases above, there is typically a base file and two variants.

PIC-deltaxml-sync.gif
Click to enlarge
Synchronization in DeltaXML Sync

The DeltaXML Sync product builds upon the DeltaXML Core engine to provide synchronization of multiple edits to a base file. The basic idea behind synchronization is illustrated in the figure above.

4.2. DeltaXML Sync

Using DeltaXML Sync you can add sophisticated synchronization behaviour to your own software systems with a minimum of difficulty. As an illustration of how straight forward it is to integrate DeltaXML Sync into your own programs, the following code snippet illustrates how you implement Synchronization using DeltaXML Sync:

Synchronizer syncer= SynchronizerFactory.newInstance().newSynchronizer(); 
syncer.sync(new StreamSource(new File(args[0])), 
            new StreamSource(new File(args[1])), 
            new StreamSource(new File(args[2])), 
            new StreamResult(new File(args[3])));

As you can see from this example, the synchronization functionality within DeltaXML Sync is extremely straight forward to use. As with the DeltaXML Core differencing engine, DeltaXML Sync can also be used with input and output filters to allow for pre and post processing of the XML files. In addition the Synchronizer accepts standard JAXP input / output parameters, including SAXSource and SAXResult objects.

More details on synchronization can be found in the on-line Synchronization Tutorial.

Chapter 5. Building Solutions using DeltaXML

5.1. Custom Solutions

You can build your own custom solutions around the core of the DeltaXML products. DeltaXML provides the high performance XML Differencing and Synchronization engines, while you build simple to construct processing pipelines around these engines. The filters you use in these processing chains employ standard frameworks (such as the Java XMLFilter interface and SAX events) and are as complex or simple as you need to make them.

5.2. Working with DeltaXML

The normal process of integrating DeltaXML into any application is one in which the developer focuses on configuring DeltaXML to solve their problem. This generally involves the following steps:

  1. Identification of pre and post filters.

  2. Selection of appropriate existing filters from the DeltaXML filters library.

  3. Development of appropriate own Java or XSLT filters.

  4. Configuration of DeltaXML using these filters.

  5. Configuration of DeltaXML and / or the under lying parsers' features or properties.

  6. Execution of DeltaXML within filter pipeline.

Note that this is usually an iterative process as the pipeline architecture adopted allows increasing levels of filters to be added as they are identified.

5.3. Sample Solutions

To illustrate the way in which custom solutions can be built around DeltaXML products, a number of examples are provided with the DeltaXML distributions. For example, DeltaXML ships solutions for:

As an illustration, the DocBook solution uses DeltaXML supplied input and output filters, plus the generally available changebars.xsl Stylesheet for producing a HTML version of a DocBook with change-bar information. The result is the following processing chain:

PIC-docbook-pipeline.gif
Click to enlarge
DocBook difference processing with DeltaXML

An example of the output generated is presented in a browser below:

PIC-docbook-changes.gif
Click to enlarge
Viewing changes in DocBook documents generated by DeltaXML

An interesting point to note about this processing chain, is that the changebars.xsl stylesheet is being used as a filter within the DeltaXML pipeline, but was written without reference to DeltaXML for general processing of DocBook files. We have merely taken it and used it as is within this pipeline. This illustrates the reusable nature of the filters used within a DeltaXML pipeline. That is, DeltaXML works with any standard XSL stylesheet or Java XML Filter using the SAX/TrAX mechanisms.

These and many other solutions are built upon the accuracy and performance of the DeltaXML products and the processability of the XML Delta files.

Chapter 6. Further Information

6.1. Demos and Downloads for DeltaXML Core

A number of online demos and trial services are available, these allow experimentation with DeltaXML Core using our server.

Time-limited trial versions of the DeltaXML Core SDK can be obtained using the SDK request form.

6.2. On-line Documentation

All our technical documentation is publically available for you to either download or access on-line.

Please see:

Tutorials, Reference documentation and FAQ's

Published Articles & Papers.