Loading login details...

Introduction to DeltaXML

Table of Contents

Chapter 1. Getting Started

1.1. Overview

DeltaXML is a sophisticated XML change control technology. This short tutorial will enable you to use this powerful toolset to solve a wide range of problems.

By working quickly through this tutorial, you'll become familiar with XML comparison terminology, and with features such as "full delta comparison", "orderless comparison" and "input/output filters" You may also want to work through the code examples and experiment for yourself as you read the tutorial.

Further information about DeltaXML is available at the DeltaXML site. If you'd like to see how DeltaXML can offer solutions you may not have considered, such as increased security and database synchronization, you may want to check our Use Cases pages, which present such challenges from various perspectives and discuss how DeltaXML can bring immediate benefits.

Navigating through the tutorial is easy:

When you've completed this tutorial, you'll have DeltaXML installed on your machine, and be ready to undertake configuration for any customizations you require. We hope you find using the DeltaXML Core API both productive and enjoyable.

1.2. Installing DeltaXML

In order to use the DeltaXML Core API you need to install a Java 2 SDK, 1.2.1 or later. We recommend that you use version 1.4.2 or above to ensure maximum efficiency during XML processing. For deployment you will just need a Java2 JVM (JRE). You can download versions from:

The DeltaXML Core API is distributed in a single ZIP file, containing:

Unzip the archive to your hard disk - we will refer to the directory it creates (such as "DeltaXMLCore-3_0_0") as the "DeltaXML Core installation directory", or just "the install dir". Since you'll be accessing DeltaXML Core from here, we suggest a standard location such as C:\javalibs for Windows and /usr/local or /opt for Unix.

As we provide Ant build files to compile and run all demonstration programs you will complete this tutorial faster if you have installed Ant on your machine. Ant can be downloaded from http://ant.apache.org/.

1.3. Checking your installation

To make sure you've installed the DeltaXML Core API correctly, follow these steps:

To test the compiled program using JDK 1.4 enter the following command:

java -classpath .;..\..\deltaxml.jar;..\..\saxon.jar;..\..\xercesImpl.jar SimpleFiles 

If everything has been installed correctly, this will create a DeltaXML delta file, out.xml.

With the DeltaXMLCore API installed and tested, you're now ready to test its remarkable capabilities.

1.3.1. A Note About Windows and Unix Paths

From the above you'll see that Windows uses "\" and ";", whereas the various Unix flavours use "/" and ":". (In fact you can use "/" with CLASSPATH on Windows too.) To assist brevity, we will from now on use Unix-style only. Note that Ant removes this inconsitency and uses the appropriate conventions for the current platform.

1.4. Tutorial navigation

Navigating through the tutorial is easy:

1.5. Acknowledgements

The success of XML is driven by the desire for open standards for simple and powerful technologies. While our core algorithms and implementation represent a considerable investment in R&D, we are only able to bring them to you in this form thanks to the work of many others. In particular we'd like to say "thank you" to:

This product includes software developed by the Apache Software Foundation (http://www.apache.org/) - though you can use any JAXP-compliant technology.

Chapter 2. Generating a Delta - Structured Diff

2.1. Changes Only

The out.xml file created using the SimpleFiles class set up to test the installation of the DeltaXML Core API is an example of a "changes-only" delta file.

A "changes only" comparison retains the structure of the original documents in the delta file, but only includes data that has changed. Markup in the deltaxml:namespace indicates additions, deletions and changes so that processing of the delta is very straightforward.

This is the simplest form of comparison and is the default. This delivers the fastest and smallest deltas - if you have access to either of the original documents and the "changes only" delta, you have everything you need for further processing.

See The DeltaXML delta format for details of the delta syntax used by DeltaXML.

2.2. Full Delta

A Full Delta comparison generates a delta with the same structure as the Changes Only format, but also including all data from the original documents. Since all data from both documents is included, marked up to show changes, this is the preferred format for visual display.

To select a Full Delta, set the isFullDelta feature to true - see Setting API features and properties.

An example of an HTML visualization of Full Delta comparison is generated by our DeltaWing - Building a Java UI demonstration.

2.3. Code sample - Simple.java

The source code for the Simple class, SimpleFiles.java, can be found in the samples directory. This code shows how a DeltaXML PipelinedComparator can be used to compare two XML files specified on the command line, returning the delta to a result file identified by the third argument

import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
import java.io.File;
import java.io.FileNotFoundException;
import javax.xml.transform.OutputKeys;

/**
 * Demonstrates the construction of a very simple
 * pipeline doing fileIO.
 */
public class SimpleFiles {

  public static void main(String[] args)
	  throws FileNotFoundException, PipelinedComparatorException
  {
    PipelinedComparator pc= new PipelinedComparator();
    pc.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    pc.compare(new File("a.xml"), new File("b.xml"), new File("out.xml"));
  }
}
      

Executing this class on two simple XML files will generate a delta file of the changes between the two files. For example, given the files

a.xml:

<root> 
  <a/> 
  <b/> 
  <c/> 
</root>

and b.xml:

<root> 
  <a/> 
  <x/><b/> 
  <c attr="hello world"/> 
</root>

Executed using the command:

java -classpath .;..\..\deltaxml.jar;..\..\saxon.jar;..\..\xercesImpl.jar SimpleFiles

We can generate the following delta:

<root xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
    deltaxml:delta="WFmodify"> 
<a deltaxml:delta="unchanged"/> 
  <x deltaxml:delta="add"/>
  <b deltaxml:delta="unchanged"/> 
  <c deltaxml:delta="WFmodify" deltaxml:new-attributes="attr=&#34;hello world&#34;"/> 
</root> 

This delta file indicates that the element <x> was added and that there was a new attribute for the element <c>.

Chapter 3. Applying a Delta - Structured Patch

3.1. Forward combination

Delta files can be combined with either of the originating files to recreate the other.

The forward combination option of the DeltaXML Core API is the XML equivalent of the Unix/GNU patch utility, which allows you to store just the first original file and the delta - the second, edited, file can be regenerated when required form this pair. This is the default operation of the DeltaXML Combiner.

3.2. Reverse combination

Reverse combination is symmetrical with Forward Combination and allows the delta to be combined with the second, edited, file to recreate the original file.

A reverse combination is selected by setting the isCombineForward feature to false - see Setting API features and properties.

3.3. Code sample - Applying delta patches

In this example an XMLCombiner is used to apply a delta to one original file to generate the other. As in the Simple example, this sample works on files specified as command-line arguments. The parameters passed to the combine method are:

Note that the generated file is "XML-identical" to the original except that, to improve readability, newlines are inserted within opening and closing tags - the visual appearance of the document will therefore differ from the original. DeltaXML also assigns a prefix to every element identified as belonging to a namespace, irrespective of whether it had one in the original file. This is necessary to handle merges of documents with different namespaces. You will therefore see "p0:" or a similar prefix on your element names if you were not using a prefix for one or more of the namespaces in your source file. Using XSL output filters these extra newlines and prefixes can easily be removed.

import com.deltaxml.api.DeltaXMLProcessingException; 
import com.deltaxml.api.XMLCombiner; 
import com.deltaxml.api.XMLCombinerConfigurationException; 
import com.deltaxml.api.XMLCombinerFactory; 
import javax.xml.transform.stream.StreamSource; 
import javax.xml.transform.stream.StreamResult; 
import java.io.File; 
import java.io.FileNotFoundException; 
import java.io.IOException; 
 
public class SimpleCombiner 
{ 
    public static void main(String[] args) 
        throws DeltaXMLProcessingException,  
               XMLCombinerConfigurationException,  
               FileNotFoundException, IOException 
    {   
     XMLCombiner combiner = XMLCombinerFactory.newInstance().newXMLCombiner(); 
     combiner.combine(new StreamSource(new File(args[0])), 
                      new StreamSource(new File(args[1])), 
                      new StreamResult(new File(args[2]))); 
    } 
} 

Chapter 4. Viewing the changes

4.1. Generating XHTML with XSL

XSLT is our preferred tool for transforming a delta file into a viewable form. This example demonstrates displaying a delta file as HTML using nested tables to represent the tree structure. This is an intuitive and compact display of changes suitable for viewing in a browser.

To illustrate this, the delta produced by the SimpleFiles example presented earlier can be post processed to produce the following HTML table:

PIC-html-changes-table.gif
Click to enlarge

To add such post processing to the SimpleFiles example, it is merely necessary to tell the PipelinedComparator class that it must now use an output XSL filter. This is done using the setOutputFilters method on the PipelinedComparator class. The modified class is presented below:

import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;

/**
 * Demonstrates the construction of a very simple
 * pipeline doing fileIO.
 */
public class SimpleFiles {

  public static void main(String[] args)
	  throws FileNotFoundException, PipelinedComparatorException
  {
    PipelinedComparator pc= new PipelinedComparator();
    List outFilters= new ArrayList(); 
    outFilters.add(new File("deltaxml-folding-html.xsl")); 
    pc.setOutputFilters(outFilters); 
    pc.compare(new File("a.xml"), new File("b.xml"), new File("out.html"));
  }
}
        

Note that the output file is now called "out.html".

As can be seen from this example, using XSLT filters to create input and output filters to the DeltaXML Comparator within Java pipelines provides a straightforward mehtodology for building sophisticated display systems for users, using (X)HTML, CSS and JavaScript and requiring only standard browser technology.

4.2. DeltaWing - Building a Java UI

DeltaWing is a sample application which allows users to select two XML documents, as local files or URLs, and generates a full delta file. Using XSLT the delta is transformed into HTML which is displayed within a basic browser window within the application.

Once input documents have been selected they are fed into a TrAX pipeline (see Building a pipeline) so that all subsequent processing is in-memory. The TrAX pipeline is constructed using XSL documents which are complied as part of the application. For this sample application the standard Swing HTML display control, which only supports HTML 3.2 and provides rudimentary functionality, has been used: For a commercial implementation, please contact us to discuss more advanced options.

The following screen dump illustrates DeltaWing being used to compare the two simple XML files.

PIC-deltawing.gif
Click to enlarge

The full source code for the DeltaWing application is included with the API distribution, which includes an Ant build script which allows the application to be compiled, assembled and run simply by changing to the deltawing directory and typing ant. The deltawingex.jar which is generated and automatically run provides a useful demonstration tool to illustrate the basics of XML comparison. To see the full delta display of the demonstration files you should select demo1.xml and demo2.xml from the samples subdirectory of your install dir.

Since the source for comparisons can be modified to include your own input and output filters you can use this tool to prototype transformations you require. For technical details of the application itself please refer to the source code. The (very simple) UI is constructed using NetBeans using only standard Swing components, to allow you to extend and develop it as you wish.

4.3. Advanced visual display options

Native-code display options require greater effort but provide richer functionality. More sophisticated tree views can be built - you may want to store the delta as a tree model (using dom4j or similar) and use it to populate a JTree for display. For 2-pane views a change navigator may assist users by providing a visual overview of changes in the two documents, perhaps using colour to highlight changes.

For some applications "interactive" change control may be appropriate. For example, using Next Change and Previous Change buttons may give effective navigation in some circumstances: using Accept Change and Reject Change buttons it is possible to give the user fine control over merge operations.

For examples of tools providing useful UIs for viewing changes, you may want to study the ideas proposed for the following text-based differencing tools:

You will need choose an appropriate metaphor to handle the tree structure of XML, which is not properly handled by any of the above tools. Nesting or a "hierarchical tree view" can provide a familiar interface.

Chapter 5. Working with DeltaXML

5.1. The DeltaXML delta format

Our patented delta format is one of our core technical strengths. By representing changes in a form as close as possible to that of the original document, and in a way that is easy to process, you can benefit from pipeline architectures that permit extensive customization. And since the same structure can handle both Full Delta and Changes Only deltas, you can switch between a compact form and a full-featured form without altering your post-processing code.

Technical details of the DeltaXML standard can be found in How DeltaXML Represents Changes to XML Files. The format is refreshingly simple in its structure, a simplicity which pays dividends in the more complex applications of DeltaXML. Unlike an XPath-based syntax such as XUpdate, the delta file remains human-readable, one of the key benefits of XML.

5.2. Managing white space

For many users of XML, white space is critical; for others, it is an irrelevance. For example, when working with XHTML white space can generally be ignored except within <pre> elements. This syntax is defined within the Schema/DTD for XHTML, and since DeltaXML (or, more properly, JAXP) uses this metadata during document loading, you will get the comparison results you expect whenever the input documents are metadata-controlled.

For many documents, however, no Schema/DTD is used. For these "well-formed" documents, another approach is required. Using XSLT - input and output filters described how to pre-process input documents - by applying a standard normalize-space.xsl stylesheet, multiple contiguous spaces/newlines are reduced to a single space, ensuring that white space only differences will be ignored. By modifying this stylesheet it is possible to specify that only white space meeting specific criteria - maybe excluding that within <pre> elements - should be normalized.

5.3. Namespaces

DeltaXML offers full support for XML Namespaces. This includes ignoring differences between namespace prefixes when the URIs are identical, and correctly processing namespace declarations on ancestor elements. Where namespace prefixes are used within attribute values, care should be taken that they are used consistently.

The generated delta declares the DeltaXML namespace for the root element:

xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"

DeltaXML also scans the input documents for namespace declarations, including prefixes, and uses these in the output delta.

On viewing a delta file, you will very probably see p0: namespace prefixes attached to all your elements. These are automatically assigned to elements in the default namespace and can safely be ignored, the semantics of your generated (recombined) document are identical to the original. Alternatively an output filter can be used to replace all refernces to this automatically assigned default namespace with references to the unnamed default namespace.

Further details of how DeltaXML handles namespaces are in the documentation for The DeltaXML delta format.

Chapter 6. Distributing your application

6.1. Requirements for distribution

The DeltaXML Core API redistributable product is contained entirely within deltaxml.jar. Re-distribution is subject to the terms of the licence agreement. You may also redistribute our XSL scripts, either modified or unmodified, with your application, either stand-alone or "burnt -in" - for example, using the "embedded strings" approach or using Apache XSLTC with translets.

You will also need to provide (or require) a JAXP-compliant parser and transformer. The Apache Xerces/Xalan combination supplied with the software may be redistributed under the terms of the Apache Software License.

Note that distribution of a product which is in competition with DeltaXML is not permitted, and the DeltaXML Core API and any add-ons must not be exposed in distributed products, for obvious reasons. Please check your licence and contact us if you are unsure.

Chapter 7. Troubleshooting

7.1. SAX errors - invalid XML

The most common errors when processing XML are during the initial document load - resulting from XML that is not well-formed. For these you will need to refer to the error codes for your parser - for Xerces, see http://xml.apache.org/xerces-j/. Commonly encountered errors are:

A validator such as that from STG or ElCel can help you track down these errors quickly.

The JTidy project provides tools for converting poorly formatted HTML to well-formed XHTML - you may find useful code here for handling non-compliant documents.

7.2. XSL errors

Please see the error documentation for your transformer. For Xalan, this is available online at http://xml.apache.org/xalan-j/apidocs/index.html under the heading XSLTErrorResources while Saxon users can find details of error recovery policies listed at http://saxon.sourceforge.net/saxon6.5.3/conformance.html#errorrecovery.

Things to check include:

7.3. DeltaXML error codes

DeltaXML in its default configuration is capable of handling any two well-formed XML documents which have the same root element type, and will not in these cases generate error codes. When recombining (applying a delta), especially when working with orderless data some exceptional conditions can arise - these are detailed at the online error messages page.

Chapter 8. Further Reading

8.1. DeltaXML Documentation

The primary documentation for the API is from the Javadocs accompanying the current release - in the apidocs directory, open file index.html in a browser. This documentation is also available as online Javadocs.

Support documentation is available from our list of User Manuals for various aspects of DeltaXML.

8.2. External Resources

XML comparison touches on many areas of activity - these are just a few of the resources we have found particularly useful/interesting:

8.3. Useful Books

Of the many books devoted to XML we have found the following particularly helpful.