Technical Specification

Overview

DeltaXML Core compares two well-formed XML files, an 'old' file and an 'updated' file, and generates an XML file describing the differences between the two files. The file representing the differences is known as a delta file.

The DeltaXML Core software provides a procedural interface that can be embedded in other Java-based or .NET-based software to compare elements and attributes of two well-formed XML 1.0 documents, A and B, and represent any differences in a well-formed, XML-encoded, delta file, D.

The DeltaXML Combine function provides the capability of re-combining D with A to generate B' such that when B and B' are compared using the DeltaXML Compare function no differences will be identified. Similarly D may be re-combined with B to generate A' such that when A and A' are compared no differences will be identified. Re-combination is not supported when one or more of the elements are specified as orderless.

Delta Files

A DeltaXML delta file normally represents just the set of differences between two files, and does not include any data that has not changed. The DeltaXML Compare function provides a feature that can be set to generate a 'full context delta' which includes unchanged data, including any unchanged elements and attributes. The 'full context delta' provides a structured representation of two files as a single file in which common data is shared.

A DeltaXML delta file has the same basic structure as the files that have been compared, with some additional attributes and elements. An XML namespace (the DeltaXML namespace) distinguishes these additional elements and attributes from those found in the input files.

Delta files cannot be compared unless the DeltaXML namespace in the delta files is changed before comparison.

XML Processing

DeltaXML sends each document to an XML parser prior to processing. If the document starts with a DOCTYPE declaration or a call to an XML Schema the parser will process the DTD or Schema and return a SAX stream with all the entities expanded, and any unspecified attributes added with default values. DeltaXML does not need to take into account the structure of a file as specified in a DTD or Schema file during processing except to ensure that any white space flagged as ignorable by the parser is ignored. White space is otherwise treated as significant. If it is not significant, it can be removed prior to processing using the white space normalization XSLT stylesheet supplied with the software.

Comments and processing instructions are not passed to the SAX output stream by the parser. If they are considered to be significant for the purposes of comparison, XSLT should be used to convert comments and processing instructions into elements that can be compared during parsing.

DeltaXML handles namespaces and will detect elements in the same namespace even if the namespace prefix values are different. An element or attribute in a namespace may have a different namespace prefix in the delta file from that used in the input file.

Document Comparison

DeltaXML compares the two XML files, taking account of the tree structure of the files and identifying corresponding elements in the two files. Corresponding elements will have the same element local name and namespace and will have corresponding parent elements. The root elements of the two files must have the same local name and namespace. DeltaXML determines the best fit at each level in the tree structure between the two files. The best fit algorithm determines the longest common subsequence of corresponding elements. The best fit gives precedence to elements that are exactly equal over those that have just the same element name and namespace.

DeltaXML can use key values, identified to the software using an attribute in the DeltaXML namespace, to identify corresponding elements in the two files. Elements with different keys in the two files will not be considered to correspond.

DeltaXML treats elements as ordered, i.e. a change in order is identified as a change. Optionally any element can be identified to DeltaXML as orderless, using an attribute in the DeltaXML namespace which must be present in both files. In this case the child elements may appear in any order in the two files and DeltaXML will match corresponding elements. Within an orderless element, a corresponding element is an element with the same name, namespace and key or an element that is exactly equal through its tree structure. In orderless comparison, any elements that do not exactly correspond will be added or deleted according to which file they appear in. Orderless elements must have element-only content.

DeltaXML ignores the order of attributes. Changes to attributes are represented using elements in the DeltaXML namespace.

Text Handling

PCDATA items are treated as a whole and are not subdivided into words or characters. XSLT filters may be used to modify the markup before the files are compared and thus provide a word-by-word comparison. The XML parser interprets CDATA sections and expands entity references prior to comparison within DeltaXML.

System Requirements

DeltaXML Core requires either:

  • A Java Standard Edition JRE version 5.0 or later. We test on: Oracle Solaris 10 (Intel Xeon), Mac OSX (10.6 or higher on Intel), Windows Server 2008 R2 and Windows 7 platforms. For support any reported problem should be reproducible on at least one of these platforms.
  • The Microsoft .NET Framework version 3.5 or later.

Patent granted 2001270901; EP1325432; 60134999.7; US8,196,135B2; CA 2416876; US 8,423,518 B2; EP2174238; 602008031420.0. Patents pending 1315520.5; 14275178.3; 14/474,377