DeltaXML can be used to provide a 'diff and patch' capability for any XML document or data file. The diff would be generated by comparing two versions and producing a delta that contains only the differences, known as a changes-only delta. The patch capability is provided by the recombine operation.
This 'diff and patch' capability can be useful where updates to XML files need to be sent using minimum bandwidth, for example in mobile or satellite applications. Another application is in situations where it is necessary to keep a full audit trail of changes to an XML file using minimum storage space.
The DeltaXML DeltaV2 format comes in two main flavours: full context and changes-only.
The full context delta efficiently stores the entire contents of the input files. It can be used to create a difference report or a document with changes marked-up.
The changes-only delta format stores only what has changed and the necessary context for those changes. It is an ideal format for storing differences between different versions of XML documents.
DeltaXML Core comes bundled with a recombiner which can recreate Document ‘B’ with Document ‘A’ and the changes-only delta. The recombiner can also be used in reverse to recreate Document ‘A’ from Document ‘B’ and the changes-only delta. This process is heavily used at DeltaXML as part of our comprehensive roundtrip test suite (to ensure the comparator is 100% accurate).
More usefully, these tools can be used when there are many versions of a document. There are various strategies that can be used to achieve this.
The most obvious approach is storing the original version of the document, and the changes-only delta between the versions.
Any version can then easily be recreated by using the recombiner with the stored version and the relevant changes-only delta.
The big issue with this approach is that as the document gets more revised the deltas will get larger and larger, potentially to the point where the delta is effectively the entire document.
Another possible approach is to store the first and last versions, and changes-only deltas between each version and the last.
Any version can then be recreated by either forward recombining from version 1, or reverse recombining form the latest version. The issue here is that the more versions you have, the more recombinations it will potentially take to reconstruct a version.
The sensible solution here would be to store the first, latest and each
nth version (what
n is depends on your system’s use/data).
DeltaXML Core expects to be run in an environment where either its inputs or its output may have been processed by XSLT filters, which in particular conforms to the XSLT processing model's XDM tree model (as specified in XQuery 1.0 and XPath 2.0 Data Model). This model does not contain entries that correspond to all the XML node types, such as 'DOCTYPEs', 'entities', and 'CDATA sections and ignorable whitespace', which are removed, expanded and converted to text characters respectively as part of the XSLT parsing.
Our lexical preservation filters can preserve the four mentioned XML node types, by converting them to and from XML element nodes. These preservation filters also enable processing instructions and comments, which are otherwise ignored by our comparator technology, to be retained and compared in a similar manner. For further information please refer to the How to Preserve Doctype Information sample, the How to Preserve Processing Instructions and Comments sample, and the LexicalPreservation Java API documentation.
In order for the recombiner to work with filtered documents, its inputs need to be identical to those used to generate the delta. The sample ensures this by storing the marked-up version (i.e. after filtering with the LexicalPreservation filter), see graphic below (the inputs with '(Preserved)' are marked-up with the lexical preservation).
filter is only run as the last stage when retrieving a version from the
system, see below:
The sample, which is bundled with DeltaXML Core, is an implementation of the second approach discussed in the previous section. It is made up of three classes:
Versions and offers methods for management. Its constructor offers a quick way to disable the lexical preservation filtering (the default is for it to be enabled). The important methods are:
addVersion(File)- this adds the
Fileas a version of the document.
verifyDeltaForInput(File, File, File)- validates that the changes-only deltaV2 is valid and can be used to recreate either of the inputs, this is called by
addVersion(File). This throws a
InvalidChangesOnlyDeltaExceptionwhen the changes-only deltaV2 fails the validation.
retrieveVersionForwards(int, File)- this retrieves the requested version of the document using forward recombine, starting with the first version of the document.
retrieveVersionBackwards(int, File)- this retrieves the requested version of the document using reverse recombine, starting with the latest version.
runLexicalPreservationInfilter(File input)- this runs the lexical preservation input filter, and returns the marked-up version of the input.
runLexicalPreservationOutfilter(File input, File output)- this runs the lexical preservation output filter, and returns the non-marked-up version of the input (i.e. the expanded lexical content is back in its original form.
There are a few limitations with the sample implementaton, including:
The sample loads 5 versions (version 1, version 2, version 3, version 4 and version 5) of a document, generates change-only deltas between the versions and reconstructs the requested version (version 3 by default).
The sample is designed to be built and run via the Ant build technology. The provided build.xml script has two main targets
run- the default target which compiles and runs the sample code.
clean- returns the sample to its original state.
It can be run by issuing either the
ant or the
run commands. If you wish to override which version of the document is
included then you should run
ant -Dversion-to-retrieve=n (where
n is the version you wish to see retrieved).
Alternatively, the sample can be manually compiled and run using the following Java commands, asuming that both the Java compiler and runtime platforms are avaialble on the command-line.
javac -cp ../../deltaxml.jar:../../saxon9pe.jar DeltasForVersioning.java Version.java java -cp .:../../deltaxml.jar:../../saxon9pe.jar DeltasForVersioning 3
Note that you need to ensure that you use the correct directory and class path separators for your operating system.