Loading login details...

DeltaXML Newsletter - May 2002

Contents

Independent assessment of XML comparison software hands prize to DeltaXML

Based on an analysis of 40,000 XML files found on the web, the French research group INRIA have produced a paper assessing the functionality and performance of various XML software comparison tools including DeltaXML, DOMMITT, IBM Treediff, Sun's DiffMK, MMDiff and their own xyDiff.

Their conclusions include this statement, "The experiments presented show (i) a significant quality advantage for minimal-based algorithms (DeltaXML, MMDiff) (ii) a dramatic performance improvement with linear complexity algorithms (DeltaXML, XyDiff, GNU Diff). DeltaXML seems the best choice because it runs extremely fast and its results are close to the minimum."

The paper provides a detailed analysis of both execution speed and quality of result for many thousands of real XML files up to 1Mb. Although DeltaXML can compare larger file sizes, these were not included in the analysis.

For a copy of the paper, "A comparative study for XML change detection", see ftp://ftp.inria.fr/INRIA/Projects/verso/VersoReport-221.pdf , or
http://osage.inria.fr/verso/PUBLI/all-bykey.php?mytexte=cobena

XML Europe 2002 in Barcelona - paper on Merging XML

In the Technical stream at XML Europe 2002 in Barcelona 21-23 May 2002, Robin La Fontaine of DeltaXML is presenting a paper "Merging XML files: a new approach providing intelligent merge of XML data sets". This paper shows how the delta file generated by DeltaXML can be used as the basis for a merge of two XML documents or data sets.

From the Abstract, "As XML becomes ubiquitous so the need for powerful tools to manipulate XML data becomes more pressing. Merging XML is particularly tricky, but often necessary to consolidate data feeds from heterogeneous systems, or to synchronize submissions of XML fragments which make up a larger document. An automated mechanism for defining and controlling such merges has been developed and is demonstrated to provide a consistent, adaptable and resilient solution to this problem. Integration into an information pipeline allows limitless customization.

This paper proposes a systematic approach to merging based on the use of an intermediate XML file that contains both of the files to be merged in a formal structure that clearly identifies data that is common to both files and data that is unique to one of the files. The advantage of this intermediate file is that many of the conflicts that typically emerge when XML data is merged can be identified and resolved. The resolution of these conflicts is a key to achieving a useful merge."

More details about the paper and presentation in the next newsletter.

Merging XML with DeltaXML

The basic requirements can be simply stated in an informal way: the merged document should contain everything from both the original documents, without duplication where there are overlaps. The difficulty is in defining where these overlaps occur and ensuring that these are handled correctly.

Most merge algorithms are based on a comparison algorithm to find the best fit between the documents being merged. DeltaXML can already perform this matching to execute a comparison, so it makes a good basis for a merge operation. Taking advantage of the 'full delta' output from DeltaXML, which contains both the changed and the unchanged data, a merge between two XML data sets can be performed quite simply.

One of the benefits of DeltaXML in this context is that it can make use of keys to establish a 'correct' merge, and it can be customised to specific needs using XSL.

The paper to be presented at XML Europe will provide an overview of merging XML, but if you would like to see more specifically how DeltaXML can be used for merging, there is a paper on the DeltaXML web site at

http://www.deltaxml.com/pdf/merging-xml-with-deltaxml.pdf

If you would like to know more about any of the topics in this newsletter we'd be happy to discuss them with you further.
Contact us at info@deltaxml.com