Loading login details...

How to Detect a Move

Table of Contents

  1. Handling moves
  2. Background
  3. How to identify moves by processing the delta file
  4. How to identify moves by pre-processing the data

1. Handling moves

This "how to" document discusses how you can detect a 'move' rather than a delete and an add.

In some situations, an element is moved from one place in an XML file to another. When comparing files where such a move has occurred, DeltaXML will not recognise the move as such, but will identify the change as a 'delete' (denoted by deltaxml:delta="delete") from the original position and an 'add' of an element in the new position.

Although DeltaXML does not identify a move directly, it does generate a delta file in XML and it is possible to process this delta file to match delete/add pairs and so identify moves. This is described below, but first it is worth considering why DeltaXML adopts this approach, and some of the inherent complexities of a 'move' operation.

2. Background

There are a number of reasons for DeltaXML adopting the approach of identifying additions and deletions but not moves.

The first reason is that, in most situations, the delete and add behaviour is what is required. This is almost invariably the case where the XML represents data rather than a text document.

Secondly, the DeltaXML delta file contains no pointers or Xpath expressions: all the elements have the same set of ancestors as in the original file(s). In order to represent a move, an Xpath expression or a pointer is needed. This adds a level of complexity and makes processing of the delta files more difficult. It was a design goal of the delta format to keep it as simple as possible and also that the delta should reflect the structure of the files being compared.

A third reason relates to the control complexity that is inherent in identifying moves. Here are some considerations:

This leads us to the conclusion that, for reliable move detection, each element should be identified so that moves could be detected accurately. In this case it is in fact quite possible to use DeltaXML and, by simply processing he delta file, to detect and mark these moves.

3. How to identify moves by processing the delta file

If you have, in your files, a reliable way to detect which elements are the same in the two files, then moves can be identified by processing the delta file. For example, if you have an ID attribute on elements, these can be used in this way. Let's look at an example where a set of work and home contacts are represented in an XML file.

<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"> 
 <work-contacts> 
  <contact ID="johnsmith" deltaxml:key="johnsmith"> 
   <name>John Smith</name> 
   <phone type="office">+44 200 1234 567</phone> 
   <phone type="fax">+44 200 1234 568</phone> 
   <phone type="mobile">+44 200 1234 569</phone> 
  </contact> 
  <contact ID="markjones" deltaxml:key="markjones"> 
   <name>Mark Jones</name> 
   <phone type="office">+44 200 1234 599</phone> 
   <phone type="fax">+44 200 1234 599</phone> 
   <phone type="mobile">+44 200 1234 500</phone> 
  </contact> 
 </work-contacts> 
 <home-contacts> 
 </home-contacts> 
</records>

And, in a modified version we have moved John Smith to be a home contact, and we have updated his phone number.

<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"> 
 <work-contacts> 
  <contact ID="markjones" deltaxml:key="markjones"> 
   <name>Mark Jones</name> 
   <phone type="office">+44 200 1234 599</phone> 
   <phone type="fax">+44 200 1234 599</phone> 
   <phone type="mobile">+44 200 1234 500</phone> 
  </contact> 
 </work-contacts> 
 <home-contacts> 
  <contact ID="johnsmith"  deltaxml:key="johnsmith"> 
   <name>John Smith</name> 
   <phone type="office">+44 200 1234 555</phone> 
   <phone type="fax">+44 200 1234 568</phone> 
   <phone type="mobile">+44 200 1234 569</phone> 
  </contact> 
 </home-contacts> 
</records>

When these are compared, we get the following delta file:

<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
    deltaxml:delta="WFmodify"> 
 <work-contacts deltaxml:delta="WFmodify"> 
  <contact deltaxml:delta="delete" ID="johnsmith" deltaxml:key="johnsmith"> 
   <name>John Smith</name> 
   <phone type="office">+44 200 1234 567</phone> 
   <phone type="fax">+44 200 1234 568</phone> 
   <phone type="mobile">+44 200 1234 569</phone> 
  </contact> 
  <contact deltaxml:delta="unchanged" deltaxml:key="markjones" /> 
 </work-contacts> 
 <home-contacts deltaxml:delta="WFmodify"> 
  <contact deltaxml:delta="add" ID="johnsmith" deltaxml:key="johnsmith"> 
   <name>John Smith</name> 
   <phone type="office">+44 200 1234 555</phone> 
   <phone type="fax">+44 200 1234 568</phone> 
   <phone type="mobile">+44 200 1234 569</phone> 
   </contact> 
  </home-contacts> 
</records>

The move is obvious by inspection here because we have an element with an ID="johnsmith" and a deltaxml:delta="delete" attribute on it, and another with the same ID attribute and a deltaxml:delta="add" attribute. By processing the delta file, these situations can be detected and the elements marked to indicate that they represent a move operation. Note that the phone number is not identified as changed, but the new number is present in the moved data, so the data is correct.

DeltaXML allows keys to control the comparison process and these would typically be useful for detecting moves, as shown in the use of keys in the above example.

4. How to identify moves by pre-processing the data

Another approach to this problem is to generate, from the original data, a file containing the elements that you know may have moved and that you wish to be compared.

Using the above example, we could filter the original files to generate these files. Here we have removed the <work-contacts> and <home-contacts> wrapper elements and just listed the <contact> elements. We have added a path attribute (or we could have used a sub-element) to record the original ancestors.

<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
    deltaxml:ordered="false"> 
 <contact ID="johnsmith" deltaxml:key="johnsmith" path="records/work-contacts/"> 
   <name>John Smith</name> 
   <phone type="office">+44 200 1234 567</phone> 
   <phone type="fax">+44 200 1234 568</phone> 
   <phone type="mobile">+44 200 1234 569</phone> 
  </contact> 
  <contact ID="markjones" deltaxml:key="markjones" path="records/work-contacts/"> 
   <name>Mark Jones</name> 
   <phone type="office">+44 200 1234 599</phone> 
   <phone type="fax">+44 200 1234 599</phone> 
   <phone type="mobile">+44 200 1234 500</phone> 
  </contact> 
</records>

And, in the modified version we have this:

<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
    deltaxml:ordered="false"> 
 <contact ID="markjones" deltaxml:key="markjones" path="records/work-contacts/"> 
   <name>Mark Jones</name> 
   <phone type="office">+44 200 1234 599</phone> 
   <phone type="fax">+44 200 1234 599</phone> 
   <phone type="mobile">+44 200 1234 500</phone> 
  </contact> 
 <contact ID="johnsmith"  deltaxml:key="johnsmith" path="records/home-contacts/"> 
   <name>John Smith</name> 
   <phone type="office">+44 200 1234 555</phone> 
   <phone type="fax">+44 200 1234 568</phone> 
   <phone type="mobile">+44 200 1234 569</phone> 
  </contact> 
</records>

Note that we have added deltaxml:ordered="false" to the root element because the <contact> elements could appear in any order. An alternative is to sort them by key and this may be easier to check. When these are compared we will see the changes to each contact in terms of its contents and its position in the original file.

<records xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"  
 deltaxml:delta="WFmodifyUnordered"> 
 <contact deltaxml:delta="WFmodify" deltaxml:key="johnsmith"  
  deltaxml:old-attributes="path=&quot;records/work-contacts/&quot;"  
  deltaxml:new-attributes="path=&quot;records/home-contacts/&quot;"> 
  <name deltaxml:delta="unchanged" /> 
  <phone deltaxml:delta="WFmodify"> 
  <deltaxml:PCDATAmodify> 
  <deltaxml:PCDATAold>+44 200 1234 567</deltaxml:PCDATAold> 
 <deltaxml:PCDATAnew>+44 200 1234 555</deltaxml:PCDATAnew> 
 </deltaxml:PCDATAmodify> 
 </phone> 
 <phone deltaxml:delta="unchanged" /> 
 <phone deltaxml:delta="unchanged" /> 
 </contact> 
 </records>

This delta provides details only of the changed contacts and those that have been moved. The moved are indicated by a change in the path attribute.

DeltaXML provides a versatile solution to handling additions and deletions and can be configured to show moves as indicated above.