Compare
Find differences between 2 XML files
Content Compare
Find differences within the text of 2 XML files
The first of these can be achieved by comparing the two files, and where there is data in one or both files we keep it – this means we get a union of the two files. But there may be some values that appear in both files and are different (a ‘conflict’), so we need to decide in this situation which one to keep – typically one of the files will be taken to be the ‘master’ to resolve this.
Let’s say the two files are A and B. The master will prevail whenever there is a conflict, i.e. we need to make a decision as to whether to choose the value from A or the value from B. Note that we are combining the information so nothing is deleted, the only conflict we have is where A and B have different values at some corresponding point.
We might like to know where such conflicts have occurred. If we compare A and B with A as the master we get a result, call it Am+B. In this result, A has prevailed in any conflict. We could get another result using B as the master, Bm+A, i.e. in this result B has prevailed in any conflict. By comparing these two results, i.e. Am+B compared with Bm+A, we can see where there has been an override by the master, and this might be useful for checking or deciding that in some cases we prefer a different override.
What about the second situation above, where we want to merge two edited files to combine the changes? We cannot achieve this by merging two files, unfortunately. The reason seems obvious if you have tried it, but not so obvious if you have not! Here is why.
Consider that we had two modified files
B:
{ "x": 4, "y": 7 }
C:
{ "x": 4, "y": 5, "z": 21 }
The value for x is the same, so that is no problem. What about the value for y? The values differ in B and C but we do not know which one is the latest or updated version. If we had knowledge of the original file, for example that the value was 7, then we would know that C had updated it. But the value could have been 6 in which case both B and C have updated it.
And what about z in B? It looks like B has added it, but we cannot be sure because it may have been present in the original file so in fact A has deleted it.
Merging changes is therefore not possible with only two files – we need to have the original file also. With all three files, we can perform a three-way merge and then apply some logic to the merge to work out what the result should look like. There may be conflicts between the changes, for example:
We need some way to handle these – we could decide that in all cases the change made by C is the master so we just pick that in the case of a conflict, or we could look at each conflict in turn and decide whether to choose B or C or neither of these, leaving it as it was.
There is another slightly different way of looking at this (and this reflects a situation that happens all too frequently). We have two similar files and we want to update them both in the same way. If we make the changes we want in one of the files, can we use merge to apply those changes to the other file? The answer is yes, and it is a merge very similar to the one described above but it is not quite the same. A name given to this particular type of merge in some source code control systems is ‘graft’, i.e. we graft the changes from one branch onto another branch. It can be a very useful thing to do because it saves repeating work manually (and any manual change needs to be checked and that in itself can be laborious and error-prone). So if we can do a graft automatically this can be a real time saver.
So how does graft differ from the three-way merge above? Let’s call the missing file, the original file, A. In the case of the three-way merge, A is often called the ancestor of B and C because it is the file from which both B and C have been derived by some editing process. For graft it is slightly different because A is the ancestor of only one of the two files, the other one may not be directly related to A though there must be some similarities.
So let us say the changes are in B and we want to apply them to C. In other words, any change between A and B needs to be applied to C. We can now argue that we do not have any conflicts as such, so we can apply the process automatically. What has happened to the conflicts identified above?
This makes graft potentially very useful for several different situations:
It is interesting to note that B and C could be completely different and the graft operation would ‘work’ but there would be no resulting changes to C. That is not useful in itself, but it does mean that there is no limit on how different B and C are – the graft process just becomes rather less useful the more that they differ.
Merging two files produces a union of the data content but we need to assign one of the two as the ‘master’ in case of conflict.
Merging the changes made in two files can only be done if we have access to the original file, otherwise we cannot identify what has changed, what has been added and what has been deleted.
A variant of the traditional three-way merge is the ‘graft’ process where we apply the changes made to one file to another similar file – a very useful process in many situations.