There are a couple of reasons to merge JSON files:
- To combine the information in two JSON files – simple JSON Data Merge
- Because two files have been updated and you want to combine the changes
Simple JSON Data Merge
The first of these can be achieved by comparing the two files, and where there is data in one or both files we keep it – this means we get a union of the two files. But there may be some values that appear in both files and are different (a ‘conflict’), so we need to decide in this situation which one to keep – typically one of the files will be taken to be the ‘master’ to resolve this.
Let’s say the two files are A and B. The master will prevail whenever there is a conflict, i.e. we need to make a decision as to whether to choose the value from A or the value from B. Note that we are combining the information so nothing is deleted, the only conflict we have is where A and B have different values at some corresponding point.
We might like to know where such conflicts have occurred. If we compare A and B with A as the master we get a result, call it Am+B. In this result, A has prevailed in any conflict. We could get another result using B as the master, Bm+A, i.e. in this result B has prevailed in any conflict. By comparing these two results, i.e. Am+B compared with Bm+A, we can see where there has been an override by the master, and this might be useful for checking or deciding that in some cases we prefer a different override.
Merging Changes between JSON files
What about the second situation above, where we want to merge two edited files to combine the changes? We cannot achieve this by merging two files, unfortunately. The reason seems obvious if you have tried it, but not so obvious if you have not! Here is why.
Consider that we had two modified files
The value for x is the same, so that is no problem. What about the value for y? The values differ in B and C but we do not know which one is the latest or updated version. If we had knowledge of the original file, for example that the value was 7, then we would know that C had updated it. But the value could have been 6 in which case both B and C have updated it.
And what about z in B? It looks like B has added it, but we cannot be sure because it may have been present in the original file so in fact A has deleted it.
Merging changes is therefore not possible with only two files – we need to have the original file also. With all three files, we can perform a three-way merge and then apply some logic to the merge to work out what the result should look like. There may be conflicts between the changes, for example:
- Both B and C have changed a value in a different way
- B has deleted a value but C has also modified that same value
- B and C have both added a new value but they are different
We need some way to handle these – we could decide that in all cases the change made by C is the master so we just pick that in the case of a conflict, or we could look at each conflict in turn and decide whether to choose B or C or neither of these, leaving it as it was.
There is another slightly different way of looking at this (and this reflects a situation that happens all too frequently). We have two similar files and we want to update them both in the same way. If we make the changes we want in one of the files, can we use merge to apply those changes to the other file? The answer is yes, and it is a merge very similar to the one described above but it is not quite the same. A name given to this particular type of merge in some source code control systems is ‘graft’, i.e. we graft the changes from one branch onto another branch. It can be a very useful thing to do because it saves repeating work manually (and any manual change needs to be checked and that in itself can be laborious and error-prone). So if we can do a graft automatically this can be a real time saver.
So how does graft differ from the three-way merge above? Let’s call the missing file, the original file, A. In the case of the three-way merge, A is often called the ancestor of B and C because it is the file from which both B and C have been derived by some editing process. For graft it is slightly different because A is the ancestor of only one of the two files, the other one may not be directly related to A though there must be some similarities.
So let us say the changes are in B and we want to apply them to C. In other words, any change between A and B needs to be applied to C. We can now argue that we do not have any conflicts as such, so we can apply the process automatically. What has happened to the conflicts identified above?
- Both B and C have changed a value in a different way: because there is a change between A and B we can apply this to C and the fact that the value in C was different does not matter.
- B has deleted a value but C has also modified that same value: again the same applies, we can apply the deletion made by B without worrying that the value in C differs.
- B and C have both added a new value but they are different: there is no concept of C having added a value, so we just apply the new added value to overwrite the value in C.
This makes graft potentially very useful for several different situations:
- If B is a subset of C then we can apply relevant changes to C.
- If B is a superset of C then we can apply relevant changes to C and ignore any that do not apply.
- If B and C are just similar, we can apply all relevant changes and ignore any that are not relevant.
It is interesting to note that B and C could be completely different and the graft operation would ‘work’ but there would be no resulting changes to C. That is not useful in itself, but it does mean that there is no limit on how different B and C are – the graft process just becomes rather less useful the more that they differ.
Merging two files produces a union of the data content but we need to assign one of the two as the ‘master’ in case of conflict.
Merging the changes made in two files can only be done if we have access to the original file, otherwise we cannot identify what has changed, what has been added and what has been deleted.
A variant of the traditional three-way merge is the ‘graft’ process where we apply the changes made to one file to another similar file – a very useful process in many situations.