In this very small example it makes no difference to the meaning if the address appears before or after the affiliation. How do we specify this in our schema? Unfortunately, XML Schema does allow us to do this directly. It allows complex types to contain ‘sequence’, ‘choice’ or ‘all’. A sequence or choice may occur more than once. Choice is just one of a list of elements and sequence is a defined order of elements, ‘all’ is unordered. It is tempting to think that ‘all’ is what we need here. But what ‘all’ says is that the elements can appear in the file in any order but it does not say that the order has no meaning. This might appear a subtle distinction, but it is not!
If the order of affiliation and address is not significant then there is a good technical argument to just pick an order and say in the schema they must appear in that order. This makes it clear and is easier for a reader of the data to have a defined order. There can then be no ambiguity, for example if the order is not specified then it is tempting to think that putting the address after the name is appropriate when the address refers to the person, and after the affiliation when it refers to the location of the affiliation.
So when the order is not important, just pick one, even though it is counter-intuitive to specify an order when the order really does not matter. However, the discussion at Balisage then went further because perhaps there are 25 or 50 elements where the order is not important and then if we pick an order it becomes much more difficult to edit the file to add one extra element in – you have to find the right place to add it even though it does not matter to the meaning! On the other hand perhaps it is easier to determine if a particular element is present if it has to be in a specified order.
There is an exception to this principle though: if we allowed multiple address elements then we could not use this technique to indicate that the order in which they appear is not significant. So the principle only works in some situations, i.e. when only one instance of a particular element is allowed.
So the problem remains, and it is a shame that it is not possible to change the default that order is significant in XML and JSON arrays.
A solution to normalise orderless XML and JSON data
How could this be fixed in some future update of the XML Schema language? Attempts to add to the syntax rules themselves is likely to result in quite lot of extra complexity. But there is an easier way to do this, because XML Schema already handles a similar situation for white space. By definition, white space is normalised by any reader of XML so multiple spaces have no more significance than one space – making it easy to pretty-print a file without changing its meaning, very convenient. But it is possible on any element to add an attribute xml:space=’preserve’ to indicate that the white space within this element is significant and must be preserved and must not be normalised.
So this suggests a simple solution for order also. We could have an attribute to indicate that the normal significance of order is overruled for a particular element, e.g. dx:ignore-order=’true’ . That would seem a simple solution to this problem. It might be good if this was in the xml namespace but that is restricted so this would need to be an agreed standard first, hence the use of dx as the namespace prefix here.
It would be remiss of me not to mention that in our XML and JSON comparison products, we do allow for orderless data! It is more challenging to find corresponding elements if they can appear in any order but it is very often really useful to be able to do this.
As with so many things, the devil is in the detail here, but the detail is important. Time to think about other types of order now, and order some refreshment!