Loading login details...

Introduction to DeltaXML Delta Format Version 2: deltaV2

This document provides an introduction to the new DeltaXML delta format (referred to as deltaV2) for representing changes between two XML documents. It is intended primarily for those familiar with the existing delta format (referred to here as deltaV1) to show how this has been improved. This document does not describe either the old or the new delta format in detail.

Background

The current DeltaXML delta format was designed in 2000 and has been 100% stable since then. We are pleased to introduce a new delta format which builds on this and improves it, in particular:

There are some particular areas in which the new format will prove itself. These include:

Additionally, deltaV2 preserves some of the unique features and benefits of the original delta format:

Initially the DeltaXML Core product will adopt this new format in a new 5.0 release. At a later date the DeltaXML Sync product will also adopt this new delta format.

We will look at some of these areas in more detail.

Simpler with fewer elements and attributes

DeltaV1 had six elements and three attributes:

DeltaV2 has four elements and one attribute, apart from two additional attributes on the root element:

This has the advantage that less code needs to be written to process delta data. Note also that since the new format caters for three or more documents as well as the basic two, there is even less that needs to be learned in order to process changes.

The delta attribute is similar, and the correspondence between the old and new formats is as follows:

deltaV1

deltaV2

Comment

deltaxml:delta='delete'

deltaxml:deltaV2='B'

The element appears in the 'new' document or 'B' document only.

deltaxml:delta='add'

deltaxml:deltaV2='A'

The element appears in the 'old' document or 'A' document only.

deltaxml:delta='unchanged'

deltaxml:deltaV2='A=B'

The element appears in both documents and is equal.

deltaxml:delta='WFmodify'

deltaxml:deltaV2='A!=B'

The element appears in  both documents and is different in each, i.e. not equal.

deltaxml:delta= 'WFmodifyUnordered'

deltaxml:deltaV2='A!=B' with deltaxml:ordered='false'

The element appears in  both documents and is different in each, i.e. not equal.

Attributes easier to process

One of the biggest changes is in the way attribute values are handled. DeltaV1 was compact in the way that it handled attribute values but quite difficult to process, and could not be extended to more than two documents.

In deltaV1, changed attributes were encoded within the two delta attributes @deltaxml:new-attributes and @deltaxml:old-attributes. This meant that to process the attribute values they needed to be extracted. Also, because the old and new values were separated in these two attributes, it was often necessary to do set operations to determine whether an attribute was added, deleted or modified.

In deltaV2, attributes are handled within markup and processing is therefore very much easier. Unchanged attributes are handled as before: they remain unchanged as attributes.

Consider this small example to see how this works, where attribute a1 is unchanged, a2 is added, a3 is deleted and a4 is modified.

Document A (old):

    <p a1="value1" a3="value3" a4="value4"/>

In deltaV1 this would be represented as:

    <p deltaxml:delta="WFmodify" a1="value1" 
       deltaxml:old-attributes="a3='value3' a4='value4'" 
       deltaxml:new-attributes="a2='value2' a4="value5'" />

In deltaV2 this is represented as:

    <p deltaxml:deltaV2="A!=B" a1="value1">
       <deltaxml:attributes deltaxml:deltaV2="A!=B">
        <dxa:a2 deltaxml:deltaV2="B">
            <deltaxml:attributeValue deltaxml:deltaV2="B">
                value2</deltaxml:attributeValue>
        </dxa:a2>
        <dxa:a3 deltaxml:deltaV2="A">
            <deltaxml:attributeValue deltaxml:deltaV2="A">
                value3</deltaxml:attributeValue>
        </dxa:a3>
        <dxa:a4 deltaxml:deltaV2="A!=B">
            <deltaxml:attributeValue deltaxml:deltaV2="A">
                value4</deltaxml:attributeValue>
            <deltaxml:attributeValue deltaxml:deltaV2="B">
                value5</deltaxml:attributeValue>
        </dxa:a4>
       </deltaxml:attributes>
    </p>

The new format is much more verbose, but the code to process it is much shorter and simpler. For example, to determine which attributes have been modified, in deltaV1 it is necessary to parse deltaxml:old-attributes and deltaxml:new-attributes to extract the names of all the attributes and then do a set intersection on these to find the names of any attributes in both lists. In deltaV2, it is only necessary to find elements within deltaxml:attributes which have more than one deltaxml:attributeValue within them.

The handling of attribute namespaces is now more consistent because the attribute names become element names (for attributes where the value has changed) rather than the prefixes being embedded in the deltaxml:old-attribtues and deltaxml:new-attributes values. This makes for easier handling of the namespaces.

Note also that deltaV2 can be extended to handle three or more documents, whereas deltaV1 is limited to just two.

Attribute Namespaces

Some special namespaces are used when representing attribute change in the deltaxml:attributes element. These are listed below:

usual or recommended
prefix

namespace uri

purpose

dxa

http://www.deltaxml.com/ns/non-namespaced-attribute

The namespace of an element used to represent an attribute which was not in a namespace in one or both input files.

dxx

http://www.deltaxml.com/ns/xml-namespaced-attribute

The namespace of an element used to represent an attribute in the XML namespace (corresponding to the URI: http://www.w3.org/XML/1998/namespace and always bound to the prefix xml:).  Such attributes include: xml:space, xml:id, xml:base and xml:lang.

These new namespaces are used for several reasons:

New Root element attributes

An attribute on the root element specifies that the document is a delta document and is conforms to deltaV2: deltaxml:version='2.0'

Another attribute on the root element indicates whether the delta document contains just the changes (deltaxml:content-type='changes-only') or if the data that is unchanged in all the documents is also present (deltaxml:content-type='full-context').

Text handling

Text is handled in a similar manner but there are changes to enable more than two documents to be represented. Consider the following example:

Document A (old):

    <p>The quick brown fox</p>

Document B (new):

    <p>The quick red fox</p>

In deltaV1 this would be represented as:

    <p deltaxml:delta="WFmodify">
        <deltaxml:PCDATAmodify>
            <deltaxml:PCDATAold>The quick brown fox</deltaxml:PCDATAold>
            <deltaxml:PCDATAnew>The quick red fox</deltaxml:PCDATAnew>
        </deltaxml:PCDATAmodify>
    </p>

In deltaV2 this is represented as:

    <p deltaxml:deltaV2="A!=B">
        <deltaxml:textGroup deltaxml:deltaV2="A!=B">
            <deltaxml:text deltaxml:deltaV2=”A”>
                The quick brown fox</deltaxml:text>
            <deltaxml:text deltaxml:deltaV2=”B”>
                The quick red fox</deltaxml:text>
        </deltaxml:textGroup>
    </p>

This could also be represented in deltaV2 more precisely as:

    <p deltaxml:deltaV2="A!=B">
        The quick
        <deltaxml:textGroup deltaxml:deltaV2="A!=B">
            <deltaxml:text deltaxml:deltaV2=”A”>brown</deltaxml:text>
            <deltaxml:text deltaxml:deltaV2=”B”>red</deltaxml:text>
        </deltaxml:textGroup>
        fox
    </p>

There is therefore no significant difference in the way that text is handled, except that the absence of text in one document is treated in a slightly different manner:

Document A (old):

    <p>The quick brown fox</p>

Document B (new):

    <p></p>

In deltaV1 this would be represented as:

    <p deltaxml:delta="WFmodify">
        <deltaxml:PCDATAmodify>
            <deltaxml:PCDATAold>The quick brown fox</deltaxml:PCDATAold>
            <deltaxml:PCDATAnew/>
        </deltaxml:PCDATAmodify>
    </p>        

In deltaV2 this is represented as:

    <p deltaxml:deltaV2="A!=B">
        <deltaxml:textGroup deltaxml:deltaV2="A!=B">
            <deltaxml:text deltaxml:delta=”A”>
                The quick brown fox
            </deltaxml:text>
        </deltaxml:textGroup>
    </p>