The Simplified DITA Merge Format

1. Introduction

The simplified XML output from DeltaXML DITA Merge is a simplified form of the deltaV2 format. This section describes parts of simplified format particularly significant in the DeltaXML DITA Merge context.

The principle of the simplified format is that when a change is encountered, the actual content of each of the variants is listed in full, without any nested changes.

2. Features and benefits of the simplified delta format

  • All changes recorded at one level without nesting, so it is simple to process
  • Contains all of the data from all of the input files, so any file can be re-constructed and no loss of information
  • Structure follows that of the input files, therefore it is easy to understand and process
  • Supports n-way merge, so more than three documents can be processed
  • Version identifiers are specified through the API prior to processing and appear in the output, giving flexibility and making the delta easier to understand
  • This format can be used for further processing to resolve differences.

3. Disadvantages of the simplified delta format

  • Because nested changes cannot be represented information may need to be duplicated so that change can be represented at a single point in the result hierarchy.
  • The duplication described above may make it harder to see some changes, particularly in the case of small changes nested within large changes.

4. Elements of the simplified delta format

4.1. Root element attributes: deltaxml:version-order and deltaxml:version

These attributes appear on the root element. The deltaxml:version-order attribute specifies the version identifiers in the order in which they were added to the document. The deltaxml:version attribute should be "s1.0".

4.2. deltaxml:versionContentGroup and deltaxml:versionContent

Each variant or version change is shown in a deltaxml:versionContentGroup element which contains at least two deltaxml:versionContent elements. Each deltaxml:versionContent contains the element and text content for this version, and has a deltaxml:versionSet attribute which indicates the versions for which this content is applicable. All versions referenced in the deltaxml:versionSet attribute must be equal to each other - there are never any nested changes.

Where there is no content for a particular version, the deltaxml:versionContent element is empty. Therefore within each deltaxml:versionContentGroup element, all the versions will be represented within one of the child deltaxml:versionContent elements.

<p>Example of versionContentGroup
<deltaxml:versionContentGroup>
   <deltaxml:versionContent deltaxml:versionSet="ancestor">John.Doe@company.com</deltaxml:versionContent>
   <deltaxml:versionContent deltaxml:versionSet="edit1">JohnDoe@anothercompany.com</deltaxml:versionContent>
   <deltaxml:versionContent deltaxml:versionSet="edit2"/>
   <deltaxml:versionContent deltaxml:versionSet="edit3">John.Doe@othercompany.com</deltaxml:versionContent>
</deltaxml:versionContentGroup></p>

4.3. deltaxml:versionAttributeGroup and deltaxml:versionAttribute

Attributes are handled in an analogous way. Any attribute that is unchanged for all the versions remains on the element. Any attribute values that differ are listed in full within a deltaxml:versionAttributeGroup element. The deltaxml:versionAttributeGroup must be the first child element of the element associated with the attribute change. It contains two or more deltaxml:versionAttribute elements each with a deltaxml:versionSet attribute, and zero or more other attributes that are for the versions listed in the deltaxml:versionSet attribute. Therefore the full set of attributes for a particular version is the set on the element plus the set on the relevant deltaxml:versionAttribute element.

Where there is no content for a particular version, the deltaxml:versionAttribute element has no attributes other than the deltaxml:versionSet attribute. Therefore within each deltaxml:versionAttributeGroup element, all the versions will be represented within one of the child deltaxml:versionAttribute elements.

<p>Example of versionAttributeGroup
<person id="1" gender="male">
  <deltaxml:versionAttributeGroup>
     <deltaxml:versionContent deltaxml:versionSet="ancestor" name="J" age="26"/>
     <deltaxml:versionContent deltaxml:versionSet="edit1" name="John"/>
     <deltaxml:versionContent deltaxml:versionSet="edit2" name="John" age="27" email="John.Doe@company.com"/>
  </deltaxml:versionAttributeGroup >
</person></p>

5. versionSet attributes

The deltaxml:versionSet attributes in the simplified delta format contain a sequence of one or more version identifiers joined by the '=' character.

The deltaxml:versionSet attributes conform to the following rules:

  • In the context of a deltaxml:versionContentGroup, the child elements (deltaxml:versionContent) must reference each version once and only once.
  • In the context of a deltaxml:versionAttributeGroup, the child elements (deltaxml:versionAttribute) must reference each version once and only once.
  • The deltaxml:versionSet attributes only appear on the deltaxml:versionContent and deltaxml:versionAttribute elements.

6. Version Identifiers

Version identifiers are user-specified labels assigned to the common ancestor and each revision document. An identifier must be supplied each time a new document is added and each new identifier must have a unique value.

6.1. Choosing values for version identifiers

Version identifiers may be user-specified or machine generated (provided they meet the constraints outlined below). For example, the revision numbers or hash values used in a version control system could be used.

6.2. Constraints on version identifiers

Version identifiers should conform to the NMTOKEN production rule defined in the XML Specification. The same production rules are used in both the XML 1.0 and XML 1.1 specifications. This production rule allows many unicode characters, but prohibits the use of the ! (hex value 0x21) and '=' characters (hex value 0x3b) and also space characters.

7. Differences from the deltaV2 format

  • Unlike deltaV2 format, the delta versions are specified only on deltaxml:versionContent and deltaxml:versionAttribute element.
  • The required top-level attribute deltaxml:content-type will have different values in Merge results. The value 'simplified-merge-concurrent' represents concurrent editing. Future versions of the Merge product will also support a 'travelling draft' model where there is not necessarily the concept of a common ancestor version. It is likely the value 'merge-consecutive' will be used for this algorithm. Other values may also be introduced for subsequent Merge developments.