Technical Specification

Overview

DeltaXML DITA Compare compares two well-formed Darwin Information Typing Architecture (DITA) inputs and returns a well-formed DITA output. These inputs may be in the form or either a DITA map or topic.

The result from a comparison highlights the differences between the inputs using either DITA's own change markup attributes ('rev' and 'status') on elements or the tracked changes markup for the popular XML editors Arbortext, FrameMaker, oXygen, and XMetaL.

Comparison Types

Topic

When comparing two topics:

Differences are marked in the topic result as described in the Output Formats section below.

Topicset Map

When comparing two sets of topics, with each set referenced from a different map:

Differences are marked in each topic as described in the Output Formats section below.

Within the map, each topic reference uses the status attribute to indicate whether the topic it is referring to (i.e. the referent) has been inserted, deleted, changed, or left unchanged. Here, the changes within topics are presented using either DITA markup or an editor specific tracked changes format.

Mapfile Map

When comparing map files directly:

Within the map, topic reference elements from each map are aligned based on their keyref or href attributes. Attributes on the topic reference elements indicate whether the elements match or are found in only one of the input maps - as described in the following Output Formats section.

Output Formats

DITA Markup

Textual changes are highlighted by wrapping the changed text in phrase elements (<ph> by default) that specify the rev and/or status attributes. Where the <ph> element is not valid, textual changes are optionally wrapped with delimiters to show where change has occurred e.g. -[[old-text]]-+[[new-text]]+

The status attribute may have these values: unchanged, changed, new, deleted. The status attribute will be added with appropriate value where it is allowed. An option lets you control whether or not the status attribute is added. The status attribute is allowed on many different elements in the hierarchy: for example a changed section would have status='changed' and within it each paragraph would have a status attribute with an appropriate value.

The rev attribute may have user-specified values, and you can change the default values that are built into the software. The default values are: deltaxml-add, deltaxml-delete. These will be added with appropriate values, where they are allowed, to show added or deleted text, paragraphs etc. An option lets you control whether or not the rev attribute is added. The ditaval file can be configured to provide different text decoration in your output pipeline, controlled by the value of this attribute.

Tracked Changes

In addition to standard DITA Markup, the output format can be configured to use the tracked changes formats for the XML editors: Arbortext, FrameMaker, oXygen or XMetaL.

The Arbortext tracked changes output format uses special tracked change elements. Here DITA elements can contain Arbortext tracked change elements, and vice versa. One consequence of this approach is that the resulting tracked change document does not conform to the DITA standard. In order to return an Arbortext tracked change document back to the DITA standard all changes need to be accepted or rejected.

The oXygen and XMetaL tracked changes output formats use processing instructions to highlight the changes between the documents. Deleted content is contained within a processing instruction itself, whereas added content is identified by two processing instructions marking the start and end of the inserted section. One useful property about this way of tracking changes is that removing all the track change processing instructions leaves the second ('B') version of the document.

The FrameMaker tracked changes output format is similar to that for oXygen and XMetal, the main difference is that an XML comment is used to contain any deleted content and its markup.

Notes

  • Track change format elements or processing instructions (including oXygen and XMetaL) are removed before comparison begins.
  • When DITA Markup is used, a 'fixup' process is applied to keep the output as a valid DITA document, (e.g. conflicts in id attribute values are fixed).
  • When Tracked Changes is used, the 'fixup' process is not applied.
  • A DITA Compare comparison will fail if the inputs are not DITA files.

DITA versions

DeltaXML DITA Compare targets the language features of OASIS DITA 1.1 Other versions may compare with usable results but result validity cannot be guaranteed. DITA 1.1 and 1.2 XML catalog support is provided by the tool, for other versions or specializations some configuration of the catalog system will be necessary.

Comparing DITA specializations

Input documents that are instances of a DITA specialization will be generalized before comparison using the generalization mechanism in the DITA OpenSource Toolkit (DOST). Once comparison has taken place, the result file will undergo specialization, again using the mechanism provided by DOST.

If the two inputs were instances of different specializations, the result file will be specialized using the typing of the second input file where possible. Comparing input files that are instances of different specializations is not recommended.

Result validity

If the inputs are valid instances of v1.1 map or topic then result will also be a valid v1.1 DITA map or topic respectively, so long as:

  • one of the standard specializations is used, and
  • the phrase element parameter is <ph> (the default).
Note: To avoid any possible confusion, pre-exising markup used to represent change (e.g. the status attribute) is removed prior to comparison.

Inputs that are instances of a specialization will produce a result that is an instance of the same specialization but due to the way in which generalization/specialization is performed during the comparison will potentially produce an invalid result, particularly if the scope of the <ph> element or the rev and status attributes is restricted by the specialization.

When the DITA Markup output format is used (see the Output Format section above), a 'fixup' process is applied to fix validity issues caused by conflicts between the two versions.

Tool Dependancy Notes

Controlling PDF output using ditaval markup (when using the DITA Open Toolkit)

DITA-OT version 1.5.3 (or higher) is required for getting ditaval markup to be used when generating XSL-FO for PDF output. The samples directory in the download contains a deltaxml.ditaval file that can be used to highlight addition and deletion markup (by default in green and red respectively).