How to Compare DocBook

1. Introduction

This "how to" guide gives details on how DeltaXML can be used to generate DocBook revisionflag data which can then be used for generating inline (red/green, strike-through/italics)  change highlighting and 'ChangeBar'  type output.

2. DocBook

DocBook [3] is an Open Source SGML and XML vocabulary, describing a document format that allows easy management of documentation. It can be used to create a wide variety of types of documentation, including articles, books, book sets, etc.

Out of the box, DocBook provides you with a common vocabulary. What it does not provide, by default, is a production system for converting DocBook into HTML, RTF, and PDF etc. However, the DocBook XSL Stylesheets [4] do exactly this. These, combined with an appropriate transformation processor provide a way of transforming the DocBook documents and generating different types of output, such as HTML, XHTML and XSL-FO documents. The XSL-FO documents can then be processed with a Formating Objects or 'FO' processor (such as XEP [6] or  FOP [7]) to produce PDF files.

This document describes a three stage toolchain for processing DocBook with change identification:

  1. DeltaXML DocBook Compare (or in DeltaXML Core version 5, a sample pipeline) takes two DocBook files as input and generates a DocBook file with revisionflags.
  2. docbook-xsl can be used to generate XSL-FO with appropriate change identification using our extensions that are available for download,
  3. An appropriate renderer can convert the XSL-FO into formats such as PDF.

3. Change Information in DocBook

It is not uncommon to need to track the changes between one version of a document and another. For example, in many word processors including those such as Microsoft Word and OpenOffice.org Writer, changes can be highlighted via visual cues such as lines (change bars in the page borders) or by colour changes to the text, strike through, underlining etc.

DocBook documents are no exception. For example, if you have used DocBook to document a User Manual for a software tool, then you may wish to review the changes made to this manual from version 1 to version 2 etc.

DocBook provides the 'revisionflag' attribute to track changes within a DocBook document. The revisionflag attribute indicates the revision status of an element. The default for this (optional) attribute is that the DocBook element hasn't been revised. The revisionflag attribute is one of the common attributes in DocBook that occur on almost all elements (others include ID, XrefLabel etc.) and thus these elements can be given revision information. The revisionflag attribute has an enumerated set of values, which are:

  • changed
  • added
  • deleted
  • off

Note that this implies that the revisionflag attribute is only intended to highlight changes between two specific versions of a document. It is not intended to provide full document version information.

The revisionflag can be added to elements manually as part of a document update process. However, this is very laborious and can be error prone as the documentor may forget to update the attribute or may use the wrong value. Additionally, because manually added revisionflags only highlight changes from one version of a document to the next, you cannot see changes over several versions or between major versions. These problems are solved by using DeltaXML DocBook Compare to add the revisionflags automatically to show changes between any two versions.

4. Using DeltaXML to Compare DocBook

DeltaXML DocBook Compare will compare two versions of a DocBook document and produce a third DocBook document which contains all of the elements in the two inputs but with revisionflag attributes to show changes. See the ReadMe file in the download for details about how to do this. It is also possible to use this pipeline with our online DocBook Compare demo[1].

4.1. Processing of input documents

This tool performs the following tasks on the two input DocBook documents (you do not need to understand the details here, but it may be useful to know that this happens):

  • Optionally copies all id attributes to a new attribute deltaxml:key. This helps the comparison algorithm to ensure that it is comparing the correct elements against each other.
  • Adds the xml:space="preserve" attribute to all elements listed in the specification as having "linespecific" formatting (i.e. the format attribute has the value linespecific). These include programlisting, literallayout and screen elements. This is because white space within these elements has significance and it should not be removed by other filters you may choose to use such as NormalizeSpace.
  • Splits the contents of all linespecific elements into lines so that their content can be compared on a line-by-line basis. 
  • Ensures that all elements that are not compared on a line-by-line basis are compared on a word-by-word basis. 

4.2. Processing of generated document

The generated result is processed as follows:

  • Add revisionflag attributes.
  • Wrap changed portions of text in a phrase element with the appropriate revisionflag attribute on. This is one of the intended uses of the phrase element.

5. Generating Output With Change Marking

The resultant DocBook document that you have now produced contains the information you need to highlight the changes between the two documents when you produce a viewable output document. All that is needed is an adapted version of the stylesheets normally used to produce output. DocBook XSL [4] is a set of stylesheets that is commonly used to process DocBook into a presentable format, whether that is HTML (single page and chunked) or XSL-FO (for processing into PDF). At the time of writing, the current version of DocBook XSL is 1.76.1.

5.1. HTML Output

This version includes a stylesheet called changebars.xsl (you can find it in the html directory). If the DocBook result from the pipeline above is processed using this stylesheet, you can produce HTML output that highlights changes using different background colours for added, deleted or modified sections (these are not strictly change bars but the output format is still useful).

You may wish to try out our online DocBook Compare Demo [1], or if you have Saxon available the following example command could be used:

java -jar saxon.jar -o result.html docbook-result.xml /path/to/docbook-xsl-1.72.0/html/changebars.xsl

5.2. PDF Output

If you are producing PDF from your DocBook, it is likely that you will use DocBook XSL to generate XSL-FO and then process this to produce a PDF file. It is possible to add a customization layer to DocBook XSL that can be used to process DocBook documents containing revision flags to produce XSL-FO with change marking. We have produced an implementation of this. When it was under development, XSL-FO version 1.1 was still only a W3C Candidate Recommendation and so provision for change bars in XSL-FO was only possible with renderer-specific extensions. We chose to use RenderX [6] extensions as the change bar implementation most suited our needs. However, it is possible to edit the stylesheets to use any change bar implementation that you wish.

You can download our implementation of DocBook to XSL-FO with changebars [2]. Please note, you will also need to have a version of DocBook XSL [4] available to use this.

It provides the following features:

  • Adds colour and text-styling to added, deleted and modified text.
  • Adds changebars to the page where changes occur.
  • Is fully configurable and has parameters for configuring all of the above.
  • Can be integrated with your existing DocBook XSL customization.

We have tested the following versions of DocBook, DocBook XSL and RenderX:

  • DocBook version 4.5
  • DocBook XSL version 1.72.0
  • RenderX version 4.7

It is also possible to use the stylesheets with RenderX changebars disabled to produce XSL-FO that has changes marked with text colouring and styling. This customization is also included in our DocBook Compare demo [1]. This could then be processed into PDF using another renderer such as Apache FOP.

5.3. Support

We welcome feedback on your experience of using these stylesheets. However, we cannot guarantee to fix any problems you may encounter as our stylesheets rely heavily on third-party software. Please contact us with any comments or suggestions.