How to Compare DocBook
Table of Contents
- 1. Introduction
- 2. DocBook
- 3. Change Information in DocBook
- 4. Using DeltaXML Core to Compare Docbook
- 5. Generating Output With Change Marking
- 6. References
1. Introduction
This "how to" guide gives details on how DeltaXML can be used to generate DocBook revisionflag data which can then be used for generating inline (red/green, strike-through/italics) change highlighting and 'ChangeBar' type output.
2. DocBook
DocBook [3] is an Open Source SGML and XML vocabulary, describing a document format that allows easy management of documentation. It can be used to create a wide variety of types of documentation, including articles, books, book sets, etc.
Out of the box, DocBook provides you with is a common vocabulary. What it does not provide, by default, is a production system, for converting DocBook into HTML, RTF, and PDF etc. However, the DocBook XSL Stylesheets [4] do exactly this. This combined with an appropriate transformation processor provides a way of running these stylesheets on the DocBook documents and generating different types of output, such as HTML / XHTML and XSL-FO documents. The XSL-FO documents can then be processed with a Formating Objects or ('FO') processor/renderer (such as XEP[5] or FOP [6]) to produce PDF files.
This document describes/proposes a 3 stage toolchain for processing DocBook with change identification:
- A DeltaXMLCore pipeline for DocBook takes two DocBook files as input and generates a DocBook file with revisionflags.
- With the extensions that are available here for download, docbook-xsl can now generate XSL-FO with appropriate change identification.
- An appropriate renderer can convert the XSL-FO into formats such as PDF.
3. Change Information in DocBook
It is not uncommon to need to track the changes between one version of a document and another. For example, in many word processors including those such as Microsoft Word and OpenOffice Writer, changes can be highlighted via visual cues such as lines (change bars in the page borders) or by colour changes to the text, strike through, underlining etc.
DocBook documents are no exception. For example, if you have used DocBook to document a User Manual for a software tool, then you may wish to review the changes made to this manual from version 1 to version 2 etc.
DocBook provides the 'revisionflag' attribute to track changes within a DocBook document. The revisionflag attribute indicates the revision status of an element. The default for this (optional) attribute is that the DocBook element hasn't been revised. The revisionflag attribute is one of the common attributes in DocBook that occur on all elements (others include ID, XrefLabel etc.) and thus all elements can be given revision information. The revisionflag attribute has an enumerated set of values, which are:
-
changed
-
added
-
deleted
-
off
Note that this implies that the revisionflag attribute is only intended to highlight changes between two specific versions of a document. It is not intended to provide full document version information.
The revisionflag can be added to elements manually as part of a document update process. However, this is very laborous and can be error prone as the documentor may forget to update the attribute or may use the wrong value. Additionally, because manually added revisionflags only highlight changes from one version of a document to the next, you cannot see changes over several versions or between major versions. These problems are solved by using DeltaXML Core to add the revisionflags automatically to show changes between any two versions.
4. Using DeltaXML Core to Compare DocBook
It is possible to use DeltaXML Core to compare two versions of a DocBook document and produce a third DocBook document which contains all of the elements in the two inputs but with revisionflag attributes to show changes. You can do this using the following command:
java -jar command.jar compare docbook4 docbook1.xml docbook2.xml docbook-result.xml
This uses a predefined pipeline, specified in the dxp file compare-docbook.dxp. This pipeline can be used directly with the command line tool as above, or the DXP pipeline can be used in Java code to configure a PipelinedComparator, see Guide to DeltaXML Pipeline Configuration for more details.
The DocBook pipeline (compare.dxp) applies a pre-filter (docbook-infilter.xsl) to the inputs and a post-filter ( docbook-outfilter.xsl) to the delta file. These filters are provided in samples/xsl-filters folder contained in the DeltaXMLCore zip file and the details of tasks they perform are given below. Other options on the pipeline include word-by-word comparison (on by default), output DocType and output indentation. It is also possible to use this pipeline with our online DocBook comparison service [1].
4.1 DocBook Infilter
This filter performs the following tasks:
-
Copies all
idattributes to a new attributedeltaxml:key. This helps the comparison algorithm to ensure that it is comparing the correct elements against each other. -
Adds the
xml:space="preserve"attribute to allprogramlisting,literallayoutandscreenelements. This is because white space within these elements has significance and it should not be removed by other filters you may choose to use such as NormalizeSpace. -
Splits the contents of
programlisting,literallayoutandscreeninto lines so that their content can be compared on a line-by-line basis. This is achieved by wrapping each line in adeltaxml:lineelement that can be removed later on in the post-filter. -
Adds the
deltaxml:word-by-word="true"attribute to all elements that are not compared on a line-by-line basis so that they can be compared on a word-by-word basis if desired. This requires the use of three more filters,WordByWordInfilter,WordByWordOutfilter1andWordByWordOutfilter2but can make the resultant file more specific in the changes that it identifies.
4.2 DocBook Outfilter
This filter performs the following tasks:
-
Removes
deltaxml:lineelements that were added toprogramlisting,literallayoutandscreen. -
Converts
deltaxml:deltaattributes into the appropriaterevisionflagattribute. -
Wraps changed portions of text in a
phraseelement with the appropriaterevisionflagattribute on. This is one of the intended uses of thephraseelement.
We have observed a number of issues when using these filters with Xalan-J. One issue is a problem with Stack Overflow in interpreted Xalan-J; another is a problem with XSLT pattern matching in XSLTC (the default Transformer used in J2SE 5.0). We recommend the of the Saxon XSLT processor with these filters.
5. Generating Output With Change Marking
The resultant DocBook document that you have now produced contains the information you need to highlight the changes between the two documents when you produce a viewable output document. All that is needed is an adapted version of the stylesheets normally used to produce output. DocBook XSL [4] is a set of stylesheets that is commonly used to process DocBook into a presentable format, whether that is HTML (single page and chunked) or XSL-FO (for processing into PDF). At the time of writing, the current version of DocBook XSL is 1.72.0.
5.1 HTML Output
This version includes a stylesheet called changebars.xsl (you can find it in the html directory). If the DocBook result from the pipeline above is processed using this stylesheet, you can produce HTML output that highlights changes using different background colours for added, deleted or modified sections (these are not strictly change bars but the output format is still useful).
You may wish to try out our online DocBook service [1], or after installation/setup of DeltaXML Core and/or Saxon the following example command could be used:
java -jar saxon.jar -o result.html docbook-result.xml /path/to/docbook-xsl-1.72.0/html/changebars.xsl
5.2 PDF Output
If you are producing PDF from your DocBook, it is likely that you will use DocBook XSL to generate XSL-FO and then process this to produce a PDF file. It is possible to add a customization layer to DocBook XSL that can be used to process DocBook documents containing revision flags to produce XSL-FO with change marking. We have produced an implementation of this. When it was under development, XSL-FO version 1.1 was still only a W3C Candidate Recommendation and so provision for change bars in XSL-FO was only possible with renderer-specific extensions. We chose to use RenderX [5] extensions as the change bar implementation most suited our needs. However, it is possible to edit the stylesheets to use any change bar implementation that you wish.
You can download our implementation of DocBook to XSL-FO with changebars [2]. Please note, you will also need to have a version of DocBook XSL [4] available to use this.
It provides the following features:
-
Adds colour and text-styling to added, deleted and modified text.
-
Adds changebars to the page where changes occur.
-
Is fully configurable and has parameters for configuring all of the above.
-
Can be integrated with your existing DocBook XSL customization.
We have tested the following versions of DocBook, DocBook XSL and RenderX:
-
DocBook version 4.5
-
DocBook XSL version 1.72.0
-
RenderX version 4.7
It is also possible to use the stylesheets with RenderX changebars disabled to produce XSL-FO that has changes marked with text colouring and styling. This customization is also included in our DocBook service. This could then be processed into PDF using another renderer such as Apache FOP.
5.3 Support
We welcome feedback on your experience of using these stylesheets. However, we cannot guarantee to fix any problems you may encounter as our stylesheets rely heavily on third-party software. Please contact us with any comments or suggestions.
6. References
-
DeltaXML DocBook Service: http://www.deltaxml.com/free/docbook/
-
DeltaXML DocBookXSL customizations: http://www.deltaxml.com/library/downloads.html
-
DocBook: The Definitive Guide, Online book with definition of all the DocBook elements.
http://www.docbook.org/tdg/en/html/docbook.html -
DocBook XSL Stylesheets
-
DocBook XSL Stylesheets: http://docbook.sourceforge.net/projects/xsl/
-
Documentation on the stylesheets http://wiki.docbook.org/topic/DocBookXslStylesheetDocs
-
-
RenderX's XEP: http://www.renderx.com/tools/xep.html
-
Apache FOP: http://xmlgraphics.apache.org/fop/