How to Use DeltaXML Markup with XHTML

(Version 1)

DeltaXML logo

This version:

http://www.deltaxml.com/xhtml/intro-v1.html dated 12th February 2002

Previous versions:

None

Editor:

Robin La Fontaine , Monsell EDM Ltd.

Copyright © 2002 Monsell EDM Ltd All rights reserved.

Contents

Introduction

Guidelines for best results

Changes to tables

Example of forms

XHTML Filters

Conclusions

Introduction

This document explains how DeltaXML may be used to find changes to XHTML files and display these changes as XHTML so that they can be viewed in a browser. DeltaXML will not work with HTML files directly because HTML files are not valid XML. However, if these are converted to XHTML, e.g. using Tidy, then DeltaXML can be applied and provides a very flexible way to display changes in electronic documents.

The DeltaXML web site provides a demonstration of this capability but it is important to understand that this is only one way to use it. If it does not do exactly what you need then it is possible to change it so that it will meet your needs. This is because XSL stylesheets are used as input and output filters to produce the result shown. By changing these filters, changes can be displayed in many different ways.

The current demonstration shows basic textual differencing and it does not, for example, show changes to the text attributes. However, by changing the output filter using standard XSL it would be possible to show such changes to the user. This means that by using XML, in the form of XHTML, and DeltaXML, the result is far more powerful than a simple HTML differencer. The demonstration is based on XHTML Transitional and is a demonstration only and may not work for all XHTML files.

Guidelines for best results

The DeltaXMLcomparison process can be controlled to ensure that the 'same' items are matched up in the two files being compared. This is particularly important where extensive changes have been made. For example, if a number of paragraphs have been deleted and new ones added, with existing ones changed, then it is almost impossible for an automated process to find the correct matches between the original and the new document. There is a simple solution to this: ID attributes can be added to identify the sections and paragraphs and these will be used by DeltaXML in matching the two files.

The first principle is therefore to use ID attributes as extensively as possible to enable the best results. These can be added to headings and paragraphs. XHTML and XML require these to be unique within the file but DeltaXML only requires keys to be unique within their parent element. This means that if you have some other way to identify the elements then you can use that instead of ID attributes - all that is needed is to change the XSL input filter to copy these values into a deltaxml:key attribute.

To illustrate this effect, consider the two examples below.

Result without using keys

Para 1a: These paragraphs have no keys.

Para 1b: Therefore it is difficult to detect when a paragraph has been changed and when it has been deleted and a new one added.

Para 1c: This paragraph will be deleted, and a new one added after the next paragraph.

Para 1d: This paragraph will be modified.

Result using keys

Para 1a: These paragraphs do have keys.

Para 1b: Therefore it is easy to detect when a paragraph has been changed and when it has been deleted and a new one added.

Para 1c: This paragraph will be deleted, and a new one added after the next paragraph.

Para 1d: This paragraph will be modified.

Changes to Tables

DeltaXML will in many cases be able to detect where rows and columns have been added to tables, or where the text has been changed. In some cases formatting may be lost, for example where the <col> and <colgroup> elements have been used.

Consider a small table with some text as shown below.

Problem Solution Comment
Identifying the corresponding paragraphs in the two documents Use ID attributes to enable DeltaXML to match these You can use other attributes if you prefer provided they are unique within the parent element
Ignoring changes to text formatting You can strip out this information from both files as they are read in using XSL You can do this selectively

Table 1

In this version of this document, the title line above is in normal text. In a modified version the title is in bold.

We will copy the same table again below, and in the second version make some modifications to the text.

Problem Solution Comment
Identifying the corresponding paragraphs in the two documents Use ID attributes to enable DeltaXML to match these You can use other attributes if you prefer provided they are unique within the parent element
Ignoring changes to text formatting You can strip out this information from both files as they are read in using XSL You can do this selectively

Table 2

Again, we will copy the table and then remove a column.

Problem Solution Comment
Identifying the corresponding paragraphs in the two documents Use ID attributes to enable DeltaXML to match these You can use other attributes if you prefer provided they are unique within the parent element
Ignoring changes to text formatting You can strip out this information from both files as they are read in using XSL You can do this selectively

Table 3

Example of Forms

Forms are more complex to handle because some of the items, for example each option in a menu, cannot have underlined text in them. So, these need to be processed in a special way. Here we have chose to flag the changes using -[[xx]]- to indicate a deletion and +[[xx]]+ to indicate an addition.

This is an example of a simple form which will be modified.

Form1 Version 1

Enter your name:

Title:

Are you using DeltaXML? Yes

How do you rate DeltaXML?

The above example is not meant to illustrate the best or only way to identify changes - it simply illustrates one way in which the delta file can be processed to show a user how items have changed. Because the delta file represents a structured version of the two files, with changes clearly marked up, it is possible to show almost anything in the output.

XHTML Filters

The process uses input and output filters written in XSL. These apply some changes to the files to ensure that DeltaXML works in a way that is appropriate to XHTML.

XHTML Input filter

The input filter will add keys to the XHTML file based on the ID attributes. It will also indicate to DeltaXML that the <head> element is unordered, i.e. its child elements may appear in any order in the two files. To assist in matching up corresponsing elements, keys are added to each <meta> tag based on the value of its 'name' attribute. The filter also deletes <col> and <colgroup> elements so that changes to tables can be represented without causing errors in the XHTML file.

XHTML Output filter

The output filter, again written in XSL, converts the delta file into standard XHTML. This is achieved by processing each item according to one of the following

Note that within some elements, e.g. <style>, we do not do this - we simply output just the new or both added and deleted text. Changes can be made to this according to actual needs. For attributes, the values in the 'new' file are used and there is no indication in the output file if these have changed. Again, the XSL could be modified so that changed attribute values are shown to the user.

Conclusions

The examples shown above illustrate how DeltaXML can be applied to show changes to XHTML files. Using XSL, the process can be configured to ignore changes of some types of data, e.g. text characteristics, and to show changes in any way a user may require. This therefore provides a much more flexible solution than a simple HTML differencer might provide.

Copyright © 2002 Monsell EDM Ltd All rights reserved. Patent applied for.