Using DeltaXML Markup with XHTML

(Version 2)

This version:

http://www.deltaxml.com/xhtml/intro-v2.html dated 14th February 2002

Previous versions:

http://www.deltaxml.com/xhtml/intro-v1.html dated 12th February 2002

Editor:

Robin La Fontaine , Monsell EDM Ltd.

Copyright © 2002 Monsell EDM Ltd All rights reserved.

Contents

Introduction

Guidelines for Best Results

Changes to tables

Example of forms

XHTML Filters

Conclusions

Introduction

This document explains how DeltaXML may be used to find changes to XHTML files and display these changes as XHTML so that they can be viewed in a browser. DeltaXML will not work with HTML files directly because HTML files are not valid XML. However, if these are converted to XHTML, e.g. using Tidy, then DeltaXML can be applied and provides a very flexible way to display changes in electronic documents.

The DeltaXML web site provides a demonstration of this capability but it is important to understand that this is only one way to use it. If it does not do exactly what you need then it is possible to change it so that it will meet your needs. This is because XSL stylesheets are used as input and output filters to produce the result shown. By changing these filters, changes can be displayed in many different ways.

The current demonstration on the DeltaXML web site shows basic textual differencing and it does not, for example, show changes to the text attributes. However, by changing the output filter using standard XSL it would be possible to show such changes to the user. This means that by using XML, in the form of XHTML, with DeltaXML , the result is far more powerful than a simple HTML differencer. The demonstration is based on "XHTML Transitional" and is a demonstration only and may not work for all XHTML files.

Guidelines for Best Results

The DeltaXML comparison process can be controlled to ensure that the 'same' items are matched up in the two files being compared. This is particularly important where extensive changes have been made. For example, if a number of paragraphs have been deleted and new ones added, with existing ones changed, then it is almost impossible for an automated process to find the correct matches between the original and the new document. There is a simple solution to this: ID attributes can be added to identify the sections and paragraphs and these will be used by DeltaXML in matching the two files.

The first principle is therefore to use ID attributes as extensively as possible to enable the best results. These can be added to headings and paragraphs. XHTML and XML require these to be unique within the file but DeltaXML only requires keys to be unique within their parent element. This means that if you have some other way to identify the elements then you can use that instead of ID attributes - all that is needed is to change the XSL input filter to copy these values into a deltaxml:key attribute.

To illustrate this effect, consider the two examples below.

Result without using keys

Para 1a: These paragraphs have no keys.

Para 1b: Therefore it is difficult to detect when a paragraph has been changed and when it has been deleted and a new one added.

Para 1d: This paragraph will be modified - like this.

Para 2a: This paragraph has been added in version 2.

Result using keys

Para 1a: These paragraphs do have keys.

Para 1b: Therefore it is easy to detect when a paragraph has been changed and when it has been deleted and a new one added.

Para 1d: This paragraph will be modified - like this.

Para 2a: This paragraph has been added in version 2

Changes to Tables

DeltaXML will in many cases be able to detect where rows and columns have been added to tables, or where the text has been changed. In some cases formatting may be lost, for example where the <col> and <colgroup> elements have been used.

Consider a small table with some text as shown below.

Problem Solution Comment
Identifying the corresponding paragraphs in the two documents Use ID attributes to enable DeltaXML to match these You can use other attributes if you prefer provided they are unique within the parent element
Ignoring changes to text formatting You can strip out this information from both files as they are read in using XSL You can do this selectively

Table 1

In this version of this document, the title line above is in bold text. In the previous version it was in normal text.

We will copy the same table again below, and in the second version make some modifications to the text.

Problem Solution Comment
How can we be sure to identify the corresponding paragraphs in the two documents? Use ID attributes to enable DeltaXML to match these You can use other attributes if you prefer provided they are unique within the parent element
Ignoring changes to text formatting You can strip out this information from both files as they are read in using XSL You can do this selectively, i.e. for elements you select

Table 2

Again, we will copy the table and then remove a column.

Problem Solution
Identifying the corresponding paragraphs in the two documents Use ID attributes to enable DeltaXML to match these
Ignoring changes to text formatting You can strip out this information from both files as they are read in using XSL

Table 3

In this version, there are just two columns in the table above rather than three as in the first version.

Example of Forms - Version 2

Forms are more complex to handle because some of the items, for example each option in a menu, cannot have underlined text in them. So, these need to be processed in a special way. Here we have chose to flag the changes using -[[xx]]- to indicate a deletion and +[[xx]]+ to indicate an addition.

This is an example of a simple form which has been modified.

Changes are to the Title choices:

The "Are you using DeltaXML " has been changed from a check-box to radio buttons with "No" button added.

The "How do you rate DeltaXML " has been modifed and each item in the checklist has been changed. Default has been added.

Form1 Version 2

Enter your name:

Title:

Are you using DeltaXML? Yes No

How do you rate DeltaXML on a scale of 1-10?

The above example is not meant to illustrate the best or only way to identify changes - it simply illustrates one way in which the delta file can be processed to show a user how items have changed. Because the delta file represents a structured version of the two files, with changes clearly marked up, it is possible to show almost anything in the output.

XHTML Filters

The process uses input and output filters written in XSL. These apply some changes to the files to ensure that DeltaXML works in a way that is appropriate to XHTML.

XHTML Input filter

The input filter will add keys to the XHTML file based on the ID attributes. It will also indicate to DeltaXML that the <head> element is unordered, i.e. its child elements may appear in any order in the two files. To assist in matching up corresponsing elements, keys are added to each <meta> tag based on the value of its 'name' attribute. The filter also deletes <col> and <colgroup> elements so that changes to tables can be represented without causing errors in the XHTML file.

XHTML Output filter

The output filter, again written in XSL, converts the delta file into standard XHTML. This is achieved by processing each item according to one of the following

Note that within some elements, e.g. <style>, we do not do this - we simply output just the new or both added and deleted text. Changes can be made to this according to actual needs. For attributes, the values in the 'new' file are used and there is no indication in the output file if these have changed. Again, the XSL could be modified so that changed attribute values are shown to the user.

Conclusions

The examples shown above illustrate how DeltaXML can be applied to show changes to XHTML files. Using XSL, the process can be configured to ignore changes of some types of data, e.g. text characteristics, and to show changes in any way a user may require. This therefore provides a much more flexible solution than a simple HTML differencer might provide.

The second version has modifications which are made to illustrate how differences are detected and shown. Both files were created in Dreamweaver and then MacTidy was applied to convert to XHTML.

Copyright © 2002 Monsell EDM Ltd All rights reserved. Patent applied for.