Loading login details...

DeltaXML Newsletter - May 2005

Welcome to our May newsletter. You may notice that we have introduced a new more concise format to the newsletter. The aim is to bring you all the company, customer and product news you need in a much shorter format. At the same time we are expanding our technical articles to give more in depth coverage.

This month we are announcing our new corporate name. Out goes the old name of Monsell EDM and in comes our much more focused: DeltaXML Ltd.

In our new faces section we meet software engineer, Tristan Mitchell, who joined DeltaXML's engineering team in March this year.

This month's technical issue looks at comparing DocBook documents using DeltaXML

    - The DeltaXML Team.

Contents

In this newsletter:

What's in a Name? Monsell EDM Ltd becomes DeltaXML Ltd

Recognizing the company's position as a business focused on XML change we have now renamed ourselves from Monsell EDM to DeltaXML. A simple change, but one that reflects our dedication to this environment both now and for the future.

The name change reinforces the company's product line-up: DeltaXML Core and DeltaXML Sync. All future products will be named in the same fashion to maintain consistency and brand messaging.

Monsell EDM was established nearly fifteen years ago to focus on Electronic Data Management (EDM) projects that it undertook for the European Commission, the Electronic Industries Association (EIA, USA) and others who were interested in investigating standards for data exchange.

Led by Founder and Chief Executive Robin La Fontaine, the company changed its focus in the late 1990s to concentrate on the emerging XML standard and the difficulties associated with change in this environment. Over the last five years the company has gone from strength to strength in bringing XML change control to Government bodies, large international corporations, OEMs, Systems Integrators and smaller companies with a need for very accurate, high speed processing of XML deltas.

New Faces at DeltaXML: Tristan Mitchell, Software Engineer

The DeltaXML Software Engineering team is expanding with the hiring of new recruit Tristan Mitchell, aged 26. Tristan graduated with an MEng in Software Engineering from Aberystwyth University's highly-respected Computer Science faculty. Since joining DeltaXML Tristan has been putting his keen interest in Java and Web Apps to good use in converting DeltaXML's XSLT filters into Java to make them more memory efficient and faster. Tristan is also working with DeltaXML's lead software architect, Nigel Whitaker, on the next big release of DeltaXML Core, version 3.0 .

Since joining DeltaXML Tristan has already been busy gaining new skills. He has been to IBM's Winchester facility for training in Eclipse, and has attended an in-house training course on Cocoon at DeltaXML's head office in Upton-upon-Severn in Worcestershire.

Technical Corner: Highlighting DocBook document changes with DeltaXML

Processing DocBook [1] files to display changes is a typical application for DeltaXML. We'll briefly describe how DeltaXML was customized to compare DocBook XML files and generate changes in HTML using the standard DocBook XSLT style sheet, i.e. changebars.xsl which ships with DocBook XSL [2]. This XSLT style sheet uses an attribute 'revisionflag' on certain elements to generate a display of changes in HTML. Our task was to use DeltaXML to generate these revision flags automatically on the appropriate elements in order to show changes between any two DocBook documents.

The approach is to compare the two DocBook files ('old' and 'new') using DeltaXML and then convert the resultant full delta file back into a standard DocBook document - including the revision flags to show updates.

The full delta file generated by DeltaXML contains all the information from each of the original documents with each element annotated with a 'deltaxml:delta' attribute to show changes. Therefore we need to process each element and replace these attributes with 'revisionflag' attributes where appropriate. This is not quite as easy as it sounds because revisionflag attributes are not allowed on all elements, only on a few, e.g. <para>, <phrase> . Also, where text has been modified, we have to generate new <phrase> elements to span the new and old text - this is one of the intended uses of the <phrase> element.

We also need to handle changes to attributes in the document. We cannot have two values for an attribute, so it is most appropriate to take the attribute values from the new document because we want to view the updated information in preference to the old information. This is done simply by selecting the new values and ignoring any attributes that are only present in the old document. For a different use case, a different strategy could have been used. This processing is achieved using an XSLT style sheet.

We can go a little further than this to improve the results by pre-processing the input files, again using XSLT. First, we can use any 'id' attributes as keys so that elements with the same 'id' in the two files will be considered to correspond. This is achieved by copying the 'id' attribute to a 'deltaxml:key' attribute. This can dramatically improve the 'accuracy' of the comparison, yielding as-expected results such as might be expected from a human reviewer, and so 'id' attributes should be used if possible.

Second, we can process elements such as <programlisting> and <blockquote> in a special way to enable a line-by-line comparison for these items. This is more appropriate and useful than comparing the whole block or using a word-by-word difference. To achieve this, we process the input files before comparison and add in extra elements to span each line. These allow DeltaXML to perform a line-by-line comparison, and are then removed, after the comparison, in the output filter described above.

The result of this is a customization of DeltaXML which has these benefits:

Similar customizations have been built for other custom document types, please contact us if you would like to use these customization XSLT style sheets for DeltaXML, or would like to discuss a customization to your requirements.

Weblinks:
[1] DocBook: http://www.docbook.org/
[2] DocBook XSLT: http://docbook.sourceforge.net/projects/xsl/

Diary Dates

http://www.xtech-conference.org/ - XTech 2005, Amsterdam, 24-27 May 2005

http://www.extrememarkup.com/ - Extreme 2005 - August 1-5 - Montreal, Quebec, Canada

Weblinks:
DeltaXML news: http://www.deltaxml.com/news/

Please let us know whether this newsletter has been useful to you, we welcome any suggestions about information you'd like discussed in future editions. We'll be back next month with another edition.

Copyright (c) 2005 DeltaXML Ltd.

Newsletter archive: http://www.deltaxml.com/newsletters/
Newsletter subscription management: http://lists.deltaxml.com/mailman/listinfo/open-newsletter