Loading login details...

DeltaXML Newsletter - February 2004

Translation is this month's theme - using DeltaXML to simplify management of multilingual content. For larger sites, updating content in a dozen languages is an expensive challenge - one for which we are developing a powerful new solution, more below. We also have a look at performance and file size considerations, for many of our customers our emphasis on speed and our ability to handle very large documents were key factors when choosing DeltaXML. And we welcome more customers who are using DeltaXML to develop innovative XML-driven products and services.

Progress with DeltaXML Sync, our generic XML synchronization technology, continues apace - an announcement is coming up next month. Combine concurrent edits, manage variants and merge master  updates with custom edits - please let us know  if you'd like to join the Early Access team and try out a pre-release version.

Please let us know also if you have anything you'd like to discuss about managing change in XML. If you prefer to research further before contacting us directly, free evaluation downloads are available at our website: www.deltaxml.com

    - The DeltaXML Team.

Contents

In this newsletter:

Customer Focus: Recent DeltaXML Customers

Recent customers include:

DeltaXML enables redlining in a documentation control server, powers generic XML comparison as an ASP service, controls synchronization with software control file updates and adds document comparison to Vignette V7.

[Read more...] 

DeltaNet Online XML Comparison

Bookmark this site and spread the word - freely available XML comparison, no registration required. If you choose to register for free you'll get access to FullDeltaNet, allowing documents to be specified as URLs and delivering results as both HTML and as an XML delta.

Weblink: DeltaNet at http://compare.deltaxml.com/

Beyond Babel- Managing Multilingual Content with XML

Multilingual content delivery is transforming access to information for the growing proportion of the online community for whom English is not their language of choice. Even four years ago, a  Forrester report[1] showed 37% of Fortune 100 companies producing multilingual websites - today a multilingual site is seen by major corporates as key to reaching a global audience. XML publishing frameworks, with their separation of content from presentation, are becoming the tool of choice for managing such sites. Now new challenges are emerging - to manage change in the XML content and to manage it efficiently.

Consider, if you will, a vendor of end-user software. Their website content and all product documentation are written in English and stored in XML, perhaps using DocBook or a proprietary grammar. There are 100 distinct "pages" of information for this first release. A decision is taken to localize all this content to German, Spanish, French, Italian and Portuguese - now requiring maintenance of 600 pages. Versions 1.1, 1.2 and 1.3 appear - there are now 2400 distinct language/version combinations, at which point version 2 appears requiring a rewrite of some 20% of the original English pages. Maintenance of the documentation in just six languages is now requiring significant effort. Although there is a proven demand for further translations, the maintenance costs are prohibitive, and a plan to move to a further half-dozen languages is scrapped.

Machine translation, useful for understanding a paragraph in a foreign language, is still a long way from providing automatic translation, so our software vendor must rely on slow and costly hand translation and re-translation of every change in the "source language" documents. Traditionally, translators are simply given the 'old' and 'new' documents, and asked to re-translate. Companies such as Lionbridge, Weblation and Trados have built their businesses around the management of this complexity. With the spread of XML, though, there is a new and much more efficient approach available - automatic identification of changes at the desired level of granularity. By identifying changes at, say, the paragraph or word level, and by presenting translators with text for English v1.0, English v1.1 and German v1.0, with all changes marked up, custom translation tools can easily be built that allow much greater efficiency - and much more enjoyable work.

Existing solutions to this problem rely on maintaining content in all its versions within a proprietary Change Management System - with all the disadvantages of vendor lock-in this implies. At the XML Europe 2004 conference in Amsterdam we'll be presenting a paper showing how DeltaXML technology can be used to create a pure XML "archive", which contains a "content" document in multiple versions and/or multiple languages. This "archive", or unified delta, contains everything a translator needs efficiently to find what has changed between versions of the source-language document, to see the original translation and to modify it as appropriate for the new translation. All this is handled for generic XML, so you can use whatever tools you prefer - translators update content and re-submit it as XML. This base technology will allow application vendors to build "open" tools that manage a new generation of fully reusable, widely localized content across multiple versions.

Weblinks:
http://www.xmleurope.com/2004/tuesday.asp - Beyond Babel - Simplifying Translation with XML, Amsterdam 20 April (see Diary Dates below for details).
[1] http://www.forrester.com/ER/Research/Report/Summary/0,1338,9513,FF.html  - Forrester report, The Multilingual Site Blueprint, June 2000.

Performance Tuning with DeltaXML

From the first release of DeltaXML, our users have demanded high performance even when processing very large documents. Improvements in algorithms and data have allowed us to tune the DeltaXML Core API considerably, it's now exceeding the expectations of most of our users. Metrics, though, can be pretty misleading - comparison times, in particular, depend aheavily on the structure of the input documents and on the extent of the differences. Largely "flat" XML  database exports, with thousands of elements within a single root element, will have very different performance characteristics to, say, complex legal documents of the same size but with a deeply nested tree structure. Changes at the end of a document may have different effects to changes at the beginning... and so on.

With that caveat, though, a guideline "real-world" figure from customers' data may give some indication of realistic expectations. In a recent customer support query, we processed two 10Mb documents and generated a full context delta, showing changes in the context of the original delta. Processing time was 27 seconds - Sun v210, 1GHz CPU, 512 Mb, Solaris 9 08/03, JDK 1.4.2.

This is so far ahead of any other available comparison technology that further optimization may seem redundant - but there is always room for improvement. Here are some tuning tips and best practices.

We use a highly optimized storage ("micro-DOM") to hold the documents being compared, which must be held entirely in memory for access by the comparison algorithms. These algorithms are tuned for best performance where changes are small relative to overall document size - the most common scenario we see. Let us know if your requirements are different, we'll see how we can help.

If you have practical experience of performance tuning DeltaXML, we'd love to hear from you. If you'd like to read more, try our "Comparing Large Documents" page below. Happy tuning!

Weblink: http://www.deltaxml.com/comparing-large-files.html

Diary Dates

Weblinks:
DeltaXML Latest Developments: http://www.deltaxml.com/news/latest-developments.html

Please let us know whether this newsletter has been useful to you, we welcome any suggestions about information you'd like discussed in future editions. We'll be back next month with another edition.

© 2004 DeltaXML and Monsell EDM Ltd.