Loading login details...

Technical FAQ for the DeltaXML Core API

Table of Contents

1. Technical FAQ for the DeltaXML Core API

This summary outlines some of the many advantages that can be gained by using the DeltaXML Core API to identify changes to XML data.

Further frequently asked questions related to the use of the DeltaXML command-line evaluation software and Java API to compare components of XML files are listed in our Support FAQ.

1.1. How can DeltaXML handle any change in XML?

Because DeltaXML records changes by making the minimal possible additions to the existing XML markup, using namespace-qualified attributes to record changes wherever possible, the effect of comparisons is always minimized. The only limitations on comparison is that the root elements of the two documents must be the same. Lexical changes are ignored, as are the order of attributes in the two files. Changes to attribute values are recorded. By assigning keys to elements you can even identify when an element has changed its position within the data structure without otherwise being updated.

1.2. Can DeltaXML handle namespaces?

Yes. DeltaXML will use the prefixes available in the input files or generate a prefix for each namespace. If you use prefixes inside attribute values you need to be sure that your data files are consistent in the use of the prefixes.

1.3. What if my files have DTDs?

No problem! These are handled by the standard SAX parser which reads the files in. Entities will be expanded and default attribute values applied. White space that can be ignored according to the DTD will be ignored. The file is not validated against the DTD.

1.4. How does DeltaXML handle white space?

White space traditionally gives rise to problems when comparing XML files - these are resolved by DeltaXML. If there is a DTD this is used to eliminate spurious white space. Otherwise it is safest to remove extra white space before the comparison process: an XSL filter is provided to do this. Without these precautions you will typically identify a lot of differences that are not significant. If white space is important, then leave it in and DeltaXML will compare it.

1.5. What about attribute order?

Attribute order is ignored during file comparison. According to the XML specification attributes can appear in any order in the two files.

1.6. Some of my data elements can appear in any order within their parent. Can DeltaXML handle that?

Yes. You can use XSL as an input filter to add attributes to tell DeltaXML to treat some elements as orderless, i.e. the child elements can appear in any order. You can also add keys to any child element to ensure that corresponding elements in the two files are always matched up.

1.7. How can I control the comparison process?

You can add keys to any element to ensure that corresponding elements in the two files are always matched up. You can use ID attribute values as the value of the keys - or you can use any other attribute, e.g. a "name", or even the contents of a sub-element, providing the value is unique among a set of siblings. If a combination of attributes forms the key, you can specify that also.

1.8. Can I see changes in the context of the original?

The normal delta file contains just the changes between the two files, and unchanged data is omitted. In some situations it is more useful to have the unchanged data available in the delta file, and this option is provided as the 'full delta'. This is especially useful, for example, when you are displaying changes to a user because it is easier to understand changes when they are displayed in context with the original data.

1.9. How can I convert my output to HTML?

Just use XSL in the normal way, adding templates to deal with the specific delta elements and attributes. Examples stylesheets are provided as part of the API, and we can help you with any specific requirements you have. You have complete flexibility here about how you display changes, and what changes you display.

1.10. I need to see changes to text on a word-by-word basis, not just that a block of text has changed. Is this possible?

Yes. The compare-detailed option in the DeltaXML command-line tool can be used to compare files on a word-by-word basis; sample code is also provide to show how a word-by-word pipeline can be constructed and included in your systems. White space is automatically normalized prior to comparison, and the output can be in the form of a displayable HTML file, such as our example of Using DeltaXML Markup with XHTML

1.11. How can I integrate DeltaXML into my product?

DeltaXML is provided as a Java API using standard SAX and TrAX protocols. It typically only takes a day to integrate, depending on the complexity of your application. You need to be familiar with Java.

1.12. Can I use any XML parser with DeltaXML?

Yes, any JAXP 1.1 compliant parser is compatible. Xerces is provided as standard but you can use a different parser.

1.13. How does DeltaXML find the best match between elements?

At first it looks for keys that correspond and matches these. Then it looks for whole elements (including their attributes and content) that are equal and matches these. Finally it matches elements with the same type and then proceeds down the tree to find where there are differences. Specialist algorithms are used to find the best match at any level in the files.

1.14. Can DeltaXML handle large files?

Yes, it has been tested and performs well on files of serveral hundred megabytes in size. There are some issues that need to be considered when comparing very large files: our web site provides advice on this subject.

1.15. Does it take a long time to compare large files?

DeltaXML has many optimizations that improve its speed and keep memory usage as low as possible. The algorithms employed are optimized for files that have only a small number of differences. The algorithms have been designed to utlilize linear memory capacity for larger files. More.

1.16 What is the purpose of the deltaxml:exchange element in the delta file?

A deltaXML:exchange element occurs when two data elements at the same position in the input documents are exchanged, i.e. one is deleted and the other added. This means that where there was one item in each of the input documents, there will be one item in the delta file. The rationale behind it is to preserve, as far as possible, the same structure in the delta as in the input documents.

For a deltaXML:exchange to occur in the delta file the two items in the input files must be different types, i.e. different element types, elements of the same type but with different keys, or one an element and the other a PCDATA item. The reason that two elements of the same type but with different keys are considered to be different is because DeltaXML will never report that a key has been changed, because it never considers that two elements with different keys correspond with each other.

The deltaxml:old and deltaxml:new elements are needed because the items exchanged may be PCDATA and therefore need a delimiter. For consistency, the delimiter is always used within deltaxml:exchange. In addition, an element inside the deltaxml:old will have a deltxml:delta="delete" attribute and an element within deltaxml:new will have a deltxml:delta="add" attribute. This is to be consistent with other elements that are added or deleted and this makes some processing of the delta file easier.

If you prefer not to have an exchange element, additional XSLT templates can be added to any downstream filter to ignore the exchange and process the data within it, assuming you already have templates to handle elements with deltaxml:delta attributes. The following XSLT templates may be useful for this:

<xsl:template match="deltaxml:old/text()">
 ... whatever you want here to handle old text...
</xsl:template>

<xsl:template match="deltaxml:new/text()">
 ... whatever you want here to handle new text...
</xsl:template>

<xsl:template match="deltaxml:exchange">
  <xsl:apply-templates select="deltaxml:old/node()"/>
  <xsl:apply-templates select="deltaxml:new/node()"/>
</xsl:template>

1.17 How can I remove the deltaxml:exchange element from the delta file?

The following XSLT filter will remove the deltaxml:exchange elements from a delta file.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
                version="1.0">

  <!-- default match -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <!-- match an element exchange -->
  <xsl:template match="deltaxml:exchange[deltaxml:old/*][deltaxml:new/*]">
    <xsl:apply-templates select="deltaxml:old/*"/>
    <xsl:apply-templates select="deltaxml:new/*"/>
  </xsl:template>

  <!-- match a text-element exchange -->
  <xsl:template match="deltaxml:exchange[not(deltaxml:old/*)][deltaxml:new/*]">
    <deltaxml:PCDATAmodify>
      <deltaxml:PCDATAold><xsl:value-of select="deltaxml:old"/></deltaxml:PCDATAold>
      <deltaxml:PCDATAnew/>
    </deltaxml:PCDATAmodify>
    <xsl:apply-templates select="deltaxml:new/*"/>
  </xsl:template>

  <!-- match an element-text exchange -->
  <xsl:template match="deltaxml:exchange[deltaxml:old/*][not(deltaxml:new/*)]">
    <xsl:apply-templates select="deltaxml:old/*"/>
    <deltaxml:PCDATAmodify>
      <deltaxml:PCDATAold/>
      <deltaxml:PCDATAnew><xsl:value-of select="deltaxml:new"/></deltaxml:PCDATAnew>
    </deltaxml:PCDATAmodify>
  </xsl:template>

</xsl:stylesheet>