Formatting Element Changes

1. Introduction

There are many XML languages that describe documents available, such as XHTML, DocBook and ODF. One common feature of these XML languages is that some of their elements are used not to define structure but to mark text as having a certain format. Elements such as <strong> or <em> in XHTML, <emphasis> in DocBook and <text:span> in ODF are examples of such elements.

DeltaXML Core makes no distinction between structural elements and formatting elements when comparing two versions of a document. Because of this, changes to formatting elements can generate more change than expected in a delta file. Consider the following examples of a very simple documentation language.

Example 1: A simple XML document (input1.xml in the sample directory)

<document>
  <para>This paragraph will have words made bold in the following version.</para>
  <para>In this sentence, new words will be added and some made bold.</para>
</document>

Example 2: The same document with text changes and text formatting added (input2.xml in the sample directory)

<document>
  <para>This paragraph will have <bold>words made bold</bold> in the following version.</para>
  <para>In this sentence, <bold>new bold words</bold> will be added and some made bold.</para>
</document>

When these files are compared, they generate a delta file that shows a lot of change.

Example 3: The delta file without taking formatting elements into account (text is being compared word by word)

<document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
          deltaxml:deltaV2="A!=B"
          deltaxml:version="2.0"
          deltaxml:content-type="full-context">
  <para deltaxml:deltaV2="A!=B">
    This paragraph will have 
    <deltaxml:textGroup deltaxml:deltaV2="A">
      <deltaxml:text deltaxml:deltaV2="A">words</deltaxml:text>
    </deltaxml:textGroup>
    <bold deltaxml:deltaV2="B">words made bold</bold>
    <deltaxml:textGroup deltaxml:deltaV2="A">
      <deltaxml:text deltaxml:deltaV2="A">made bold </deltaxml:text>
    </deltaxml:textGroup>
    in the following version.
  </para>
  <para deltaxml:deltaV2="A!=B">
    In this sentence, 
    <deltaxml:textGroup deltaxml:deltaV2="A">
      <deltaxml:text deltaxml:deltaV2="A">new</deltaxml:text>
    </deltaxml:textGroup>
    <bold deltaxml:deltaV2="B">new bold words</bold>
    <deltaxml:textGroup deltaxml:deltaV2="A">
      <deltaxml:text deltaxml:deltaV2="A">words </deltaxml:text>
    </deltaxml:textGroup>
    will be added and some made bold.
  </para>
</document>

Although this is a correct representation of what has changed between the two versions of the document, it may not be intuitive to somebody making changes in a WYSIWYG editor. For a document editor the most important changes are textual changes, not the format changes, and this delta file shows more text change than actually occurred. DeltaXMLCore includes some XSLT filters to improve this result by taking into account those elements that are merely used for textual formatting.

2. DocumentComparator

The com.deltaxml.cores9api.DocumentComparator is designed to handle structural changes such as this. In order for the comparator to identify formatting elements, they will need to be marked with a deltaxml:format="true" attribute. In the example documents above, the <bold> element needs to be marked in this way. The following XSLT template could be used to do this.

Example 4: an XSLT template to mark bold elements (defined in mark-formatting.xsl in the sample directory)

<xsl:template match="bold">
  <xsl:copy>
    <xsl:attribute name="deltaxml:format" select="'true'"/>
    <xsl:apply-templates select="node()"/>
  </xsl:copy>
</xsl:template>

This stylesheet needs to be added to the DocumentComparator by assigning a FilterChain to the PRE_FLATTENING extension point. The following Java code snippet (from FormattingElementDemo.java in the sample directory) shows how to do this:

 DocumentComparator comparator= new DocumentComparator();

 FilterStepHelper fsh= comparator.newFilterStepHelper();
 
 FilterChain formatMarker= 
    fsh.newSingleStepFilterChain(new File("mark-formatting.xsl"), "format-marker");

 comparator.setExtensionPoint(ExtensionPoint.PRE_FLATTENING, formatMarker);

Note that the equivalent C# code for the .NET API is broadly similar to the Java above and can be viewed in FormattingElementDemo.cs in the sample directory.

The final result for the example files above is as follows:

Example 5: the final result

<document xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
          deltaxml:deltaV2="A!=B"
          deltaxml:version="2.0"
          deltaxml:content-type="full-context">
  <para deltaxml:deltaV2="A=B">
    This paragraph will have <bold>words made bold</bold> in the following version.
  </para>
  <para deltaxml:deltaV2="A!=B">
    In this sentence, <bold deltaxml:deltaV2="A!=B">new 
    <deltaxml:textGroup deltaxml:deltaV2="B">
      <deltaxml:text deltaxml:deltaV2="B">bold </deltaxml:text>
    </deltaxml:textGroup>
    words</bold> will be added and some made bold.
  </para>
</document>

Note that the first paragraph is marked as unchanged because there have been no textual changes. The second paragraph shows the new word added within the newly bolded text.

3. Summary

  • Changes to formatting elements are shown as a structural change unless they are marked.
  • Formatting elements can be marked using the deltaxml:format='true' attribute.
  • The result is a delta file that focuses on textual change.

4. Running the sample

The sample can be run from the FormattingElements sample directory included in the DeltaXML Core distribution.

4.1. DCP Sample

If you have Apache Ant installed, use the build script provided to run the sample. Simply type the following command to run the pipeline and produce the output files result/DCP/no-format-result.xml and result/DCP/format-result.xml.

ant run-dcp

This sample uses Ant to drive DeltaXML Core's built-in command-line interface along with the Document Comparator Pipelines (DCP) configuration file: formatting-elements.dcp

4.2. Java Sample

If you have Ant installed, use the build script provided to run the sample. Simply type the following command to run the pipeline and produce the output files no-format-result.xml and format-result.xml.

ant run-dc

To clean up the sample directory, run the following command in Ant.

ant clean

4.3. C# Sample

We provide a Visual Studio solution (.sln) file for the C# sample in the dotnet-api directory, and a Visual Studio project (.csproj) file may be found within the sample directory FormattingElementDemo.

Alternatively, the sample can be built and run without Visual Studio by running the rundemo.bat batch file - either from the command-line, or by double-clicking on it in the Windows File Explorer.