Loading login details...

Guide to DeltaXML Pipeline Configuration

Table of Contents

1. Introduction

1.1. The Benefits of Pipelines

The use of Processing Pipelines allows complex systems for XML processing to be composed from a number of smaller, simpler components. The underlying concepts were initially developed with the SAX parsing and filtering APIs and have subsequently been adopted by XSLT and other standards. For further information and background on JAXP and SAX event pipelining please refer to Powering Pipelines with JAXP, a paper presented at XML 2004.

1.2. What is DXP?

The DXP (Delta Xml Pipelines) language defines a processing pipeline in XML. DXP describes XML processing pipelines to prepare data prior to DeltaXML comparison and to process data after comparison. It is an XML language and can be used by anyone familiar with XML data; it does not require any knowledge of Java programming.

DXP is not a general purpose XML pipelining language[1]; it is optimized for pipelines containing a DeltaXML Comparator. Unlike general purpose pipelining languages there is no mechanism for specifying the location of the two input sources or documents or to specify where the pipeline result will be located or produced. With DXP these are capabilities of the tool into which DXP has been embedded. For example, a GUI tool such as DeltaWing may provide GUI widgets for selecting input files. Another feature specific to DXP is that the two input pipelines into the comparator are identical or 'symetrical': the same set of filters is applied to each of the comparator inputs.

DXP can also be considered a tool extension language, and this is indeed how it is used in the DeltaWing and command.jar applications included in the DeltaXML Core from release 3.1. The ability to embed DXP processing is also available for you to use in your applications. The com.deltaxml.core.DXPConfiguration class is provided to include DXP capabilities in a wide range of Java applications. This will simplify configuration and enable flexibility in the use of the DeltaXML Comparator.

1.3. Summary of DXP

Here is a quick summary of DXP:

2. The Pipeline Model

2.1. An introductory example

This diagram of an example pipeline provides a good introduction to the concepts described in this section.

PIC-dxp-illustration.jpg
A Pictorial Pipeline

At the centre of a pipeline is a comparator (the triangle), the inputs to the comparator are processed by an ordered sequence of one or more input filters (the rectangles) and the comparator output is also fed through a sequence of filters. Any particular filter may be optional, indicated by the bypass arrow in the diagram. This optionality is controlled by a boolean pipeline parameter, named 'detailed' in the example. When 'detailed' has the value false, four of the filters are bypassed.

The diagram also illustrates a comparator feature, called 'full'. For this pipeline a full delta (which includes the unchanged data) is always required, so a literal value of true is always used.

The final filter in the pipeline has a parameter called 'colour1', the value of this parameter affects the HTML/CSS colour used to represent certain types of changes. The user can specify the colour, by setting a pipeline parameter. However, if the user chooses not to do this, then the default parameter colour of green is passed to the filter.

The text of this pipeline is included in the following example. It may not make complete sense at this point, details of the features and concepts will be described in later sections of this document.

Example 1. DXP for example pipeline

<!DOCTYPE comparatorPipeline SYSTEM "dxp.dtd">
<comparatorPipeline id="xhtml" description="XHTML Comparison" >
  <pipelineParameters>
    <booleanParameter name='detailed' defaultValue="true"/>
    <stringParameter name='add-colour' defaultValue="green"/>
  </pipelineParameters>
  <inputFilters>
    <filter>
      <resource name="xhtmli.xsl"/>
    </filter>
    <filter if="detailed">
      <class name="com.deltaxml.pipe.filters.WordByWordInfilter"/>
    </filter>
  </inputFilters>
  <outputFilters>
    <filter if="detailed">
      <class name="com.deltaxml.pipe.filters.WordByWordOutfilter1"/>
    </filter>
    <filter if="detailed">
      <class name="com.deltaxml.pipe.filters.WordByWordOutfilter2"/>
    </filter>
    <filter>
      <resource name="xhtmlo.xsl"/>
      <parameter name="colour1" parameterRef="add-colour"/>
    </filter>
  </outputFilters>
  <comparatorFeatures>
    <feature name="http://deltaxml.com/api/feature/isFullDelta" literalValue="true"/>
  </comparatorFeatures>
</comparatorPipeline>

2.2. Chains of Filters

The two elements inputFilters and outputFilters specify a chain of filters within a comparatorPipeline.

Example 2. DXP Grammar for Pipelines and Filters

<!ELEMENT comparatorPipeline (fullDescription?, pipelineParameters?, 
         inputFilters?, outputFilters?, 
         outputProperties?,outputFileExtension?,
         parserFeatures?, comparatorFeatures?)>
<!ATTLIST comparatorPipeline
          id CDATA #REQUIRED
          description CDATA #REQUIRED>
<!ELEMENT inputFilters (filter+)>
<!ELEMENT outputFilters (filter+)>

Both of the XML inputs to a Comparison are passed through a list of input filters. These filters can add, remove or change information as data passes through them. Each filter operates by modifying a Stream of SAX events (or callbacks to an SAX ContentHandler). The operation of these filters can be defined using Java or XSLT.

Similarly a sequence of filters can be applied to the output of the comparator. These filters could be designed to operate in conjunction with certain input filters (e.g. Word-by-Word or XHTML) or be stand-alone filters to clean up the output or generate a report showing changes.

2.3. Pipeline Parameters

Often the operation of a pipeline should be influenced by the user. Rather than construct similar pipeline definitons, it may be more convenient and better practice to parameterize the pipeline.

Example 3. DXP Grammar for Pipeline Parameters

<!ELEMENT pipelineParameters (booleanParameter | stringParameter)+>
<!ELEMENT booleanParameter EMPTY>
<!ATTLIST booleanParameter 
          name CDATA #REQUIRED
          defaultValue (true|false) #REQUIRED>
<!ELEMENT stringParameter EMPTY>
<!ATTLIST stringParameter
          name CDATA #REQUIRED
          defaultValue CDATA #REQUIRED>

Here are some examples of how pipeline paramaters could be used:

2.3.1. Pipeline Parameters

Parameters of a pipeline are similar to the formal parameters of a programming language method or function.

These formal parameters allow the enviroment or system which is running the pipeline to query their values/setting from the user and then pass them to the pipeline. The application invoking the pipeline can give the user information about the parameters and/or a means to specify their values, for example in a GUI application a set of widgets such as tick-boxes and text areas.

2.3.2. Parameter Types

Two types of parameter are supported, boolean parameters and string parameters. They need to be defined with a default value, for the case when the user does not specify their values. Using our previous example the first part of a pipeline definiton make look like this:

Example 4. Parameter example

<comparatorPipeline description="Differences Report" id="diffrep">
  <pipelineParameters>
    <booleanParameter name="normalize_whitespace" defaultValue="false"/>
    <stringParameter name="delete_colour" defaultValue="red"/>
    <stringParameter name="add_colour" defaultValue="green"/>
  </pipelineParameters>

Our model of parameters here is much simpler than, for example, that provided by XSLT processors which often allow Java objects to be passed as parameters and then converted into appropriate XSLT types.

2.3.3. Use of parameters

Uses of the parameters will be introduced later in the document, but a brief list of their uses includes:

2.4. Filters

A filter is a component in a pipeline which processes the data in some way.

Example 5. DXP Grammar for Filters

<!ELEMENT filter ((class | resource | http | file), parameter*) >
<!ATTLIST filter
          if CDATA #IMPLIED
          unless CDATA #IMPLIED>

<!ELEMENT class EMPTY>
<!ATTLIST class name CDATA #REQUIRED>

<!ELEMENT resource EMPTY>
<!ATTLIST resource name CDATA #REQUIRED>

<!ELEMENT http EMPTY>
<!ATTLIST http url CDATA #REQUIRED>

<!ELEMENT file EMPTY>
<!ATTLIST file path CDATA #REQUIRED>

Input and output filters can be implemented using XSLT or Java. The use of Java for output filtering is facilitated by the use of the XMLOutputFilter class and associated adapters provided in the DeltaXML Core API. These supplant the JAXP mechansism and are described in more detail in Powering Pipelines with JAXP.

2.4.1. Java filters

A Java filter is one which implements the org.xml.sax.XMLFilter interface, typically by extending the XMLFilterImpl class. It is used in compiled form. The associated class file must be available to the classloader of the application. To use a Java filter its fully qualified class is specified as in the follwing example . This example demonstrates the use of one of the filters included in the deltaxml.jar file included in the release.

Example 6. Using a Java filter

<filter>
  <class name="com.deltaxml.pipe.filters.WordByWordInfilter"/>
</filter>

2.4.2. XSLT filters

There are a number of ways to locate an XSLT filter, including:

HTTP URL support is based on the java.net.URL class. The following example shows how a filter can be addressed using a URL.

Example 7. Referring to an XSLT filter by HTTP URL

<filter>
  <http url="http://www.example.com/samples/filter.xsl"/>
</filter>

Files can also be used to specify XSLT filter locations. Relative file specifications should be avoided as the current working directory could easily be changed when invoking a DXP compatible tool. The underlying support for this type of filter specification is based on the java.io.File class and any file specifications should be compatible with the pathnames used with this Java class. See the following for an exmaple

Example 8. Referring to an XSLT filter by File location

<filter>
  <file path="/usr/local/deltaxml/DeltaXMLCore-3_0/samples/xsl-filters/pi2xml.xsl"/>
</filter>

The final way of locating XSLT scripts is the resource mechanism. This allows XSLT files to be located on the classpath, and in particular in .jar files. The path used is the location of the XSLT script within the jar file, and more precisely is the path used as an argument to the ClassLoader.getResource(String) method.

This mechanism is provided so that you can deliver, to an end-user, a single jar file containing both code and data for one or more DXP pipeline. See the following for an example of referring to a filter located in a jar file.

Example 9. Referring to an XSLT filters inside a Jar File

<filter>
  <resource path="/xsl/deltaxml-folding-html.xsl"/>
</filter>

2.4.3. Filter Parameters

The operation of a filter may be controlled by parameters passed to the filter.

Example 10. DXP Grammar for Filter Parameters

<!ELEMENT filter ((class | resource | http | file), parameter*) >

<!ELEMENT parameter EMPTY>
<!ATTLIST parameter
          name CDATA #REQUIRED
          parameterRef CDATA #IMPLIED
          literalValue CDATA #IMPLIED>

The parameter values may come from a number of sources including:

When an XSLT filter is being used any parameters should be declared using the <xsl:param> element in XSLT.

To supply parameters to Java filters a parameter setting, or set method, should be provided. This method must conform to certain requirements, its name must be the string set followed by the exact DXP parameter name string. It should also take a single boolean or String parameter.

Please consult the sample filters and pipelines provided in the release for examples.

The following example gives some examples of legal and illegal parameter use. Note that providing both literal and pipeline parameter ref attributes in the parameter element is disallowed.

Example 11. Examples of Filter Parameters

<filter>
  <class name="com.deltaxml.pipe.filters.PreserveWhitespace"/>
  <parameter name="preserve-mixed"
             parameterRef="preserve-ws"/>  <!-- legal, refers to a formal parameter 
                                                of the pipeline -->
  <parameter name="remove-non-mixed-ws"
             literalValue="yes"/>          <!-- legal, a literal value -->
  <parameter name="normalize-attrs"
             literalValue="yes"
             parameterRef="preserve-ws"/>  <!-- illegal: cannot use both literal and 
                                                formal together-->
</filter>

2.4.4. Filter Optionality

Boolean pipeline parameters can also be used to control the operation or bypassing of certain pipeline stages. For example to avoid any normalization of input whitespace we could simply remove a normalization filter from the list of input filters.

Example 12. DXP Grammar for Filter Optionality

<!ELEMENT filter ((class | resource | http | file), parameter*) >
<!ATTLIST filter
          if CDATA #IMPLIED
          unless CDATA #IMPLIED>

Two attributes, if and unless may be added to any pipeline stage. Their values should refer to one boolean formal parameter by name. In the case of the if attribute, when the associated parameter is true then the filter is applied. Conversely, the unless attribute applies the filter when the referenced parameter is false. If both pipeline control parameters are used (and hopefully refer to different parameters!) the application of the pipeline stage is determined by the boolean-and of both conditions.

The following example shows how the application of an input filter can be controlled by a pipeline parameter.

Example 13. Filter Optionality example

<comparatorPipeline description="Differences Report" id="diffreport">
  <pipelineParameters>
    <booleanParameter name="normalize_whitespace" defaultValue="false"/>
    ...
  </pipelineParameters>
  <inputFilters>
    <filter if="normalize_whitespace">
      <class name="com.deltaxml.pipe.filters.NormalizeSpace"/>
    </filter>
    ...
  </inputFilters>
  ...
</comparatorPipeline>

2.5. Other features

This section describes some other aspects of a pipeline which can be configured or parameterized.

2.5.1. Parser Features

Parser features provide control of the XML parsers used to read the input data. The supported features are those provided by the PipelinedComparator.setParsetFeasture(String, boolean) method which can include standard JAXP/SAX features or parser specific features. Some example feature settings are show in the following example.

Example 14. Parser features example

<parserFeatures>
  <feature name="http://xml.org/sax/features/validation"  parameterRef="validate-inputs"/>
  <feature name="http://apache.org/xml/features/validation/schema" literalValue="true"/>
</parserFeatures>

2.5.2. Comparator Features

Comparator features control the features of the DeltaXML Core comparator, e.g. to select between full-context delta ouptut or a minimal, changes-only delta.

Example 15. Comparator features example

<comparatorFeatures>
  <feature name="http://deltaxml.com/api/feature/isFullDelta"  literalValue="true"/>
</comparatorFeatures>

2.5.3. Output properties

Output properties control the operation of the serializer which is responsible for generating the textual XML (or HTML depending upon the filters used) results. In DXP, output properties are string values. Some examples, including one specific to the use of Saxon, are demonstrated in the following example.

Example 16. Output properties example

<outputProperties>
  <property name="indent"  literalValue="true"/>
  <property name="{http://saxon.sf.net}indent-spaces"  literalValue="2"/>
  <property name="doctype-public"  literalValue="-//W3C//DTD SVG 1.1//EN"/>
  <property name="doctype-system" 
            literalValue="http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"/>
</comparatorFeatures>

2.5.4. Output file extension

The outputFileExtension element provides a hint for how an application should handle the pipeline results. Depending upon whether the final filter in the output pipeline is producing XML or HTML output, the tool may need to take different actions. This element provides a mechanism for tools that use DXP to determine the output data type. For example, the DeltaWing 2.0 application has different settings (user preferences) for displaying raw XML output and HTML output. The following is an example of XHTML generation.

Example 17. Output file extension example

<outputFileExtension extension="xhtml"/>

2.5.5. Descriptions and Ids

There are some final housekeeping attributes and elements needed on a pipeline in order for it to be embedded in an application.

2.6. Differences between the DXP and PipelinedComparator Models

These share common roots and a similar processing model, but there are some differences between DXP and the PipelinedComparator java class. Some of these include:

3. Using DXP

3.1. How to customize DXP pipelines

A number of DXP files are included in the samples/dxp directory included in the DeltaXML Core releases.

A tool may, in addition to inbuilt DXP files, provide mechanisms for locating and using 'extension' DXP files, for example, looking in certain directories for files with a .dxp extension. In this way a tool becomes user-extensible, and DeltaWing is an example of this.

The precise details of tool extensibilibility should be documented by the respective tools, including details of any override mechanisms, based on ids or other mechanisms.

3.1.1. How to write DXP

The code which reads and processes DXP files requires them to be valid. We strongly suggest that all DXP files should refer to the DXP DTD included as samples/dxp/dxp.dtd in the DeltaXML Core releases, but also in other locations such as being embedded in .jar files. In order to ensure validity we would suggest the use of XML editors which can process DTDs and ensure XML file validity.

4. Future Developments/Directions

Please contact us with any comments, bug-reports or suggestions about the current DXP language/system or our future plans and enhancements. Any input would be most welcome.

[1] Integration with more general purpose pipelining languages and systems may be considered for future releases