Loading login details...

Guide to writing Java XML Filters for DeltaXML

Table of Contents

List of Examples

Preface

Abstract

This tutorial focusses on writing Java XML filters to be used to pre-process and post-processing of data within a DeltaXML pipeline. It presents the SAX XMLFilter interface and how this can be used within Java XML processing programs. It then explores how Java XML Filters can be used with DeltaXML and presents some guidelines for writing such filters

Chapter 1. Introduction

In this tutorial we will step through the use of the SAX API for XML processing's facilities for crating Java XML Filter objects. These are objects that can filter (or process) data within a pipeline of filters. This is a standard API, provided as part of the Java envrionment since Java 2 SDK 1.4.0. and can be used in a wide range of XML processing applications. In this tutorial we will focus on its use with the DeltaXML toolset (in particular we will focus on its use as part of a DeltaXML comparison pipeline).

In the remainder of this tutorial we will first introduce the concept of Java XML filters, examine how a generic Java XML Filter can be written, explore how such filtrs can work with DeltaXML and resent some guidelines on writing Java XML Filters. We conclude by illustrating how to mix Java and XSL based filters within a DeltaXML comparison pipeling.

Chapter 2. Java XML Filters

2.1. The XMLFilter interface

The XMLFilter interface is part of the SAX API for processing XML documents, although it's generally ignored by many Java XML tutorials or overlooked by most Java developers. An XMLFilter is a sub-interface of the XMLReader class, as such it is very like the XMLReader except that it obtains its events from another XML reader rather than a primary source like an XML document, file or database. As such, it is a primary component within the JAXP pipeline architecture. That is, it can sit within a pipeline, receiving XML data from another XML processing element and passing its data onto another XML processing element if required.

Thus XML Filters can modify a stream of (XML) events as they pass on to the final application. Indeed, this is the primary construct in the event-driven, serial-access, pipelining architecture described earlier.

Assuming you have a distribution of SAX, or a version of Java containing the XML APIs and look at the included classes; the one you want is org.xml.sax.XMLFilter. You should also ensure that you have the SAX helper classes, found in the org.xml.sax.helpers package. In that package, you will want to focus on the org.xml.sax.helpers.XMLFilterImpl class.

The XMLFilter interface and the XMLFilterImpl implementation of that interface pair add up to powerful filtering for any SAX-based applications and provide a major architectural advantage in using DeltaXML itself.

2.2. The XMLFilter interface methods

If you examine the XMLFilter, you'll find that it extends the org.xml.sax.XMLReader interface, and adds two new methods:

  1. public void setParent(XMLReader parent); This method allows the application to link the filter to a parent reader (which may be another filter). The argument may not be null.

  2. public XMLReader getParent(); This method allows the application to query the parent reader (which may be another filter). It is generally a bad idea to perform any operations on the parent reader directly: they should all pass through this filter.

This probably doesn't look like much; of course, you also get all the other XMLReader methods such as startElement(), endElement(), etc. In each of these methods, you can operate upon the input XML data before an application gets to it. In the case of DeltaXML this is extended such that this can be done before and after DeltaXML processes the comparison.

To put this in perspective, note that application code doesn't start to work on the XML data from a SAX parse until after that parsing has completed. However, you can insert an XMLFilter into the processing chain before this completion, meaning you get to modify data before the application gets that data (and, for example, outputs it). Since you have all of the SAX callback methods that an XMLReader does, you can work with the elements, the attributes, the prefix mappings, and anything else that SAX can work with.

2.3. XMLFilter methods to override

The following methods can be overridden to perform custom actions when the associated parser events occur. Always be sure to call super.method if the event is to be passed on to the transformer.

  1. public void startDocument() throws SAXException

    • This event occurs at the start of a document.

  2. public void endDocument() throws SAXException

    • This event occurs at the end of a document.

  3. public void startPrefixMapping(String prefix, String uri) throws SAXException

    • This event occurs when a namespace declaration occurs. It will occur before the startElement event for the element where the namespace comes into scope.

  4. public void endPrefixMapping(String prefix) throws SAXException

    • This event occurs when a declared prefix goes out of scope. It will occur after the endElement event for the element where the namespace goes out of scope.

  5. public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException

    • This event occurs when a start element tag is encountered.

  6. public void endElement(String uri, String localName, String qName) throws SAXException

    • This event occur when an end element tag is encountered.

  7. public void characters(char[] ch, int start, int length) throws SAXException

    • This event occurs when a text node is encountered. This includes inter-element whitespace, except where a DTD causes it to be treated as ignorable whitespace.

  8. public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException

    • This event occurs when a DTD is present and inter-element whitespace is encountered.

  9. public void processingInstruction(String target, String data) throws SAXException

    • This event occurs when a processing instruction is encountered.

Note: The Attributes object in startElement() is the org.xml.sax.Attributes interface. The implementation object is org.xml.sax.helpers.AttributesImpl.

2.4. Working with attributes

As mentioned above, element attributes are accessed through the Attributes object atts in the startElement method. This object can be used as it is if read-only access is required on the attributes. However, to edit attributes, the Attributes object must be converted to an AttributesImpl object. This cannot be performed using explicit casting, instead either pass the Attributes object to the constructor of AttributesImpl, for example:

AttributesImpl attImpl = new AttributesImpl(atts);

or pass it to the setAttributes method on AttributesImpl, for example:

attImpl.setAttributes(atts);

Once the Attributes have been converted to an AttributesImpl object, you can now use methods such as addAttribute, setAttribute or removeAttribute to edit the list.

To determine the presence of a certain attribute within an Attributes object, the getIndex method can be used. The attribute can be referenced by either its qName or by both uri and localName, for example:

int index = atts.getIndex("foo:bar");

or

int index = atts.getIndex("http://foobar.com", "bar");

The returned int is either the index of the specified attribute, or -1 if it is not present.Similarly, the getValue method can be used. This returns either a String representing the value of the specified attribute, or null if it is not present.

Chapter 3. Implementing Java XML Filters

In this section we will look at how you can write your own Java XML Filter classes. To do this we will create a very simple application that we will then develop on and use with DeltaXML in the next section of the tutorial.

3.1. A simple filter

Let's look at a concrete example of an XMLFilter, so that you can start to get an idea of what must be implemented to create your own filter. The following listing shows a very simple SAX filter that changes all postalcode elements into postcode elements. This effectively pre-processes the input document to modify one element, while allowing all other elements to pass through unchanged. This filter is presented in the following listing.

Example 3.1. A Simple XMLFilter class

import org.xml.sax.Attributes; 
import org.xml.sax.SAXException; 
import org.xml.sax.helpers.XMLFilterImpl; 
 
public class SimpleXMLFilter extends XMLFilterImpl { 
 
    public void startElement(String uri,  
                             String localName,  
                             String qName, 
                             Attributes atts) throws SAXException { 
        // Map postalcode to postcode 
       if ((localName.equals("postcalcode")) ||
           (qName.equals("postalcode"))) { 
              System.out.println("SXF: " + qName); 
              qName = "postcode"; 
              localName = "postcode"; 
       } 
       // Delegate on to inherited behaviour 
       super.startElement(uri, localName, qName, atts); 
    } 
 
    public void endElement(String uri, String localName, String qName) 
            throws SAXException { 
       // Map postalcode to postcode 
       if ((localName.equals("postcalcode")) ||
           (qName.equals("postalcode"))) { 
              System.out.println("SXF: " + qName); 
              qName = "postcode"; 
              localName = "postcode"; 
       } 
       super.endElement(uri, localName, qName); 
    } 
 
}

To be a Java XML filter a class must either implement the XMLFilter interface or extend the XMLFilterImpl class. In our case we are extending the XMLFilterImpl class form the org.xml.sax.helpers package as this means that we only need to implement those methods that will actually do something, (all other methods required by the XMLFilter interface are provided by inheritance form the XMLFilterImpl class). That is, XMLFilterImpl provides a default version of all the required methods, allowing us to implement (override) only those methods that have customized behavior. This keeps our code cleaner, and requires less work on our part.

We now free to implement only the methods startElement and endElement (be careful to make sure that the method signatures are the same as those in the XMLReader interface, otherwise you will be overloading the methods rather than overriding them, which will mean that your code will not be called).

Our startElement and endElement methods change the localName and the qName (qualified name) of the element to potstcode if the element postalcode is found. Otherwise they just pass the data through unaltered.

You may wonder why we set both values. This has to do with rules regarding SAX processing and the state of the following two SAXParserFactory properties:

http://xml.org/sax/features/namespaces and the

http://xml.org/sax/features/namespace-prefixes properties:

Essentially, these rules say that:

  1. the Namespace URI and local name are required when the namespaces property is true (the default), and are optional when the namespaces property is false (if one is specified, both must be);

  2. the qualified name is required when the namespace-prefixes property is true, and is optional when the namespace-prefixes property is false (the default).

To handle these situations we are setting both parameters to the new element name. This is also why we test both parameters.

The key thing to remember here is that the filter receives the XML data before it is passed onto the next thing in the XML processing pipeline. Thus, the output of this filter becomes the input of the next thing in the pipeline. In turn the next thing in the pipeline views the data it receives as its input (i.e. It never sees the original data). Thus if we change the element postalcode to postcode, then the next processor in the pipeline only sees the element postcode.

The key to understanding how this is achieved in a simple application with a single filter and a content handler is that your XMLFilter receives the data before the ContentHandler. If you want to process that data you can, if you want to pass it onto the content handler then you can do so by calling the parent classes version (the one defined in the XMLFilterImpl class) which will do just that. In fact, all of the default methods in XMLFilterImpl do just that. To avoid the data being seen by the reader at all, simply avoid delegating to the reader's methods.

However, a word of warning if you use a filter to remove some elements form the data input to the next element in the pipeline. It is all too easy to pollute the data being sent on. For example, consider the case where you don't delegate in the startElement() method for certain data, but forget to do the same in endElement(). The result would be that some elements would never be reported as starting, but would be reported to the reader as ending. This would cause, in the best case, program errors, and in the worst case, data loss or corruption in your application.

Note

In the SimpleXMLFilter class presented earlier we ignored namespaces. It should be noted that the code as it stands will remove a namespaced postalcode element from it's namespace as well as renaming it to postcode. e.g both <deltaxml:postalcode/> and <postalcode/> would be output as <postcode/>. We can overcome this limitation by modifying the behaviour of the code, for exmaple we could change the startElement method thus:

public void startElement(String uri, 
                         String localName, 
                         String qName, 
                         Attributes atts) throws SAXException { 
    // Map postalcode to postcode 
    if (localName.equals("postalcode")) { 
        System.out.println("SXF: " + localName); 
        // Now check to see if a name space is being used 
        int index=  qName.indexOf(":"); 
        if (index != -1) { 
            // If it is then we want to leave the name space as is  
            qName= qName.substring(0, index) + "postcode"; 
        } else {                 
            qName = "postcode"; 
        } 
        localName = "postcode"; 
    } 
    // Delegate on to inherited behaviour 
    super.startElement(uri, localName, qName, atts); 
}

This version of the startElement method will retain any namespace information provided. However, it is slightly more complex and thus we will leave the startElement method as it is.

3.2. A sample ContentHandler

In order to give our SimpleXMLFilter class some meaning, let us look at a simple XML content handler. In this simple example, the following class represents our application. It is this class which will receive the output generated by our filter.

Example 3.2. The TestHandler content handler

import org.xml.sax.Attributes; 
import org.xml.sax.SAXException; 
import org.xml.sax.helpers.DefaultHandler; 
 
public class TestHandler extends DefaultHandler { 
 
    public void startElement(String uri, 
                             String localName, 
                             String qName, 
                             Attributes atts) 
                                  throws SAXException { 
        println("<" + qName + ">"); 
    } 
 
    public void endElement(String uri, 
                           String localName, 
                           String qName) 
                                  throws SAXException { 
        println("</" + qName + ">"); 
    } 
 
    private void println(String s) throws SAXException { 
        System.out.println("   " + s); 
    } 
 
}

All that this class does is to printout the XML tags it receives, as an echo of its input. This will allow us to see what data it receives. It is a very simple class that merely extends the org.xml.sax.helpers.DefaultHandler class which in turn implements the org.xml.sax.ContentHandler interface. This allows it to be used within the XML processing chain.

3.3. Setting up a processing chain

Once you have your filter set up and compiled, you need to create a pipeline for processing your XML. This should move from input document to filter to reader. You may even have multiple filters, stacked upon each other. As long as input comes first, and your reader (with application-specific callbacks) comes last, things work fine. However, you may have a particular order for your filters, and you should pay attention to that closely. The following listing shows how to set up your program for using filters in the general case (we will look at how much easier it is with DeltaXML in the next chapter).

Example 3.3. Setting up the processing pipeline

import java.io.IOException; 
 
import javax.xml.parsers.ParserConfigurationException; 
import javax.xml.parsers.SAXParser; 
import javax.xml.parsers.SAXParserFactory; 
 
import org.xml.sax.InputSource; 
import org.xml.sax.SAXException; 
import org.xml.sax.XMLReader; 
 
public class XMLFilterTest  { 
 
    public static void main(String[] args) 
                          throws IOException, 
                                 ParserConfigurationException, 
                                 SAXException { 
        SAXParserFactory spf = SAXParserFactory.newInstance(); 
        SAXParser parser = spf.newSAXParser(); 
        XMLReader reader = parser.getXMLReader(); 
        SimpleXMLFilter filter = new SimpleXMLFilter(); 
 
        filter.setParent(reader); 
 
        TestHandler handler = new TestHandler(); 
        filter.setContentHandler(handler); 
        filter.setErrorHandler(handler); 
 
        InputSource inputSource = new InputSource("test.xml"); 
        filter.parse(inputSource); 
    } 
 
}

Notice that because the one or more filters must sit between input source and the reader, all the operations that you would normally invoke on the reader are invoked on the filter. It then delegates any data that passes through the filter to the reader, as you saw in SimpleXMLFilter. If this is not 100% clear to you do not worry as DeltaXML will make it very easy for you to create a pipeline without needing to get down to this level.

3.4. Running the processing chain

We are now in a position to execute our filtered pipeline. We will run this on the content of the XML file test.xml. The content of this file is presented below:

Example 3.4. The text.xml data file

<AddressList> 
 <person> 
  <name>John Smith</name> 
  <street>10 Grays Inn Road</street> 
  <city>London</city> 
  <postalcode>WC1X 8TX</postalcode> 
 </person> 
</AddressList>

This is a very simple XML file containing some basic address list like data. Note that the element holding the PCDATA WC1X 8TZ is called postalcode.

We run the main method of the XMLFilterTester class to process this XML file using our new filter . The result of running this class is presented below:

   <AddressList> 
   <person> 
   <name> 
   </name> 
   <street> 
   </street> 
   <city> 
   </city> 
SXF: postalcode 
   <postcode> 
SXF: postalcode 
   </postcode> 
   </person> 
   </AddressList>

In the above output, the printouts form the SimpleXMLFilter are prefixed by SXF: and the printouts form the TestHandler are prefixed by three spaces. Thus you can easily see the output generated by each. From this, it is clear that the SimpleXMLFilter ignores all XML except for the element postalcode which it converts into the element postcode. Thus the TestHandler knows nothing about the element postalcode as it never receives it, and thus the data has been filtered before being received by the TestHandler.

Chapter 4. Using Java XML Filters with DeltaXML

4.1. DeltaXML and Java Filters

Writing filters to work with DeltaXML is exactly like writing any Java XML filter. Thus we can use the filter we just created to process two XML files before they are compared by DeltaXML.

This might be useful if the Schema or DTD used for the XML file has changed, such that the element is now called postcode in new or modified file. This might happen due to some changing requirements imposed by, for example, third parties.

As the incorporation of filters into the processing pipeline, both before and after, execution of DeltaXML is so important, DeltaXML provides a convenience class which greatly simplifies the incorporation of filters (whether Java based or XSL based). This class is the PipelinedComparator class in the com.deltaxml.core package. This class allows a list of input and output filters to be specified and then a comparison performed. If the filters are implemented in Java, as SimpleXMLFilter is, then the class object is passed to the PipelinedComparator. A class object can be obtained in a number of ways, for example, by using the classClass and the moethod forName:

Class.forName("com.foo.EchoFilter");

or by using the .class extension on the name:

com.foo.EchoFilter.class

This is illustrated in the sample programs presented in this chapter.

4.2. Using a Java XML Filter

In this section we will take the generic Java XML Filter written in the last chapter and use it as an input filter for DeltaXML. Thus all data to be comapred will first pass through the SimpleXMLFilter before being passed to DeltaXML.

The following program illustrates the use of the PipelinedComparator with the SimpleXMLFilter presented earlier.

Example 4.1. Using SimpleXMLFilter with DeltaXML

import java.io.File; 
 
import com.deltaxml.core.PipelinedComparator; 
import com.deltaxml.core.PipelinedComparatorException;
 
public class PCXMLFilterTest { 
 
  public static void main(String[] args)  
    throws PipelinedComparatorException
  { 
    PipelinedComparator pc= new PipelinedComparator(); 
    // Set up the input filter 
    pc.setInputFilters(new Class[]{SimpleXMLFilter.class}); 
    // Now run the DeltaXML comparison 
    pc.compare(new File("test.xml"),  
            new File("new-test.xml"),  
            new File("changes.xml"));    
  } 
}
        

Notice that we create a new instance of the PipelinedComparator class and set an array of input filters on this class (you can also set an array of output filters as well). Once this is done we call the compare method with the two files to compare and the output file for the delta.

We will reuse the test.xml file from the previous example, but create a new-test.xml file with the following content:

Example 4.2. The new-test.xml file

<AddressList> 
 <person> 
  <name>John Smith</name> 
  <street>12 Grays Inn Road</street> 
  <city>London</city> 
  <postcode>WC1X 8TX</postcode> 
 </person> 
</AddressList>

As you can see form this file, this XML document has an element postcode. It also has changes to the element street's PCDATA. Although the element <postcode> is not present in the original file, the input filter will convert <postalcode> to be the element <postcode> and thus allow the two files to be compared appropriately. Running the PCXMLFilterTest generates the following delta file:

Example 4.3. Initial Delta File for test.xml and new-test.xml

<?xml version="1.0" encoding="UTF-8"?>
<AddressList 
    xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
    deltaxml:delta="WFmodify"> 
 <person deltaxml:delta="WFmodify"> 
  <name deltaxml:delta="unchanged"/> 
  <street deltaxml:delta="WFmodify">
    <deltaxml:PCDATAmodify>
      <deltaxml:PCDATAold>
        10 Grays Inn Road
      </deltaxml:PCDATAold>
      <deltaxml:PCDATAnew>
        12 Grays Inn Road
      </deltaxml:PCDATAnew>
    </deltaxml:PCDATAmodify>
  </street> 
  <city deltaxml:delta="unchanged"/> 
  <postcode deltaxml:delta="unchanged"/> 
 </person> 
</AddressList>

Which illustrates that only the <street> elements' PCDATA value have changed - it did not see the change from <postalcode> to <postcode>.

4.3. Using Multiple Java XML Filters

We can improve on the example presented in the last section by using the Word-by-Word filters provided as examples with the DeltaXML distribution. These can be found under the src and are in the package com.deltaxml.pipe.filters. There are three filters, an input filter called WordByWordInfilter, and two output filters called WordByWordOutfilter1 and WordByWordOutfilter2. Output filters can be set on the PipelinedComparator class usinf the setOutputFilters method. The modified program, that exploirts these filters, looks like this:

Example 4.4. Multiple filters for DeltaXML

import java.io.File; 

import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
import com.deltaxml.pipe.filters.WordByWordInfilter; 
import com.deltaxml.pipe.filters.WordByWordOutfilter1; 
import com.deltaxml.pipe.filters.WordByWordOutfilter2; 
 
 
public class PCXMLFilterTest2 { 
 
  public static void main(String[] args) 
    throws PipelinedComparatorException 
  { 
    PipelinedComparator pc= new PipelinedComparator(); 
    // Set up the input filter 
    pc.setInputFilters(new Class[]{SimpleXMLFilter.class, 
                                   WordByWordInfilter.class}); 
    // Now run the DeltaXML comparison 
    pc.setOutputFilters(new Class[] {WordByWordOutfilter1.class, 
                                     WordByWordOutfilter2.class}); 
    pc.compare(new File("test.xml"),  
               new File("new-test.xml"),  
               new File("diff.xml"));    
  } 
}
          

As you can see form this PCXMLFilterTest2 class, it only differs from PXCMLFilterTest class in having two input filers listed and two output filters. All these filters are implemented in Java class that implement the XMLFilter interface.

The result of running this program on the two XML files is presented below:

Example 4.5. A Revised delta file

<?xml version="1.0" encoding="UTF-8"?> 
<AddressList 
      xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" 
      deltaxml:delta="WFmodify">
   <person deltaxml:delta="WFmodify"> 
      <name deltaxml:delta="unchanged"/> 
      <street deltaxml:delta="WFmodify">
        <deltaxml:PCDATAmodify>
          <deltaxml:PCDATAold>
            10
          </deltaxml:PCDATAold>
          <deltaxml:PCDATAnew>
            12
          </deltaxml:PCDATAnew>
        </deltaxml:PCDATAmodify>  
        Grays Inn Road 
      </street> 
    <city deltaxml:delta="unchanged"/> 
    <postcode deltaxml:delta="unchanged"/> 
  </person> 
</AddressList>

This has now refined the delat produced such that we can see that the only value that has changed is that the number 10 has been changed to the number 12.

We could go on and define further filters to provide whatever pre and post processing requirements we may have. However, these examples illustrate the way in which you can implement your own filters and then use then as either pre or post processor with DeltaXML.

If you wish to get under the hood of the PipelinedComparator and create your own pipelines and call the DeltaXML compare API directly, then you will need to look at the Building Pipelines tutorial.

Chapter 5. Java XML Filter writing guidelines

The guidelines for writing a Java XML Filter presented in this section are not necessarily specific to DeltaXML, and are really guidelines for all Java XML Filters. However, they have particular resonance with DeltaXML and are presented here to help you create your own XML filters.

5.1. Storing data

Sometimes it may be necessary to store some of the data that has been parsed. This could be for various reasons, you may need to store details of a start element as its output depends on something that may come later in the document. You may need to store the current value of an attribute as it changes the way that child elements are output etc. If too much data is being stored and a large amount of memory is taken up, then the advantage of using a serial-access method of processing is lost; it may be more sensible to use another method such as DOM or XSL stylesheets.

5.1.1. Attribute values

If it is necessary to know the value of an attribute of a parent element when processing another element, it can be useful to use a Stack to store this. Suppose an XML document stores text which must be output either in upper-case or lower-case depending on an attribute value in the text node's parent element.

<root caps="false">
   <elem1 caps="true">
      This should be output in upper-case
      <elem1child caps="false">
         But this should be lower-case
      </elem1child>
      <elem1child2>
         No value was specified, should be upper-case because of parent
      </elem1child2>
      This should be upper-case
   </elem1>
   But this is in the root element and so should be lower-case
</root>

The characters to output will be accessible from the characters method. However, when this method is called, we have no access to the Attributes object that was passed by any of the startElement methods. The value of the caps attribute should be pushed onto a Stack at the beginning of every startElement method and popped at the end of every endElement method. If the attribute is not set in a particular element (as in elem1child2 above), the value currently at the top of the Stack should be pushed on again. This would cause problems if the root element didn't have a value specified, as peeking at the Stack would throw an EmptyStackException. In this case, a default value should be pushed onto the Stack. Example code for this can be seen below:

import org.xml.sax.helpers.XMLFilterImpl;
import org.xml.sax.*;
import java.util.Stack;
import java.util.EmptyStackException;

public class RandomClass extends XMLFilterImpl {
   Stack capsVal = new Stack();
   String defaultValue = "false";

   public void startElement(String uri, 
                            String localName, 
                            String qName, 
                            Attributes atts) 
                                     throws SAXException {
      int index = atts.getIndex("caps");
      if(index != -1) {
         capsVal.push(atts.getValue(index));
      } else {
         String value;
         try {
            value = (String)capsVal.peek();
         } catch(EmptyStackException e) {
            value = defaultValue;
         }
         capsVal.push(value);
      }
      super.startElement(uri, localName, qName, atts);
   }

   public void endElement(String uri, 
                          String localName, 
                          String qName) 
                                   throws SAXException {
      capsVal.pop();
      super.endElement(uri, localName, qName);
   }

   public void characters(char[] ch, int start, int length) 
                                              throws SAXException{
      String output = new String(ch, start, length);
      if(((String)capsVal.peek()).equals("true")) {
         output = output.toUpperCase();
      } else {
         output = output.toLowerCase();
      }
      super.characters(output.toCharArray(), 0, output.length());
   }
}

5.1.2. Elements

If it is necessary to store only one element at a time, this can be achieved using Strings for the uri, localName and qName and an AttributesImpl object for the atts. If many elements need to be stored, it may become necessary to write a custom Element class which could then be pushed onto a Stack.

5.1.3. Attributes

Again with attributes, if it is only necessary to store one attribute at a time, this can be achieved with Strings for the uri, localName, qName and value. If it is necessary to store many attributes, a custom class may need to be written, which again could be stored in a Vector, ArrayList or similar data structure.

5.1.4. Namespace declarations

If it is necessary to declare new namespaces within an XML document, it would be useful to know if that namespace has already been declared earlier in the document as it is invalid XML to declare a namespace that is already in scope. This can be achieved easily using a flag which is set in the startPrefixMapping and endPrefixMapping methods.

boolean foobarPrefixDefined = false; 
public void startPrefixMapping(String prefix, 
                               String uri) 
                                   throws SAXException {
   if(prefix.equals("foobar")) {
      foobarPrefixDefined = true;
   }
   super.startPrefixMapping(prefix, uri);
}

public void endPrefixMapping(String prefix) throws SAXException {
   if(prefix.equals("foobar")) {
      foobarPrefixDefined = false;
   }
   super.endPrefixMapping(prefix);
}

The boolean value can then be tested before declaring the namespace to see if it is necessary or not.

5.2. Removing elements of a certain type

The following example filter would remove any element with the local name "remove" from the XML document. Everything else would be copied to the output document (including any contents of the remove elements).

Example 5.1. Removing elements

import org.xml.sax.helpers.XMLFilterImpl; 
import org.xml.sax.SAXException; 
 
public class ExampleFilter extends XMLFilterImpl { 
 
   public void startElement(String uri, 
                            String localName, 
                            String qName, 
                            Attributes atts)
                                 throws SAXException { 
      if(!localName.equals("remove")) 
         super.startElement(uri, localName, qName, atts); 
      } 
  
      public void endElement(String uri, 
                             String localName, 
                             String qName) 
                                 throws SAXException { 
         if(!localName.equals("remove")) 
            super.endElement(uri, localName, qName);   
      }   
}

5.3. Comments

The XMLFilterImpl does not perform any actions on comments within an XML document. This means that they are removed when passed through the filter. If it is necessary to include the original comments in the output document, the

http://xml.org/sax/properties/lexical-handler property

must be set on the filter's XMLReader with a valid object as the value. The easiest way to do this is to pass the TransformerHandler as an argument as this implements the LexicalHandler interface.

TransformerHandler th = saxFactory.newTransformerHandler();  
filter.getParent().setProperty("http://xml.org/sax/properties/lexical-handler", 
                               th);

5.4. Parsers

5.4.1. Parser Differences

It is important to know how the particular parser implementation you are using handles the different events as there are sometimes differences.

For example, a string of characters ending in a newline character (e.g abcdefg\n) will cause different numbers of character events depending on the parser. If Crimson (the default for JDK 1.4.x) is used, only one character event will occur. However, Xerces-J creates two consecutive events, one for the letters (abcdefg in the example above) and one for the newline character at the end. It is important to be aware of this when writing XMLFilters to be used with Xerces-J.

Individual parser documentation should give you information on these specifications.

5.4.2. Parser Features

There are two features that can be set on the SAXParserFactory that change the information received from a parser event.

  1. http://xml.org/sax/features/namespaces - defaults to true

    • Setting this feature to true (or not setting it at all as the default is true) causes element and attribute names to be reported with a uri, localName and qName. For example the element <foo:bar> where foo is declared as xmlns:foo="http://foobar.com" would be reported with a uri of "http://foobar.com", a localName of "bar" and a qName of "foo:bar".

When the feature is set to false, only the qName is reported. This will cause problems when using Saxon as a transformer as it throws an Exception if it receives any element or attribute with no localName specified.

  1. http://xml.org/sax/features/namespace-prefixes - defaults to false

    • Setting this feature to true causes namespace declarations to be included in the attribute list in the startElement event. They have no uri or localName, a qName of xmlns:prefix and a value of the namespace URI. The lack of localName can cause problems if using Saxon as a transformer because it will throw an Exception if it encounters any elements or attributes without a localName. Therefore, if this feature is set to true and Saxon is being used, the namespace declaration attributes should be removed from the attribute list or have a localName added.

Chapter 6. Combining Java XML Filters and XSL Stylesheet filters

In this tutorial we have illustrated how you can use Java XML Filters with DeltaXML and also how to use XSL Filters with DeltaXML. In fact you can mix the two if you wish. Indeed you can provide a mixed list of Java classes and XSL stylesheets in any order both for input and output purposes.

As an example, consider the following program:

import java.io.File; 
import java.io.FileNotFoundException; 
import java.util.ArrayList; 
import java.util.List; 
 
import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
import com.deltaxml.pipe.filters.WordByWordInfilter; 
import com.deltaxml.pipe.filters.WordByWordOutfilter1; 
import com.deltaxml.pipe.filters.WordByWordOutfilter2; 
 
public class PCXMLFilterTest3 { 
 
  public static void main(String[] args) 
    throws PipelinedComparatorException, 
           FileNotFoundException 
  { 
    PipelinedComparator pc= new PipelinedComparator(); 
    // Set up the input filter 
    List inFilters = new ArrayList(); 
    inFilters.add(new File("normalize-space.xsl")); 
    inFilters.add(SimpleXMLFilter.class); 
    inFilters.add(WordByWordInfilter.class); 
    pc.setInputFilters(inFilters); 
    // Now setup the output filters 
    List outFilters= new ArrayList(); 
    outFilters.add(WordByWordOutfilter1.class); 
    outFilters.add(WordByWordOutfilter2.class); 
    outFilters.add(new File("deltaxml-tables.xsl")); 
    pc.setOutputFilters(outFilters); 
    // Now run the comparison 
    pc.compare(new File("test.xml"),  
               new File("new-test.xml"),  
               new File("diff.html"));    
  } 
}
    

In the above program (PCXMLFilterTest3) we have used three input filters and three output filters. In each case one fo the filters is implemented as an XSL script and the other two are implemented as Java XML Filters. However, DeltaXML does not need to be concerned with the actual implementation, both approaches work as filters. Indeed both the Normalize Space and the Word-By-Word filters are available as XSL files or as Java XML Filters. You could try changing the type used and seeing the results (they should be exactly the same).

The actual result of running this program is that the diff.html file is generated which is presented below:

PIC-filtered-result.gif
Click to enlarge
Delta file generated from heterogeneous filters

Note that in general the Java XML filters are faster and have lower memory overheads and are thus often preferable to their XSL equivalents.