Guide to writing Java XML Filters for DeltaXML
Table of Contents
- Preface
- 1. Introduction
- 2. Java XML Filters
- 2.1. The XMLFilter interface
- 2.2. The XMLFilter interface methods
- 2.3. XMLFilter methods to override
- 2.4. Working with attributes
- 3. Implementing Java XML Filters
- 3.1. A simple filter
- 3.2. A sample ContentHandler
- 3.3. Setting up a processing chain
- 3.4. Running the processing chain
- 4. Using Java XML Filters with DeltaXML
- 5. Java XML Filter writing guidelines
- 6. Combining Java XML Filters and XSL Stylesheet filters
List of Examples
- 3.1. A Simple XMLFilter class
- 3.2. The TestHandler content handler
- 3.3. Setting up the processing pipeline
- 3.4. The text.xml data file
- 4.1. Using SimpleXMLFilter with DeltaXML
- 4.2. The new-test.xml file
- 4.3. Initial Delta File for test.xml and new-test.xml
- 4.4. Multiple filters for DeltaXML
- 4.5. A Revised delta file
- 5.1. Removing elements
Preface
Abstract
This tutorial focusses on writing Java XML filters to be used to pre-process and post-processing of data within a DeltaXML pipeline. It presents the SAX XMLFilter interface and how this can be used within Java XML processing programs. It then explores how Java XML Filters can be used with DeltaXML and presents some guidelines for writing such filters
Chapter 1. Introduction
In this tutorial we will step through the use of the SAX API for XML processing's facilities for crating Java XML Filter objects. These are objects that can filter (or process) data within a pipeline of filters. This is a standard API, provided as part of the Java envrionment since Java 2 SDK 1.4.0. and can be used in a wide range of XML processing applications. In this tutorial we will focus on its use with the DeltaXML toolset (in particular we will focus on its use as part of a DeltaXML comparison pipeline).
In the remainder of this tutorial we will first introduce the concept of Java XML filters, examine how a generic Java XML Filter can be written, explore how such filtrs can work with DeltaXML and resent some guidelines on writing Java XML Filters. We conclude by illustrating how to mix Java and XSL based filters within a DeltaXML comparison pipeling.
Chapter 2. Java XML Filters
2.1. The XMLFilter interface
The XMLFilter interface is part of the SAX API for processing
XML documents, although it's generally ignored by many Java XML tutorials or
overlooked by most Java developers. An XMLFilter is a sub-interface
of the XMLReader class, as such it is very like the
XMLReader except that it obtains its events from another XML reader
rather than a primary source like an XML document, file or database. As such, it
is a primary component within the JAXP pipeline architecture. That is, it can
sit within a pipeline, receiving XML data from another XML processing element
and passing its data onto another XML processing element if required.
Thus XML Filters can modify a stream of (XML) events as they pass on to the final application. Indeed, this is the primary construct in the event-driven, serial-access, pipelining architecture described earlier.
Assuming you have a distribution of SAX, or a version of Java containing the
XML APIs and look at the included classes; the one you want is
org.xml.sax.XMLFilter. You should also ensure that you have the SAX
helper classes, found in the org.xml.sax.helpers package. In that
package, you will want to focus on the
org.xml.sax.helpers.XMLFilterImpl class.
The XMLFilter interface and the XMLFilterImpl implementation of
that interface pair add up to powerful filtering for any SAX-based applications
and provide a major architectural advantage in using DeltaXML itself.
2.2. The XMLFilter interface methods
If you examine the XMLFilter, you'll find that it extends the
org.xml.sax.XMLReader interface, and adds two new methods:
-
public void setParent(XMLReader parent);This method allows the application to link the filter to a parent reader (which may be another filter). The argument may not benull. -
public XMLReader getParent();This method allows the application to query the parent reader (which may be another filter). It is generally a bad idea to perform any operations on the parent reader directly: they should all pass through this filter.
This probably doesn't look like much; of course, you also get all the other
XMLReader methods such as startElement(),
endElement(), etc. In each of these methods, you can operate upon
the input XML data before an application gets to it. In the case of DeltaXML
this is extended such that this can be done before and after DeltaXML processes
the comparison.
To put this in perspective, note that application code doesn't start to work
on the XML data from a SAX parse until after that parsing has completed.
However, you can insert an XMLFilter into the processing chain
before this completion, meaning you get to modify data before the application
gets that data (and, for example, outputs it). Since you have all of the SAX
callback methods that an XMLReader does, you can work with the
elements, the attributes, the prefix mappings, and anything else that SAX can
work with.
2.3. XMLFilter methods to override
The following methods can be overridden to perform custom actions when the
associated parser events occur. Always be sure to call super.method
if the event is to be passed on to the transformer.
-
public void startDocument() throws SAXException-
This event occurs at the start of a document.
-
-
public void endDocument() throws SAXException-
This event occurs at the end of a document.
-
-
public void startPrefixMapping(String prefix, String uri) throws SAXException-
This event occurs when a namespace declaration occurs. It will occur before the startElement event for the element where the namespace comes into scope.
-
-
public void endPrefixMapping(String prefix) throws SAXException-
This event occurs when a declared prefix goes out of scope. It will occur after the endElement event for the element where the namespace goes out of scope.
-
-
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException-
This event occurs when a start element tag is encountered.
-
-
public void endElement(String uri, String localName, String qName) throws SAXException-
This event occur when an end element tag is encountered.
-
-
public void characters(char[] ch, int start, int length) throws SAXException-
This event occurs when a text node is encountered. This includes inter-element whitespace, except where a DTD causes it to be treated as ignorable whitespace.
-
-
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException-
This event occurs when a DTD is present and inter-element whitespace is encountered.
-
-
public void processingInstruction(String target, String data) throws SAXException-
This event occurs when a processing instruction is encountered.
-
Note: The Attributes object in startElement() is the
org.xml.sax.Attributes interface. The implementation object is
org.xml.sax.helpers.AttributesImpl.
2.4. Working with attributes
As mentioned above, element attributes are accessed through the
Attributes object atts in the
startElement method. This object can be used as it is if
read-only access is required on the attributes. However, to edit
attributes, the Attributes object must be converted to an
AttributesImpl object. This cannot be performed using explicit
casting, instead either pass the Attributes object to the
constructor of AttributesImpl, for example:
AttributesImpl attImpl = new AttributesImpl(atts);
or pass it to the setAttributes method on
AttributesImpl, for example:
attImpl.setAttributes(atts);
Once the Attributes have been converted to an
AttributesImpl object, you can now use methods such as
addAttribute, setAttribute or
removeAttribute to edit the list.
To determine the presence of a certain attribute within an
Attributes object, the getIndex method can be used.
The attribute can be referenced by either its qName or by both
uri and localName, for example:
int index = atts.getIndex("foo:bar");
or
int index = atts.getIndex("http://foobar.com", "bar");
The returned int is either the index of the specified attribute,
or -1 if it is not present.Similarly, the getValue
method can be used. This returns either a String representing the
value of the specified attribute, or null if it is not present.
Chapter 3. Implementing Java XML Filters
In this section we will look at how you can write your own Java XML Filter classes. To do this we will create a very simple application that we will then develop on and use with DeltaXML in the next section of the tutorial.
3.1. A simple filter
Let's look at a concrete example of an XMLFilter, so that you
can start to get an idea of what must be implemented to create your own filter.
The following listing shows a very simple SAX filter that changes all
postalcode elements into postcode elements. This
effectively pre-processes the input document to modify one element, while
allowing all other elements to pass through unchanged. This filter is presented
in the following listing.
Example 3.1. A Simple XMLFilter class
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.XMLFilterImpl;
public class SimpleXMLFilter extends XMLFilterImpl {
public void startElement(String uri,
String localName,
String qName,
Attributes atts) throws SAXException {
// Map postalcode to postcode
if ((localName.equals("postcalcode")) ||
(qName.equals("postalcode"))) {
System.out.println("SXF: " + qName);
qName = "postcode";
localName = "postcode";
}
// Delegate on to inherited behaviour
super.startElement(uri, localName, qName, atts);
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
// Map postalcode to postcode
if ((localName.equals("postcalcode")) ||
(qName.equals("postalcode"))) {
System.out.println("SXF: " + qName);
qName = "postcode";
localName = "postcode";
}
super.endElement(uri, localName, qName);
}
}
To be a Java XML filter a class must either implement the
XMLFilter interface or extend the XMLFilterImpl
class. In our case we are extending the XMLFilterImpl class form
the org.xml.sax.helpers package as this means that we only need to
implement those methods that will actually do something, (all other methods
required by the XMLFilter interface are provided by inheritance
form the XMLFilterImpl class). That is, XMLFilterImpl
provides a default version of all the required methods, allowing us to implement
(override) only those methods that have customized behavior. This keeps our code
cleaner, and requires less work on our part.
We now free to implement only the methods startElement and
endElement (be careful to make sure that the method signatures are
the same as those in the XMLReader interface, otherwise you will be
overloading the methods rather than overriding them, which will mean that your
code will not be called).
Our startElement and endElement methods change the
localName and the qName (qualified name) of the
element to potstcode if the element postalcode is
found. Otherwise they just pass the data through unaltered.
You may wonder why we set both values. This has to do with rules regarding
SAX processing and the state of the following two SAXParserFactory
properties:
http://xml.org/sax/features/namespaces and the
http://xml.org/sax/features/namespace-prefixes properties:
Essentially, these rules say that:
-
the Namespace URI and local name are required when the namespaces property is true (the default), and are optional when the namespaces property is false (if one is specified, both must be);
-
the qualified name is required when the namespace-prefixes property is true, and is optional when the namespace-prefixes property is false (the default).
To handle these situations we are setting both parameters to the new element name. This is also why we test both parameters.
The key thing to remember here is that the filter receives the XML data
before it is passed onto the next thing in the XML processing pipeline. Thus,
the output of this filter becomes the input of the next thing in the pipeline.
In turn the next thing in the pipeline views the data it receives as its input
(i.e. It never sees the original data). Thus if we change the element
postalcode to postcode, then the next processor in the
pipeline only sees the element postcode.
The key to understanding how this is achieved in a simple application with a
single filter and a content handler is that your XMLFilter
receives the data before the ContentHandler. If you want to process
that data you can, if you want to pass it onto the content handler then you can
do so by calling the parent classes version (the one defined in the
XMLFilterImpl class) which will do just that. In fact, all of the
default methods in XMLFilterImpl do just that. To avoid the data
being seen by the reader at all, simply avoid delegating to the reader's
methods.
However, a word of warning if you use a filter to remove some elements form
the data input to the next element in the pipeline. It is all too easy to
pollute the data being sent on. For example, consider the case where you don't
delegate in the startElement() method for certain data, but forget
to do the same in endElement(). The result would be that some
elements would never be reported as starting, but would be reported to the
reader as ending. This would cause, in the best case, program errors, and in the
worst case, data loss or corruption in your application.
Note
In the SimpleXMLFilter class presented earlier we ignored
namespaces. It should be noted that the code as it stands will remove a
namespaced postalcode element from it's namespace as well as
renaming it to postcode. e.g both
<deltaxml:postalcode/> and <postalcode/>
would be output as <postcode/>. We can overcome this
limitation by modifying the behaviour of the code, for exmaple we could change
the startElement method thus:
public void startElement(String uri,
String localName,
String qName,
Attributes atts) throws SAXException {
// Map postalcode to postcode
if (localName.equals("postalcode")) {
System.out.println("SXF: " + localName);
// Now check to see if a name space is being used
int index= qName.indexOf(":");
if (index != -1) {
// If it is then we want to leave the name space as is
qName= qName.substring(0, index) + "postcode";
} else {
qName = "postcode";
}
localName = "postcode";
}
// Delegate on to inherited behaviour
super.startElement(uri, localName, qName, atts);
}
This version of the startElement method will retain any
namespace information provided. However, it is slightly more complex and thus we
will leave the startElement method as it is.
3.2. A sample ContentHandler
In order to give our SimpleXMLFilter class some meaning, let us look at a simple XML content handler. In this simple example, the following class represents our application. It is this class which will receive the output generated by our filter.
Example 3.2. The TestHandler content handler
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class TestHandler extends DefaultHandler {
public void startElement(String uri,
String localName,
String qName,
Attributes atts)
throws SAXException {
println("<" + qName + ">");
}
public void endElement(String uri,
String localName,
String qName)
throws SAXException {
println("</" + qName + ">");
}
private void println(String s) throws SAXException {
System.out.println(" " + s);
}
}
All that this class does is to printout the XML tags it receives, as an echo
of its input. This will allow us to see what data it receives. It is a very
simple class that merely extends the
org.xml.sax.helpers.DefaultHandler class which in turn implements
the org.xml.sax.ContentHandler interface. This allows it to be used
within the XML processing chain.
3.3. Setting up a processing chain
Once you have your filter set up and compiled, you need to create a pipeline for processing your XML. This should move from input document to filter to reader. You may even have multiple filters, stacked upon each other. As long as input comes first, and your reader (with application-specific callbacks) comes last, things work fine. However, you may have a particular order for your filters, and you should pay attention to that closely. The following listing shows how to set up your program for using filters in the general case (we will look at how much easier it is with DeltaXML in the next chapter).
Example 3.3. Setting up the processing pipeline
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
public class XMLFilterTest {
public static void main(String[] args)
throws IOException,
ParserConfigurationException,
SAXException {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
XMLReader reader = parser.getXMLReader();
SimpleXMLFilter filter = new SimpleXMLFilter();
filter.setParent(reader);
TestHandler handler = new TestHandler();
filter.setContentHandler(handler);
filter.setErrorHandler(handler);
InputSource inputSource = new InputSource("test.xml");
filter.parse(inputSource);
}
}
Notice that because the one or more filters must sit between input source and
the reader, all the operations that you would normally invoke on the reader are
invoked on the filter. It then delegates any data that passes through the filter
to the reader, as you saw in SimpleXMLFilter. If this is not 100%
clear to you do not worry as DeltaXML will make it very easy for you to create a
pipeline without needing to get down to this level.
3.4. Running the processing chain
We are now in a position to execute our filtered pipeline. We will run this
on the content of the XML file test.xml. The content of this file
is presented below:
Example 3.4. The text.xml data file
<AddressList> <person> <name>John Smith</name> <street>10 Grays Inn Road</street> <city>London</city> <postalcode>WC1X 8TX</postalcode> </person> </AddressList>
This is a very simple XML file containing some basic address list like data.
Note that the element holding the PCDATA WC1X 8TZ is called
postalcode.
We run the main method of the XMLFilterTester
class to process this XML file using our new filter . The result of running this
class is presented below:
<AddressList> <person> <name> </name> <street> </street> <city> </city> SXF: postalcode <postcode> SXF: postalcode </postcode> </person> </AddressList>
In the above output, the printouts form the SimpleXMLFilter are
prefixed by SXF: and the printouts form the
TestHandler are prefixed by three spaces. Thus you can
easily see the output generated by each. From this, it is clear that the
SimpleXMLFilter ignores all XML except for the element
postalcode which it converts into the element
postcode. Thus the TestHandler knows nothing about the
element postalcode as it never receives it, and thus the data has
been filtered before being received by the TestHandler.
Chapter 4. Using Java XML Filters with DeltaXML
4.1. DeltaXML and Java Filters
Writing filters to work with DeltaXML is exactly like writing any Java XML filter. Thus we can use the filter we just created to process two XML files before they are compared by DeltaXML.
This might be useful if the Schema or DTD used for the XML file has changed,
such that the element is now called postcode in new or modified
file. This might happen due to some changing requirements imposed by, for
example, third parties.
As the incorporation of filters into the processing pipeline, both
before and after, execution of DeltaXML is so important,
DeltaXML provides a convenience class which greatly simplifies the incorporation
of filters (whether Java based or XSL based). This class is the
PipelinedComparator class in the com.deltaxml.core
package. This class allows a list of input and output filters to be specified
and then a comparison performed. If the filters are implemented in Java, as
SimpleXMLFilter is, then the class object is passed to the
PipelinedComparator. A class object can be obtained in a number of ways, for
example, by using the classClass and the moethod forName:
Class.forName("com.foo.EchoFilter");
or by using the .class extension on the name:
com.foo.EchoFilter.class
This is illustrated in the sample programs presented in this chapter.
4.2. Using a Java XML Filter
In this section we will take the generic Java XML Filter written in the last chapter and use it as an input filter for DeltaXML. Thus all data to be comapred will first pass through the SimpleXMLFilter before being passed to DeltaXML.
The following program illustrates the use of the
PipelinedComparator with the SimpleXMLFilter
presented earlier.
Example 4.1. Using SimpleXMLFilter with DeltaXML
import java.io.File;
import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
public class PCXMLFilterTest {
public static void main(String[] args)
throws PipelinedComparatorException
{
PipelinedComparator pc= new PipelinedComparator();
// Set up the input filter
pc.setInputFilters(new Class[]{SimpleXMLFilter.class});
// Now run the DeltaXML comparison
pc.compare(new File("test.xml"),
new File("new-test.xml"),
new File("changes.xml"));
}
}
Notice that we create a new instance of the PipelinedComparator
class and set an array of input filters on this class (you can also set an array
of output filters as well). Once this is done we call the compare method with
the two files to compare and the output file for the delta.
We will reuse the test.xml file from the previous example, but
create a new-test.xml file with the following content:
Example 4.2. The new-test.xml file
<AddressList> <person> <name>John Smith</name> <street>12 Grays Inn Road</street> <city>London</city> <postcode>WC1X 8TX</postcode> </person> </AddressList>
As you can see form this file, this XML document has an element
postcode. It also has changes to the element street's
PCDATA. Although the element <postcode> is not present in the
original file, the input filter will convert <postalcode> to
be the element <postcode> and thus allow the two files to be
compared appropriately. Running the PCXMLFilterTest generates the
following delta file:
Example 4.3. Initial Delta File for test.xml and new-test.xml
<?xml version="1.0" encoding="UTF-8"?>
<AddressList
xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
deltaxml:delta="WFmodify">
<person deltaxml:delta="WFmodify">
<name deltaxml:delta="unchanged"/>
<street deltaxml:delta="WFmodify">
<deltaxml:PCDATAmodify>
<deltaxml:PCDATAold>
10 Grays Inn Road
</deltaxml:PCDATAold>
<deltaxml:PCDATAnew>
12 Grays Inn Road
</deltaxml:PCDATAnew>
</deltaxml:PCDATAmodify>
</street>
<city deltaxml:delta="unchanged"/>
<postcode deltaxml:delta="unchanged"/>
</person>
</AddressList>
Which illustrates that only the <street> elements' PCDATA
value have changed - it did not see the change from
<postalcode> to <postcode>.
4.3. Using Multiple Java XML Filters
We can improve on the example presented in the last section by using the
Word-by-Word filters provided as examples with the DeltaXML
distribution. These can be found under the src and are in the
package com.deltaxml.pipe.filters. There are three filters, an
input filter called WordByWordInfilter, and two output filters
called WordByWordOutfilter1 and WordByWordOutfilter2.
Output filters can be set on the PipelinedComparator class usinf
the setOutputFilters method. The modified program, that exploirts
these filters, looks like this:
Example 4.4. Multiple filters for DeltaXML
import java.io.File;
import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
import com.deltaxml.pipe.filters.WordByWordInfilter;
import com.deltaxml.pipe.filters.WordByWordOutfilter1;
import com.deltaxml.pipe.filters.WordByWordOutfilter2;
public class PCXMLFilterTest2 {
public static void main(String[] args)
throws PipelinedComparatorException
{
PipelinedComparator pc= new PipelinedComparator();
// Set up the input filter
pc.setInputFilters(new Class[]{SimpleXMLFilter.class,
WordByWordInfilter.class});
// Now run the DeltaXML comparison
pc.setOutputFilters(new Class[] {WordByWordOutfilter1.class,
WordByWordOutfilter2.class});
pc.compare(new File("test.xml"),
new File("new-test.xml"),
new File("diff.xml"));
}
}
As you can see form this PCXMLFilterTest2 class, it only differs
from PXCMLFilterTest class in having two input filers listed and
two output filters. All these filters are implemented in Java class that
implement the XMLFilter interface.
The result of running this program on the two XML files is presented below:
Example 4.5. A Revised delta file
<?xml version="1.0" encoding="UTF-8"?>
<AddressList
xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
deltaxml:delta="WFmodify">
<person deltaxml:delta="WFmodify">
<name deltaxml:delta="unchanged"/>
<street deltaxml:delta="WFmodify">
<deltaxml:PCDATAmodify>
<deltaxml:PCDATAold>
10
</deltaxml:PCDATAold>
<deltaxml:PCDATAnew>
12
</deltaxml:PCDATAnew>
</deltaxml:PCDATAmodify>
Grays Inn Road
</street>
<city deltaxml:delta="unchanged"/>
<postcode deltaxml:delta="unchanged"/>
</person>
</AddressList>
This has now refined the delat produced such that we can see that the only value that has changed is that the number 10 has been changed to the number 12.
We could go on and define further filters to provide whatever pre and post processing requirements we may have. However, these examples illustrate the way in which you can implement your own filters and then use then as either pre or post processor with DeltaXML.
If you wish to get under the hood of the PipelinedComparator
and create your own pipelines and call the DeltaXML compare API directly, then
you will need to look at the Building Pipelines tutorial.
Chapter 5. Java XML Filter writing guidelines
The guidelines for writing a Java XML Filter presented in this section are not necessarily specific to DeltaXML, and are really guidelines for all Java XML Filters. However, they have particular resonance with DeltaXML and are presented here to help you create your own XML filters.
5.1. Storing data
Sometimes it may be necessary to store some of the data that has been parsed. This could be for various reasons, you may need to store details of a start element as its output depends on something that may come later in the document. You may need to store the current value of an attribute as it changes the way that child elements are output etc. If too much data is being stored and a large amount of memory is taken up, then the advantage of using a serial-access method of processing is lost; it may be more sensible to use another method such as DOM or XSL stylesheets.
5.1.1. Attribute values
If it is necessary to know the value of an attribute of a parent element when
processing another element, it can be useful to use a Stack to
store this. Suppose an XML document stores text which must be output either in
upper-case or lower-case depending on an attribute value in the text node's
parent element.
<root caps="false">
<elem1 caps="true">
This should be output in upper-case
<elem1child caps="false">
But this should be lower-case
</elem1child>
<elem1child2>
No value was specified, should be upper-case because of parent
</elem1child2>
This should be upper-case
</elem1>
But this is in the root element and so should be lower-case
</root>
The characters to output will be accessible from the characters method.
However, when this method is called, we have no access to the
Attributes object that was passed by any of the
startElement methods. The value of the caps attribute
should be pushed onto a Stack at the beginning of every
startElement method and popped at the end of every
endElement method. If the attribute is not set in a particular
element (as in elem1child2 above), the value currently at the top
of the Stack should be pushed on again. This would cause problems
if the root element didn't have a value specified, as peeking at the
Stack would throw an EmptyStackException. In this
case, a default value should be pushed onto the Stack. Example code
for this can be seen below:
import org.xml.sax.helpers.XMLFilterImpl;
import org.xml.sax.*;
import java.util.Stack;
import java.util.EmptyStackException;
public class RandomClass extends XMLFilterImpl {
Stack capsVal = new Stack();
String defaultValue = "false";
public void startElement(String uri,
String localName,
String qName,
Attributes atts)
throws SAXException {
int index = atts.getIndex("caps");
if(index != -1) {
capsVal.push(atts.getValue(index));
} else {
String value;
try {
value = (String)capsVal.peek();
} catch(EmptyStackException e) {
value = defaultValue;
}
capsVal.push(value);
}
super.startElement(uri, localName, qName, atts);
}
public void endElement(String uri,
String localName,
String qName)
throws SAXException {
capsVal.pop();
super.endElement(uri, localName, qName);
}
public void characters(char[] ch, int start, int length)
throws SAXException{
String output = new String(ch, start, length);
if(((String)capsVal.peek()).equals("true")) {
output = output.toUpperCase();
} else {
output = output.toLowerCase();
}
super.characters(output.toCharArray(), 0, output.length());
}
}
5.1.2. Elements
If it is necessary to store only one element at a time, this can be achieved
using Strings for the uri, localName and
qName and an AttributesImpl object for the
atts. If many elements need to be stored, it may become necessary
to write a custom Element class which could then be pushed onto a
Stack.
5.1.3. Attributes
Again with attributes, if it is only necessary to store one attribute at a
time, this can be achieved with Strings for the uri,
localName, qName and value. If it is
necessary to store many attributes, a custom class may need to be written, which
again could be stored in a Vector, ArrayList or
similar data structure.
5.1.4. Namespace declarations
If it is necessary to declare new namespaces within an XML document, it would
be useful to know if that namespace has already been declared earlier in the
document as it is invalid XML to declare a namespace that is already in scope.
This can be achieved easily using a flag which is set in the
startPrefixMapping and endPrefixMapping methods.
boolean foobarPrefixDefined = false;
public void startPrefixMapping(String prefix,
String uri)
throws SAXException {
if(prefix.equals("foobar")) {
foobarPrefixDefined = true;
}
super.startPrefixMapping(prefix, uri);
}
public void endPrefixMapping(String prefix) throws SAXException {
if(prefix.equals("foobar")) {
foobarPrefixDefined = false;
}
super.endPrefixMapping(prefix);
}
The boolean value can then be tested before declaring the namespace to see if it is necessary or not.
5.2. Removing elements of a certain type
The following example filter would remove any element with the local name
"remove" from the XML document. Everything else would be copied to
the output document (including any contents of the remove
elements).
Example 5.1. Removing elements
import org.xml.sax.helpers.XMLFilterImpl;
import org.xml.sax.SAXException;
public class ExampleFilter extends XMLFilterImpl {
public void startElement(String uri,
String localName,
String qName,
Attributes atts)
throws SAXException {
if(!localName.equals("remove"))
super.startElement(uri, localName, qName, atts);
}
public void endElement(String uri,
String localName,
String qName)
throws SAXException {
if(!localName.equals("remove"))
super.endElement(uri, localName, qName);
}
}
5.3. Comments
The XMLFilterImpl does not perform any actions on comments
within an XML document. This means that they are removed when passed through the
filter. If it is necessary to include the original comments in the output
document, the
http://xml.org/sax/properties/lexical-handler property
must be set on the filter's XMLReader with a valid object as the
value. The easiest way to do this is to pass the TransformerHandler
as an argument as this implements the LexicalHandler interface.
TransformerHandler th = saxFactory.newTransformerHandler();
filter.getParent().setProperty("http://xml.org/sax/properties/lexical-handler",
th);
5.4. Parsers
5.4.1. Parser Differences
It is important to know how the particular parser implementation you are using handles the different events as there are sometimes differences.
For example, a string of characters ending in a newline character (e.g abcdefg\n) will cause different numbers of character events depending on the parser. If Crimson (the default for JDK 1.4.x) is used, only one character event will occur. However, Xerces-J creates two consecutive events, one for the letters (abcdefg in the example above) and one for the newline character at the end. It is important to be aware of this when writing XMLFilters to be used with Xerces-J.
Individual parser documentation should give you information on these specifications.
5.4.2. Parser Features
There are two features that can be set on the SAXParserFactory
that change the information received from a parser event.
-
http://xml.org/sax/features/namespaces- defaults to true-
Setting this feature to
true(or not setting it at all as the default istrue) causes element and attribute names to be reported with auri,localNameandqName. For example the element<foo:bar>where foo is declared as xmlns:foo="http://foobar.com" would be reported with auriof "http://foobar.com", alocalNameof "bar" and aqNameof "foo:bar".
-
When the feature is set to false, only the qName
is reported. This will cause problems when using Saxon as a transformer as it
throws an Exception if it receives any element or attribute with no
localName specified.
-
http://xml.org/sax/features/namespace-prefixes- defaults to false-
Setting this feature to
truecauses namespace declarations to be included in the attribute list in thestartElementevent. They have nouriorlocalName, aqNameofxmlns:prefixand a value of the namespace URI. The lack oflocalNamecan cause problems if using Saxon as a transformer because it will throw an Exception if it encounters any elements or attributes without alocalName. Therefore, if this feature is set totrueand Saxon is being used, the namespace declaration attributes should be removed from the attribute list or have alocalNameadded.
-
Chapter 6. Combining Java XML Filters and XSL Stylesheet filters
In this tutorial we have illustrated how you can use Java XML Filters with DeltaXML and also how to use XSL Filters with DeltaXML. In fact you can mix the two if you wish. Indeed you can provide a mixed list of Java classes and XSL stylesheets in any order both for input and output purposes.
As an example, consider the following program:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
import com.deltaxml.core.PipelinedComparator;
import com.deltaxml.core.PipelinedComparatorException;
import com.deltaxml.pipe.filters.WordByWordInfilter;
import com.deltaxml.pipe.filters.WordByWordOutfilter1;
import com.deltaxml.pipe.filters.WordByWordOutfilter2;
public class PCXMLFilterTest3 {
public static void main(String[] args)
throws PipelinedComparatorException,
FileNotFoundException
{
PipelinedComparator pc= new PipelinedComparator();
// Set up the input filter
List inFilters = new ArrayList();
inFilters.add(new File("normalize-space.xsl"));
inFilters.add(SimpleXMLFilter.class);
inFilters.add(WordByWordInfilter.class);
pc.setInputFilters(inFilters);
// Now setup the output filters
List outFilters= new ArrayList();
outFilters.add(WordByWordOutfilter1.class);
outFilters.add(WordByWordOutfilter2.class);
outFilters.add(new File("deltaxml-tables.xsl"));
pc.setOutputFilters(outFilters);
// Now run the comparison
pc.compare(new File("test.xml"),
new File("new-test.xml"),
new File("diff.html"));
}
}
In the above program (PCXMLFilterTest3) we have used three input
filters and three output filters. In each case one fo the filters is implemented
as an XSL script and the other two are implemented as Java XML
Filters. However, DeltaXML does not need to be concerned with the actual
implementation, both approaches work as filters. Indeed both the Normalize Space
and the Word-By-Word filters are available as XSL files or as Java XML Filters.
You could try changing the type used and seeing the results (they should be
exactly the same).
The actual result of running this program is that the diff.html
file is generated which is presented below:
|
||
| Delta file generated from heterogeneous filters |
Note that in general the Java XML filters are faster and have lower memory overheads and are thus often preferable to their XSL equivalents.