Table of Contents
How to Preserve Processing Instructions and Comments
1 Introduction
XML documents often contain processing instructions (PIs) or comments as well as the normal elements and attributes. While these parts of the document are reported by the parser, they are not by default processed during a comparison. This is of course not always an issue but sometimes it is necessary to include comments and processing instructions in the result. To achieve this, they need to be converted into XML elements within the document and then converted back again after comparison. This sample explains how this can be achieved using filters provided in the DeltaXML product.
2 Converting Processing Instructions and Comments into XML
The first step in preserving PIs and comments is to convert them into XML elements. The following example shows an XML document that contains PIs and comments.
Example 1: an XML file containing PIs and comments (input1.xml in the sample directory)
<!-- document comment outside of the root element --> <?pi_target pre-root processing instruction ?> <root> <!-- the following paragraph is a pangram --> <para>The quick brown fox jumps over the lazy dog.</para> <?pi_target processing instructions may be modified ?> <para>A quick movement of the enemy will jeopardize six gunboats.</para> <!-- comments may be deleted --> <para>A final paragraph</para> </root> <!-- comments can appear after the root element --> <?pi_target so can processing instructions ?>
The DeltaXML Core product contains filters for PI and comment processing. The following example shows the same file after being processed by the pi2xml.xsl XSLT stylesheet. Notice that the PIs and comments that appeared outside of the root element have been moved inside it, wrapped in special container elements highlighting the fact.
N.B. The pi2xml.xsl input filter must not appear after a filter that does not process PIs and comments, otherwise it will not be able to convert them into elements. It is therefore advisable to place it as near to the start of the input filter chain as possible.
Example 2: the XML file after passing through the pi2xml input filter
<root xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" xmlns:pi="http://www.deltaxml.com/ns/processing-instructions"> <deltaxml:preceding-root-comments-and-pis> <deltaxml:comment deltaxml:word-by-word="false"> document comment outside of the root element </deltaxml:comment> <pi:pi_target deltaxml:word-by-word="false">pre-root processing instruction </pi:pi_target> </deltaxml:preceding-root-comments-and-pis> <deltaxml:comment deltaxml:word-by-word="false"> the following paragraph is a pangram </deltaxml:comment> <para>The quick brown fox jumps over the lazy dog.</para> <pi:pi_target deltaxml:word-by-word="false">processing instructions may be modified </pi:pi_target> <para>A quick movement of the enemy will jeopardize six gunboats.</para> <deltaxml:comment deltaxml:word-by-word="false"> comments may be deleted </deltaxml:comment> <para>A final paragraph</para> <deltaxml:following-root-comments-and-pis> <deltaxml:comment deltaxml:word-by-word="false"> comments can appear after the root element </deltaxml:comment> <pi:pi_target deltaxml:word-by-word="false">so can processing instructions </pi:pi_target> </deltaxml:following-root-comments-and-pis> </root>
These elements can now be compared as part of the comparison and will appear in the delta file.
If you only wish to convert comments and not PIs (or vice versa), you can
make use of the filter parameters defined on the input filter. These are
'convert-pi' and 'convert-comments' and should be set
to either 'yes' or 'no'.
3 Converting back after comparison
The following example shows the delta file produced after comparing input1.xml and input2.xml from the sample directory.
Example 3: a delta file showing changes to PIs and comments
<root xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" xmlns:pi="http://www.deltaxml.com/ns/processing-instructions" deltaxml:deltaV2="A!=B" deltaxml:version="2.0" deltaxml:content-type="full-context"> <deltaxml:preceding-root-comments-and-pis deltaxml:deltaV2="A=B"> <deltaxml:comment deltaxml:word-by-word="false"> document comment outside of the root element </deltaxml:comment> <pi:pi_target deltaxml:word-by-word="false">pre-root processing instruction </pi:pi_target> </deltaxml:preceding-root-comments-and-pis> <deltaxml:comment deltaxml:deltaV2="A!=B" deltaxml:word-by-word="false"> <deltaxml:textGroup deltaxml:deltaV2="A!=B"> <deltaxml:text deltaxml:deltaV2="A"> the following paragraph is a pangram </deltaxml:text> <deltaxml:text deltaxml:deltaV2="B"> the following two paragraphs are pangrams </deltaxml:text> </deltaxml:textGroup> </deltaxml:comment> <para deltaxml:deltaV2="A=B">The quick brown fox jumps over the lazy dog.</para> <pi:pi_target deltaxml:deltaV2="A!=B" deltaxml:word-by-word="false"> <deltaxml:textGroup deltaxml:deltaV2="A!=B"> <deltaxml:text deltaxml:deltaV2="A">processing instructions may be modified </deltaxml:text> <deltaxml:text deltaxml:deltaV2="B">processing instructions may be changed </deltaxml:text> </deltaxml:textGroup> </pi:pi_target> <para deltaxml:deltaV2="A=B">A quick movement of the enemy will jeopardize six gunboats.</para> <deltaxml:comment deltaxml:deltaV2="A" deltaxml:word-by-word="false"> comments may be deleted </deltaxml:comment> <para deltaxml:deltaV2="A=B">A final paragraph</para> <deltaxml:following-root-comments-and-pis deltaxml:deltaV2="A=B"> <deltaxml:comment deltaxml:word-by-word="false"> comments can appear after the root element </deltaxml:comment> <pi:pi_target deltaxml:word-by-word="false">so can processing instructions </pi:pi_target> </deltaxml:following-root-comments-and-pis> </root>
Because PIs and comments will not be represented as elements in the final document, we cannot mark them as changed as there is no way of adding a delta attribute to them. This means that when they have been modified, we need to decide which version is going to be output. There are two ways of achieving this; using the parameters on the xml2pi.xsl output filter or using the generic ignore-changes mechanism.
3.1 Using output filter parameters
The XSLT output filter has two parameters,
'pi-modified-version' and 'comment-modified-version'
and both take the values 'new' and 'old'. They correspond to the version of the
PI or comment that will be output in the event of a modification. The downside
of using the mechanism is that if the only changes to a document are in the PIs
and comments, the delta file after this output filter will actually be invalid
as it will state that there are changes. See the following example delta (the
result of processing the previous example delta with a value of 'new' for both
parameters).
Example 4: A delta file with a root delta value of A!=B but with no changes
<!-- document comment outside of the root element -->
<?pi_target pre-root processing instruction ?>
<root xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
deltaxml:deltaV2="A!=B"
deltaxml:version="2.0"
deltaxml:content-type="full-context">
<!-- the following two paragraphs are pangrams -->
<para deltaxml:deltaV2="A=B">The quick brown fox jumps over the lazy dog.</para>
<?pi_target processing instructions may be changed ?>
<para deltaxml:deltaV2="A=B">A quick movement of the enemy will jeopardize six gunboats.</para>
<para deltaxml:deltaV2="A=B">A final paragraph</para>
</root>
<!-- comments can appear after the root element -->
<?pi_target so can processing instructions ?>3.2 Using ignore changes
A better way of processing the changed PIs and comments is to use the generic ignore-changes mechanism. This will post process the resultant delta to ensure that the delta values are adjusted to correctly show the remaining changes. See "How to ignore changes" for more details.
4 Running the sample
If you have Ant installed, use the build script provided to run the sample.
Simply type the following command to run the pipeline and produce the output
files result.xml.
run ant
If you don't have Ant installed, you can run the sample from a command line by issuing the following command from the sample directory (ensuring that you use the correct slashes for your operating system).
java -jar ../../command.jar compare preserve input1.xml input2.xml result.xml
To clean up the sample directory, run the following command in Ant.
ant clean
