com.deltaxml.odf.comp
Class OdtComparator

java.lang.Object
  extended by com.deltaxml.odf.comp.OdtComparator

public class OdtComparator
extends Object

Compares two ODT (ISO/IEC 26300 v1.1) files and generates a result which describes how they differ.

NB: the ODT Comparator only supports ODF packages, not single-file ODF XML documents.


Constructor Summary
OdtComparator()
          Constructs an OdtComparator.
 
Method Summary
 void compare(File file1, File file2, File result)
          Compares two ODT files and produces an ODT result which describes the differences between the two input files.
 void compare(File file1, File file2, File result, ClassLoader cl)
          Compares two ODT files and produces an ODT result which describes the differences between the two input files.
 void compare(File file1, File file2, OutputStream result, ClassLoader cl)
          Compares two ODT files and produces an ODT result which describes the differences between the two input files.
static String getVersion()
          Returns the version of the OdtComparator currently in use.
 void setAddedBackground(Color color)
          Sets the background color of the text marked as added in the result document.
 void setAddedColor(Color color)
          Sets the color of the text marked as added in the result document.
 void setAddedDecoration(FontDecoration decoration)
          Sets the font decoration of the text marked as added in the result document.
 void setAddedEmptyParaColor(Color color)
          Sets the Color to use for the background color of an added empty paragraph.
 void setAddedImageBorder(Color color)
          Sets the border color of images marked as added in the result document.
 void setAddedObjectBackground(Color color)
          Sets the background color of OLE objects marked as added in the result document.
 void setAddedStyle(FontStyle style)
          Sets the font style of the text marked as added in the result document.
 void setAddedWeight(FontWeight weight)
          Sets the font weight of the text marked as added in the result document.
 void setConvertListsToParagraphs(boolean value)
          Sets whether or not to convert list items into paragraphs.
 void setDeletedBackground(Color color)
          Sets the background color of the text marked as deleted in the result document.
 void setDeletedColor(Color color)
          Sets the color of the text marked as deleted in the result document.
 void setDeletedDecoration(FontDecoration decoration)
          Sets the font decoration of the text marked as deleted in the result document.
 void setDeletedEmptyParaColor(Color color)
          Sets the Color to use for the background color of a deleted empty paragraph.
 void setDeletedImageBorder(Color color)
          Sets the border color of images marked as deleted in the result document.
 void setDeletedObjectBackground(Color color)
          Sets the background color of OLE objects marked as deleted in the result document.
 void setDeletedStyle(FontStyle style)
          Sets the font style of the text marked as deleted in the result document.
 void setDeletedWeight(FontWeight weight)
          Sets the font weight of the text marked as deleted in the result document.
 void setHighlightEmptyChangedParas(boolean value)
          Sets whether or not to add a background color to added or deleted empty paragraphs.
 void setIgnoreCarriageReturnChanges(boolean value)
          Sets whether or not to ignore carriage return changes in the document.
 void setIgnoreImageChanges(boolean value)
          Sets whether or not to ignore image changes.
 void setIgnoreTableCellCarriageReturns(boolean value)
          Sets whether or not to ignore carriage return changes within table cells.
 void setKeyedComparison(boolean value)
          Sets whether or not to use keys to align paragraphs in the comparison.
 void setKeyTablesByName(boolean value)
          Sets whether or not to use table names as keys for table alignment.
 void setMaximumOrphanedWordCount(int wordCount)
          Sets the maximum number of consecutive words to consider as an orphaned sequence.
 void setMinimumCommonText(int percentage)
          Sets the value of mimimumCommonText, used to determine how to represent certain changed items.
 void setModifiedImageBorder(Color color)
          Sets the border color of images marked as modified in the result document.
 void setModifiedObjectBackground(Color color)
          Sets the background color of OLE objects marked as modified in the result document.
 void setOrphanedPercentage(int percentage)
          Sets the orphaned percentage parameter that is used in determining whether a sequence of words is orphaned or not.
 void setRemoveEmptyParasBeforePageBreak(boolean value)
          Sets whether to remove empty paragraphs that precede a page break in the result.
 void setUncomparableImageBorder(Color color)
          Sets the border color of images that cannot be compared.
 void setUncomparableObjectBackground(Color color)
          Sets the background color of OLE objects that cannot be compared.
 void setWhitespaceOnlyChangeOutput(WhitespaceOnlyChangeOutput value)
          Sets the output rule for changes that only involve whitespace characters.
 void suppressTripleTableOutput(boolean value)
          Sets whether or not to suppress the triple table output.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

OdtComparator

public OdtComparator()
              throws OdtComparatorConfigurationException
Constructs an OdtComparator. If necessary, the Object should then be set up by calling the various set methods to set the desired parameters before calling one of the compare methods.

Throws:
OdtComparatorConfigurationException - when there is a problem configuring the OdtComparator. This could be caused by the wrong version of saxon being available on the classpath.
Method Detail

getVersion

public static String getVersion()

Returns the version of the OdtComparator currently in use.

Returns:
a String representing the version number of this version of the OdtComparator

setKeyedComparison

public void setKeyedComparison(boolean value)

Sets whether or not to use keys to align paragraphs in the comparison.

If the input documents contain keys in hidden text fields they can be used to help with the matching of paragraphs in the comparison. However, in certain circumstances this may produce a less optimal result. If this is the case, ignoring the keys during the comparison may be desirable. Calling this method with a value of false will cause key values to be ignored.

Default: true

Parameters:
value - whether or not to use keys in the comparison

setIgnoreCarriageReturnChanges

public void setIgnoreCarriageReturnChanges(boolean value)

Sets whether or not to ignore carriage return changes in the document.

Typically, adding or deleting a carriage return (not a line-break) can cause a large textual change in a document. This is due to the paragraph structure of the underlying XML of the document and is not incorrect but can be seen to be a less than optimal result. Passing a value of true to this method causes these kind of changes to be ignored in the result where possible.

Default: true

Parameters:
value - whether or not to ignore carriage return changes
See Also:
setIgnoreTableCellCarriageReturns(boolean)

setIgnoreTableCellCarriageReturns

public void setIgnoreTableCellCarriageReturns(boolean value)

Sets whether or not to ignore carriage return changes within table cells.

Note: This parameter will only have an effect if carriage returns are being ignored elsewhere in the document (see setIgnoreCarriageReturnChanges(boolean))

Default: false

Parameters:
value - whether or not to ignore carriage return changes within table cells
See Also:
setIgnoreCarriageReturnChanges(boolean)

setDeletedColor

public void setDeletedColor(Color color)

Sets the color of the text marked as deleted in the result document.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: Color.RED

Parameters:
color - the Color whose RGB value will be used for deleted text

setDeletedWeight

public void setDeletedWeight(FontWeight weight)

Sets the font weight of the text marked as deleted in the result document.

Default: none, i.e. the deleted text takes its font weight from its style as defined in input file 1.

Parameters:
weight - the FontWeight to use for the deleted text

setDeletedStyle

public void setDeletedStyle(FontStyle style)

Sets the font style of the text marked as deleted in the result document.

Default: none, i.e. the deleted text takes its font style from its style as defined in input file 1.

Parameters:
style - the FontStyle to use for the deleted text

setDeletedDecoration

public void setDeletedDecoration(FontDecoration decoration)

Sets the font decoration of the text marked as deleted in the result document.

Default: FontDecoration.STRIKETHROUGH

Parameters:
decoration - the FontDecoration to use for deleted text

setDeletedBackground

public void setDeletedBackground(Color color)

Sets the background color of the text marked as deleted in the result document.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: none, i.e. the deleted text takes its background color from its style as defined in input file 1.

Parameters:
color - the Color whose RGB value will be used for the background to deleted text

setAddedColor

public void setAddedColor(Color color)

Sets the color of the text marked as added in the result document.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: none, i.e. the added text takes its color from its style as defined in input document 2.

Parameters:
color - the Color whose RGB value will be used for added text

setAddedWeight

public void setAddedWeight(FontWeight weight)

Sets the font weight of the text marked as added in the result document.

Default: none, i.e. the added text takes its font weight from its style as defined in input file 2.

Parameters:
weight - the FontWeight to use for the added text

setAddedStyle

public void setAddedStyle(FontStyle style)

Sets the font style of the text marked as added in the result document.

Default: none, i.e. the added text takes its font style from its style as defined in input file 2.

Parameters:
style - the FontStyle to use for the added text

setAddedDecoration

public void setAddedDecoration(FontDecoration decoration)

Sets the font decoration of the text marked as added in the result document.

Default: FontDecoration.UNDERLINE

Parameters:
decoration - the FontDecoration to use for added text

setAddedBackground

public void setAddedBackground(Color color)

Sets the background color of the text marked as added in the result document.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: none, i.e. the added text takes its background color from its style as defined in input file 2.

Parameters:
color - the Color whose RGB value will be used for the background to added text

setAddedImageBorder

public void setAddedImageBorder(Color color)

Sets the border color of images marked as added in the result document.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: the Color with RGB value 00F000

Parameters:
color - the Color whose RGB value will be used for the border of added images

setDeletedImageBorder

public void setDeletedImageBorder(Color color)

Sets the border color of images marked as deleted in the result document.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: the Color with RGB value F00000

Parameters:
color - the Color whose RGB value will be used for the border of deleted images

setModifiedImageBorder

public void setModifiedImageBorder(Color color)

Sets the border color of images marked as modified in the result document.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: the Color with RGB value 0000F0

Parameters:
color - the Color whose RGB value will be used for the border of modified images

setUncomparableImageBorder

public void setUncomparableImageBorder(Color color)

Sets the border color of images that cannot be compared.

An image may be marked as uncomparable due to a missing file (if the image is external to the ODT document) or if the image format is one that the OdtComparator cannot process.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: the Color with RGB value FFFF00

Parameters:
color - the Color whose RGB value will be used for the border of uncomparable images

setAddedObjectBackground

public void setAddedObjectBackground(Color color)

Sets the background color of OLE objects marked as added in the result document.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: the Color with RGB value 80F080

Parameters:
color - the Color whose RBG value will be used for the background of added OLE objects

setDeletedObjectBackground

public void setDeletedObjectBackground(Color color)

Sets the background color of OLE objects marked as deleted in the result document.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: the Color with RGB value F08080

Parameters:
color - the Color whose RBG value will be used for the background of deleted OLE objects

setModifiedObjectBackground

public void setModifiedObjectBackground(Color color)

Sets the background color of OLE objects marked as modified in the result document.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: the Color with RGB value 90C0F0

Parameters:
color - the Color whose RBG value will be used for the background of modified OLE objects

setUncomparableObjectBackground

public void setUncomparableObjectBackground(Color color)

Sets the background color of OLE objects that cannot be compared.

Objects are marked as uncomparable when it is not possible to determine whether they are equal or not. This could be due to a missing file (if the object in the document is linked to an external file), or an object format that the OdtComparator cannot process.

Note: only the RGB value of the Color will be used, any Alpha component will be ignored.

Default: the Color with RGB value FFFF00

Parameters:
color - the Color whose RBG value will be used for the background of uncomparable OLE objects

setIgnoreImageChanges

public void setIgnoreImageChanges(boolean value)

Sets whether or not to ignore image changes.

Changed images and OLE objects, whether they are a change of the image/object itself or a change of its attributes (e.g. position, z-index etc) are shown with a colored frame (for images) or background (for OLE objects). If you wish these kind of changes to be ignored in the result document, pass the value of true to this method. The result document will contain the image or object present in input file 2 with no extra frame or background. Added and deleted images will still be marked in the result.

Default: false

Parameters:
value - whether or not to ignore image changes

setConvertListsToParagraphs

public void setConvertListsToParagraphs(boolean value)

Sets whether or not to convert list items into paragraphs.

The paragraph includes the item number or bullet and is indented as the list item would have been. This gives a better result when comparing text in lists and will also highlight list number changes.

Default: true

Parameters:
value - whether or not to convert lists to paragraphs

setRemoveEmptyParasBeforePageBreak

public void setRemoveEmptyParasBeforePageBreak(boolean value)

Sets whether to remove empty paragraphs that precede a page break in the result.

If empty paragraphs remain before page breaks, they can sometimes cause the result to contain unwanted empty pages. Passing a value of true to this method will cause them to be removed from the result, this stopping the empty pages from appearing.

Default: false

Parameters:
value - whether or not to remove empty paragraphs that immediately preced a page break

setHighlightEmptyChangedParas

public void setHighlightEmptyChangedParas(boolean value)

Sets whether or not to add a background color to added or deleted empty paragraphs.

If an empty paragraph is added or deleted between input documents, it can be difficult to identify in the result document unless it is given a background color. However, if you wish for these paragraphs to have no background color, pass a value of false to this method.

The Color to use for highlighting the paragraphs can be changed using setAddedEmptyParaColor(Color) and setDeletedEmptyParaColor(Color)

Default: true

Parameters:
value - whether or not to highlight empty added or deleted paragraphs
See Also:
setAddedEmptyParaColor(Color), setDeletedEmptyParaColor(Color)

setDeletedEmptyParaColor

public void setDeletedEmptyParaColor(Color color)

Sets the Color to use for the background color of a deleted empty paragraph.

Note: This value will only be used if setHighlightEmptyChangedParas(boolean) is set to true.

Default: Color.RED

Parameters:
color - the Color to use as the background color for deleted empty paragraphs
See Also:
setHighlightEmptyChangedParas(boolean)

setAddedEmptyParaColor

public void setAddedEmptyParaColor(Color color)

Sets the Color to use for the background color of an added empty paragraph.

Note: This value will only be used if setHighlightEmptyChangedParas(boolean) is set to true.

Default: The Color that has the RGB value 00AE00

Parameters:
color - the Color to use as the background color for deleted empty paragraphs
See Also:
setHighlightEmptyChangedParas(boolean)

setMaximumOrphanedWordCount

public void setMaximumOrphanedWordCount(int wordCount)

Sets the maximum number of consecutive words to consider as an orphaned sequence.

Words within paragraphs are compared on a word by word basis. This means that some common words (for example, 'the', 'and', 'a' in the English language) are likely to match up in places where they break up a sequence of added or deleted words. This can cause the resultant paragraphs to look very cluttered in terms of added, deleted and unchanged words.

Words that remain unchanged in between added or deleted words are termed 'orphaned' words. It is possible to tidy up the appearance of cluttered paragraphs by changing the markup of these words so that they don't break added or deleted sequences. It should be noted that an unchanged word that breaks a deleted sequence will be turned into a deleted word but, for correctness, will also appear as an added word somewhere else in the paragraph.

The maximum orphaned word count parameter is used in conjunction with the orphaned percentage parameter to control this behaviour. The value passed to this method is used to determine the maximum number of orphaned words to merge into the surrounding added or deleted sequence.

Note: assigning a value of 0 to this parameter has the effect of turning off the behaviour described here, no matter what value of orphaned percentage may be set.

Default: 2

Parameters:
wordCount - the maximum number of consecutive words to consider as an orphaned sequence
Throws:
IllegalArgumentException - if a value less than zero is passed as a parameter
See Also:
setOrphanedPercentage(int)

setOrphanedPercentage

public void setOrphanedPercentage(int percentage)

Sets the orphaned percentage parameter that is used in determining whether a sequence of words is orphaned or not.

The value of this parameter is used in the following calculation:
(unchanged-word-count / (preceding-changed-word-count + unchanged-word-count + following-changed-word-count)) * 100 <= orphanedPercentage
where unchanged-word-count must be less than or equal to the value of maximumOrphanedWordCount.

This means that, if a value of 20% is used, a single unchanged word must be surrounded by at least 5 added/deleted words to be considered an orphaned word.

Note: assigning a value of 100% to this parameter has the effect of requiring a minimal number of added/deleted words surrounding an orphaned word, although it does still require at least one added/deleted word on either side to be considered an orphan.

Default: 20

Parameters:
percentage - the percentage of unchanged words relative to surrounding modified words
Throws:
IllegalArgumentException - if the parameter value is not in the range 0-100

setMinimumCommonText

public void setMinimumCommonText(int percentage)

Sets the value of mimimumCommonText, used to determine how to represent certain changed items.

This parameter affects the way modified paragraphs and table rows are displayed. If these items contain mostly modified (added/deleted) text with only a small amount that is unchanged, it is sometimes better to display the change as the deleted version and the added version of the entire paragraph or table row rather than show a single item with changes in place. This parameter controls this behaviour by specifying the minimum amount of common text (as a percentage) that should be present to avoid splitting the paragraph or table row into an added and a deleted version.

Note: a value of 0 has the effect of turning this behaviour off, whereas a value of 100 has the effect of causing ALL paragraphs and table rows containing change to be output as the added/deleted versions.

Default: 20

Parameters:
percentage - the percentage of 'unchanged' text required in a paragraph to avoid splitting it
Throws:
IllegalArgumentException - if the parameter value is not in the range 0-100

suppressTripleTableOutput

public void suppressTripleTableOutput(boolean value)

Sets whether or not to suppress the triple table output. This output is generated when a table is changed structurally. The table showing modifications may not always be easy to read so the old and new version of the table are also output.

Passing a value of true to this method will stop the triple table output from being generated and the result will include a single modified table.

Default: false

Parameters:
value - whether to suppress the triple table output or not

setKeyTablesByName

public void setKeyTablesByName(boolean value)

Sets whether or not to use table names as keys for table alignment.

Parameters:
value - whether to use table names as keys

setWhitespaceOnlyChangeOutput

public void setWhitespaceOnlyChangeOutput(WhitespaceOnlyChangeOutput value)

Sets the output rule for changes that only involve whitespace characters.

If you do not wish to see changes that only involve whitespace characters, passing a value of WhitespaceOnlyChangeOutput.OutputA or WhitespaceOnlyChangeOutput.OutputB to this method will stop them being displayed. The two different values define which version of the whitespace string to use in the result when the whitespace string has been modified. Deleted and added strings (as opposed to modified) are output without markup as they appeared in either input A (for deleted) or input B (for added).

Default: WhitespaceOnlyChangeOutput.ShowChanges

Parameters:
value - the whitespace output type to use for whitespace only changes
See Also:
WhitespaceOnlyChangeOutput

compare

public void compare(File file1,
                    File file2,
                    File result)
             throws FileNotFoundException,
                    OdtComparatorException,
                    InvalidOdtFileException,
                    OdtInputException,
                    OdtOutputException,
                    StylesheetLoadException,
                    OdfLicensingException
Compares two ODT files and produces an ODT result which describes the differences between the two input files. Uses the context ClassLoader (that returned by: Thread.getContextClassLoader() called from the result of Thread.currentThread()) to locate Jar Resources used during comparison. See the other compare methods if an alternative ClassLoader is required due to application server or IDE restrictions.

Parameters:
file1 - the first, or older, ODT input file
file2 - the second, or newer, ODT input file
result - specifies an ODT file which will be created to describe the differences between the other parameters
Throws:
FileNotFoundException - if the result file cannot be created
OdtComparatorException - if there is a problem during comparison, see the nested cause for more information
OdtInputException - if there is a problem reading one of the input files
OdtOutputException - if there is a problem writing to the result file
StylesheetLoadException - if a required stylesheet cannot be loaded. See the exception documentation for more details
InvalidOdtFileException - if one of the input file is not an ODT file in the expected packaging format
OdfLicensingException - if there is a problem with the software license. NB the actual exception thrown will be a sub-type of OdfLicensingException. See the exception documentation for more details

compare

public void compare(File file1,
                    File file2,
                    File result,
                    ClassLoader cl)
             throws FileNotFoundException,
                    OdtComparatorException,
                    InvalidOdtFileException,
                    OdtInputException,
                    OdtOutputException,
                    StylesheetLoadException,
                    OdfLicensingException
Compares two ODT files and produces an ODT result which describes the differences between the two input files.

Parameters:
file1 - the first, or older, ODT input file
file2 - the second, or newer, ODT input file
result - specifies an ODT file which will be created to describe the differences between the other parameters
cl - the ClassLoader used when locating Jar Resources used during comparison
Throws:
FileNotFoundException - if the result files cannot be created
OdtComparatorException - if there is a problem during comparison, see the nested cause for more information
OdtInputException - if there is a problem reading one of the input files
OdtOutputException - if there is a problem writing to the output file
StylesheetLoadException - if a required stylesheet cannot be loaded. See the exception documentation for more details
InvalidOdtFileException - if one of the input files is not an ODT file in the expected packaging format
OdfLicensingException - if there is a problem with the software license. NB the actual exception thrown will be a sub-type of OdfLicensingException. See the exception documentation for more details

compare

public void compare(File file1,
                    File file2,
                    OutputStream result,
                    ClassLoader cl)
             throws OdtComparatorException,
                    OdtInputException,
                    OdtOutputException,
                    StylesheetLoadException,
                    InvalidOdtFileException,
                    OdfLicensingException
Compares two ODT files and produces an ODT result which describes the differences between the two input files.

Parameters:
file1 - the first, or older, ODT input file
file2 - the second, or newer, ODT input file
result - specifies an OutputStream which will be used as the basis for a ZipOutputStream to describe the differences between the other parameters
cl - the ClassLoader used when locating Jar Resources during comparison
Throws:
OdtComparatorException - if there is a problem during comparison, see the nested cause for more information
OdtInputException - if there is a problem reading one of the input files
OdtOutputException - if there is a problem writing to the output
StylesheetLoadException - if a required stylesheet cannot be loaded. See the exception documentation for more details
InvalidOdtFileException - if one of the input files is not an ODT file in the expected packaging format
OdfLicensingException - if there is a problem with the software license. NB the actual exception thrown will be a sub-type of OdfLicensingException. See the exception documentation for more details


© 2001-2009 DeltaXML Ltd. All Rights Reserved.