Loading login details...

DeltaXML Support Forums

Not found what you where looking for? Try the  advanced search 
DeltaXML Core > NonMatchingPCDataCombineException
Joined: 27-March-2007
Posts: 32
Location: Malvern, United Kingdom
Posted: 07-July-2008 12:11
Recombined Deltas must be 'raw'
Hello Giovani,

Without seeing the delta input to the recombine operation it is hard to give a definitive answer to this problem.  However,   I can make an educated guess:  there is a good chance that your delta input to the combine operation has been modified in some way.

The recombine operation requires 'raw' delta files, and in particular you should not:
* Indent the output delta ("Indent=yes" on the command-line)
* Remove whitespace with the NormalizeSpace filter prior to comparison
   (add "Preserve Whitespace=true" to the command-line tools).

We added these options to various comparison pipelines because they provide flexibility for users doing various types of comparison operation and post-processing, however they do not play well with the recombination operations.

If you are familar with the UNIX command-line tools diff and patch, this would be similar to reformatting a diff output file and then expecting patch to operate on it.

We document this, for the command-line tools in the release and online at:
  http://www.deltaxml.com/core/5.0/docs/command-processor.html#commands

From the exception stack-trace I see that you are using the API.  You may have noticed that there isn't a PipelinedCombiner as pipelining the combine operation doesn't make sense (there are no useful input or output filters that can be applied).  In general if you are using the com.deltaxml.api.XMLCombiner it makes sense to  also use the comparator from the same package, the com.deltaxml.api.XMLComparator.

I think we need to make our documentation clearer on these aspects of the Combiner API usage (similar to the command-line documentation) and we will do this
for our 5.1 release.

Thanks for the feedback and apologies for any confusion this has caused.

Nigel
DeltaXML Core > NonMatchingPCDataCombineException
Joined: 04-July-2008
Posts: 1
Posted: 07-July-2008 20:00
NonMatchingPCDataCombineException
Hi
I was able to create a delta file without problems but when trying to recombine the delta with one of the original files to get a third one I receive the following exception:

com.deltaxml.api.NonMatchingPCDataCombineException: The PCData at /videocollection in the delta file does not match the PCData at /videocollection in the input file.
   at com.deltaxml.c_b.c_cb.c_b(c_cb.java:240)
   at com.deltaxml.api.CombinerImpl.combine(CombinerImpl.java:8)
   at deltaxml.DeltaXMLTest.reverseXML(DeltaXMLTest.java:77)
   at deltaxml.DeltaXMLTest.main(DeltaXMLTest.java:93)

Here are the XML files

a.xml
<videocollection>
  <title>Tootsie</title>
</videocollection>


b.xml
<videocollection>
  <title>Tootsie 2</title>
</videocollection>


Any idea?
DeltaXML Core > want to know how the comparison logic works
Joined: 27-March-2007
Posts: 32
Location: Malvern, United Kingdom
Posted: 07-July-2008 16:49
Comparison - subsequences vs. substrings
Hello Skrish,


Sorry it took a while to examine/debug this one (we spent some hours
looking at this yesterday).

You only get this odd result using the 'enhanced' or 'document centric' matcher.

If you turn it off (it defaults to 'on' in the sandbox and command line), for example:

$ java -jar /usr/local/DeltaXMLCore-5_0/command.jar compare delta f1.xml f2.xml f12.xml "Enhanced Match 1=false"


You will see an improved result:

<root xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" xmlns:dxx="http://www.deltaxml.com/ns/xml-namespaced-attribute" xmlns:dxa="http://www.deltaxml.com/ns/non-namespaced-attribute" deltaxml:deltaV2="A!=B" deltaxml:version="2.0" deltaxml:content-type="full-context">
  <a deltaxml:deltaV2="B">value0</a>
  <a deltaxml:deltaV2="A=B" />
  <a deltaxml:deltaV2="A=B">value3</a>
</root>


The 'enhanced matcher' result is 'correct' but as you observed, it is not obvious or simple.  It is designed to optimize the matching in document centric XML (think of multiple paragraphs of text with some paras add/deleted/modified).  It takes into account all of the PCDATA in an element's subtree (which can be a large number when word-by-word splitting/reconstruction is used). 

In both cases the optimization function used in the matching or alignment is the 'Longest Common Subseqence' (also know as the 'edit path', or 'Levenshtien distance').

It is subtly different from the Longest Common 'Substring'.  Wikipedia has a good example of the difference:

http://en.wikipedia.org/wiki/Substring

http://en.wikipedia.org/wiki/Longest_common_substring_problem

http://en.wikipedia.org/wiki/Levenshtein_distance

In the case of your test-data, there are 2 optimal equal length subsequences when processing the flattened  data-structure used by the enhanced matcher.  Unfortunately, the algorithm returned the subsequence which leads to the non-intuitive result.  While this result may appear complicated it is however 'correct' and more generally there can be multiple 'correct' answers for any pair of inputs.  It is always possible to generate either input from a full-context delta.

So in summary:

  - the enhanced matcher works well with document like data and your data work better with this setting turned off.

  - the result is correct, if not obvious

  - it should work better with larger amounts of PCData and word-by-word turned on.  Your example exhibits some of the problems of microbenchmarks.  We'd be reluctant to look at optimizations/improvements unless you can show us a more realistic or larger example of mismatching.

I hope this answers your question.  I would suggest experimenting with larger test data and if you see any similar problems please get back to us.

Thanks,

Nigel
Filters > problem in deleting a element in the comparison
Joined: 28-March-2007
Posts: 9
Location: Salisbury, UK
Posted: 07-July-2008 12:51
RE: problem in deleting a element in the comparison
Hi Arul,

Firstly, from the example output you have included, it looks like you are using an older version of DeltaXML Core. We have recently released version 5.0 which, among other things, includes a new improved result format. You can download this at http://www.deltaxml.com/library/downloads.html and request a new evaluation key to use with this version.

I have tried to reproduce the result that you are getting from this comparison and I am unable to. Firstly, I had to edit the XML documents you included above as they were not valid. The </sec> closing tags needed to be changed to </sect1> closing tags. Comparing these new valid inputs with various settings always gave me the result you were expecting, with the first row being deleted. Are you using any input filters on the documents before you compare them?

You are using the PipelinedComparator to compare your documents. One setting that you may not be aware of is the Enhanced Matcher. This improves the matching of elements by evaluating the PCDATA in their subtrees. In your example, the Enhanced Matcher would cause the row in the second input to match with the second row of the first input based on the PCDATA inside it. To ensure the Enhanced Matcher is on, use the following line in your code:

pc.setComparatorFeature("http://deltaxml.com/api/feature/enhancedMatch1", true)

assuming your PipelinedComparator instance is called 'pc'

To further improve the matching when using the Enhanced Matcher, word by word comparison can be used. This requires the addition of an input filter and some output filters to the pipeline. If you are using Core 5.0, add com.deltaxml.pipe.filters.dx2.wbw.WordInfillter as the last input filter and com.deltaxml.pipe.filters.dx2.wbw.WordSpaceFixup and com.deltaxml.pipe.filters.dx2.wbw.WordOutfilter as the first two output filters.
Take a look at http://www.deltaxml.com/free/examples/core-features/word-by-word for an explanation of word by word comparison.

Hopefully this should help you solve your problem. If you are using your own input filters, try the comparison without them first to see if they are the cause of the problem. Do ask further questions if you are still having problems getting the correct result.

Tristan
DeltaXML Core > "ERROR: Could not find dxp file with id 'docbook'"
Joined: 15-April-2008
Posts: 8
Location: , Canada
Posted: 07-July-2008 18:24
Working now, with issues.
Did it a again, in reverse. This time I created a thread where I meant to create a reply. Can you delete the other thread?

Thanks Nigel.

It's all working now; although the duplicate-id removal script does not seem to function for docbook 5 documents, I just modified my docbook customization layer to detect and rename duplicate IDs and it publishes out now just fine.

I'm also trying to remove the DOCTYPE declaration from the output xml file, but although the output file claims <!--Generated by docbook-outfilter.xsl-->, methinks it's a liar, because there is no such XSL file in the 5.0 distribution that I can find.
DeltaXML Core > All good now
Joined: 15-April-2008
Posts: 8
Location: , Canada
Posted: 07-July-2008 18:22
All good now
Thank Nigel.

It's all working now; although the duplicate-id removal script does not seem to function for docbook 5 documents, I just modified my docbook customization layer to detect and rename duplicate IDs and it publishes out now just fine.

I'm also trying to remove the DOCTYPE declaration from the output xml file, but although the output file claims <!--Generated by docbook-outfilter.xsl-->, methinks it's a liar, because there is no such XSL file in the 5.0 distribution that I can find.
DeltaXML Core > docbookinfilter.xsl, docbookoutfilter.xm, and Docbook 5 (namespaces)
Joined: 27-March-2007
Posts: 32
Location: Malvern, United Kingdom
Posted: 07-July-2008 17:56
Status update
Apologies for the delay - a mixture of good and bad news:

Our docbook5 comparison pipeline is ready to be included in our 5.1 release.  This part of the process takes two docbook5 files and produces and docbook5 result (with namespaced elements and revisionflags).

We've tested and validated our result files.

One good improvement in 5 is that d:phrase is allowed in many more places than previously.  This means we can provide added/deleted revisionflags rather than just changed revisionflags.  In terms of FO/PDF output this means more red/green rather than blue differences.

What has taken much longer than expected is updating the docbook-xsl customization layer - this is not really an issue to do with the introduction of the namespaces, but rather as a result of understanding and then supporting the changes in the various docbook-xsl releases over the last year or so.

If you (or any forum reader) just needs the docbook5 pipeline and are not using our customization layer we are happy to provide the DXP and XSLT filter files now.  We're still in the process of testing (we generate PDFs for 120 pairs of test files and 'eyeball' them for correctness).  This is testing the docbook-xsl customization, but also to some extent the whole 'publishing pipeline' including our 'comparison pipeline' and so would rather release the complete docnbook5 pipeline when we're done.

Thanks, and apologies for the delay,

Nigel
DeltaXML Core > "ERROR: Could not find dxp file with id 'docbook'"
Joined: 27-March-2007
Posts: 32
Location: Malvern, United Kingdom
Posted: 07-July-2008 17:35
Some renaming in Core 5.0
Hello,

Likely should have made this a new thread in the first place....


No problem - I need to go back to the previous thread soon anyway to provide an update...

I'm trying to evaluate Core 5.0 now, and getting the "ERROR: Could not find dxp file with id 'docbook'" error message when I run:

"java -jar command.jar compare docbook ../../../docs/xmllint_temporary1.xml ../../../docs/xmllint_temporary2.xml docbook-result.xml ../../../docs/outfile.xml"


This is because we renamed the sub-command in our Core 5.0 release (in the expectation of docbook5 support to be added).

If you run the command without arguments (for example: java -jar command.jar), it reports the available sub-commands, which now include docbook4.

So your command above just needs to change docbook to docbook4

As an aside:  I did explore the possibility of keeping a generic docbook pipline (which would output namespaced or non-namespaced docbook depending upon the inputs), but it was complicated and I expected the subsequent processing (eg selection of docbook-xsl vs. docbook-xsl-ns) would require/expect one or the other form of output.

The xmllint files are just files that have had their Xinclude elements consolidated by XMLlint.


OK - understood.  When we add our docbook5 pipeline/target to command.jar we will configure it to make use of Xerces' Xinclude support.

I've downloaded the latest evaluation version of Core. I suspect that this is just a config issue.


It looks like we may have web-pages or other documentation which still says 'compare docbook'.  I will try to hunt it out and change it now.

Thanks,

Nigel
DeltaXML Core > want to know how the comparison logic works
Joined: 03-July-2008
Posts: 1
Posted: 07-July-2008 17:01
want to know how the comparison logic works
I tested the two samples of xml. Samples are given below.
In sample 1, document 1 has an empty element. Document 2 has a new element added in the first position. The empty element (of document 1 ) is present in document 2 at second position. Delta XML comparison results marks the change as,
a) first element is a new (added) element
b) second element is unchanged (the empty element)

I created another sample by adding a new element in both the documents (document 1 and 2). Value of the new element is not changed in document 2. if i compare the document 1 and 2, results are changing. The first element has been marked as "modified". In the previous comparison it was marked as "new". I could not understand how the addition of a new element in both documents changes the results. Need help me in understanding the logic behind comparison.

Sample 1 - Document 1
-------------------------

<root>
<a></a>
</root>

Sample 1 - Document 2
-----------------------

<root>
<a>value0</a>
<a></a>
</root>

Sample 1 - Result
------------------

<root deltaxml:deltaV2="A!=B" deltaxml:version="2.0" deltaxml:content-type="full-context">
<a deltaxml:deltaV2="B">value0</a>
<a deltaxml:deltaV2="A=B" />
</root>


Sample 2 - Document 1
---------------------

<root>
<a></a>
<a>value3></a>
</root>

Sample 2 - Document 2
---------------------

<root>
<a>value0</a>
<a></a>
<a>value3></a>
</root>

Sample 2 - Result
------------------

<root deltaxml:deltaV2="A!=B" deltaxml:version="2.0" deltaxml:content-type="full-context">
<a deltaxml:deltaV2="A!=B">
<deltaxml:textGroup deltaxml:deltaV2="B">
<deltaxml:text deltaxml:deltaV2="B">value0</deltaxml:text>
</deltaxml:textGroup>
</a>
<a deltaxml:deltaV2="B" />
<a deltaxml:deltaV2="A=B">value3&gt;</a>
</root>
Filters > problem in deleting a element in the comparison
Joined: 19-June-2008
Posts: 4
Posted: 07-July-2008 16:13
problem in deleting a element in the comparison
old xml

<book>
   <chapter id="chapter_4">
      <sect1 condition="trusteesandofficers" continued="true" id="chapter_4-sect1_1">
         <table cols="2" datatype="2" primary="" render="0" rows="17" style="" tabletype="8">
               <tbody>
                  <row condition="TT1L">
                     <entry> Michael H. Koonce<sbr/>  Secretary<sbr/>  DOB: 4/20/1960<sbr/>  Term of office since: 2000 </entry>
                     <entry> Principal occupations: Senior Vice President and General Counsel, Evergreen Investment Services, Inc.; Secrtary, Senior Vice President and General Counsel, Evergreen InvestmentManagement Company, LLC and Evergreen Service Company, LLC; Senior Vice President and Assistant General Counsel, Wachovia Corporation
                           <footnoteref alt="1" idref="577390002" label="3" order="1" size="1" type="button" value="577390002"/>
                           <footnoteref alt="1" idref="577400002" label="4" order="1" size="1" type="button" value="577400002"/>
                           <footnoteref alt="1" idref="577410002" label="5" order="1" size="1" type="button" value="577410002"/> </entry>
                  </row>
                  <row condition="TT1L">
                     <entry> James Angelos<sbr/>  Chief Compliance Officer<sbr/>  DOB: 9/2/1947<sbr/>  Term of office since: 2004 </entry>
                     <entry> Principal occupations: Chief Compliance Officer, Evergreen Funds and Senior Vice President of Evergreen Investments Co, Inc; Former Director of Compliance, Evergreen Investment Services,Inc
                        <footnoteref alt="1" idref="577420002" label="6" order="1" size="1" type="button" value="577420002"/>
                        <footnoteref alt="1" idref="57743" label="7" order="1" size="1" type="button" value="57743"/>
                        <footnoteref alt="1" idref="57744" label="8" order="1" size="1" type="button" value="57744"/> </entry>
                  </row>
               </tbody>
         </table>
      </sec>
   </chapter>
</book>

New xml

<book>
   <chapter id="chapter_4">
      <sect1 condition="trusteesandofficers" continued="true" id="chapter_4-sect1_1">
         <table cols="2" datatype="2" primary="" render="0" rows="17" style="" tabletype="8">
               <tbody>                  
                  <row condition="TT1L">
                     <entry> James Angelos<sbr/>  Chief Compliance Officer<sbr/>  DOB: 9/2/1947<sbr/>  Term of office since: 2004 </entry>
                     <entry> Principal occupations: Chief Compliance Officer, Evergreen Funds and Senior Vice President of Evergreen Investments Co, Inc; Former Director of Compliance, Evergreen Investment Services,Inc
                        <footnoteref alt="1" idref="577420002" label="6" order="1" size="1" type="button" value="577420002"/>
                        <footnoteref alt="1" idref="57743" label="7" order="1" size="1" type="button" value="57743"/>
                        <footnoteref alt="1" idref="57744" label="8" order="1" size="1" type="button" value="57744"/> </entry>
                  </row>
               </tbody>
         </table>
      </sec>
   </chapter>
</book>

got Result

<book xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" deltaxml:delta="WFmodify" >
   <chapter deltaxml:delta="WFmodify" id="chapter_4">
      <sect1 deltaxml:delta="WFmodify" condition="trusteesandofficers" continued="true" id="chapter_4-sect1_1">
         <table rows="16" deltaxml:delta="WFmodify" cols="2" datatype="2" primary="" render="0" style="" tabletype="8">
            <tbody>
                  <row deltaxml:delta="WFmodify" condition="TT1L">
                     <entry deltaxml:delta="WFmodify"> <deltaxml:PCDATAmodify>
                     <deltaxml:PCDATAold>Michael H. Koonce</deltaxml:PCDATAold>
                     <deltaxml:PCDATAnew>James Angelos </deltaxml:PCDATAnew>
                     </deltaxml:PCDATAmodify>
                     <sbr deltaxml:delta="unchanged"/> <deltaxml:PCDATAmodify>
                     <deltaxml:PCDATAold>Secretary  </deltaxml:PCDATAold>
                     <deltaxml:PCDATAnew>Chief Compliance Officer</deltaxml:PCDATAnew>
                     </deltaxml:PCDATAmodify>
                     <sbr deltaxml:delta="unchanged"/> DOB: <deltaxml:PCDATAmodify>
                     <deltaxml:PCDATAold>4/20/1960</deltaxml:PCDATAold>
                     <deltaxml:PCDATAnew>9/2/1947</deltaxml:PCDATAnew>
                     </deltaxml:PCDATAmodify>
                     <sbr deltaxml:delta="unchanged"/> Term of office since: <deltaxml:PCDATAmodify>
                     <deltaxml:PCDATAold>2000 </deltaxml:PCDATAold>
                     <deltaxml:PCDATAnew>2004 </deltaxml:PCDATAnew>
                     </deltaxml:PCDATAmodify>
                     </entry>
                     <entry deltaxml:delta="WFmodify"> Principal occupations: <deltaxml:PCDATAmodify>
                     <deltaxml:PCDATAold>Senior Vice President and General Counsel,</deltaxml:PCDATAold>
                     <deltaxml:PCDATAnew>Chief Compliance Officer,   </deltaxml:PCDATAnew>
                     </deltaxml:PCDATAmodify> Evergreen <deltaxml:PCDATAmodify>
                     <deltaxml:PCDATAold>Investment Services, Inc.; Secrtary,</deltaxml:PCDATAold>
                     <deltaxml:PCDATAnew>Funds and  </deltaxml:PCDATAnew>
                     </deltaxml:PCDATAmodify> Senior Vice President <deltaxml:PCDATAmodify>
                     <deltaxml:PCDATAold>and General Counsel,</deltaxml:PCDATAold>
                     <deltaxml:PCDATAnew>of  </deltaxml:PCDATAnew>
                     </deltaxml:PCDATAmodify> Evergreen <deltaxml:PCDATAmodify>
                     <deltaxml:PCDATAold>InvestmentManagement Company, LLC and   </deltaxml:PCDATAold>
                     <deltaxml:PCDATAnew>Investments Co, Inc; Former Director of Compliance,</deltaxml:PCDATAnew>
                     </deltaxml:PCDATAmodify> Evergreen <deltaxml:PCDATAmodify>
                     <deltaxml:PCDATAold>Service Company, LLC; Senior Vice President and Assistant General Counsel, Wachovia Corporation</deltaxml:PCDATAold>
                     <deltaxml:PCDATAnew>Investment Services,Inc          </deltaxml:PCDATAnew>
                     </deltaxml:PCDATAmodify>
                     <footnoteref idref="577420002" value="577420002" deltaxml:delta="WFmodify" alt="1" label="3" order="1" size="1" type="button"/>
                     <footnoteref idref="57743" value="57743" deltaxml:delta="WFmodify" alt="1" label="4" order="1" size="1" type="button"/>
                     <footnoteref idref="57744" value="57744" deltaxml:delta="WFmodify" alt="1" label="5" order="1" size="1" type="button"/>
                     </entry>
                     </row>
                           <row deltaxml:delta="delete" condition="TT1L">
                     <entry> James Angelos<sbr/> Chief Compliance Officer<sbr/> DOB: 9/2/1947<sbr/> Term of office since: 2004 </entry>
                     <entry> Principal occupations: Chief Compliance Officer, Evergreen Funds and Senior Vice President of Evergreen Investments Co, Inc; Former Director of Compliance, Evergreen Investment Services,Inc<footnoteref alt="1" idref="577420002" label="6" order="1" size="1" type="button" value="577420002"/>
                     <footnoteref alt="1" idref="57743" label="7" order="1" size="1" type="button" value="57743"/>
                     <footnoteref alt="1" idref="57744" label="8" order="1" size="1" type="button" value="57744"/>
                     </entry>
                  </row>
            </tbody>
         </table>
      </sec>
   </chapter>
</book>

Expected Result

<book xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1" deltaxml:delta="WFmodify" >
   <chapter deltaxml:delta="WFmodify" id="chapter_4">
      <sect1 deltaxml:delta="WFmodify" condition="trusteesandofficers" continued="true" id="chapter_4-sect1_1">
         <table rows="16" deltaxml:delta="WFmodify" cols="2" datatype="2" primary="" render="0" style="" tabletype="8">
            <tbody>
                  <row deltaxml:delta="delete" condition="TT1L">
                     <entry> Michael H. Koonce<sbr/>  Secretary<sbr/>  DOB: 4/20/1960<sbr/>  Term of office since: 2000 </entry>
                     <entry> Principal occupations: Senior Vice President and General Counsel, Evergreen Investment Services, Inc.; Secrtary, Senior Vice President and General Counsel, Evergreen InvestmentManagement Company, LLC and Evergreen Service Company, LLC; Senior Vice President and Assistant General Counsel, Wachovia Corporation
                           <footnoteref alt="1" idref="577390002" label="3" order="1" size="1" type="button" value="577390002"/>
                           <footnoteref alt="1" idref="577400002" label="4" order="1" size="1" type="button" value="577400002"/>
                           <footnoteref alt="1" idref="577410002" label="5" order="1" size="1" type="button" value="577410002"/> </entry>
               
                     </row>
                     <row condition="TT1L">
                        <entry> James Angelos<sbr/> Chief Compliance Officer<sbr/> DOB: 9/2/1947<sbr/> Term of office since: 2004 </entry>
                        <entry> Principal occupations: Chief Compliance Officer, Evergreen Funds and Senior Vice President of Evergreen Investments Co, Inc; Former Director of Compliance, Evergreen Investment Services,Inc<footnoteref alt="1" idref="577420002" label="6" order="1" size="1" type="button" value="577420002"/>
                           <footnoteref alt="1" idref="57743" label="7" order="1" size="1" type="button" value="57743"/>
                           <footnoteref alt="1" idref="57744" label="8" order="1" size="1" type="button" value="57744"/>
                        </entry>
                  </row>
            </tbody>
         </table>
      </sec>
   </chapter>
</book>

Note: I am using PipelinedComparator to compare the files. If you see the difference between the expected result and the got result, the result that i got is collapsed blacklining.

Here in the old xml I have a table with two rows with footnoteref element and in the new xml i have deleted the first row. I compared the two xmls. In the diff xml I have the result as the first row as modified and the second row as deleted.
What can I do to resolve this problem. I am helpless. Could you help me how to do?