DeltaXML Support Forums
| new topic post reply | DeltaXML Products and Applications -> DeltaXML Core -> Performance impact of deltaxml:key |
Performance impact of deltaxml:key | |||||
Posted:
03-July-2007 009:13 Hello, In my code, I've observed that deltaxml:key severly impacts performance. The situation is as follows: The XMLs to compare are approx. 4 MB each. Diffing takes about 45 sec. with raw documents but a whopping 50 minutes(!) when I add keys using a custom filter (takes about 5 seconds). Is this to be expected or due to an error on my side? I have already checked that the keys are unique (indeed they are). Also everything is ordered compare. I'm wondering however if nested assignment of the keys may have something to do with this... Is it recommended to use keys that way in a ordered comparison? Any hints are appreciated. Thanks, Philipp | |||||
Re: Performance impact of deltaxml:key | |||||
Posted:
03-July-2007 16:22 Hello Philipp,
That's a problem we would certainly like to take a look at. Using keys should make things faster - The matching code when try to align two lists of sibling nodes at a particular level of the tree will use keys as a short-cut, it won't try things like exact subtree equality if it finds matching keys.
I can guess at one or two possibilities here: - adding the filter causes the whole process/pipeline to nearly fill the available memory - most of the 50 minutes is taken up with garbage collection. - The key values that you are using are somehow causing the hash functions in the java.util collections code to behave poorly.
Keys can be nested, but don't need to be - the matching process is recursive. The rule is: keys must be unique amongst sibling elements of the same name or 'type' (equal local-name and NS URI) I presume you've seen: http://www.deltaxml.com/library/how-to-compare-orderless-elements.html in particular section 4 (we should/must give this document a better title!)
1. Do the filtering separately and then compare the filtered file - perhaps this is your 5s time? 2. Experiment with the heap size (java -Xmx ...) 3. Compare a file with itself - we don't do any shortcuts in this case, but it should be the optimal O(n) performance case. 4. If it is possible, please let us see the data or a small sample of it(either with the keys or the raw data with the keying code). Our forum system doesn't support attachments, you could try pasting the data/code, or providing some URIs. If your concerned about privacy you could also use email: support@deltaxml.com I'm curious/concerned about this problem (the keyed matching was code I wrote/tested) and I will be happy to help further. Cheers, Nigel | |||||
![]() | |||||
Posted:
04-July-2007 14:45 Hi Nigel, thanks for your quick reply and your suggestions. I've sent in the pre-processed data to support@deltaxml.com. Thanks in advance for looking into this :-) Cheers, Philipp | |||||
| new topic post reply |
To find out about new replies to this post as they occur please subscribe to one of these feeds: | ![]() ![]() | moderate |
Performance impact of deltaxml:key
