Content Architect and XML Consultant, Karnov Group
each varying in size, with an average of around 200 Kbytes
due to XSLT pipelines and the merging being automated
For over 150 years, Karnov Group has provided information in law, tax, auditing and accounting across countries. They help both private and public customers daily while supporting legal certainty. Their solutions are based on reliable knowledge, authored by subject-matter experts, and delivered via technology.
It is essential that organisations which publish reference material relied upon by professionals ensure that their content is accurate and up-to-date. This is particularly vital in fields such as law, engineering, and medicine. It is desirable, then, that such material should be stored in a single repository for ease of updating, version control, and dissemination.
The Karnov Group Denmark is a Danish legal publishing house which provides a legislative overview to the legal profession, courts of law, accountants and local government employees in Denmark. It has been established for over 150 years and currently has 150 employees at its Copenhagen headquarters and 550 associated professionals. It is part of the Karnov Group AB, based in Sweden, which had a turnover in the last financial year of SEK 878M. Every year it publishes Karnov’s Law Books. In addition, Ugeskrift for Retsvæsen (Weekly Journal for the Judiciary), other professional journals and casebooks are published weekly or annually.
In 2017, Karnov Group Denmark bought Norstedts Juridik, a Swedish legal publisher, and set about merging their document sources. Both companies published the Swedish legislation, known as the Swedish Code of Statutes or SFS, online and in print, and both used in-house XML sources based on PDF sources provided by Regeringskansliet, i.e. the Swedish government. But the companies used separate tag sets and their own interpretations of the semantics in the PDFs. They also enriched the basic law text with extensive annotations, links to (and from) case law, and so on. And, in the case of Norstedts, the XML source is continually updated to support their flagship product, the so-called Blue Book, a printed book of the entire in-force SFS, updated yearly as the law itself changes.
It made little sense to continue maintaining the two SFS sources separately, of course. Instead, what was needed was, essentially, the sum of the two, with annotations, missing versions (significant gaps existed in the versions’ histories), and so on, added. There were 8-9,000 separate statutes, occupying a total of about 16,000 files, each varying in size from a few Kbytes to 10 or 12 Mbytes, with an average of around 200 Kbytes. So, obviously, the task of merging these two datasets was enormous and needed to be automated somehow.
At first glance, in order to maintain a single source of legislative information it might be enough to simply pick one XML source and use that. However, both companies had their existing online systems and customer bases with differing product offerings. As suggested above, the sources described the same thing, the Swedish Code of Statutes, but the sets were not an exact match. Sometimes one company would have SFS documents, the other didn’t, and often, there were significant gaps with older versions of individual chapters and paragraphs. It made little sense to throw any of that away just to make the merge simpler.
There was also the matter of the Norstedts flagship product, the printed law book, that had to be included in any future offerings, so anything written specifically for that book had to be included.
Similarly, Karnov included extended notes in their SFS content and made that available as a separate product, which meant that they would have to be preserved, too. Therefore, in one way or another, the respective SFS sources would have to be merged.
It would obviously be a huge undertaking to carry out this merge manually so in early 2018 Karnov Group recruited Ari Nordstrom as XML guru to address this problem. Ari is a content architect and XML expert with over twenty years of experience in single-source document management and publishing, encompassing most XML standards in use today, from schema languages to XSLT, XQuery, and XProc. His past clients include organisations such as Volvo Cars, The Swedish Federation of Farmers, LexisNexis, and many others. Ari was tasked with managing the project to merge the two disparate SFS sources, and then to devise a repeatable method for updating the documents as laws are made, revised or repealed.
In order to merge the SFS content into a single XML source first convert both sources into a single exchange format. Then compare these versions using DeltaXML’s XML Compare, and then do the actual merge based on the difference file produced. Finally, convert the merged content into a future editing format. This process was to be done in six stages:
The various transformations were performed using XSLT pipelines managed by XProc. Once both sources were in exchange format, they needed to be compared with each other. For this, Ari chose what many see as the industry standard for comparing XML, the XML Compare tool from DeltaXML Ltd. Ari was familiar with XML Compare when he joined Karnov, and it was one of those tools that he really wanted to use in a big project, having listened to DeltaXML and others discuss the product and differencing at XML Prague and Balisage.
XML Compare compares two XML files, “A” and “B”, with each other according to predefined rules and inserts differencing markup to show where the differences lie. XML Compare can optionally output an HTML representation of the differenced A and B files, which proved helpful in this project when discussing the merge with various project stakeholders. Crucially, it was in Step 4 where the unique functionality of the XML Compare product from DeltaXML Ltd proved invaluable. For example, when comparing sources that supposedly have the same base, along with some definable differences, it is useful to tell the compare process that certain nodes are intended to be the same. In XML Compare, you tell the application by adding deltaxml:key attribute values to any nodes that are the same in both sources.
“I have close to 30 years in the field and in my experience XML Compare doesn’t have an equal at what it does. Again, I’d tested it but hadn’t had the chance to use it in a real project. Basically, I told Karnov what I needed, and they were good enough to trust me. I’d used oXygen’s built-in diffing, of course, but it’s nowhere near as powerful. The various software diffing tools – there are a few decent ones – aren’t XML-aware, so they were never an option.”
Ari Nordstrom, Content Architect and XML Consultant, Karnov Group
Ari was fully occupied on this project for 6 months at the end of which the two disparate systems had been successfully merged, while both companies’ regular publishing activities had continued undisturbed.
DeltaXML’s XML Compare formed a vital component of this project. The fact that the process had to be repeatable – new laws were being written all the time – made most other approaches unworkable. While there is legislation from at least 1750 and onwards, most of the code was written very recently, within the last decade or two. A manual approach would not have worked.
The advantages of using XML Compare for this project can be summarised as follows:
Content Architect and XML Consultant, Karnov Group
DeltaXML’s products are used throughout the world, from SMEs to global enterprises. Our comparison and merging software transform the way our customers handle change in their XML documents, XML data and JSON files. If the above story resonated with your own challenges get in touch with us by either booking a demo or filling in our contact form.