Posted1 February 2019
byDeltaXML

Git merge – where do we fit in?

Posted1 February 2019
byDeltaXML

This post examines and discusses the various ways in which the git merge process can be extended and explains why we’re suggesting that its more appropriate to integrate our tree-based merge tools as merge-drivers rather than the more common route of providing a mergetool. The merge (and also graft) process in Git involves a number of components:

The ‘merge strategy’ is responsible for looking at all of the files and directories with an understanding of moves and renames, matching up the corresponding files, determining the appropriate ancestor and calling the merge driver on triples of files. In some cases the scenario will determine that a full merge is unnecessary and may, for example, perform a fast-forward merge. It is also possible to specify a scenario such as ‘ours’ that produces a result that takes all of the files on a certain branch. In these cases it is not really a full merge and the merge driver may not be invoked.

The ‘merge driver’ receives three files corresponding to the ancestor and the two branches, loads the content into memory and is responsible for aligning their content and identifying any conflicts. Using a return code, it signals to the invoking code whether there are any conflicts. This usually reports a message to the user, often using a line starting with a capital “C” character. The diff3 merge driver in Git represents conflicts using a textual line based format consisting of marker lines using angle-bracket, equals or plus characters. The git config command can be used to configure different merge drivers and the .gitattributes file can then select between them using the filenames or extensions of the files being merged. For example:

*.xml,*.xsl    merge=xmlmerge
*.json         merge=jsonmerge

The user typically needs to resolve any conflicts in each file before the merge operation can be completed and committed. It is possible to take the file with the markers produced by the driver and resolve the conflicts by editing that output in a text editor and then reporting that the file has been resolved. A ‘mergetool’ provides a graphical user interface to automate the conflict resolution process, often allowing the user to select content from one of the branches or possibly the ancestor for each of the conflicting regions in the file.

There are two common usage or interaction patterns we have found relating to the use of merge drivers and merge tools:

The merge driver produces the line-based conflict markers and then the merge tool reads the result file from the driver, interprets the markers and provides the user with selection capabilities based on this interpretation. We know of two merge tools which take this approach: MS Visual Studio Code and TkDiff. Please do add comments if you know any others.
The merge tool, when it is invoked, is supplied with the filename of the driver result, but also the names of the original inputs to the merge driver. It can then re-merge the inputs and perhaps base its user interface on internal data structures from its own merge algorithm. Examples of tools which re-merge and do not seem to use the driver results include: Araxis Merge, P4Merge, and OxygenXML.

It’s possible to integrate our XML and JSON aware merge products into the Git merge process as either a merge driver or merge tool. We believe that the best approach is to integrate as a merge driver and use the following arguments.

Avoiding conflict confusion

The merge driver and merge tool should identify the same conflicts, i.e. behave in a consistent way. When processing XML with a line-based algorithm (such as diff3), changes such as those to attribute order might cause a conflict in the merge driver. In many workflows a conflicting file would cause the merge tool to be invoked in order to resolve the conflict. But if the merge tool then uses a tree-based XML or JSON aware algorithm this would not identify these apparent conflicts and the file may not even have any conflicts present. The unnecessary invocation of the merge tool may cause confusion for the user.

Improved non-conflicting results

A tree-based merge algorithm which is XML or JSON aware would normally produce well-formed XML or JSON results. However this is not true of a line based merge such as diff3, where the result may have mismatched element tags for example. These bad results will not necessarily be associated with a conflict – the mismatched tag may be non-conflicting. If the tree-aware algorithm is only used in the merge tool, it may never be invoked unless there is a conflict and it is therefore possible for bad result to go unnoticed.

An algorithm with a better understanding of the data and its semantics can make better alignment decisions. Again in non-conflicting situations it makes sense to have this better alignment performed at the merge driver stage.

Simpler software design

The separation of the merge algorithm and a conflict resolving GUI can lead to simpler software design. It may be that merge tools find the textual markers insufficient for their needs and can provide a better experience by re-running a merge algorithm, but the merge architecture would be simpler if this was not necessary. This would avoid duplicated code and reduce the processing and IO required.

Let’s finish the post with screenshot. Here’s one of our test files for an attribute conflict. We’ve used an XML aware merge process in the merge driver and it identifies the attribute conflict. In this case we’re using the same textual markers to annotate the conflict as diff3 uses, but we have reformatted the conflict to minimise and precisely include just the conflicting part of the XML tree. If Visual Studio Code is used as a merge tool it then provides the conflict handling capabilities shown just above each of the conflicting areas. The screenshot is part of ongoing experimentation with change representations that can be used to communicate between the merge driver and mergetool – we’re also looking at XML and JSON based markup and we are planning to discuss this more in future blog posts or conference papers.

Keep Reading

How Move Detection Improves Document Management

3 July 2024

/

0 Comments

Learn how move detection technology improves document management by accurately tracking relocated content.

Streamlining Data Syndication in PIM Systems through JSON Comparison

3 July 2024

/

0 Comments

Utilise JSON comparison to reduce errors, labour costs, and system downtime.

Move detection when comparing XML files

28 May 2024

/

0 Comments

DeltaXML introduces an enhanced move detection feature that provides a clearer insight of how your content has changed.

Configuring XML Compare for Efficient XML Comparison

21 May 2024

/

0 Comments

Define pipelines and fine-tune the comparison process with various configuration options for output format, parser features, and more.

A Beginner’s Guide to Comparing XML Files

20 May 2024

/

0 Comments

With XML Compare, you receive more than just a basic comparison tool. Get started with the most intelligent XML Comparison software.

Introducing Character By Character Comparison

11 April 2024

Find even the smallest differences in your documents with speed and precision with character by character comparison.

Tackling Tracked Changes & Overcoming Hurdles in Managing Large Document Revisions

1 March 2024

Managing large document revisions is challenging with tracked changes.

Everything Great About DeltaJSON

20 February 2024

Accessible through an intuitive online GUI or REST API, DeltaJSON is the complete package for managing changing JSON data. Learn everything about makes DeltaJSON great.

Mastering Complex Table Comparisons Within Your Technical Documentation

16 February 2024

Our software excels at presenting changes in complex tables and technical content.

XML Compare →

XML Data Compare →

DITA Compare →

DocBook Compare →

Watch our latest video

How DeltaXML are the Industry Standard

XML Merge →

DITA Merge →

New Release for Merge

Mastering table comparison and merging

Content Compare S1000D →

Content Compare JATS →

Content Compare BITS →

Content Compare NISO-STS →

Content Compare XSL-FO →

Watch our latest video

Getting started with Content Compare

DeltaJSON →

Book a demo

Integrate DeltaJSON into your workflows and applications

ConversionQA →

Meet your new best friend

Getting started with ConversionQA

Resources →

Events and webinars →

Customer stories →

Partners →

Speak to an expert →

Story spotlight

How Karnov Group Merged Two Legal Publishing Companies’ Incompatible Content Databases

All documentation →

Support portal →

BitBucket repositories →

AWS AMI documentation →

Licensing user guide →

Trialing our software?

Get up and running with your DeltaXML evaluation with our video playlist

oXygen Adaptor →

XSLT / Xpath → 55K+ installs

CALS Table Viewer → 534 installs

XPath Notebook → 6K+ installs

Looking for something specific?

We’re confident we can beat any XML comparison challenge

Share this blog

Git merge – where do we fit in?

Avoiding conflict confusion

Improved non-conflicting results

Simpler software design

Share this blog

Keep Reading

Never miss an update on DeltaXML

Our Products

Resources

Company

Follow us

Integrate and customise comparison results within your systems and processes. Learn More

Never miss an update

XSLT / Xpath → ^{55K+ installs}

CALS Table Viewer → ^{534 installs}

XPath Notebook → ^{6K+ installs}