Ian Varley

What are some pros and cons for textual, syntactical, and semantical levels of integration in source control systems?

One pro for textual integration is its universality; text is more or less the same everywhere (localization aside) and nearly every programming environment in existence uses text as its main storage format; therefore, algorithms that work at a purely textual level (like DIFF) are widely applicable, where more sophisticated syntactical or semantical integration would by its nature have a much narrower base of applicability. This also means that tools working with text have much broader support and more longstanding stability.

That said, tools that use deeper levels of analysis can be tantalizing, in that they promise the ability to automate merges that would require human intervention in a plain text tool. Working from the abstract syntax tree allows them to ensure that integrated programs maintain correctness of programming language constructs. A syntactically aware DIFF engine could ignore differences that make no difference - for example, extra whitespace in C programs could be ignored, or at least suppressed during file diffing operations, so that indenting a large block of code wouldn't unintentionally obscure the 1 real important conflicting change buried within the long list of removed and added lines.

Of course, a con of syntactic integration is that each kind of syntax needs its own DIFF engine, and there are many more decisions to be made (should the order of functions be considered a syntatic change or not? etc.). That level of complexity probably accounts for why such tools aren't in more common use.

Semantic integration, of course, is one level deeper even than syntactic integration, where not only notational correctness is considered, but also functional correctness - using program slicing, the algorithm can determine whether two changes are or are not "disjoint" - meaning, could they be in conflict based on what else they touch in the program? If they can't be in conflict, then it's a very safe merge; if they could, it could be an unsafe merge. This could be very useful for trying to automate more complicated merge procedures, such as public projects with large numbers of developers.

Again, the con here is the specificity and added complexity - in practice, because text integration works so well in real life situations, there's little to warrant adding this level of complexity, at least in most situations.

Ian Varley

Coding | Examples | On Source Control