2011-04-14

Source Code Symmetry and Transcendent Programming Tools

The more experienced you become as a coder, the more you look at patterns and shapes and symmetry, rather than just reading code a character or token at a time to understand a piece of code.  This is an extremely desirable programming skill, because:
  1. The bandwidth and pattern recognition capabilities of the subconscious layers of the human visual system far exceed those of the conscious reasoning brain.
  2. Cross-checking code symmetries can effectively ensure completeness and correctness of code in many situations.
By analogy, when a beginning chess player is evaluating their next best move, they search the space of moves in the immediate neighborhood of the current board configuration. A really good chess player however does not spend much time thinking of individual moves, but rather thinks at a much higher level of abstraction, dealing with patterns and strategies.  The advanced player is in effect able to compress a huge amount of information into a relatively small number of concepts, and is able to employ much more powerful reasoning tools over these concepts.

A really simple example of source code symmetry is:

a[i].x += a[i - 1].x;
a[i].y += a[i - 1].y;

The visual symmetry here is that 'x' and 'y' run across the rows, but 'i' and 'i - 1' run down the columns. If you know that's what you're expecting and you don't see those things lining up without even thinking about what they mean, then you probably have a copy/paste error.

There are many complex abstract examples of code symmetry, and very few advanced programmers would even be able to enunciate the subconscious tools they employ daily to analyze visual and logical symmetries when writing code.  Some vague examples include:
  • Symmetries in the wavy line of indentation (indicating symmetries in scope and control flow);
  • Relative differences in the structure of nested function call applications between two expressions;
  • Differences in Boolean operators used between a block of related but different complex Boolean expressions.
In general, the presence of visual symmetry indicates the presence of underlying structural symmetry in the program. The converse is not necessarily true however -- programming language syntax may obfuscate the underlying symmetry of a program if the syntax is not designed to render functional symmetries in a visually symmetric way.

I have believed for a long time that one day we will be able to write really, really good programming tools that alert you to broken symmetries (e.g. copy/paste errors where you copied code for x but forgot to change the x to a y for the second version) -- or even suggest code or write code for you based on predicted symmetries or symmetries that are detected to be incomplete. This sort of power could catch a lot of the sorts of bugs you get from confusing two similarly-named variables etc. And I suspect programming with this level of integration between syntax, logical structure and IDE functionality would take programming to a completely new, transcendent level.

In the general case, detecting all reasonable symmetries for an arbitrary programming language may be uncomputable or at least intractable. A better approach would be to bend a language's syntax to support symmetry explicitly as a top-level feature of the language. This would involve identifying the types of symmetry you typically find in a program and finding syntactic ways of binding the symmetrical parts together in useful ways. Functions are already a weak form of this, since they allow for the common parts of a symmetry to be abstracted away and parameterized. But I think it could go a lot deeper than that.