2011-04-14

Source Code Symmetry and Transcendent Programming Tools

The more experienced you become as a coder, the more you look at patterns and shapes and symmetry, rather than just reading code a character or token at a time to understand a piece of code.  This is an extremely desirable programming skill, because:
  1. The bandwidth and pattern recognition capabilities of the subconscious layers of the human visual system far exceed those of the conscious reasoning brain.
  2. Cross-checking code symmetries can effectively ensure completeness and correctness of code in many situations.
By analogy, when a beginning chess player is evaluating their next best move, they search the space of moves in the immediate neighborhood of the current board configuration. A really good chess player however does not spend much time thinking of individual moves, but rather thinks at a much higher level of abstraction, dealing with patterns and strategies.  The advanced player is in effect able to compress a huge amount of information into a relatively small number of concepts, and is able to employ much more powerful reasoning tools over these concepts.

A really simple example of source code symmetry is:

a[i].x += a[i - 1].x;
a[i].y += a[i - 1].y;

The visual symmetry here is that 'x' and 'y' run across the rows, but 'i' and 'i - 1' run down the columns. If you know that's what you're expecting and you don't see those things lining up without even thinking about what they mean, then you probably have a copy/paste error.

There are many complex abstract examples of code symmetry, and very few advanced programmers would even be able to enunciate the subconscious tools they employ daily to analyze visual and logical symmetries when writing code.  Some vague examples include:
  • Symmetries in the wavy line of indentation (indicating symmetries in scope and control flow);
  • Relative differences in the structure of nested function call applications between two expressions;
  • Differences in Boolean operators used between a block of related but different complex Boolean expressions.
In general, the presence of visual symmetry indicates the presence of underlying structural symmetry in the program. The converse is not necessarily true however -- programming language syntax may obfuscate the underlying symmetry of a program if the syntax is not designed to render functional symmetries in a visually symmetric way.

I have believed for a long time that one day we will be able to write really, really good programming tools that alert you to broken symmetries (e.g. copy/paste errors where you copied code for x but forgot to change the x to a y for the second version) -- or even suggest code or write code for you based on predicted symmetries or symmetries that are detected to be incomplete. This sort of power could catch a lot of the sorts of bugs you get from confusing two similarly-named variables etc. And I suspect programming with this level of integration between syntax, logical structure and IDE functionality would take programming to a completely new, transcendent level.

In the general case, detecting all reasonable symmetries for an arbitrary programming language may be uncomputable or at least intractable. A better approach would be to bend a language's syntax to support symmetry explicitly as a top-level feature of the language. This would involve identifying the types of symmetry you typically find in a program and finding syntactic ways of binding the symmetrical parts together in useful ways. Functions are already a weak form of this, since they allow for the common parts of a symmetry to be abstracted away and parameterized. But I think it could go a lot deeper than that.

4 comments:

  1. Presence of symmetry in source code is evidence of inefficient expression or insufficient expressivity, and should be removed by the use of abstraction. (Symmetry in machine code, of course, is another matter; sometimes it can be beneficial, as long as your application is not limited by the instruction cache.)

    ReplyDelete
  2. I think there are two forms of abstraction that let programmers explicitly write out their "symmetries":

    (1) Higher-order function calls.
    (2) Macros.

    A higher-order function allows programmers to abstract out common patterns of code while changing controlled sub-parts; they say, "This code is the same shape as that, but switch part (a) for part (b)."

    Macros---think Scheme, not CPP---are far more drastic. They allow programmers to switch out not just patterns of control, but evaluation strategies and definitions. In most languages, a higher-order function evaluates its arguments before running the function. A macro allows a programmer to say, "This code is like that, but evaluate this part lazily." Macros can also abstract over typically unabstractable parts of the language---datatype definitions, for example.

    The problem with abstraction is that it incurs a cost for understanding (hopefully low---if your abstractions are hard to understand, you should choose better ones!) and for maintenance. To me, the cost is worth it for higher-order functions, but macros take things too far.

    ReplyDelete
  3. Agree with previous commenters. Code symmetry is evil. It offen leads to subtle errors due to breaking of this symmetry :)

    Considering the example provided: it should be written as a[i] += a[i - 1], where a[i] is the instance of some class which support += operator.

    ReplyDelete
  4. David's comment is an interesting one -- he is basically phrasing symmetry as redundancy, and stating that redundancy should be removed -- in other words program source should be as highly compressed as possible, in other words we should maximize the entropy and minimize the encoded size of program source. I completely agree that redundancy and lack of code reuse is a major source of breakage and programmer error. Great point David!

    That opens up a whole other interesting kettle of fish -- Kolmogorov complexity. You can only slightly compress the digits of pi because the entropy is quite high (and it's an infinite sequence) but you can write a program to generate an infinite number of digits of pi in only seven lines of code (given language support for arbitrary precision math). So algorithmic compression is far more powerful in the limit than sequence compression. This must apply to source code compression too. I think Michael Greenberg's comment about Scheme macros apply directly in this case: they provide Kolmogorov-type compression on source code. However it feels like only a baby step in the right direction, I have a gut feeling they're not quite abstract enough, powerful enough or simple enough for the functionality I'm getting at.

    I also agree with Michael that abstraction incurs a cost for understanding. Although the best abstractions are the simplest and "purest" in some sense. (You can add complexity but you can't add simplicity, etc.) Designing a simple elegant abstraction is far harder than designing a system that is complex (or even complex to use) that is simply labeled "an abstraction" to make the designer feel good about their accomplishment ;)

    ReplyDelete