Tuesday, July 6, 2010

Syntactic Noise

When programming, we usually know in some simplistic terms the behavior we expect from the computer. We can easily express this notion in some vague non-standard pseudo-code-like notation, such as:

for all bonds in the index
    calculate the yield
    add the weighted sum to the total
divide total by the number of bonds

This, in a sense is the essence of the instructions that we want to perform, in a manner that is somewhat ambiguous to the computer, but clear enough the we can understand it.

By the time we’ve coded this in a format precise enough for the computer to correctly understand, the meaning has been obscured by a lot of strange symbols, odd terms and inelegant syntax:

public double calcIndex()
   double weightedSum = 0.0;
   int bondCount = 0;
   for (int i=0;i
       status = yieldCalc(bondInfo, Calculate.YIELD,
           true, bondFacts);

       if (status != true) {
           throw new 
       weightedSum += calcWeight(bondFacts, Weight.NORMAL);
   return weightedSum / bondCount;

Which is a considerably longer and more complex representation of the steps we want the computer to perform.

In the above contrived example, with the exception of the error handling and the flags for the calculation options, the code retains a close similarity to the pseudo code. Still, even as close as it is, the added characters for the blocks, the function calls and the operators makes the original intent of the code somewhat obscured.

Programmers learn to see through this additional syntactic noise in their own code, but it becomes a significant factor in making their work less readable to others.

Now my above example is pretty light in this regard. Most computer languages allow users to express and compact their code using an unlimited amount of noise. The ultimate example is the obfuscated C contest, were entries often make extreme use of the language’s preprocessor cpp.

Used well, cpp can allow programmers to encapsulate syntactic noise into a simple macro that gets expanded before the code is compiled. Used poorly, macros can contribute to strange bugs and mis-leading error messages.

For example, a decade (or so) ago, I used something like:

#define BEGIN(function) do {\
    TRACE(“Entered”, __LINE__, __FILE__, #function);\
    char last[] = #function;\
    } while(0);

#define RETURN(value) do {\
    TRACE(“Exited”, last);\
    return value;\
    } while(0);

in C for each function call, so that the code could be easily profiled (the original was quite a bit more complex, I’ve forgotten most of what was there) and each function call used the same consistent methods for handling its tracing.

The cost was the requirement for all code to look like:

int functionCall(int x, in y) {

But for the minor extra work and discipline, the result was the ability to easily and quietly encapsulate a lot of really powerful tracing ‘scaffolding’ into the code as needed. More importantly the macros hide a lot of the ugly syntactic noise such as language constants like __FILE__.

In a language like Java, the programmer doesn’t have a powerful tool like a preprocessor, but they still have lots of options. Developers could use an external preprocessor like m4, but they would likely interfere with the cushy IDEs that most programmers have become addicted to. Still, there are always ways to increase or decrease the noise in the code:

return new String[] { new Data().modifyOnce().andAgain().toString(),
“value”, ( x == 2 ? new NameString() : null )};

While the above is a completely contrived example -- hopefully people aren’t doing horrible things like this -- it shows that it is fairly easy to “build up” very noisy statements and blocks. However:

String modified = twiceModified();
String name = nameString(x);

return stringArray(twiceModified, “value”, name);

accomplishes the same result, but with the addition of having to create three methods to encapsulate some of the logic (you can guess what they look like). The noise in the initial example is completely unnecessary. What we want to capture are the three values that are returned, how they are generated is a problem that can be encapsulated elsewhere in the code.

Along with excess syntactic noise there are a couple of other related issues.

First, while try/catch blocks can be useful, they actually form a secondary flow of logic in and on top of the base code, thus they generate confusion. Recently a friend was convinced that the JVM was acting up (objects were apparently only getting ‘partially initialized’), but it was actually just a combination of sloppy error handling and threading problems.

Naming too, is a big factor in making the code readable. Data are usually nouns, and methods are usually verbs. The shortest possible “full name” at the given level of abstraction should match either the business domain, or the technical one. If you don’t immediately know what it is called, it’s time for some research, not just a random guess.

And finally, method calls are essentially free. OK, they are not, but unless you’re coding against some ultra-super extreme specs, they’re not going to make any significant difference, so use them where ever, and whenever possible. Big fat dense code is extremely hard to extend, which increases the likelihood that the next coder is going to trash your work. You don’t want that, so to preserve you labors, make it easy to extend the code.

Syntactic noise is one of the easiest problems to remove from a code base, but it seems to also be one of the more common ones. Programmers get used to ugly syntax and just assume that a) it is impossible to remove and b) everyone else will tolerate it. However, with a bit of thinking and a small amount of refactoring it can be quickly reduced, and often even nearly eliminated.

Good, readable code blocks have always been able to minimize the noise, thus allowing the intent of the code to shine through. As for other programmers, a few may tolerate or even contribute to the mess, but most will seize the opportunity to send it to The Daily WTF and then re-write it from scratch. Great programmers can do more than just get their code to work, they can also build a foundation to allow their efforts to be extended.