Friday, October 29, 2010

Code Validation

Programmers often spent a significant amount of time writing the initial code. Because of this, they usually focus on that work as being the most significant part of the process. But on large projects, especially as they grow, most of our time gets sunk into fixing or extending what is already there. Because of this, good coding habits that make that second half of the process far easier are crucial to keeping up the development momentum.

Consistency, self-discipline, refactoring and low technical debt are important ways to avoid getting caught in a lot of unnecessary labour, but ultimately the best thing is to leverage the technological tools to their maximum in helping to validate the code. Elegant code is always easier to debug than a mass of spaghetti or some eclectic arrangement. Ultimately, once you know the rogue behavior and get to the right location in the source, the solution to the problem should be obvious.

There are six different levels of certainty when dealing with the validation of any code:

1. The compiler can validate the code.
2. It can be validated by looking at a line or two
3. It can be validated by looking at the whole file.
4. It can be validated by cross-referencing a bunch of files.
5. It requires a debugger to validate it.
6. It cannot be validated, and running in a environment only decreases the odds that it is correct.

Obviously the first level is best. There has been lots of work done in modern languages to improve the number of types of things that can be entirely automated for validation. Precise syntax errors and warnings about sloppy practices greatly help. Still, most of the code we write doesn’t fit into this category, computers are only as intelligent as we make them and most of the language mechanics have been generalized to suit the widest possible audience.

Encapsulation and abstraction are huge techniques for getting complex systems finished. The next two levels really mean that this has been done well, or at least reasonably well. If there is a bug, and you can go straight to a line of code and see that it is incorrect on inspection, then besides being a great help it is also a sign of elegance. If you have to root around in the same file, that is OK, but a somewhat lesser accomplishment. If you have to jump all over, that is a sign of spaghetti. Of course, with complex code in order to determine the problems by simple inspection, the programmer must understand both the code and the abstraction used to encapsulate that code. A junior programmer might not be so experienced, or familiar with the underlying techniques, but this is a lack of skill, rather than an issue with the source.

If you have to bounce around the files, depending on how distributed the data and logic are, the time increases significantly. Besides time, the problem also becomes trying to keep all of the little pieces in your head. Enough detail, and enough files and it becomes impossible to know if the code is correct without guessing. There is huge difference between thinking that the code will work, and knowing it for sure. Proper use of encapsulation should save people from this fate.

Modern debuggers are a great tool, and can often help a programmer find tricky issues, but they are in no way a substitute for a well-written program. They provide a snapshot into the state of the code at one specific time, but that can be misleading. Just because the code is running for one set of variables, doesn’t mean it will effectively cover all of the corner cases. Debuggers are extremely useful tools, but in no way a means for fully understanding the code or insuring that it won’t break easily while running. Reliance on them tends toward fragile implementations.

The final level is probably the one that we rely on the most. Run it, and hope that it works. Of course it is for primarily this reason that modern software is so flaky. We rely way too much on this undependable technique, when we could rely more heavily on the top three.

Well-written code is both a pleasure to read and to extend. If more of our works were written better, we’d have way less code, which in turn would have far less bugs, security vulnerabilities, weird functions, confusion and a whole lot of other negatives. We’ve known for a long time how to build better systems, it is just we ignored it frequently.