Friday, October 29, 2010

Code Validation

Programmers often spent a significant amount of time writing the initial code. Because of this, they usually focus on that work as being the most significant part of the process. But on large projects, especially as they grow, most of our time gets sunk into fixing or extending what is already there. Because of this, good coding habits that make that second half of the process far easier are crucial to keeping up the development momentum.

Consistency, self-discipline, refactoring and low technical debt are important ways to avoid getting caught in a lot of unnecessary labour, but ultimately the best thing is to leverage the technological tools to their maximum in helping to validate the code. Elegant code is always easier to debug than a mass of spaghetti or some eclectic arrangement. Ultimately, once you know the rogue behavior and get to the right location in the source, the solution to the problem should be obvious.

There are six different levels of certainty when dealing with the validation of any code:

1. The compiler can validate the code.
2. It can be validated by looking at a line or two
3. It can be validated by looking at the whole file.
4. It can be validated by cross-referencing a bunch of files.
5. It requires a debugger to validate it.
6. It cannot be validated, and running in a environment only decreases the odds that it is correct.

Obviously the first level is best. There has been lots of work done in modern languages to improve the number of types of things that can be entirely automated for validation. Precise syntax errors and warnings about sloppy practices greatly help. Still, most of the code we write doesn’t fit into this category, computers are only as intelligent as we make them and most of the language mechanics have been generalized to suit the widest possible audience.

Encapsulation and abstraction are huge techniques for getting complex systems finished. The next two levels really mean that this has been done well, or at least reasonably well. If there is a bug, and you can go straight to a line of code and see that it is incorrect on inspection, then besides being a great help it is also a sign of elegance. If you have to root around in the same file, that is OK, but a somewhat lesser accomplishment. If you have to jump all over, that is a sign of spaghetti. Of course, with complex code in order to determine the problems by simple inspection, the programmer must understand both the code and the abstraction used to encapsulate that code. A junior programmer might not be so experienced, or familiar with the underlying techniques, but this is a lack of skill, rather than an issue with the source.

If you have to bounce around the files, depending on how distributed the data and logic are, the time increases significantly. Besides time, the problem also becomes trying to keep all of the little pieces in your head. Enough detail, and enough files and it becomes impossible to know if the code is correct without guessing. There is huge difference between thinking that the code will work, and knowing it for sure. Proper use of encapsulation should save people from this fate.

Modern debuggers are a great tool, and can often help a programmer find tricky issues, but they are in no way a substitute for a well-written program. They provide a snapshot into the state of the code at one specific time, but that can be misleading. Just because the code is running for one set of variables, doesn’t mean it will effectively cover all of the corner cases. Debuggers are extremely useful tools, but in no way a means for fully understanding the code or insuring that it won’t break easily while running. Reliance on them tends toward fragile implementations.

The final level is probably the one that we rely on the most. Run it, and hope that it works. Of course it is for primarily this reason that modern software is so flaky. We rely way too much on this undependable technique, when we could rely more heavily on the top three.

Well-written code is both a pleasure to read and to extend. If more of our works were written better, we’d have way less code, which in turn would have far less bugs, security vulnerabilities, weird functions, confusion and a whole lot of other negatives. We’ve known for a long time how to build better systems, it is just we ignored it frequently.

4 comments:

  1. could you comment on how tests and similar practices fits in with the points you have raised?

    thanks.

    ReplyDelete
  2. Hi Criador,

    Thanks for your comment. That is a great question, but I think a real answer would be too large for a comment, so I'll write up another post discussing it in detail. The short answer is that there are techniques you can do to both reduce testing, and make it easier to spot bugs. All programs require testing, but the more elegant ones require less, are fixed easier and are less likely to have long term problems. Hopefully I can get a chance to elaborate on this, sometime this week.

    Paul.

    ReplyDelete
  3. "Elegant code is always easier to debug..."

    Polymorphism, overloading, inheritance or just function pointers are techniques that may allow to write elegant code, but on the other hand may make debugging more difficult.

    ReplyDelete
  4. Hi Astrobe,

    Thanks, as always for the comment :-)

    Yes, I agree. Any of the abstraction techniques require way more thought, but on the other hand if you can leverage the work, the increase in effort is offset by a huge decrease in testing, and redundant coding.

    If you've applied some abstraction consistently, and you understand it well, then the bugs come from unexpected corner-cases. In that sense, it is easier to enhance a reasonable abstraction to deal with a missed corner-case, then it is to say fix all of the inconsistencies in spaghetti code. In the first case (assuming your abstraction fits), you're just short by a few conditions/loops (maybe a bit of refactoring). In the second case, you have to re-consider the whole mess because you're not sure that the changes won't have a cascading effect. Locally the code is simpler, but you have to consider far more of it to be certain.

    In elegant code the fixes are often just one or two liners. And once you get the corner-case, the fix is obvious.

    In an inelegant mess, a fix is often just the start of what I like to call "stuffing straw into a burlap bag full of holes". As you stuff the straw in one hole, it falls out the others. It can be a never-ending time sink. And you're never really sure if you got it right.

    You might have to think about it a lot more if the code is abstract, but overall it is way less work and far more certain to not cause other problems.

    Paul.

    ReplyDelete

Thanks for the Feedback!