Tuesday, February 19, 2008

Fundamental Coding Issues

Everybody loves a short summary, one that can easily compress a complicated idea into a simple concept. They are hard to find, but always very useful.

In software development there are many complex issues surrounding our computer language 'code' and the various conventions we use for programming. There are lots of different attributes that a program can have beyond just working, but ultimately there are only a few things that easily define a great program. With that in mind, I find the following summary quite reasonable:

The secret to great code is to get the smallest program possible without resorting to being clever, while generalizing to as large a problem space as possible given the time constraints. Make all of the broad strokes explicit, while making all of the details implicit. Get this right, keep it clean and consistent, and you've got elegance.

It is simple, as it should be, but still the underlying parts require deeper explanation.

THE SMALLEST CODE BASE

Software developers don't like to admit it, but the amount of the code in a system is a hugely significant issue for all sorts of reasons.

Programming is a massive amount of work. If you look at the various parts of development: analysis, implementation, testing and deployment, the second one -- implementation, which is the act of writing the code to implement the solution -- doesn't appear all that large in the overall process. It is only one of four basic steps. However, despite appearances, the size of the code drives issues with all of the other work, including the analysis. The code is the anchor for everything else; bigger code means more work and bigger problems.

The problem starts with having to convert your understanding of the user's need into some set of interfaces. Complicated problems produce complicated interfaces which feed on themselves. It is far more work to add a new function to Microsoft Word, for example, then it is to add it to some really simple interface. The analysis doesn't just cover the problem space, it also includes how to adapt the proposed solution to a specific tool. Adding a function to a small tool is way less work than adding it into some huge existing monolith. Because of this,the analysis changes depending on the target infrastructure. More code means more work integrating any new functionality.

The size of the code itself causes its own problems. If your intended solution requires 150,000 lines of code you'd be looking at a small team of programmers for over a year. If your intended solution requires 1,000,000 lines of code it will take a huge team many years to build it. Once it is built, if there is a problem with the 150,000 lines of code, refactoring chunks of it is significant, but not dangerous to the project. With 1,000,000 lines you are committed to your code base for good or bad. It, like the Titanic is slow to turn, so if there are any obstacles along the way such as icebergs you are in serious trouble. Why use 1,000,000 lines if 150,000 will do?

With each line of code, you commit to something you've got to maintain throughout history. It requires work to keep it updated, it even requires work to delete it. The more code there is, the more work that is required to just figure out the available options. Big code bases are unwieldy beasts that are expensive and difficult to handle. Developers often try to get away with just bolting a bit more code onto the side, but that rather obvious tactic is limited and ugly.

In all code there are always latent unwanted behaviors (bugs), that includes mature code that has been running in production for years. These 'problems' sit there like little land mines waiting for unsuspecting people to pass by and trigger some functionality. Testing is the act of mine-sweeping, where the testers spend as much time as they have, trying to detect as many of these problems as possible. Most current test processes are horribly inefficient, they end up spending way too much time on the easy issues and way too little time on the hard ones. Once the time is up, the code is released in whatever state. So often you'll find that cord works well for its simple common usages, but becomes increasingly unstable the more you try to push its envelop. Not surprisingly, that matches the testing.

Testing follows some type of inverse square law, e.g. it likely takes four times as much effort to test twice as much code. By committing to a bigger code base you are committing to a huge increase in the amount of testing, or as more often the case, you are actually diminishing your existing testing by a significant degree. So often, the code base doubles but the testing resources remain the same, only now they are 25% as effective.

With the increase in testing requirements being ignored, most big software packages get more operational problems and issues. For all software, there is a considerable support cost even if it is just an installer, an operator, a simple help desk and some beepers for the programmers. For large commercial projects support can include an entire division of the company.

Running code is hugely expensive, most programmers don't seem to understand this. They just hand off their work and don't think about the consequences. There are the fixed costs, but also those driven because of behavioral problems. A particularly bad bug in a commercial setting could cost millions of dollars and really hurt the reputation of a company. Even in an in-house setting, the code going down could delay other significant events costing money or bad publicity. The more useful the code, the more expense the failures.

Bigger code 'will' fail more often. From experience it is clear that with twice as much code, comes twice as many bugs. Programmers have a nearly consistent rate of adding bugs to their code, that is mostly independent of the actual programmer or testing. Better programmers often have less bugs per line of code, but they still have them, and because they tend to work on the more complicated sections of the system or write more code, it is not uncommon for their bug count to be higher. It stands to reason, if there is twice as much code, there are at least twice as many bugs, so the odds of getting a bug is twice as likely.

Finally, bigger code bases also mean bigger algorithms, and a lot more of them. The more complex code is harder to use and has more bugs, but it also means more documentation work. Usually the 'rules' for any of the functional behavior of the system start to become really 'sophisticated'. Well, sophisticated isn't the right word, overcomplicated is probably more appropriate. Once the functionality bloats, it takes 'essays' to even explain the simplest usage of the system, the support costs go through the roof. A very simple program that does one thing that is strictly following the conventions in an obvious way, probably doesn't need any help or tutorials. Once you grow to do fancier tasks with lots of customization, then online help becomes a necessity. Open up the system to allow the users to adapt it to themselves then you need lots pf tutorials, courses and books. A whole industry springs forth from really complex software. Photoshop is the classic example, with a phenomenal secondary industry devoted to helping people utilize the internals by translating between language of the programmers, and the language of the users (such as "how do I remove red-eye from this photo?").


NOT CLEVER

Given all of the above problems with big code bases, programmers shouldn't rush out to compact their code into the tiniest possible pieces. Size is an issue, but readable code is more important. If you get too clever and really tightly pack your code it can be impossible to understand. Clever is bad. Very bad.

The obfuscated C code contest is a great example of really clever ways to dramatically alter or reduce the size of the code. While it is entertaining, in practice it is extremely dangerous. Clever code packs way too much complexity into too small a package. If we wrote things once and only once then never touched it, that would be fine, but the lifespan of code is huge. In its life, there are always periods were people make fast judgment calls on the functioning of the code. Code needs a constant amount of never-ending fixing and updating. Stress is a part of software because the amount of work is always larger than the resources. Clever code just sets itself up to cause problems in the future. It is yet another land mine waiting to happen, as if it were just a bug of some sort.

You get away with some clever trick of the language or some other weird way of getting your results, it will be easily missed by someone else. That code is dangerous, and definitely not elegant.

Since code is just complicated by its very nature. All problem spaces have huge amounts of complexity, but we really want to lay out each and every line of code in the simplest, most straight-forwardly reusable manner possible. That also includes, not commenting too much, as well as too little. Taking away the readability of code is always asking for trouble. If you can give it to someone with a very light coding background and they get 'it' immediately then it is probably close enough to elegant. Programming students, for example should be able to easily understand the nature and purpose of the code. You shouldn't need an extensive background to read good code, although you definitely need one to write it.

The formatting, names of the variables, comments, and all of the syntactic attributes are very important in producing something that is easy to read. In big languages such Perl, showing a tremendous amount of discipline in 'not' using some of the more esoteric features of the language is the industrial strength way of coding. In Java, not following weak design pattern and bean conventions will produce things that are more readable and easily understood. Obscuring the underlying functioning of the code by focusing on its structure, just makes it harder to understand. The magic comes from generalizing it, not from being clever.

Some of the 'movements' in programming over the years have been counter-productive because they get too obsessed about the 'right way' to realize that they are doing extreme damage to the 'readability' of their code base; a factor that is far more important than being right. If it is readable and consistent it can't be too far away from elegant.


LARGEST PROBLEM SPACE

When you are solving a problem, the range of possible solutions extends from pounding out each and every instruction using a 'brute-force' approach, to writing some extremely generalized configurable program to solve a huge number of similar problems. At the one end of this spectrum, using brute-force, there is a tremendous amount of work in typing and maintaining a very fragile and stiff code base. The solution is static and brittle, needing lots of fixes. While it is probably huge, each sub-section of it is very straight-forward as it is just a long sequence of instructions to follow, e.g. get this file, open it, read the contents, put them in this structure, add these numbers, do these manipulations, save them in this format, etc. Most computer languages provide some higher level of abstraction, so at least each instruction is millions of lines of assembler, but it is still rigid and explicit.

Adding more degrees of freedom to the instructions, generalizing or making it dynamic, means that the resulting code can be used to solve more and more similar problems. The size of the problem space opens up and with additional configuration information; the usage of the code becomes huge.

As we shift the dynamic behavior away from the static lines of code, we have to provide some 'meta-data' in order for the generalized version of the code to work on a specific problem. In a very philosophical sense, the underlying details must always exist, no matter how general the solution. When we generalize however, we shift the details from being statically embedded into the code in a fragile manner, to being, either implicit, or explicitly held in the meta-data effecting the behavior.

Like energy, the primary problem domain details* can neither be created nor destroyed. You just shift them around. They just get shifted from being explicitly embedded into the code to existing somewhere else, either implicitly or in the configuration data.

*We can and do create massive amounts of artificial complexity, which creates artificial details, which can be refactored out of the code. If you can delete details without altering the functionality, then it was clearly artificial.

Way way off to the very end of the spectrum, one might imagine some very complicated all-purpose general piece of code the can do everything, but funny enough, that 'code' exists and is the computer itself. It is the ultimate general solution.

Building and maintaining a system is a huge amount of work that is often underestimated. Usually wherever a specific software based tool can be used to solve a problem, there is an abundance of similar problems that also need to be solved. Programmers love to just bite off a piece and chew on that, ignoring the whole space, but it is far more effective to solve a collection of problems all at once. All it takes is the ability to step back and look at the specific problems in their larger context.

Which of the business rules seem to bend, and how many other problems in the company have the same feel to them? The larger the problem space, the cheaper the solution. If you have ten departments in a company that all need customized phone books, solving each one by itself is considerably more work than solving them all together. If you have one group that needs approvals for their documents, that problem spans a huge number of different groups. They will all benefit by a common solution.

Generalizing makes the solution cheaper, and the reduces the overall work. Also, strangely enough, generalized code is always smaller than brute force. That means that the long term costs of maintaining a code base are cheaper as well. The size issue on its own is significant enough that in a long run perspective it is always worth generalizing to bring down the size of the code, even if there are no available similar problems. Generalizing can reduce the code base enough to get it to fit in the available development window. The technique can be applied as a measure to control the costs of the project and build more efficiently.

It also helps in maintaining consistency. If there is one routine that is responsible for rendering multiple screens in a system, but virtue of its usage it enforces consistency.


TIME CONSTRAINTS

Everybody underestimates the amount of time it takes to build software. They also underestimate the amount of effort it takes to keep it running. Like an iceberg, only a small portion of the code is actually visible to the users so they see it as a rather small malleable thing that can be easily changed. Any significant set of functionality gets locked in by its own size and complexity.

If you make frequent quick changes to your code base you'll quickly find out that you are destabilizing it. The more fast hacks you add, the worse the code becomes. There is always an expensive trade-off to be made between getting it done quickly and getting it done correctly. Many projects make the wrong choices, although the damage often takes years before it fully shows. Getting out one good release is easy, getting one out time and time again is difficult.

Without a doubt, time is the biggest problem encountered in software. There is never enough. That means that any 'technique' that helps reduce the time taken to do some work is probably good if it helps both in the short run and the long run. Any technique that adds extra work is probably bad. We should always keep in mind that sometimes you need to add a little extra effort in the short run to save time in the long run.

For instance, keeping the code clean and neat makes it easy to add stuff later. A well-maintained project takes more time, but is worth it. Sloppy is the friend of complexity.

Optimization is always about doing something extra now, that results in a gain later because the results get re-used over and over. Clean code saves time in understanding and modifying it. A little bit of extra work and discipline that pay off. Design saves time from being lost later. With a solid design you can fit right into the code you need, you don't have to waste a lot of time wandering around trying to guess at what might work.

Time is an all important quantity in software development. Wasting it is a bad idea. Some of the newer programming development techniques seek to make 'games' out of programming. It is an immature way of trying to remove some of the natural tediousness away from coding. We want to build things and get to the end as fast as possible. Playing around is avoiding the issue.

If you want to build great tools, at times it will be painful, it comes with the territory. No job is 100% entertaining, they all have their down-sides, that is why it is called work and we have to be paid to do it. Nothing wrong with hobbyist programmer's playing games and competing, but its is not appropriate for the work place.

Testing is one area that people waste massive amounts of time without getting any additional benefit. Again, some of the newer testing techniques add extra work, but in exchange they claim to reduce the amount of bugs. If that works it is great, but you have to be sceptical of most claims. Component testing thoroughly is good, but if you cannot assure the interaction of the components with each other, then there is always some minimal level of final testing that still needs to be done.

It is immutable, and as such is not possible to be optimized away. If you must test the final product as a whole coherent piece, than you cannot skip those tests no matter how much work you have done on the sub-components. If you are not skipping the tests, then you are not saving any time. If you down-grade the final tests to be less, then you upgrade the risks of allowing problems through. Of course this is true as a basic principle: in-stream testing in any process is there to reduce the amount of bouncing around between states, but it does significantly bump up the amount of testing work without increasing the quality. If the bouncing between states is not significant, then reducing it doesn't add much extra value. It doesn't negate any of the final testing, which still needs to be done.

In same way that one cannot solve the halting problem, the amount of testing required to achieve 100% certainty that there are no bugs is 'infinite'. It is an easy proof, once you accept the possibility that there is some sequence of input that causes the software to get into a state that can break it. Given that all non-trivial software maintains some form of internal-state, then to achieve absolute certainty that there are no bugs you have have to test every combination of possible input. Since there is no limit to the size of the input, the number of possible test scenarios is infinite, and it would take forever to create and apply them. Given our restrictions on time, unless we do our development on the edge of a black hole, there will always be some possibly of bugs with any release.

You may put in significant effort for a series of releases to really produce stellar quality, but you always need to remember that software development projects never really end. Sooner or later a bug is going to escape. In practice it is always sooner, and it always happens way more than most programmers anticipate. Although one person called it 'defeatist', it is a really good idea to plan for sending out patches to fix the code. Just assume there will be bugs and deal with it. If you build this into distribution and deployment, then when it is necessary it is just part of the system, otherwise the 'crisis' will eat up huge amounts of time.

Unexpected, 'expected' events cause delays, morale issues and scheduling conflicts. When we continually see the same problems happen over and over again, we need to accept them as part of process, rather than try to ignore them or rail against how unfair they are. If we anticipate problems with the deployment of the software we can build in the mechanisms to make this dealing with the problem easier. Taking a broad approach to development to include the development, testing and operations into the problem domain is the best way to build practical solutions that can withstand significant shifts in their environments.

The time issues never go away and ignoring them only makes it worse and more disruptive.


BROAD STROKES AND DETAILS

There are lots of arguments between strongly typed and loosely typed languages. Different camps of programmers feel that one or the other is the perfect approach to the problem. After lots of experience on both sides, I've come to realize that some problems are best handled with strongly typed approaches -- while others are best handled with very loose ones. The difference comes down to whether or not you need to be pedantic about the details.

For some types of solutions, the underlying correctness of the data is the key to making it work. In those cases you want to build up internal structures while programming that are very specific, and during the process you want to insure that the correct structure is actually built.

If your writing some code to write out a specific file format, for example, you'd like the internal structure to mirror the file format. In that way, depending on the format the structure and syntax are highly restricted. As the various calls go about in the system building up the structure, there is also code to make sure that the structure stays consistent and correct. With many types of errors, the closer the program stops after the first error, the easier it is to diagnose. When running in a development environment, a program that is strongly typed and checking its data can stop immediately the moment it deviates from the prescribed plan. That makes it really easy to find and fix the problems which also helps to minimize testing.

For some solutions, if you are performing a large number of steps and you want the program to be as tolerant as possible, then loosely typed is way better. It doesn't matter for example, how the data got into the program, instead it matters how it will be transformed. High-level programs, scripting, data filters, and any other program that needs to deal with a wide range of unexpected inputs fit well into this type of circumstance. The range of the data is massive, but the transformations are pretty trivial. If the program stopped each and every time the input was an unknown combination, the code would be so fragile that it would be useless. Loosely typing data under this circumstance means that the importance is on the set of instructions, they need to be completed, the data is only secondary. Scripting in particular requires this.

This dichotomy is true for all software. For some goals the data is the most important thing and it needs to be structured correctly. For some goals, it is the sequence of instructions and the final output that is significant. It doesn't matter how it got there.

So for example, we can use a typed language like Java to perform the algorithmic combinations, but a basically untyped tool like ant to insure that the code was built and deployed correctly. That the shell scripting languages in Unix are mostly loosely typed is no accident. The two approaches are needed for two very different problems.

It is also true that within all systems, the sequence of instructions at the higher level is more important, and the structure of data at the low level is important. The nice part is realizing that the depth of the code makes a big difference. If you are building a complex system, the broad strokes of the code should be loosely typed, they are more flexible and less rigid that way, while the detailed calculations should be strongly typed because the accuracy for the data is often the key. Loose typing at the higher level also helps in decoupling the architecture and splitting off the presentation from the underlying data. All these things work together at the different levels within any system.

Clearly any new language that covers the whole domain of building and deploying complex systems will cover the whole spectrum and be both strongly and loosely typed. Working both into the same syntax will be one of those key things that will bump us forward in complexity handling.


ELEGANCE

Clean and consistent code is a great secret in getting things launched, but it is exceptionally difficult to get a group of programmers to follow some underlying conventions. There are cultural issues that make it nearly impossible for big teams to synchronize their working practices. That could be why the initial version of most of commercial software is built initially by little teams and later handed off to big teams for maintenance. We still don't know how to coordinate our efforts correctly on a large scale. Our most common official methodologies are horrible.

Elegance, just for the sake of elegance is never a great idea. Elegance because it makes you job easier and it means that the software works better is a great idea. It is even better when it takes away a lot of the stresses associated with messy programming habits. It becomes a means to an end.

We build and maintain tools to help our users solve their problems, which are most generally in playing with their ever-increasing piles of data. The most important thing is that the tools we build actually solve the problems for the users. Everybody getting their input into the design, programmers having fun while coding, and processes being left wide open and 'casual' may amuse some people while they are working, but they do not get the basic tasks completed any faster. Spending time to understand the user's real needs, keeping the code clean and consistent, using real graphic design for the interface, and providing simple tools that are easy to understand and effective are some of the things that make the users come to appreciate (or not) the underlying software. Computers can make people's lives easier or they can make them harder, the difference is up to the abilities of the programmers involved with the code.

In the end, a short simple solution that works simply and consistently is as good as it gets. Bad, overly complicated bloated software with every function imaginable isn't good, and it isn't much of an accomplishment. All the fancy graphics, dancing baloney and crammed in information can't hide a badly written program. It's not the technology, it's what you do with it that matters.