Saturday, May 10, 2014


Writing a small program, a few thousand lines, is not particularly difficult once you've mastered logic and algorithms. You focus on a very specific part of the problem, work out the details and then pound it into shape. As the size of the program grows you might keep on with the same approach; decompose the problem into smaller ones, then dump each answer somewhere arbitrary in the code base. At some point however, the amount of code that is built up begins to become the problem in itself. 

With a small program, if you make some sloppy mistakes it is fairly easy to find and fix. A couple of days at most. So the code can be disorganized. It can be highly redundant. Each instance of the same solution can have its own unique approach. You can get away with a lot of really bad practices. It can be a mess.

As the code base gets larger, the cost of changing it grows -- probably exponential -- until it is a huge task. Somewhere around the medium range, say 40,000 to 60,000 lines of code, just dumping in code anywhere builds up enough disorganization that it hinders further development. It only gets worse as more code is added. A few hundred thousand disorganized lines is painful, millions of lines of that is totally unmanageable.

Disorganized code can make it very hard to correct problems. Finding a bug in a small program isn't too hard once you've mastered debugging, but in a big code base organization is required to let you quickly jump to the right area. Without organization, it could take weeks, months or even years to find the problem, you have to search everywhere. 

When there is enough functionality and enough complexity, fixing the problems is a mess too. It starts to resemble stuffing straw into an old sack. You stuff it in one hole, only to watch it fall out from others. The changes cause unexpected side effects, which cause more changes, then more side-effects, etc. It becomes an endless and mostly hopeless task made worse by impending deadlines. 

A big mess can also be hard to extend. You don't want to disturb the existing functionality, so you copy and paste bits of code from all over into a new location, but the new redundacies will start to contridict each other quickly causing their own special bugs. Instead of making things better, you are probably just making them worse.

Big disorganized code bases are unstable and suffer from all sorts of horrible operational issues and usually have annoying interface problems, not to mention endless security issues. In most cases they look as ugly from the outside as they do internally. 

As the size of any system grows, the organization of the code quickly becomes the most significant problem in the development. Fortunately there is a well-known solution.

Architecture, for software, is a combination of the underlying technology choices and the higher level organization that encapsulates the many peices of a big system from each other. Mostly, due to resource limitations, it is arrived at through a long-term vision that is then mediated via compromises and politics. It isn't necessary for small programs, but from medium to large and all the way through to huge programs, it becomes increasingly significant to the stability, quality and extendability of the system. 

A good architecture draws 'lines' through the code, providing the high level encapsulation between components. It makes it easy to attribute bugs to specific parts and it allows one to place a scope on the impact of any code changes. As well it defines exactly where common extentions should be made to the code and lays out a direction for how to add brand new peices. 

An architecture can be documented, but if the lines are clear and consistent in the code base an experienced programmer can pick it up directly from reading the source. It might help to understand why things are organized in a particular manner, but not knowing that doesn't prevent someone from following the pattern of the existing implementation. 

One way to know there is an good architecture in place is to guess where some existing functionality should be, then verify that it is actually there and only there. If that is true for most of the functionality, then it is well-organized in a manner that can be understood.

Besides all its other qualities, a good architecture makes it easier for refactoring and for code reuse. If you need to make changes there are a limited number of places to change, and they are all in a consistent location close together. This allows you to either enhance the behaviour across the system, or route new functionality through one single interface. This of course provides a strategy for building a massive system over years and years through an endless huge number of versions. Each new iteration usually involves first refactoring things, then extending them with consitent organization. Done well, the first part can even be non-destructive and quickly testable, saving time and increasing the confidence level for the new code. 

One frequent concern with extending an architecture in this manner is the possibility of creating too many intermediate layers, but again the depth of the layers is an organizational issue itself. And endless number is disorganized by definition, so this meta-problem needs similar treatment to the base organization. In fact, for millions of lines of code, this sort of upper level organizational issue repeats itself over and over again, getting a bit higher each time. Finding a reasonable multi-level organization for millions of anything is of course a non-trivial problem that lies at the heart of architecure.

Small programs don't need an architecture. It doesn't hurt, but it is not necessary. Big programs abolosutely must have one that is clean and consistent. Otherwise they become unmanageable balls of mud. Architecture is a different set of skills than programming, but an architect should know both. They should have a lot of experience with programming first. They also need to be able to grasp the larger context and help focus the work of the programmers to avoid re-implementations. Efficiency for ongoing development is driven directly by the organization of the code base. Extending a well-architected program is considerable faster than just tacking on stuff to a ball of mud. The coding may take longer, but all of the other costs are minimized. The cost of not having an architecture is a steep decline in development speed over the lifetime of a project. That generally shows as either lower productivity or lower quailty or both.

If you want to design and build big, complex systems than the only way to get them to that size and keep them moving forward is to have and enforce an architecture. Disorganization may seem faster to some people, but that's only a limited short-term perspective.