Saturday, December 19, 2015

Routine Software

As our knowledge and understanding of software grows, it is important to keep track of what are ‘trivial’ and ‘routine’ software projects. Trivial means that something can be constructed with only the bare minimum of effort, by someone who has done it before. Routine means that it has been done so many times in the past, that the knowledge -- explicitly or implicitly -- is available within the industry; it doesn’t, however, mean that it is trivial, or that isn’t a large amount of work.

Both of these categories are, of course, relative to the current state of the industry. Neither of them means that a novice will find it to be ‘easy’, in that doing anything without pre-requisite knowledge is basically ‘hard’ by definition. If you don’t know what you are doing then only massive amounts of sweat and luck will ever make it possible.

Definition

At this time, and it has remained rather consistent for the last decade at least, a routine system is medium-sized or smaller, with a statically defined data model. Given our technologies, medium means approximately 400 users or less, and probably about 40,000 to 60,000 lines of code (LOC). A statically defined data model means that the structure of what is collected now is not going to change unless the functionality of the system is explicitly extended. That is, the structure doesn’t change dynamically on its own, what is valid about a data entity today is also valid tomorrow.

Modern routine systems almost always have a graphical user interface. Some are thick clients (a single stand-alone process), while others are client-server based (which also covers the architectural constraints on a basic web or mobile app, even if it involves several backend components). None of this affects whether the development is routine, since it has all been around for decades, but it does involve separate knowledge bases that need to be learned.

All of these routine systems rely primarily on edit loops:

http://theprogrammersparadox.blogspot.ca/2010/03/edit-loop.html

They move the data back and forth between a series of widgets and a database of some form. Most import external data through some form of stable ETL mapping, and export data out via well-defined data formats. Some have deeper interactive presentations.

Any system that does not get more complicated than this is routine, in that we have been building these now for nearly three decades, and the problems involved are well-defined and understood.

There are at least three sub-problems that will cause the development to no longer be routine, although in most cases these only affect a portion of the system, not the whole. They are:

  • Dynamic data-models
  • Scale larger than medium
  • Complex algorithmic requirements

Dynamic Data

A dynamic data model means that the underlying structure of the data can and will change all of the time. Users may enter one structure, one day, then something substantially different the next, yet these will still be the same data entity. The reason this occurs is because the domain is purposely shifting, often to its own advantage. Obviously, you can’t statically encode the entire space of possible changes, because that would involve knowing the future.

Dealing with dynamic data models means pushing the problem back to the users. That is, you give them some really convenient means of keeping up with the changes, like a DSL or complex GUI, so that they can adapt quickly. That may seem easy, but the problem leaks over both the interface and the persistence. That is, you need some form of dynamic interface that adapts to both the changing collection and reporting necessary, and you need this whole mess to be dynamically held in a persistent technology. The trick is to be able to write code that has almost no knowledge of the data that it is handling. The breadth and abstract nature of the problem are what makes it tricky to implement correctly; it is very rare to see it done well.

Scaling

Once the required scale exceeds the hardware capabilities, the system needs to be decomposed into pieces that can execute independently instead of all together. This sizing problem continually shifts because the hardware is evolving quickly, but there is always some threshold where it becomes the primary technical problem. In a simple system, if the decomposition leads to a set of independent pieces, the problem is only minorly painful. Each piece is pushed out on its own hardware. Sometimes this is structural, such as splitting the backend server into a web server and a database server. Sometimes it can be partitioned, such as sticking a load balancer in front of replicated servers.

If the data and code aren’t independent then very complex synchronization algorithms are needed, many of which are cutting edge computer science right now.

Software solutions for scale also exist in one of the many forms of memoization, such as caching or paging, however in the former case adding the ability to ‘reuse’ data or sub-calculations also means being able to precisely understand and scope the lifespan of the data, failing to do this makes it easy to accidentally rely on stale data.

Most scaling solutions in of themselves are not complex, but when multiple ones exist together their interactions can be extraordinarily complicated. As the necessity of scale grows, the need to bypass more bottlenecks means significant jumps in this complexity and increased risk of sending the performance backward. This complex interaction makes scaling one of these most difficult problems, and because we don’t have a commonly used toolkit for forecasting the behavior it means that much of the work is based on intuition or trial and error.

Algorithms

Most derived data is rather straightforward, but occasionally people are looking for subtle relationships within the structure of the data. In this way, there is a need for very complex algorithms, the worst of which is AI (since if we had that, it could find the others). Very difficult algorithms are always a challenge, but at least they are independent of the rest of the system. That is, in most applications, they usually only account for a small percentage of the functionality, say 10% to 20%. The rest of the system is really just a routine wrapper that is necessary to collect or aggregate the necessary data to feed them. In that way, these algorithms can be encapsulated into an ‘engine’ that is usable by a routine system, and so the independence is preserved.

For some really deep ‘systems’ programming problems like operating systems, compilers or databases the state-of-the-art algorithms have advanced significantly, and require significant research to understand. Most have some routine core that intermingles with the other two problems. What often separates systems programming from applications programming is that ignoring what is known, and choosing to crudely reinvent it, is way more likely to be defective. It’s best to either do the homework first or use code by someone who has already done the pre-requisite research.

Sometimes semi-systems programming approaches blend into routine application development, such as locking, threading, real-time(ish) etc. These are really a mix between scaling and algorithmic issues, that should really only be used in a routine system if the default performance is unacceptable. If they are used, significant research is required to use them properly, and some thought should be given as to how to properly encapsulate them so that future changes don’t turn ugly. Quite often it is common to see poor implementations that actually degrade performance, instead of helping it, or that cause strange, infrequent bugs that go for years without being found.

Finale

Now, of course, all three of these problems can be required for the same system at the same time, and it can be necessary to run this within a difficult environment such as fault tolerant. There are plenty of examples of our constructing such beasts and trying to tame them. At the same time, there are way more examples of mostly routine systems, that share few of these problems. In that latter category there are also examples of people approaching a routine system as if it were an extraordinarily complex one, and in those cases, their attempted solutions are unnecessarily expensive or unstable or even doomed. Understanding what work is actually routine then means knowing what and how to go about doing the work, so that it is most likely to succeed and be useful in the future.

What we should do as an industry is to produce better recipes and training for building routine systems. In an organized development environment, this type of work should proceed in a smooth and highly estimable fashion. For more advanced systems, the three tricky areas can often be abstracted away from the base, since they can be harder to estimate and considerably riskier. Given all these constraints, and strong analysis and design, there is no reason why most modern software development projects are so chaotic. They can, and should be, under much better control.

No comments:

Post a Comment

Thanks for the Feedback!