Sunday, November 8, 2015

Software Engineering

The standard definition for engineering from Wikipedia is:

Engineering is the application of mathematics, empirical evidence and scientific, economic, social, and practical knowledge in order to invent, design, build, maintain, research, and improve structures, machines, tools, systems, components, materials, and processes.

But I like it to simplify it into just two rules:

  1. Understanding something really deeply.
  2. Utilizing that knowledge to build stuff with predictable behavior.

With the added caveat that by ‘predictable behavior’ I mean that it is predictable for ‘all’ possible circumstances in the real world. That’s not to say that it must withstand everything that could be thrown at it, but rather that given something unexpected, the behavior is not. There is no need to guess, it can be predicted and will do exactly as expected. Obviously, there is a strong tie between the actual depth of the knowledge and the accuracy of these predictions.

The tricky part about engineering good software is acquiring enough deep knowledge. Although the existing underlying software is deterministic and explicitly built by people, it has been expanding so rapidly over the last five decades that it has become exceptionally convoluted. Each individual technology has blurry conventions and lots of quirky behavior. It becomes difficult to both properly utilize the technology and mitigate it under adverse conditions. Getting something to ‘sort of’ work is not too difficult, but getting it to behave reliably is extraordinarily complex and time-consuming.

For example, if you wanted to really utilize a relational database, that would require a good understanding of the set-theoretical nature of SQL, normalization, query plans, implicit queries, triggers, stored procedures, foreign keys, constraints, transactions, vendor-specific event handling and how to combine all of these together effectively for models that often exceed the obvious expressiveness. When used appropriately, a relational database is a strong persistence foundation for most systems. Inappropriate usage, however, makes it awkward, time-consuming and prone to unsolvable bugs. The same technology that can nearly trivialize routine systems can also turn them into hopeless tangles of unmanageable complexity. The difference is all about the depth of understanding, not the virtues of the actual software.

A big obstacle in acquiring deep knowledge is the lack of authoritative references. Someone could write a book that would explain in precise detail how to effectively utilize a relational database for standard programming issues, but culturally we don’t get that specific because it would be discounted due to subjective objections. That is, anyone with even a small variation of opinion on any subset of the proposed approach would discount the entire approach as invalid, thus preventing it from becoming common. In addition, creativity is valued so highly that most programmers would strongly prefer to rediscover how to use a relational database, over decades, then just adopt the pre-existing knowledge. That is unfortunate because there are more interesting problems to solve if you get past these initial ones.

To get to actual engineering we would have to be able to recognize the routine parts of our problems, and then solve them with standardized components whose ‘full’ behavior is well documented. This would obviously be a lot easier if we had a reliable means of categorizing that behavior. Thus we would not need to consume massive resources experimentally determining what happens if we knew that a technology was certified as ‘type X’ for instance. In that sense, the details need to be encapsulated, but all behavioral changes, such as possible errors, need to be explicitly documented and to strictly follow some standard convention. If we can achieve this, then we have components which can be used and if a programmer sticks with a collection of them from a limited set of categories, they can actually have a full and deep understanding of how they will affect the system. That depth will give us the ability to combine our work on top in a manner that is also predictable. Without -- of course -- having to deeply understand all of the possible conventions currently out there or even the full depth of the underlying technology.

What prevents us from actual software engineering is our own cultural evolution. We pride ourselves on not achieving any significant depth of knowledge, but rather just jumping in and flailing at crude solutions. Not standardizing what we build works in favor of both the programmers and the vendors. The former is in love with the delusion of creativity, while the latter deem it as a means to lock in clients. There is also a persistent fear that any lack of perceived freedom will render the job of programming boring. This is rather odd, and clearly self-destructive, since continuously re-writing ‘similar’ code gradually loses its glamour, resulting in a significant shortening of one’s career. It’s fun and ego fulfilling the first couple of times, but it eventually gets frustrating. Solving the same simple problems over and over again is not the same as really solving challenging problems. We do the first while claiming we are really doing the second.

There are many little-isolated pockets of software engineering in existence right now, I’ve worked in a few of them in the past. What really stands out is that they are considerably less stressful, more productive and you feel really proud of the work. Slapping together crud in a hurry is the exact opposite; some crazy deadline gets met, but that’s it. Still, the bulk of the nearly 18M programmers on the planet right now are explicitly oriented towards just pounding stuff out. And of the probably trillions of lines of code that are implicitly relied on by any non-trivial system, more and more of it is utilizing less and less knowledge. It is entirely possible to create well-engineered software, and it is possible to achieve decent quality in a reasonable amount of time, but we are slipping ever farther away from that goal.

At some point, software engineering will become mandated. As software ‘eats the world’ the unpredictability of what we are currently creating will become increasingly hazardous. Eventually, this will no longer be tolerated. Given its inevitability, it would be far better if we voluntarily refactored our profession instead of having it forced on us by outsiders. A gentle realignment of our culture would be less of a setback than a Spanish-style inquisition. It’s pretty clear from recent events that we are running out of time, and it’s rather obvious that this needs to be a grassroots movement. We can actually engineer software, but it just isn’t happening often right now and it certainly isn’t a popular trend.