Thursday, February 7, 2019

Implementing Sophistication

Computers are intrinsically stupid.

To get around this problem, programmers have to take all of the knowledge they have acquired, visualize it in a way that makes it codable, and then implement it in software.

The easiest approach to this is to be as crude as possible. The coder does nothing other than sling around unknown data; all of the complexity is pushed either to the users or to the operating environment. This gets the baseline mechanics up quickly, but it does create a fragile system that isn’t extendable. It’s good for a demo, but rarely solves any real underlying problems.

Sophistication comes from the code taking on more and more of the problem in a way that is reliable. Instead of pushing back unintelligible character strings for a user to track, a sophisticated system employs powerful navigation so that they don’t have to remember anything. Instead of crashing poorly and requiring a long recovery time, a sophisticated system bounces right up again, ensuring that all of its components are in working order. The efforts pushed back to the ‘users’ are minimized, while the functioning of the system is predictable.

It’s a huge amount of work to add in sophistication for software. We can crank out flaky websites quickly, but to actually build up deep functionality takes time, skill and knowledge. Often people believe that it is not necessary. They figure that it’s only a little extra work that is pushed back to the users, so it is okay. But if you look at our modern software, with the amount of our time that it wastes, then it should be more than obvious that we aren’t making good use of our hardware and all of the electricity that we pour into it. Crude software doesn’t really automate processes for us, rather it just shifts the way we waste our time.

Sophistication starts with understanding the data. Since computers are confined to the digital realms, at best they are only able to ‘symbolically’ represent things in the real world. These representations are bound by real-world constraints that are often extraordinarily complicated. It takes time to map their informal behavior into a rigorous formal environment. Time that isn’t wasted. If the system properly encapsulates the managed data then it doesn’t need external help or hacks when it is operating. If the data is a mess, then any or all of the computations on the data are suspect. Bad systems always have bad data models, the two are intertwined.

Really understanding data is complicated and it usually means having to go beyond just the normal branches of programming knowledge and directly into domain knowledge. For some people, this is the interesting part of programming, but most try very hard to avoid building up depth on any particular domain. That is unfortunate since the same basic ‘structural’ domain issues span across areas like finance, healthcare, etc. From an implementation standpoint, the usages are very similar and digging into one domain can lead to insights in others. Uniqueness and time, for instance, are common across everything, even if the instances of those problems are steeped in domain-specific terminology.

If the base of the system rests on an organized and complete data model, the construction of the system is quite easy. Most code, in most systems, is just moving the data from one part of the system to another. The trick is to maximize reuse.

Coding is still a slow, tedious, operation particularly when you include the work of testing and debugging. Reuse means that the finalized, well-edited code is deployed repetitively, which can eliminate huge amounts of work. That is, the only code that is ‘good’ code has been heavily edited and battle tested. Otherwise, it is fresh code; it should be assumed that it contains significant bugs. This is always a safe assumption that makes it easier to understand the amount of work involved in releasing a new version of a system.

In most systems, there are huge opportunities for reuse, but they often require deep abstraction skills. Unfortunately, this makes them unavailable for most development efforts. To leverage them requires a significant up-front investment that few organizations are willing to gamble on. It’s not possible, for instance, to convince management that for an extra six months up front, it saves years of work down the road. Our industry is too impatient for that. Still, one can identify reuse and slowly refactor the code in that general direction, without having to commit significant resources immediately. This spreads the effort over the full duration of the project but requires that this type of work is not discontinued halfway through. Thus modern programming should accept that reuse and refactoring are bound together. That the latter is the means to achieve the former.

Big sophisticated systems take years, if not decades, to build. That is never how they are pitched, the time frame is usually ridiculously short and overly ambitious. Still, any developer that has been through a number of big projects is aware that the amount of work invested is massive. Because of this, systems in active development are continuously being extended to do more than their original designs. This is quite dangerous for the software in that the easiest way to extend code is to just build some independent functionality on the side and barely integrate it. This type of decay is extremely common. It is a lot less work to slap something different at the edge, then it is to revisit the underlying data model and work out how to enhance it. But each time this shortcut is chosen, the system gets considerably less sophisticated, more fragile and more bug-prone. What seems to be the faster way of achieving our goals, is actually extremely destructive in the long run. So, sophisticated isn’t just an initial design goal, it is an ongoing effort that continues as long as there are new capabilities and data getting added into the system. Sophistication can be watered down or essentially removed by poor development efforts.

Given that adding sophistication to a system is extremely time-consuming, coders have to learn how to be efficient in order to be able to meet most development constraints.

The first issue is that not all time spent building the system should be actual coding. In fact, coding should be the last, most routine part of the effort. Programmers need to learn how to acquire an understanding first, then visualize a solution and then only at the end do they sit down and start fiddling with the code. Diving head first into the code and getting lost there always wastes a huge amount of time. As well, new programmers are often hesitant to delete their code, so instead, they start to build up unintelligible, disorganized messes, that they flail at to fix a never-ending set of bugs. Code gets sticky and that causes its own problems. Fear of changing code often leads to writing more bad code.

Sometimes, the best approach to fixing the code is to walk away from the computer. Research (textbooks, blogs, etc.) and bouncing ideas off other programmers are two really critical but underused approaches to being more efficient. If you are having trouble explaining what the code should do, then you probably don’t understand it well enough to get it to work properly. It’s a waste of time to fight with code.

Efficiency also comes from non-development activities as well. Micro-management is a very popular approach these days for software development projects, but it can be horrifically misapplied and lead to tonnes of make-work. Stakeholders need some level of accountability for the work they commission, but software development should never be driven by non-technical people. They don’t understand the priorities so they focus on the shallow issues that are most visible to themselves, while the real problems in development come from the details. This always leads to a quick death as the technical debt overwhelms the ability to progress. A reasonable methodology can help, but it is tricky to apply it properly. Methodology for small projects is trivial, but the complexities grow at least exponentially as the size of the project grows. It is a very difficult and concentrated skill to keep large scale development projects from imploding or exploding. It is quite a different set of skills from either coding or architecture. In this sense, software development is intrinsically unscalable. More resources often result in more make-work, not real progress.

Realistically it isn’t difficult to type in a small set of instructions for a computer to follow. It is difficult however to type in a complete set of instructions that would help a user reliably solve one of their problems. We often get these two things confused and a great deal of modern software development is about claiming to have done the second, by only doing the first. As the software development industry matures, we should be able to do more this enhanced type of development and we do this by getting beyond our crude practices and adding in sophisticated code. This type of coding takes longer, is harder and requires more skills, but ultimately it will make computers significantly more usable for us. We shouldn’t have to accept so many software problems; we shouldn’t let our standards remain so low. Computers are amazing machines which still have a huge ability to improve our lives. Sophisticated software is what makes this possible.

No comments:

Post a Comment

Thanks for the Feedback!