Sunday, March 13, 2022

Paradigms, Patterns and Idioms

To build large sophisticated software, you have to learn how to avoid wasting time on make-work, particularly with creating redundant code.

If the codebase is small, duplicating a few things here or there really isn't noticeable. But when the code is large or complex, it becomes a crushing problem. Sloppy coding doesn’t scale.

Over the decades, lots of people have defined or discussed the issues. Some of their findings make it into practice, but often as new generations of programmers enter the field, that type of knowledge gets ignored and forgotten. So it’s always worth going back over the fundamentals.

A paradigm is an overall philosophy for arranging the code. This would include Procedural, Object-Oriented, Functional, ADT, Lambda Calculus, or Linear Algebra, but there are probably a lot of others.

Procedural is basically using nothing but functions, there is no other consistent structure. OO is decomposing by objects, Functional tends to be more by Category Theory, while languages like Lisp try for lambda calculus. APL and MatLab are oriented around vectors and matrices from linear algebra.

Object-Oriented is the most dominant paradigm right now; you decompose your code into a lot of objects that interact with each other.

However, lots of programmers have trouble understanding how to do that successfully, so it’s not uncommon to see minimal objects at the surface that are filled with straight-up Procedural code. There are some functions, but the code tends to hang together as very long sequences of explicit high and low-level instructions. The wrapping objects then are just proforma.

A paradigm can be used on its own, or it can be embedded directly into the programming language. You could do Object-Oriented coding in straight-up C —it is messy but possible — but C++ makes it more natural. The first C++ compilers were just preprocessors that generated C code. That’s a consequence of all of these languages being Turing-Complete, there is almost always a way to accomplish the same thing in any language, it’s just that the syntax might get really ugly.

We even see that with strict typing; if there is a generic workaround like Object, you can implement loose typing. Going the other way is obvious. In that sense, we can see strict typing as a partial paradigm itself that may be implemented directly in the language or layered on top.

Some languages allow you to mix and match paradigms. In C# you can easily do Procedural, Object-Oriented, Functional, and Declarative, all at the same time, intermixed. However, that is an incredibly bad idea.

Complexity exists at all of the borders between paradigms, so if the code is arbitrarily flipping back and forth, there is a huge amount of unnecessary artificial complexity that just makes it harder to infer that the code is doing what it is supposed to be doing.

So the key trick for any codebase is to commit to a specific paradigm, then stick to it.

We tend to allow new programmers to switch conventions on the fly, but that is always a mistake. It’s a classic tradeoff, save time today in not learning the existing codebase, but pay for it a few releases later. Software is only as good as the development team’s ability to enforce consistency. Big existing codebases have a long slow ramp-up time for new coders, not accepting that is a common mistake.

Within a paradigm, large parts of the work may follow a routine pattern. Before the GoF coined the term ‘Design Patterns’ people used all sorts of other terms for these repeating patterns such as ‘mechanisms’.

It might seem that data structures are similar, but a design pattern is a vague means of structuring some code to guarantee specific properties, whereas data structures can be used as literal units of decomposition.

One common mistake called Design Pattern Hell is to try and treat the patterns as if they were explicit building blocks; you see this most often when the pattern becomes part of the naming conventions. Then the coders go through crazy gyrations to take fairly simple logic and shove it awkwardly into the largest number of independent patterns. Not only does it horrifically bloat the runtime, but the code is often extra convoluted on top. Poor performance and poor readability.

But patterns are good, and consistently applying them is better. Partly because you can document large chunks of functionality just by mentioning the pattern that influenced the code, but also because programmers often get tunnel vision on limited parts of the computations, leaving the other behaviors as erratic. Weak error handling is the classic example, but poor distributed programming and questionable concurrency are also very popular. If you correctly apply a complex pattern like flyweights or model-view-controller, the pattern assures correctness even if the coder doesn’t understand why. There are far older patterns like using finite state machines as simple config parsers or producer-consumer models for handling possible impedance mismatches. Applying the patterns saves time and cognitive effort, while still managing to not reinvent the wheel. It’s just they aren’t literal. They are just patterns.

At a lower level are idioms. They usually just scope very tightly around a specific block or two of code. Idioms are the least understood conventions. You see them everywhere, but few people recognize them as such.

Some sub-branches of coding like systems programming rely more heavily on applying them consistently, but most of the less rigorous programming like application code doesn’t bother. The consequence is when you get problems like application code dipping into issues like catching or locking it is usually very unstable. Kinda works, but not really. To get it right, for example, for locking, means choosing the right idiom and rigorously making sure it is applied everywhere. Inconsistencies usually manifest as sporadic failures that are nearly impossible to debug.

There are way too many idioms to try and list them. They cover everything from the way you declare your code, to guard checks like asserts, to the way loops are unrolled. Most of them either enforce strictness, help with performance, or assure other critical properties. They are forgotten as fast as they are invented, and often differ by language, tech stack, culture, or expected quality. But finding good strong idioms is one of the key reasons to read a lot of other people's code; literate programmers are always stronger than illiterate ones.

Enforcing idioms is tricky. It should be part of code reviews, but a lot of people mistake idioms for being subjective. That might be livable for hasty in-house applications programming, but idioms really are key to getting quality and stability. Consistency is a key pillar of quality. You might initially pick a weak idiom —it happens —but if you were consistent, it is nearly trivial to upgrade it to something better. If you weren’t consistent, it’s a rather nasty slog to fix it, so it will probably get stuck in its low-quality state.

The biggest problem with all of these concepts is that most programming cultures strongly value freedom. The programmers want the freedom to code things whichever way they feel like it. One day they might feel ‘objecty’, the next it is ‘functionish’.

But being that relaxed with coding always eats away at the overall quality. If the quality degrades far enough and the code is really being used by real people for real issues, the resulting instability will disrupt any and all attempts to further improve the code. It just turns into a death march, going around in circles, without adding any real value. Fixing one part of the system breaks the other parts, so the programmers get scared to make significant changes, which locks in the status quo or worse. Attempts to avoid that code by wrapping layers around it like an onion might contain the issue but by throwing in lots of redundant code. Same with just adding things on the side. All of these redundancies are make-work, they didn’t need to happen, forced on people because they are avoiding the real issues.

Ultimately coding is the most boring part of software development. You’ve identified a tangible problem and designed a strong solution, now you just need to grind through the work of getting it done. The ever frequent attempts at making code fun or super creative, have instead just made it stressful and guaranteed to produce poor quality. It seems foolish to just keep grinding out the same weak code, only to throw it away and do it all over again later. If we’d just admit that it is mostly routine work, maybe we’d figure out how to minimize it so that we can build better stuff.

No comments:

Post a Comment

Thanks for the Feedback!