Thursday, November 17, 2022

Software Development Power Tools

You can type out a million lines of code. It will take you quite a while and it will have an incredible amount of bugs, but you can do it.

Along the way, you can make all of the code as concrete as possible. Code out each and every special case separately. Make everything isolated, so that the scope of any change is minimized.

This is the brute-force approach to programming. You just go through the long and hard exercise of pounding it all out separately. Then you go through the long and painful exercise of testing each and every possible permutation. Right at the end, you will get a workable codebase.

It works, but it is not particularly efficient, so it doesn’t scale at all. You can use it for small and maybe medium-sized systems, but that’s it.

Or, you can learn how to use power tools.

They are the digital equivalent of construction workers' power tools, like nail guns, chainsaws, power sanders, or sawzalls. And like their physical counterparts, while they can massively increase efficiency, they do also require new skills and are quite dangerous. So you have to be very careful with them. The web is littered with plenty of stories of coders injuring themselves while recklessly playing around.

A software development power tool has a couple of basic attributes. First is that when applied correctly it will save you a lot of time in coding. The second is that because it requires less code, it will also require less testing and cause fewer operational issues. But it will take more cogitative effort while working. And the final attribute is that it will actually be harder to debug, although there will be way, way fewer bugs.

When power tools are used properly they will massively cut down on code, testing, bugs, etc. When they are used poorly they can waste a lot of time, cause far more bugs, cause more damage to the code, and occasionally derail entire projects.

We can start with the most obvious power tool category: reuse. If you retype in the same code over and over again it obviously takes way longer. But as the code changes, the different variations fall out of sync with each other and cause bugs. But reusing code isn’t easy. You have to make it safe, quick, and convenient for everyone or they won’t do it. So, while it is easy to say, it is actually hard to enforce.

The granddaddy of all power tools: abstraction. Instead of pounding out each and every special case, you take a few steps back and come up with some generalization that covers all of them correctly. Of course, if you go too far back it becomes useless, and if you swing out in the wrong direction it becomes convoluted. Finding a perfect abstraction is a near art form, but if you do, huge programs can shrink down into manageable engines.

Complexity has a tendency towards exponential growth, so the next power tool tackles it directly: encapsulation. It’s often described as information hiding too, which is similar. Instead of having the guts of everything intermixed as one big exposed mess, or redoing it all as silos, you package the lower parts as well-defined, trustable, reusable opaque boxes. Internally they do something complex, but outside some of the complexity is taken away from the caller. All major technologies, such as operating systems, databases, etc. do this for the programmers. It is used everywhere, sometimes well, sometimes poorly. Its companion power tool is ‘layers’, in that they are at full strength only if they encapsulate enough complexity to make them worthwhile.

Statically encoding anything is always fragile, so the next power tool gets around that: dynamic behavior. Most of the time in code, the computer itself does not need a lot of the knowledge that the programmer has carefully placed there. So, the practice is to write code that only places the absolute minimum constraints on itself. Instead of just keeping a list of integers, for example, you keep a list of “anything”. The elements in the list are dynamic, so the underlying types can vary. Besides data structures, you can apply these ideas all over the place. The backend might not know about the specifics of any of its data. It just blindly grabs structures in response to requests. The front end doesn’t have to know either. It gets structures, maps them to widgets and display elements, then interacts with the user. None of the code knows about or understands the domain; the data then is just some data. Of course, eventually, the rubber has to hit the road or all you have built is another domain-specific programming language, so there is somewhere in the system that actuates the configuration, such that those details get passed around as more data. A bit tricky to set up, but immensely powerful afterward. Many of the huge, popular, vendor products are variations on this, some are full dynamic workflow implementations.

The expression ‘premature optimization is the root of all evil’ is just a warning that this next approach is also a power tool: optimization. The tendency is to think that optimizations are algorithmic ways to cut down on computation, but really they are far more than that. For example, you can remember and reuse data you saw before, its full name is memoization, but a sub-variant of it, caching is often quite popular. Some approaches that people think are optimizations like space-time trade-offs are actually not. The resources have shifted, not gotten smaller. Most categories of optimizations are fairly well documented, but they actually require significant research in order to fully understand and apply them. Attempting to reinvent an existing optimization is often a setback of maybe 20 to 40 years of knowledge.

A cousin of optimizations: concurrency. It’s not technically an optimization, in that you're doing the same amount of work, you're just distributing it to more computing engines all at the same time. To get it right you have to strongly understand atomicity and locking. When it’s wrong, it often manifests as Heisenbugs, nearly impossible to find or correct. Multi-process work was difficult enough, but they removed the safeties to allow multi-threaded behavior. Dealing with it is mandatory to be able to scale. But small and many medium systems don’t need it at all, the significant management overhead it causes can outweigh the benefits.

At a more micro level: syntactic sugar. This occurs when there are two or more ways to encode the same thing in a programming language, but one has a drastically reduced fingerprint. Used properly it is less code and better readability. Used badly, it is near zero readability, and possibly a landmine. It’s the easier power tool to access, and it is often the first one that people start to play with, but it is also the one that is the most abused and has caused the most harm.

You can’t discuss power tools without discussing the obsequious one: paradigms. Despite fancy language features most code these days is really just simply Procedural. That is, there are lots of functions that are cobbled together for questionable reasons. Object Oriented, Functional, etc. are deep paradigms that are embedded directly into the underlying languages. To get value from them, you have to really learn how to use them correctly. It’s a lot of often conflicting knowledge, with many different eclectic flavors. Misusing a paradigm leads people down dark holes of wasted effort. Most often, you shouldn't even mix and match them, it rarely works well. Probably the best approach is to use one consistently, but if it starts to not fit nicely, tweak it everywhere until it improves.

An oldie but goodie: normalization. Usually associated with just the data, it can be applied to code as well. Instead of just tossing the data into any haphazard structure in the database, you apply a strict set of rules to rearrange it and make sure that it has useful properties, like no redundancy. It’s a bit obtuse, but even partially done it makes the code way smaller and cleaner which cuts down on bugs, hacks, and very expensive data fixes. Data forms the foundation of every codebase, building on a broken foundation never goes well.

There are probably a lot more power tools out there, and no doubt there are still some new ones yet to be discovered.

In the panicked rush to code, people often avoid power tools, but that is frequently a rather limiting direction. Even if it all starts out as small, almost any successful coding will continue to grow. Once it passes into medium size territory, if you still want to go farther, you have to use the techniques and skills that will allow you to keep going. To build large systems, strong organization and power tools are absolute necessities, although lots of people end up learning that the hard way.

No comments:

Post a Comment

Thanks for the Feedback!