Sunday, June 14, 2020

Naturally Encapsulated Coding

In a number of different systems, more so recently, I’ve seen a lot of code that I refer to as ‘brute force’.

It manifests itself as huge convoluted functions that do all sorts of disconnected things, as well as a lot of hardcoded data and often a crazy number of ‘if’ statements. Extreme examples are highly fragmented as well, it’s hard to follow the logic since it is distributed all over the code and the flow is bouncing around erratically.

Generally, code gets built that way because that’s how the programmers want to think about the problems they are given, and many of them feel that it is easier if it’s explicitly coded into the program in the exact same way.

The concern is that type of code is highly redundant, fragile, and generally, once the program gets large enough, nearly impossible to keep extending. So, what starts out as an easier and cheaper way to get quick functionality into the code, eventually becomes a significant blocker that degrades the project’s ability to move forward.

Underneath, the primary issue seems to be the way people think about implementing a solution.

Most programmers are pretty much left to their own devices to figure out larger code structuring issues. It’s not really talked about or taught, and even though it is somewhat implied by a paradigm like object-oriented, it was rarely ever explicitly stated that way. As such, unless you ended up working with code that was well-structured at some point in your career, you’re probably not going to figure it out on your own (unless you are a mathematician). You’ll just default to what you are most comfortable with, unaware that in doing so, you are gradually making your job way harder.

Instead of seeing the work to be done by a computer as a very long list of instructions that need to be executed, we need to flip around our perspective. Data is the dual of code. They are both necessary for a code to execute, but in many ways seeing the computation from the data perspective is a lot easier than seeing it from the code perspective.

So, let’s say that there are 250 little execution steps that need to be triggered for a new feature.

The first set of instructions is about fetching some data, X, from persistence somewhere, and ensuring that it is good data.

So, we can consider the translation null -> X. Basically we start with nothing, then we get an X. But, as is usually the case, we almost never really start with nothing, really we are given some search data S, and that is what we need to find X.

So, rather obviously, and kinda cleanly, we just need something in the code that looks like getX(S) -> X. Now, that thing may go to the database, and as it travels there, it may find out that the connection to the database doesn’t exist yet, and the config info needed to initialize it may not be in memory, it might still be somewhere on disk. But we need this DB thing to be up.

So, we follow the same mechanics with getDB(C) -> DB, where C is some config information. If we don’t have that yet, again the pattern repeats as getC() -> C.

This whole DB/C thing is messy, and we want it decomposed away from the other work we are doing. It’s probably a one-time initialization that could or could not happen lazily. There is also fault handing that should be wrapped around it to handle periods of unavailability or bad queries.

So, if we are being nice, and we want getX() to not be stateful, we pass the connection in from above, in some type of environmental context that understands the running issues in the system, let’s call this E for environment.

So, getX(E,S) -> X is what we need to build. That gives us our base work.

Now X is probably raw data, and it needs to be prettied up, so again, by some context that was given to us from above. We’ll call that the user context or U. So to get to pretty data X’ we need something like:

     decorate(U,X) -> X’

At that point, we can distribute X’ to a set of widgets for a screen, or send it down some pipe and then to a bunch of widgets. Putting this all together we get:

once-in-a-while:
     getC() -> C
     getDB(C) -> DB
     DB -> E

per X-type request:
     -> U,S
     getX(E,S) ->X
     decorate(U,X) -> X’
     X’ -> widgets|pipe

Now, what’s important here is that we’ve decomposed this problem, not as a single large set of instructions but really by what amounts to ‘topic’ or ‘paragraph’ as it is oriented to the data. Basically, when the data changes the topic changes along with it, so we can break that into a new paragraph (function, method, etc.). If we put in ‘layers’ on data transformations, then the code is really easy to write, and use, and it is highly reusable.

If we had a second decorative type like X’’, then it is easy to create a new request type, or even put a switch in the code (depending on what minimizes the overall complexity). The same holds true if we need a Z that is composed from X and Y’.

If there is a bug, that is allowing bad X’s into the code, it’s rather obviously in getX. If on the screen X’ is funny, but the data in the database is good, then ‘decorate’ is the culprit. What that means is as well as consistency, and code reuse causing fewer bugs, we are also getting a really solid means of triaging the bugs we see that is letting us narrow down the code, by the impact of the bug.

The core thing here is to build upwards. The incoming requirement came from the top-down, but taking that literally gets us back to big, ugly lists of code. Instead, we start by looking at the data that needs to be persisted, we get that into the code, move it to where it is needed, then we start applying algorithms to it, to get it into the correct format. From there we just take that ‘data’ and get it into the final widget structure of the screens, or the data structure of a file, or the data format for an exported protocol, or any of the other ways that the data may leave the system.

So, start with the data, and move it to where it is needed, making the minimum number of transformations along the way. The whole task might equate to a list of things to do, but the construction is handled by building up larger and larger components until the goal is accomplished.

While that’s a fairly simple example, a more common problem is that we might need some new Z, as a computation based on X’ and Y’. But for whatever reason, X and Y are somewhat of a mess as stored currently in persistence. Instead of trying to redo them somehow, the first 2 steps are extending the persisted data properly, so that we really do have X’ and Y’. In the code, that gives us some code that uses X and some new code that needs X’. That’s fine if X’ is a proper superset of X, but what if they contradict each other? If it’s mergeable, then we can combine both for an X’’ and modify the existing getX(E,S) -> X’’ to handle the new data and backfill any missing parameters. If for some reason the changes are not mergeable, then we might need polymorphism to treat both types as the same, even if they have different structures, so X’’ is either X or X’ depending on its origins. Once X’ is available, we do the same for Y’ and then follow the initial approach to get Z into the code. The tricky part is seeing the feature as Z, Z’, X’ and Y’, but once that is understood the coding is very straight-forward and somewhat mechanical.

What’s key here is that we are not just trying to add in new features by setting them up independently and wrapping them around the existing mess, but rather looking at the underlying persistent data and extending its model to be larger to hold any new understandings that come with the feature.

What scares programmers away from doing this is both having to understand that X is already in the system, and also to change an older ‘getX’ or ‘decorate’ written by someone else. Shying away from doing that may be convenient, but it causes significant accidental complexity and disorganization that is accelerating the technical debt. So, it’s a twofold degeneration, the way that they are thinking about the solution is incorrect and their habits of updating the code will lead to a mess.

It’s unfortunate that we haven’t spent much time analyzing why code gets written in problematic ways. Likely it is because some aspects of coding are subjective, and we use it as an excuse to avoid trying to talk about any of it. Another candidate is that we still want the act of programming itself to be ‘creative’, even when it’s obvious that the ‘coding’ part of building big software shouldn’t be since it just leads to a chaotic mess. If we forget about the code itself and concentrate on making sure the data is of good quality and it is visible to the users when they need it, most programming tasks are easier, a few are more difficult and the users are happier.

If we get lost in trying to creatively guess at what we can do in the code based on the number of coding tricks that we are currently aware of, then that view of how we are coding influences the stuff we produce, which inevitably is oriented towards being easier for a specific programmer to write, than it is for the user to get their work done. I’ve written about that in the past: https://theprogrammersparadox.blogspot.com/2012/05/bag-otricks.html, but it is not that easy to describe it in a way that people can pick it up and use directly.

No comments:

Post a Comment

Thanks for the Feedback!