The Programmer's Paradox: August 2020

Saturday, August 15, 2020

Defensive Coding: Minimal Code

Sometimes you come across really beautify code. It’s clear and concise. It’s obvious how it works. If you have to edit it, it is intuitive where the changes should go. It looks super-simple. It’s a great piece of work.

Most people don’t realize that getting code to look super-simple is a lot of effort and a huge challenge. Just splatting out any initial version is ugly. It takes a lot of thought, refinement and editing work to get it looking great.

All code degrades with time and changes. If it starts out good, it will get tarnished but should hold its value. If it is ugly on day one, it will be a pit of despair a year later.

One way of approaching the problem is to equate super-simple code with the act of minimizing some of the variations until we come down to one with reasonable tradeoffs. We can list out most of these variations.

Minimize:

The number of variables
The length of a ‘readable’ name
The number of external jumps needed in order to understand the code
The effort to understand a conditional
The number of flow constructs, such as if statements and for loop
The number of overlapping logic paths
The number of hardcoded constants
The number of disjoint topics
The number of layers
The number of reader’s questions
The number of possible different behaviors

We’ll go through each of them accordingly.

Variables

We obviously don’t want to have the code littered with useless variables. But we also don’t want the ‘same data’ stored in multiple places. We don’t want to overload the meaning of a variable either. And a little less obvious, if there are several dependent variables, we want to bind them together as one thing, and move it all around as just one thing.

Readable Names

We want the shortest, longest name possible. That is, for readability we want to spell everything out in its full detail, but when and where there are different options for that, we want to choose the shortest of them. We don’t want to make up acronyms, we don’t want to make up or misused words, and we certainly don’t want to decorate the names with other attributes or just arbitrarily truncate them. The names should be correct, we don’t want to lie. If the names are good, we need less documentation.

External Jumps

If you can just read the code, without having to jump all over the code base, that is really good. It’s self-contained and entirely under control. If you have to bounce all over the place to figure out what is really happening then that is spaghetti code. It doesn’t matter why you have to bounce, just that you have to do it to get an understanding of how that block of code will work.

Conditionals

Sometimes people create negative conditionals that end up getting processed as double negatives. Sometimes people see the parts of the condition getting spread across a number of different variables. This can be confusing. Conditionals should be easy to understand, so when they aren’t they should be offloaded into a function that is. So, if you have to check 3 variables for 7 different values, then you certainly don’t want to do that directly in an ‘if’ statement. If the function you need to call requires all three variables, and a couple of the values passed, you probably have too many variables. The inputs to a conditional check function shouldn’t be that complex.

Flow of Control

There is some minimum structural logic that is necessary for a reasonable computation. This is different than performance optimizations, in that code with unnecessary branches and loops is just wasting effort. So if you loop through an array, find one part of it, then loop through it again to find the other part, that is ‘deoptimized’. By fixing it, you are just getting rid of bad code, but still not optimizing what the code is doing. It’s not uncommon in ugly code to see that a more careful construction could have avoided at least half of all of the flow constructs, if not more. When those useless constructs go, what is left is way more understandable.

Overlapping Logic

A messy part of most programming languages is error handling. It can be easily abused to craft blocks of code that have a large number of different exit points. Some necessary error handling supports multiple different conditions that are handled differently, but most error handling is rather boolean. One can mix the main logic with boolean handling and still have it readable. For more sophisticated approaches, the base code and error handling usually need to be split apart in order to keep it simple.

Hardcoded Constants

Once people grew frustrated by continually hitting arbitrary limits where the programmers made a bad choice, we moved away from sticking constants right into the code. Modern code however has forgotten this and has returned to hardcoding all sorts of bad stuff. On rare occasions, it might be necessary, but it always needs to be justified. Most of the inputs to the code should come through the arguments to the function call whenever possible.

Disjoint Topics

You can take two very specific functions and jam them into one bigger function declaration. The code for each addresses a different ‘topic’, they should be separated, they shouldn’t be together. Minimizing the number of functions in code is a very bad idea, functions are cheap but they are also the basis of readability. Each function should fully address its topic, the code at any given level should all be localized.

Layers

The code should be layered. But taken to the extreme, some layers are useless and not adding any value. Get rid of them. Over minimization is bad too. If there are no layers, then the code tends towards a suer-large jumbled mess of stuff at cross-purposes. It may seem easier to read to some people since all of the underlying details are smacked together, but really it is not, since there is too much noise included. Coding massive lists of instructions exactly as a massive list is the ‘brute force’ way of building stuff. It works for small programs but goes bad quickly because of collisions as the code base grows.

Reader’s Questions

When someone reads your code, they will have lots of questions, Some things will be obvious, some can be guessed at given by the context, but somethings are just a mystery. Code doesn’t want or need mysteries, so it is quite possible for the programmer to nicely answer these questions. Comments, comment blocks, naming, and packaging all help to resolve questions. If it’s not obvious, it should be.

Different Behaviors

In some systems, there are a lot of interdependent options that can be manipulated by the users. If that optionality scrambles the code, then it was handled badly. If it’s really an indecipherable mess, then it is fundamentally untestable, and as such is not production worthy code. The options can be handled in advance by moving them to a smaller set of more reasonable parameters, or polymorphism can be used so that the major permutations fall down into specific blocks of code. Either way, giving the users lots of choices should also not give them lots of bugs.

Summary

There are probably a few more variations, but this is a good start.

If you minimize everything in this list, the code will not only be beautiful, but readable, have way fewer bugs and people can keep extending it easily in the future.

It’s worth noting that no one on the planet can type in perfect code the first time, directly from their memory. It takes work to minimize this kinda stuff, and some of the constraints conflict with each other. So, everyone’s expectation should be that you type out a rough version of the code first, fix the obvious problems, and then start gradually working that into a beautify version. When asked for an estimate, you include the time necessary to write the initial code but also the time necessary to clean it up and test it properly. If there is some unbreakable time panic, you make sure that people know that what is going into production is ugly and only a ‘partial’ answer and still needs refining.

Thursday, August 13, 2020

Defensive Coding: KISS

Nothing gets programmers into more hot water more than the ‘keep it simple, stupid’ (KISS) principle.

There are 2 important ways it causes grief.

The first is when programmers assume it applies directly to them. They want to keep their work simple, so they want to keep their code simple.

The failure here is that there is always some ‘intrinsic’ complexity that is totally unavoidable as a byproduct of our physical universe and its history. If a programmer ignores some of that complexity and makes their code super simple, the complexity hasn’t gone away, it has just moved elsewhere. More often than not, it has gone directly to the users.

So, now the code is simple, but using it has become significantly more complicated than necessary. A super simple feature either doesn’t do much or it is horrifically hard for users to get real things accomplished. Either way, its value is limited.

If you want to keep users interested in using a system, then the code in it has to really solve their problems and it has to keep it simple for them. KISS applies to the solution, not to the way it was built. That is not simple code to write since it means ensuring that the users don’t need to lean on any outside resources. The system needs to know anything and everything about the problem, to keep it all up-to-date, and to never forget stuff, which is, of course, large and very complicated.

KISS can be applied to a specific sub-component of the mechanics, but it only works if you are avoiding some explicit overcomplication. That is, if you are thinking about adding stuff that will definitely never get used, then you can remove that. However, if it might get used someday, then by removing it, you are actually setting up a new problem. Maybe a small one, but it also could be a huge one in a short time from now.

The other way programmers get into trouble with KISS is that they assume that if they have to build a series of components, that making each one of them simple will help simplify the whole thing. The opposite is often true.

Simplifications, almost by definition are things that are not there. Gaps in what could have been. Those things might have been necessary or they could have been completely extra, but they are still absent. Collecting together a bunch of non-related things all together in the same place is another way to describe disorganization, and oddly, collecting together a bunch of missing things, related or not, is almost the same.

The growing absence of stuff is a form of complexity and if not handled explicitly it will get more complex. So, a small piece missing from one area can be worked around fairly easily, but when there are dozens of things missing all over, the workarounds are harder and more complex. Just remembering what isn’t there becomes a huge burden. So, if there are lots of things that are missing, their combined effect will be multiplicative, not additive, if they are not 100% independent (which they usually aren’t). Simple is not a cumulative property, complexity is.

“Everything should be made as simple as possible, but no simpler.” -- Albert Einstein

The key part of this quote is that while you can overcomplicate things, you can also do the opposite and oversimplify them, which is bad too. When KISS is misapplied or taken to an extreme not only does it craft a system that users hate, but it will also slowly degrade into a horrific ball of complexity that at some point becomes unfixable.

It’s always a shock to the programmers, particularly if they have been diligent about trying to simplify everything when it suddenly kicks back up and becomes the problem itself.

Stuff that is missing or not included is always going to be a big problem if you find out later that you need it. It’s that old ‘a stitch in time save nine’ saying. Filling the code with a lot of extra stuff that isn’t used ain’t great, but then it is a little better than waking up one day and realizing that what you didn’t do earlier -- when you had the chance -- is now going to derail everything.

Don’t overcomplicate things, but don’t use that as an excuse to swing way out to the opposite extreme, either.

Tuesday, August 11, 2020

Integrity and Professionalism

It is important to be able to rely on the people around you.

If everyone has a hidden agenda that’s not to your benefit, then just getting through the basics of life is complicated. You have to be constantly on your guard, and you have to be quickly reactive to whatever growing problem they set off. That feeds an instability which kinda sucks.

What underpins stability and the confidence to rely on the future comes from a solid foundation. Two critical components of that foundation are integrity and professionalism.

Integrity is mostly an internal attribute. Some people mean well and try not to rock the boat or cause unnecessary problems for others. They are trying to be honest and working hard to not play games. Basically, it’s not being sneaking, but wanting to be clear and upfront with people, so that they don’t get the wrong impressions or have suspicions about your motives. They are just trying to be nice, decent people, that mean well towards others.

Professionalism is when that extends to the work you are doing. If you have the knowledge and training, and you start some piece of work with full concentration and ensure that it is done correctly, that is a professional demeanor. If you are also courteous and explain any issues, then it is better. Either way, if someone tasks you do something that is reasonable they can be assured that you’ll get the job done within an acceptable time frame. If what they are asking has issues, you’ll inform them in advance, and let them know what they can expect.

While these are often seen as just personal attributes, they extend to all other organizations as well. Basically, companies, countries, social groups, etc. all have a personality too, and they act as a super-organism. For people, they might be having an off day, for organizations you might encounter a rogue member, but the same attributes apply. You can judge an organization on the way it mostly behaves, the same way you might react to people.

Bad behavior is contagious. The best-known example is littering, where it is believed that people are much more likely to litter if the ground is already full of litter. That tends toward being true for most other bad behaviors as well. Corruption encourages corruption, violence builds up, and if most people are being horrible and sneaky and selfish, other people get pulled into that behavior too. A society left out of control will quickly descend into chaos, with everyone just looking out for themselves. It’s so pervasive in human behavior that it might actually be deeply hardwired into the species.

The opposite is true but to a lesser effect. Doing a good deed of the day, for example, is a little contagious. It doesn’t affect as many people, and it doesn’t last as long. Still, it does carry through somewhat. And it’s that attribute that is so important. One of the key reasons to act with integrity and to be professional when working is that you would prefer that that is how others act around you. If you set an example, people do tend to follow.

If you work in a dog-eat-dog environment with everyone out for themselves, the environment itself is horrible and draining. Well, some tiny number of people find it fun and exciting, but the rest of humanity does not seem to feel that. So, if we don’t want that, one of the ways to help prevent it is to actively force yourself to rise above whatever is currently happening and to help lift everyone else too.

Knowing that good behavior is somewhat contagious really helps. Your valiant efforts to make things better around you aren’t wasted, they just take a lot longer to kick in then the dark side. And knowing that the bulk of people would prefer it your way, it’s just a matter of time and constantly reminding people to try and be a little bit better each day. Bad people rely on the rest of us turning away, good ones rely on us all making small little constant positive contributions.

Aside from trying to make your own life and working environment better, we can actually use this knowledge to improve the overall world. For one, we can try to avoid dealing with or enabling any bad actors. If they don’t act with integrity or professionalism, then it is worth putting a bit of effort into figuring out how to avoid them. It doesn’t have to be a major cause or some moral battle, just the notion that if there are at least a few choices between different vendors than it is far better to take the better one, even if there is a financial reason to take the other.

That is, a lack of integrity or a lack of professionalism should beat out price as a disincentive. The horrible company may be a bit cheaper, but they are a horrible company, so why give them power or enable them? Spend a little bit more to get a better world.

The corollary to integrity and professionalism is that prologued dealings with shady people tend to taint it. That is, if you spend all of your time in the company of horrible people, eventually you’ll become a horrible person yourself, little by little. So, we do have a choice and an effect on the world around us. You can spend some small effort, now and then, to make sure that you are not enabling the dark side, and in exchange, the world gets a tiny bit better. Or you can not worry about it, chase the best deals for yourself regardless of who is offering them, and watch the world grow worse.

Friday, August 7, 2020

Developer Management

It’s very difficult to bring a group of software developers together and get a reasonably well-built software system out of the process.

The problems stem from the underlying programming culture. Coders tend to be introverted, their thinking is often extremely black-and-white and they go rogue quite frequently.

This leads to a number of unfortunate outcomes.

First, they are not really good at communicating the problems and issues that they are having getting the system built and running. They don’t like to admit problems. Coupled with the modern impatience to get things done too quickly, this often reduces the timeframes to considerably less than is what is necessary to do a good job. If the stakeholders aren’t aware of many of the ongoing problems, they aren’t going to have realistic expectations on the progress of the work.

Coding requires a rather rigorous logical mode of thinking. Stuff either works or it doesn’t. Things are either right or they are wrong. The computer does exactly what the programmer tells it to do, there is no room for ambiguities. If you spend your day writing code with these properties, it tends to leak out into the rest of your thinking, life, and interactions with people. The world, however, is grey and murky, and often requires a soft touch to work through the issues. A big team of people working together generates a lot of different agendas and politics. None of this is nicely black and white, so there is a lot of friction between how the programmers think the world ‘should’ operate and how it actually does operate. This leads to a lot of misunderstandings, anxiety, and confused goals.

With a different perspective on the priorities, and a desire to not want to talk about it, programmers are infamous for just making a decision on their own and then going off with full confidence to get those things done. The problem is that those attempts are often not in sync with the rest of the project, so they basically go rogue and end up doing something that doesn’t provide value. A sub-group of coders incorrectly heading towards the wrong objectives will conflict with the important ones, so the result is poor or useless work.

It’s hard for management to distinguish between a rogue programmer and one not doing any work at all. In both cases, what gets accomplished is nothing. Usually given that their expectations are off too, this builds up a lot of tension between the developers and management.

In the past, people like to blame the “waterfall methodology” for going off in the wrong direction and returning with stuff that was not useful. They insisted that it was the methodology that was at fault, that it was a ‘time’ problem, but there is a little more to it than that.

If it's a big, well-defined project that takes 1.5 years to accomplish, doing it in one long continuous unit of work is a whole lot more efficient than breaking it up into little iterations and trying to stitch them together somehow. Mostly batching together similar work is more effective, is better for consistency, and for focus.

The big failures before the turn of the century drove the stakeholders to seek out better ways of tightly controlling their software projects. The culture of programming itself helped. Both sides settled on an invasive form of micromanagement. The coders seeded control of their work. So, the prerequisites for deciding on the right work like analysis and design get ignored, while the whole effort is gamified with a rather childish bent on formality. You don’t have “excessive” daily status meetings, you have ‘standups’ instead. There isn’t a long list of pending work, it’s a ‘burndown’ chart. We don’t break down the work into tiny, little, verifiable chunks, it's called a ‘sprint’. Planning is a game, not a necessity, and the stick is called a ‘retro’ which is somehow supposed to good for you.

Each time management was compelled to reach for the thumbscrews and lockdown inappropriate behavior, consultants came up with cutesy little names and games for implementing it, and pushing it as propaganda to the rest of the industry. It’s unfortunate.

For me though, the fundamental problem is not upper management controlling how the system is constructed. Rather, it is the role of leading a group of developers that is confused. It’s not a technical role and it's not a business role.

Ultimately, there are some high-level goals that need to be accomplished. The people setting those goals do not have the ability to break them down and express them as a very, very long list of complicated programming tasks. That’s the communications impedance mismatch. If you hire a bunch of programmers and can’t tell them what to do, and they won’t tell you what problems they are having, then it is pretty obvious that the project is not going to function.

So, you need an intermediary. Someone who has spent a lot of time programming, but also someone who has been around the higher-level objectives enough to understand them. They have to have a foot in both worlds because they have to accurately translate between them.

They might not be the best programmer, or be able to solve silly little fake coding issues. They just need to have spent time in projects of similar scale. They need to get their priorities straight. They might not fully understand all of the business objectives, or be a domain expert, but they need to have empathy for the users and to have some depth in their specific domain problems. They sit in the middle, and they ensure that the upper goals are progressing, while the lower work isn’t going rogue.

Over the decades, I’ve heard many an entrepreneur reach the conclusion that all they need is a group of students that can code a little bit in order to get their product to market. That’s kind of the classic delusion. It sees coding as a commodity that just requires enough ‘energy’ to drive it forward. Oddly, that can work for a demo or a proof-of-concept or some other introductory software that just needs to kinda work in order to grab more interest, but it fails miserably once it becomes real, mostly because the necessary skills to keep it all organized and keep it growing are missing. So, it’s a start that can be used to evaluate ideas, but not a product that will work when needed.

Moving up to that next level means getting serious about keeping the work under control. It’s not a small gap, but actually a rather huge one. It’s not intuitive and the kids that threw together the prototype code won’t be able to magically pull it from the ethos. This is where coding switches from being a game to getting serious. It can’t afford to be ineffective anymore, it can’t afford to be disorganized anymore. Everything changes.

For medium, large and massive projects, even the smallest issues have huge, wide-ranging consequences. You learn to deal with them via experience, and it is these issues that are far more important at this point, than the actual coding itself. Fixing the code is cheap, unrolling a ball of mud is expensive. An intermediary who knows it is important to restrict ongoing dependencies, for example, is a much better asset than a coder who can craft unique algorithms. The wrong algorithm is useless, while the wrong dependencies are often fatal.

In an industry known for its agism, and for still having a high rate of failure, you’d think it would be obvious by now that we’d know there is a missing critical component in the effort. But oddly, the stakeholders still think programmers are just cogs, and the coders still think that if they just had “more code” their problems would magically disappear. The technologies have changed, the methodologies have gotten crazier, but the underlying problems are still the same. Breaking up a months’ worth of work up into hundreds of artificial 2-week tasks doesn’t ensure that it will go any better or be more appropriate. Instead, it tends to build up a counter-culture of gaming that process. Since it’s all indecipherable from above, it does nothing to ensure that progress is actually moving as best as possible. It just provides a false sense of momentum. And the games that coders play may distract them for a while, but the necessary satisfaction from doing a good job is missing, so they aren’t getting what they want either.

Part of programming is really boring, routine, software production. It’s just work that needs to be done. Some small parts of getting a big product out to market are creative, but more often than not the creative portions fall into the business, design, and architectural efforts. If the ideas and issues are worked through in advance, and any difficult technological issues are prototyped up front, then the rest of getting out a new release is just the careful assembling of all of the pieces in an organized manner. It’s not a game, it’s not a contest. Having someone who knows what is really important during this phase of the work is going to prevent a lot of predictable quality issues from materializing. Like any other profession, programming isn’t “fun”, but when it is done well it can be quite satisfying.

Sunday, August 2, 2020

Duality

There are two very similar ways of looking at software systems.

The most common one is to see it as a lot of code that moves data around. Code is the primary concept. Its dual, however, is that you see it as data, with the code just assisting in getting it from place to place. Code is a secondary issue.

They may seem similar, and it's easy to miss the difference between the two, but from a higher level the second perspective is a lot simpler and a lot more powerful.

When we first learn to program, we are taught to start assembling larger and larger code fragments. First, it is small examples of branches and loops, putting some code into functions, calling other chunks of code to get them to do stuff. That, rather directly, establishes that ‘the code’ is super important. We get caught up in syntax issues, different languages, and various IDEs. Early coding tasks are to ‘write some code that ...’, interviews are uber-focused on the code too. Everything is about ‘code’.

Data is usually introduced as a secondary issue, but most often it is somewhat trivial and primitive. If we take a data structures course, the actual data is even abstracted away, we’re just left with abstract relationships like lists and trees.

That carries on through most programmers' careers. Their work assignments are often crafted as producing some code to support a feature. In many shops, the code is started first, then later they realize that there were some details about the data that were missed. If there is a bug to be fixed, it is because some code is missing or calculating the wrong values.

So its code, code, code, all of the way down. Data Is an afterthought.

The point of taking a data structures course is lost on most people. Sure, the underlying data is abstracted away but it's not because it doesn’t matter. It's the exact opposite. Data structures are a pretty complete means of decomposition. That is, you can take most large and complex programs and rephrase them as a set of data structures. Most programs are just a bunch of primitive operations happening on a fairly small set of structures like lists, trees, stacks, queues, etc. If those structural operations are pulled out and reused, the resulting code is way smaller, and intrinsically has less bugs. That’s why Donald Knuth collected them all together in the Art of Programming, that is why they keep getting taught in classes. They are the ‘power tools’ of programming, but to get them to work you have to flip your perspective on the system.

Data structures aren’t the only opposite approach. The general ideas around them got formalized and explicitly wired into languages as Object-Oriented programming. In non-OO languages, the programmers had to set up the structures themselves and keep any primitives nearby. With Objects that become a part of the language syntax. Objects bind code directly to the data not because it is fun, but so that objects that are structurally similar can be encapsulated and reused and they can point, polymorphically to other types. It is exactly the same as basic data structures, just formalized into the language.

It’s also why people writing a lot of code in OO that is focused on doing super long lists of instructions end up getting into such a mess with Object-Oriented languages. Retrofitting a brute force procedural style into objects is basically going against the grain of the language. Objects as mega-instructions clash with other such objects, which prevent reuse, are hard to understand collectively and are prone to integration bugs. It makes the code awkward, which keeps getting worse as the system gets larger.

While data structures are very powerful, they are just the tip of the iceberg and came of age in an era when data was really rare and hard to get. That all changed when computers became ubiquitous and networked. Now data is plentiful, everywhere, and often of extremely poor quality. Data was always important, but it's getting more important as we collect a lot more of it and want to draw wisdom out of what we have.

So the alternative perspective is to see a system, just by its data, and how that data flows around.

For data to be valuable, it has to stay around for a long time, which happens when we persist it into a database. In all ways, the schema defines the possible scope and capabilities of a system. If you haven’t persisted the data, it isn’t available for usage. If you saved some of it, in an awkward format, that will percolate upwards through all of the other functionality you use it for. If the data is available, guaranteed to be correct, then most of the features that require it are fairly simple. If its hard to write some code, its often because the incoming data is a mess.

All data starts somewhere in the real world. That may seem like a controversial statement since data like log files originate in response only to changes in the digital realm, but if you accept that those come from the behavior of the underlying hardware then it makes more sense. Besides operational data, the rest of it is entered from end-users, administrators, programmers, or third-party organizations. It starts in the real world, and really only has value in the real world. On top of this raw information, we can derive other useful relationships but it all has to start somewhere.

What becomes important then is how the data flows from one location to another. For example, it may have started as an observation by a person. They used some interface to get it into persistence. Later it might be pulled from storage and used to augment some other data. That new data is shown in another interface or packaged together and sent to a remote location. Maybe in exchange, more data flows into the system from that remote site.

If you just look at the data and ignore the code, threads, processes, etc. most systems are not particularly complex. They act as a hub to collect and distribute data to a whole bunch of different sources, people, or other machines.

What’s needed then, to build a system that manages that data is many fragments of code that can move, decorate, or translate the data as it circulates around. If you minimize the fiddling done with that code as it travels around, you’ve optimized a large portion of the system without even realizing it. The closer those data formats are in the widgets, middleware, and persistence, the less work that is needed when it is in motion. Moving data is always expensive.

That perspective puts data first. It is more important than the code, and the idea is to minimize what the code is doing to it whenever possible. There still might be memoization or other macro optimizations that are possible, but those can be seen as refinements.

What becomes important then is keeping dependent pieces of data together and managing the structural relationships between these ‘entities’. This brings us right back to data-structures. They deal with the relationships between these larger chunks and can be built to have indirect references to the underlying entities. Why? Because the structural relationships are smaller and more common. If you get them perfected for one type of entity, you can reuse them for all types of entities. That then just shifts the problem down to picking a good set of entities and relationships that best fit the data as it originated in the real world, or basically ‘modeling’.

Now the data perspective doesn’t magically eliminate performance problems. Systems aren’t intrinsically scalable, handling huge loads has to be explicitly engineered into the system in order for it to work correctly. But seeing the system as flowing data does make it a whole lot easier to lay out an industrial scale architecture.

Take caching for example. From a code perspective, programmers often just allocate a chunk of memory, set it up as a hash table, then do some type of lookup first to get the value. That seems to make caching easy, but eventually, the real problems show their ugly heads. If you see caching as a form of memoization where we keep a smaller pool of data closer to the action, then what is obvious is how to decide what’s in that pool and what's not in it. Caching stale data is bad, and also letting the cache grow to be the same size as the persistence is rather pointless. But on top of that, if the data may be in two places at the same time, what happens when you need to update it? The code perspective of caching forces programmers to think about memory, while the data perspective forces them to think about the quality of the data. It makes it easier to see the whole picture, which makes it easier to get the implementation correct. Once you realize that removing stuff from the cache and keeping it in sync during writes are the real problems, figuring out what code and architecture are needed to make this happen is a whole lot easier.

The same is true for putting queues between systems to resolve speed differences and with syncing external information for read-only internal copies. Pretty much all of the big enterprise issues get a whole lot less challenging. The difficult problems shift away from some tangled mess of code, to really keeping the whole thing organized and operating correctly.

It also applies to the really big systems as well. It’s hard to figure out how to massively parallelize systems until you realize that it is just an issue about dependencies. Where data is independent, it can be safely split across threads, processes, or machines. If it's not independent, then there will be problems. So, instead of being a complex coding problem, it is really a complex data modeling one. What underlying model do we need that both approximates the real-world issues, but also guarantees some usable independences that can be parallelized? If you structure the data correctly, the rest is just spending the time to write out the code.

As an industry, we usually start by teaching new programmers to code. That’s okay, but we often fail to teach them how to flip to this dual perspective. Instead, we leave them dangling in the wind, trying to crank out code from what is clearly a much harder perspective. Sure, some people get data structure courses or even computer theory, but then they go out into the industry and none of that sticks. To make it worse, people tag it to ‘abstraction’ and ‘overcomplexity’ and keep trying to insist that writing more overly simple code, faster, is somehow going to make it better. It even percolated back into our bad interview styles.

It permeates everything, making most serious programming tasks a whole lot harder than they need to be. If you’ve ever suspected that there was an ‘easier’ way to build systems, then you were right. If you forget about the code, and focus on the data, getting that set up correctly, then most of the coding is easy and straightforward.