Friday, April 26, 2024

The Origin of Data

In software development, we create a lot of variables and data structures for our code that we toss around all over the place.

Into these we put data, lots of it.

Some of that data originates from far away. It is not our data.

Some of that data is a result of people using our interface. Some of that data we have derived from the data we already have. This is our data.

It is important to understand where the data originates, how often it gets created, and how it varies. It is crucial to understand this before coding.

Ultimately the quality of any system rests more on its data than on its code. If the code is great, but the data is garbage, the system is useless right now. If the data is great, but the code is flakey, it is at least partially usable and is fixable. If all you collected is garbage you have collected nothing.

Common mistakes with data:
  • Allowing invalid data to percolate and persist
  • Altering someone else’s data
  • Excessive or incorrect transformations

GARBAGE IN, GARBAGE OUT

It is a mistake to let data into the running code that is garbage.

Data always comes from an “entry point”, so as close to that as you can you block any incoming garbage data. An entry point is a gateway from anywhere outside of the system including the persistent database itself. All of these entry points should immediately reject invalid data, although there are sometimes variations on this that allow for staging data until it is corrected later.

All entry points should share the same validation code in order to save lots of time and ensure consistency. If validation lets in specific variations on the data, it is because those variations are valid in the real world or in the system itself.

It is a lot of work to precisely ‘model’ all of the data in any system but that work anchors the quality of the system. Skipping that effort will always force the quality to be lower.

Data that doesn’t come directly from the users of the system, comes in from the outside world. You have to respect that data.


RESPECT THE DATA

If you didn’t collect the data yourself, it is little more than rude to start changing it.

The problem comes from programmers being too opinionated about the data types, or taking questionable shortcuts. Either way, you are not saving that copy of someone’s data, you are saving a variation on it. Variations always break somewhere.

If the data is initially collected in a different system, it is up to that originating system to change it. You should just maintain a faithful copy of it, which you can use for whatever you need. But it is still the other system’s data, not yours.

Sometimes people seed their data from somewhere else and then allow their own interfaces to mess with it. That is fine, if and only if, it is a one-time migration. If you ignore that and try to mix migrations with keeping your own copy, the results will be disastrous. Your copy and the original will drift and are not mergeable, eventually one version or the other will end up wrong. Eventually, that will cause grief.

It’s worth noting that a great deal of constants that people put into their code are also other people's data. You didn’t collect the constant, which is why it is not in a variable.

You should never hardcode any data, it should come in from persistence or configuration. In that way, good code has almost no constants in it. Not strings, or numbers, or anything really, just pure code that takes inputs and returns outputs. Any sort of hardcoded values are always suspicious. If you do hardcode something, it should be isolated into its own function and you should be incredibly suspicious of it. It is probably a bad idea.


DON’T FIDGET

You'll see a lot of data that moves around at multiple different representations. It is one data item but can be parsed into subpieces which also have value on their own. You often see systems that will ‘split’ and ‘join’ the same data repeatedly in layer after layer. Obviously, that is a waste of CPU.

Most of the time if you get some incoming data, the best choice is to parse it down right away, and always pass it around in that state. You know you need at least one piece of it, why wait until the last moment to get that? So you ‘split’ coming in, and ‘join’ going out. Also ‘split’ is not suitable for any parsing that needs a look ahead to tokenize properly.

There are plenty of exceptions to that breaking down the data immediately. For example, the actual type may be the higher representation, while the piece is just an alias for it. So, parsing right away would disrespect that data. This is common when the data is effectively scoped in some way.

If you need to move around two or more pieces of data together all or most of the time, they should be in the same composite structure. You move that instead of the individual pieces. That keeps them from getting mixed up.

Another way to mess up data is to apply incorrect transformations to it. Common variations are changing data type, altering the character sets, or other representation issues. A very common weakness is to use date & time variables to hold only dates, then in-band signal it with a specific time. Date, time, and date & time are three very different data types, used for three very different situations.

Ambiguous representations like combining integers and floating point values into the same type are really bad too. You always need extra information to make any sense of data so throwing some of the meta information away will hurt. Ambiguities are easy to create but pretty deadly to correct.


SUMMARY

One of the odder parts of programming culture is the perceived freedom programmers have to represent their data any way they choose. That freedom doesn’t exist when the data originated outside of the system.

There are sometimes a few different choices for implementations, but they never come for free. There are always trade-offs. You have to spend the time to understand the data you need in order to then decide how the code should deal with it. You need to understand it first. Getting that backward and just whacking out code tends to converge on rather awkward code full of icky patches trying to correct those bad initial assumptions. That type of code never gets better, only worse.

None of these points about handling data changes over time. They are not ancient or modern. They have been writing about this since the 70s and it is as true then as it is now. It supersedes all technical stacks and applies to every technology. Software that splatters garbage data on the screen is useless, always has been, and always will be.

Thursday, April 18, 2024

Optimizations

“Premature optimization is the root of all evil” -- Donald Knuth

Code generally implements a series of steps for the computer to follow. I am using a slightly broader definition than just an ‘algorithm’ or ‘heuristic’, which are usually defined as mappings between input and output. It is widened to include any sort of code that interacts with one or more endpoints.

We’ll talk about three general possible versions of this code. The first does the steps in an obvious way. The second adds unnecessary extra steps as well. And the third does the steps in a non-intuitive way that is faster. We can call these normal, excessive, and optimized.

Most times when you see people “optimize code” they are actually just taking excessive code and replacing it with normal code. That is, they are not optimizing it, really they just aren’t doing the useless work anymore.

If you take excessive code and fix it, you are not doing premature optimization, you’re just coding it properly. The excessive version was a mistake. It was wasting resources, which is unnecessary. Not doing that anymore is not really optimizing stuff.

If you have good coding habits, for the most part, you will write normal code most of the time. But it takes a lot of practice to master. And it comes from changing how you see the code and how you construct it.

Sometimes normal code is not fast enough. You will need to optimize it. Most serious optimizations tend to be limited to logarithmic gains. That is, you start with O(n^2) and bring it down to O(n). Sorting, for example, starts with O(n!) and gets it to O(n log n). All of these types of optimizations involve visualizing the code from a very non-intuitive viewpoint and using that view to leverage some information that circumvents the normal, intuitive route. These are the hardcode optimizations. The ones that we are warned not to try right away.

It is easy while trying to optimize code to break it instead. It is also easy to make it a whole lot slower. Some optimizations like adding in caching seem to be deceptively easy, but doing it incorrectly causes all sorts of unwelcomed bugs.

Making tradeoffs like space-time are sort of optimizations. They may appear to alter the performance but it can be misleading. My favorite example is matching unique elements in sets. The obvious way to code it is with two for loops. You take each member of the first set and compare it to the second one. But you can swap time for space. In that case, you pass through the first set and hash it, then pass through the second set and see if it is in the hash. If the respective sizes are m and n, the obvious algorithm is O(n*m) where the hashed version is O(n+m). For small data, the extra hash table shifts the operation from being multiplicative to additive. But if you scaled that up to large enough data, the management of the hash table and memory could eliminate most of those gains. It’s also worth noting that it is bounded as logarithmic, you see that by setting m to be another n.

The real takeaway though is to learn to code just the instructions that are necessary to complete the work. You so often see code that is doing all sorts of unnecessary stuff, mostly because the author does not know how to structure it better or understand what happens underneath. You also see code that does and undoes various fiddling over and over again as the data moves through the system. Deciding on a canonical representation and diligently sticking to that can avoid a lot of that waste.

Debloating code is not optimizing it. Sure it makes it run faster and with fewer resources, but it is simply removing what should have not been there in the first place. We need to teach coding in a better way so that programmers learn how to write stuff correctly the first time. Premature optimizations, though, are still the root of all evil. You need to get your code working first before you start messing with logarithmic reductions. They can be a bit mind-bending at times.

Thursday, April 11, 2024

Scope

One of the keys to getting good quality out of software development is to control the scope of each line of code carefully.

This connection isn’t particularly intuitive, but it is strong and useful.

We can loosely define the scope of any piece of code as the percentage of other lines of code in the system that ‘might’ be affected by a change to it.

In the simplest case, if you comment out the initialization of the connection to a database, all other lines of code that do things with that database will no longer work correctly. They will error out. So, the scope of the initialization is that large chunk of code that relies on or messes with the data in the database and any code that depends on that code. For most systems this a huge amount of code.

Way back, in the very early days, people realized that global variables were bad. Once you declare a variable as global, any other line of code can access it, so the scope is effectively 100%. If you are debugging, and the global variable changes unexpectedly, you have to go through every other line of code that possibly changed it at the wrong time, to fully assess and understand the bug. In a sizable program that would be a crazy amount of time. So, we came to the conclusion long ago that globals, while convenient, were also really bad. And that it is a pure scope issue. We also figured out that it was true for flow-of-control, like goto statements. As it is true for function calls too, we can pretty assume it is true in one way or another for all code and data in the system.

Lots of paradigms center around reducing the scope in the code. You encapsulate variables in Object Oriented, you make them immutable in Functional Programming. These are both ways of tightening down the scope. All the modifiers like public and private do that too. Some mechanisms to include code from other files do that. Any sort of package name, or module name. Things like interfaces are also trying to put forth restrictions on what can be called when. The most significant scope reduction is strongly typed languages, as they will not let you do the wrong thing on the wrong data type at the wrong time.

So, we’ve known for a long time that reducing the scope of as much code as much as you can is very important, but why?

Oddly it has nothing to do with the initial coding. Reducing scope while coding makes coding more complicated. You have to think carefully about the reduction and remember a lot of other little related details. It will slow down the coding. It is a pain. It is friction. But doing it properly is always worth it.

The reason we want to do this is debugging and bug fixes.

If you have spent the time to tighten down the scope, and there is a bug in and around that line of code, then when you change it, you can figure out exactly what effect the change will have on the other lines of code.

Going back to the global example, if the variable is local and scoped tightly to a loop, then the only code that can be affected by a change is within the loop itself. It may change the final results of the loop computations, but if you are fixing it, that is probably desirable.

If inside of the loop you referenced a global, in a multi-threaded environment, you will never really know what your change did, what other side effects happened, and whether or not you have really fixed the bug or just got lost while trying to fix it. The bug could be what you see on the code or it could be elsewhere, the behavior is not deterministic. Unlimited scope is a bad thing.

A well-scoped program means that you can be very sure of the impact that any code change you make is going to have. Certainty is a huge plus while coding, particularly in a high-stress environment.

There is a bug, it needs to be fixed correctly right away, making a bunch of failed attempts to fix it will only diminish the trust people around you have in your abilities to get it all working. Lack of trust tends to both make the environment more stressful and also force people to discount what you are saying. It is pretty awful.

There were various movements in the past that said if you did “X” you would no longer get any bugs. I won’t go into specifics, but any technique to help reduce bugs is good, but no technique will ever get rid of all bugs. It is impossible. They will always occur, we are human after all, and we will always have to deal with them.

Testing part of a big program is not the same as fully testing the entire program, and fully testing an entire program is always so much work that it is extremely rare that we even attempt to do it. In an ancient post, I said that testing was like playing a game of battleship with a limited set of pegs, if you use them wisely, more of the bugs will be gone, but some will always remain.

This means that for every system, with all its lines of code, there will come a day when there is at least one serious bug that escaped and is now causing big problems. Always.

When you tighten the scope, while you have spent longer in coding, you will get absolutely massive reductions in the impacts of these bugs coming to light. The bug will pop up, you will be able to look at your readable code and get an idea of why it occurred, then formulate a change to it for which you absolutely are certain of the total impact of that change. You make the change, push it out, and everything goes according to plan.

But that is if and only if you tightened the scope properly. If you didn’t then any sort of change you make is entirely relying on blind luck, which as you will find, tends to fail just when you need it the most.

Cutting down on the chaos of bug fixing has a longer-term effect. If some bugs made it to production, and the handling of them was a mess, then it eats away at any time needed to continue development. This forces the programmers to take shortcuts, and these shortcuts tend to go bad and cause more bugs.

Before you know it, the code is a huge scrambled mess, everybody is angry and the bugs just keep coming, only faster now. It is getting caught in this cycle that will pull the quality down into the mud like hyper-gravity. Each slip-up in handling the issues eat more and more time and causes more stress, which fuels more shortcuts, and suddenly you are caught up in this with no easy way out.

It’s why coming out of the gate really fast with coding generally fails as a strategy for building stuff. You're trying to pound out as much code as quickly as you can, but you are ignoring issues like scope and readability to get faster. That seems to work initially, but once the code goes into QA or actual usage, the whole thing blows up rather badly in your face, and the hasty quality of the initial code leads to it degenerating further into an iky ball of mud.

The alternative is to come out really slowly. Put a lot of effort into readability and scope on the lowest most fundamental parts of the system. Wire it really tightly. Everyone will be nervous that the project is not proceeding fast enough, but you need to ignore that. If the foundations are really good, and you’ve been careful with the coding, then as you get higher you can get a bit sloppier. Those upper-level bugs tend to have less intrinsic scope.

Having lots of code will never make a project better. Having really good code will. Getting to really good code is slow and boring, but it will mitigate a great deal of the ugliness that would have come later, so it is always worth it.

Learn to control the scope and spend time to make that a habit. Resiste the panic, and just make sure that the things you coded do what they are supposed to do in any and all circumstances. If you want to save more time, do a lot of reuse, as much as you can get in. And don’t forget to keep the whole thing really readable, otherwise it is just an obfuscated mess.

Thursday, April 4, 2024

Expression

The idea is to express the instructions to the computer that you’ve crafted in a succinct but entirely verifiable way.

If the expression is huge, the size itself will cripple your ability to verify that the instructions are correct.

If the expression is shrunk with cryptic syntax, maybe when you write it you will remember how it works, but as time goes by that knowledge fades and it will cripple your ability to verify that it is correct.

If the expression is fragmented all over the place, the lack of locality will cripple your ability to verify that it is correct.

Spaghetti code or scrambled structure is the same. Same with globals, bad names, poor formatting, etc. You can’t just look at it and mostly know that it will do what it needs to do. Obviously, this type of visual verification saves a huge amount of time debugging but it also tends to prevent a lot of mistakes in the first place.

Hiding the way things work is usually not a problem with a small amount of code. It's the small size that makes it absorbable, so you can verify it. But as the size grows, little mistakes have much larger consequences. A badly written medium-sized system is tricky to debug, but for large and huge systems it verges on impossible. Small mistakes in code organization can eat through big chunks of time. Splatter coding techniques may seem fun, but they are a guaranteed recipe for poor quality.

Getting the right degree of readability in code takes a careful balancing of all aspects of expression. Naming, logic, structure, and a lot of other issues. If you see good code, it isn’t always obvious how much work went into rearranging it to make it simple and clear, but it certainly wasn’t just chucked out in a few minutes. The authors spent quite a bit of effort on clarity and readability. They pay close attention to the details, in that way, it is not dissimilar to writing big articles or books. Careful editing is very important.

The quality of code is closely related to the diligence and care applied by the author. You think clearly about how to really solve the problem, then your code as cleanly as you can for each of the moving parts, and then you relentlessly edit it over and over again until it is ready to go. That is the recipe for good code.

Thursday, March 28, 2024

Over Complicated

I’ve seen many, many variations of programmers reacting to what they believe is over-complexity.

A common one is if they are working on a massive system, with a tonne of rules and administration. They feel like a little cog. They can’t do what they want, the way they want to do it. If they went rogue their work would harm the others.

Having a little place in a large project isn’t always fun. So people rail about complexity, but they mean the whole overall complexity of the project, not just specific parts of the code. That is the standards, conventions, and processes are complex. Sometimes they single out little pieces, but usually really it's the whole thing that is bugging them.

The key problem here isn’t complexity. It is that a lot of people working together need serious coordination. If it's a single-person project or even a team of three, then sure the standards can be dynamic. And inconsistencies, while annoying, aren’t often fatal in small codebases. But when it’s hundreds of people who all have to be in sync, that takes effort. Complexity. It’s overhead, but absolutely necessary. Even a small deviation from the right path costs a lot of time and money. Coding for one-person throw-away projects is way different than coding for huge multi-team efforts. It’s a rather wide spectrum.

I’ve also seen programmers upset by layering. When some programmers read code, they really want to see everything all the way down to the lowest level. They find that reading code that has lots of underlying function calls annoys them, I guess because they feel they have to read all of those functions first. The irony is that most code interacts with frameworks or calls lots of libraries, so it is all heavily layered these days one way or the other.

Good layering picks primitives and self-descriptive names so that you don’t have to look underneath. That it is hiding code, i.e. encapsulating complexity, is actually its strength. When you read higher-level code, you can just trust that the functions do what they say they do. If they are used all over the system, then the reuse means they are even more reliable.

But still, you’ll have a pretty nicely layered piece of work and there will always be somebody that complains that it is too complicated. Too many functions; too many layers. They want to mix everything together into a giant, mostly unreadable, mega-function that is optimized for single-stepping with a debugger. Write once, read never. Then they might code super fast but only because they keep writing the same code over and over again. Not really mastery, just speed.

I’ve seen a lot of programmers choke on the enormous complexity of the problem domain itself. I guess they are intimidated enough by learning all of the technical parts, that they really don’t want to understand how the system itself is being used as a solution in the domain. This leads to a noticeable lack of empathy for the users and stuff that is awkward. The features are there, but essentially unusable.

Sometimes they ignore reality and completely drop it out of the underlying data model. Then they throw patches everywhere on top to fake it. Sometimes they ignore the state of the art and craft crude algorithms that don’t work very well. There are lots of variations on this.

The complexity that they are upset about is the problem domain itself. It is what it is, and often for any sort of domain if you look inside of it there are all sorts of crazy historical and counter-intuitive hiccups. It is messy. But it is also reality, and any solution that doesn’t accept that will likely create more problems than it fixes. Overly simple solutions are often worse than no solution.

You sometimes see application programmers reacting to systems programming like this too. They don’t want to refactor their code to put in an appropriate write-through cache for example, instead, they just fill up a local hash table (map, dictionary) with a lot of junk and hope for the best. Coordination, locking, and any sort of synchronization is glossed over as it is just too slow or hard to understand. The very worst case is when their stuff mostly works, except for the occasional Heisenbug that never, ever gets fixed. Integrity isn’t a well-understood concept either. Sometimes the system crashes nicely, but sometimes it gets corrupted. Opps.

Pretty much any time a programmer doesn’t want to investigate or dig deeper, the reason they give is over-complexity. It’s the one-size-fits-all answer for everything, including burnout.

Sometimes over-complexity is real. Horrifically scrambled spaghetti code written by someone who was completely lost, or crazy obfuscated names written by someone who just didn’t care. A scrambled heavy architecture that goes way too far. But sometimes, the problem is that the code is far too simple to solve stuff correctly and it is just spinning off grief all over the place; it needs to get replaced with something that is actually more complicated but that better matches the real complexity of the problems.

You can usually tell the difference. If a programmer says something is over-complicated, but cannot list out any specifics about why, then it is probably a feeling, not an observation. If they understand why it is too complex, then they also understand how to remove that complexity. You would see it tangled there caught between the other necessary stuff. So, they would be able to fix the issue and have a precise sense of the time difference between refactoring and rewriting. If they don’t have that clarity then it is just a feeling that things might be made simpler, which is often incorrect. On the outside everything seems simpler than on the inside. The complexity we have trouble wrangling is always that inside complexity.

Thursday, March 21, 2024

Mangled Complexity

There is something hard to do.

Some of the people involved are having trouble wrapping their heads around the problem.

They get some parts of their understanding wrong. In small, subtle ways, but still wrong.

Then they base the solution on their understanding.

Their misunderstanding causes a clump of complexity. It is not accidental, they deliberately choose to solve the problem in a specific way. It is not really artificial, as the solution itself isn’t piling on complexity, instead it comes from a misunderstanding of the problem space, thus in a way the problem itself.

This is mangled complexity. The misunderstanding causes a hiccup, and some of the complexity on top is mangled.

Mangled complexity is extraordinarily hard to get rid of. It is usually tied to a person, their agenda, and the way they are going about performing their role. Often one person gets it wrong, then ropes in a lot of others who share the same mistake, so it starts to become institutionalized. Everybody insists that the mistake is correct, and everybody is incentivized to continue to insist that the mistake is correct.

Sometimes even when you can finally dispel the mistake, people don’t want to fix the issue as they fear it is too much effort. So, it gets locked into the bottom of all sorts of other issues.

We are building a house of cards when we choose to ignore things we find are wrong. A delay caused by unmangling complexity is a massive amount of time saved.

Thursday, March 14, 2024

Software Development Decisions

A good decision in a software development project is one that moves you at least one step closer to getting the work completed with the necessary quality.

A bad decision is one where you don’t get a step forward or you trade off a half step forward for one or more steps backward.

Even a fairly small software development project includes millions and millions of decisions. Some decisions are technical, dealing with the creation or running of the work. Some are usability, impacting how the system will solve real problems for real people.

A well running software development project mostly makes good decisions. You would look at the output of the project and have few complaints about their choices.

A poor software development project has long strings of very poor choices, usually compounding into rather substandard output. The code is a mess, the config is fragmented, the interfaces are awkward, the data is broken, etc. It is a whole lot of choices that make you ask ‘Why?’

If you look at the project and cannot tell if the choices were good or bad then you are not qualified to rate the work. If you cannot rate it, you have no idea whether the project is going well or not. If you don't know, then any sort of decision you make about the work and inject into the project is more likely to be harmful than helpful.

Which is to say if you do not immediately know if a decision is right or wrong, then you should push that decision to someone who definitely does know and then live with their choices. They may not be good, depending on the person you choose, but your chances of doing any better are far less.

In a project where nobody knows enough to make good decisions, it is highly unlikely that it will end well. So, at bare minimum, you can't rush the project. People will have to be allowed to make bad decisions, then figure out the consequences of those mistakes and then undo the previous effort and replace it all with a better choice. It will slow down a project by 10x or worse. If you try to compress that, the bad decisions will become frozen into the effort, start to pile up, and then it will take even longer.

That is, if you do not have anybody to make good decisions and you are still in a rush, the circumstances will always get way worse. It’s like trying to run to the store, but you don’t know where the store is, so you keep erratically changing directions, hoping to get lucky. You probably won’t make it to the store and if you do it will certainly have taken way longer than necessary.

If there is a string of poor choices, you have to address why they happened. Insanity is doing the same things over and over again, expecting the results to change. They will not change on their own.

Thursday, March 7, 2024

Ratcheting

You know the final version will be very complicated. But you need to get going. It is way too long to lay out a full and complete low or medium level design. You’ll just have to wing it.

The best idea is to rough-in the structure and layers first.

Take the simplest case that is reflective of the others. Not a “trivial” special case, but something fairly common, but not too ugly. Skip the messy bits.

Code the overall structure. End to end, but not fully fleshed out.

Then take something it is not yet doing and fill in more details. Not all of them, just more. If there are little bugs, fix them immediately but do it correctly. If it means refactoring stuff underneath, do it now. Do not cheat the game, as it will hurt later if you do.

Then just keep that up, moving around, making it all a little more detailed, a little more complicated. Keep making sure that what’s there always works really well. Build on that.

Ratchet up step by step. Small focus changes, fix any bugs large or small. Make sure the core is always strong. Sprinkle in more and more complexity.

This is not the fastest way to code. It causes a lot of refactoring. It requires consistency. You need to be diligent and picky. You might cycle dozens of times, depending on the final complexity, but that gives you lots of chances to edit it carefully. The code has to be neat and tidy. This is the opposite of throw away code.

Although it takes longer, I usually find that since the quality is far better, the testing and bugs get hugely reduced, usually saving more time than lost.

Thursday, February 29, 2024

Coding

Major points:
  1. Coding is always slow
  2. Coding produces both code & bugs
  3. The code always needs to be edited, the first version is just roughed in.
  4. Do not use disposable code in industrial strength projects.
The primary goal is to produce a minimal amount of readable code.

You want the code to be as small as possible, it is easier to deal with. Larger codebases are worse, not better.

You want the code to be as readable as possible so it is easier to edit. If it is a choice between small or readability, readability wins. If it is a choice between readable or performance, readability wins.

You can always fix readable code later. But it must be reable first, and remain readable afterwards.

You don’t want the code to be redundant, cause you’ll always forget to change all of the different manifestations of it for the same bug. Redundancies are bugs or potential bugs. Changes to redundant code cause the code to drift apart.

You need the codebase to be tightly organized so that it is easier to find and fix the problems. You can accidentally waste more time fixing bugs, than in coding/refactoring, so you need to optimize for that.

There should be one and only one place to put each line of code. All of the code should be in its one place. If there are lots of different places where you can put the code, you are disorganized.

The author is not the only one who needs to reread the code. Others will have to read it as well. Good code will be read by a lot of people.

Because you didn’t just magically get it right the first time, you and other people will have to go over the code, again and again, in order to make it better. Code doesn’t get written, it evolves.

The fiction that layers are bad is poor advice. Layers are the main way you keep the code organized. Without them, the code is just one huge flat mess. That is far worse, it is totally unreadable.

Layers can be abused, there can be too many of them. But not having any at all is worse. It is easier to remove a layer than add one.

A good function is specific to some part of the computation. It is general. It does the one thing that it says it does, nothing more. Sub-level detail processing is below it, in other functions. High-level flow is above it. Once you know what it does, and trust that it does exactly and only that, then you can ignore it, which makes life easier.

All data should come into the code from somewhere else. It should never be hardcoded in the code, it should not be hardcoded when passed down to the code. Thus all strings, integers, constants, etc. are suspect. The best code has zero hardcoded values.

If you need to do a bunch of steps each time for a common action, wrap the steps. The fewer things you call the better your code is. If you rely on remembering that all things have to always be done together, either you’ll forget or someone else who never knew will do it incorrectly. Either way, it is now a bug, and it may not be an obvious one, so it will waste time.

If you need to move around some data, all together. Wrap the data, in a struct, object, whatever the language supports. Composite variables are far better than lots of independent variables.

If you need to decompose the data (aka parse) to use it somewhere, decompose it once and only once. Keep it decomposed in a struct, object, etc. move it around as a composite.

If the call for some library/technology/etc. is messy, wrap it. Wrapping is a form of encapsulation, it helps to avoid bugs and reduce complexity.

If there are strange lines of code that are nonintuitive or don’t make sense, wrap them. At minimum, it gives you a chance to name it appropriately; at maximum, it leaves just one place to change it later.

Too many functions are way better than too few. If you have to get it wrong, create a billion functions. They force you to have to find reasonable names for the parts of work you are doing. If you don’t know how to name a function, then you don’t understand what you are doing. If you have too many functions it is easy to compact them. If you have too few, you are screwed.

Don’t use language features if you don’t understand them. The goal of coding for a system is not to learn new technology, it is to write industrial-strength code that withstands the test of time. If you want to play, good, but don’t do it in a real project, do it in little demos (which can be as messy as you want).

Do not pack lines. Saving yourself a few lines of code, but packing together a whole bunch of mechanics, just hides the mechanics and misguides you as to the amount of code you have. Separate out each and every line of code, it doesn’t take any real time and it lays out the mess in its full ugliness. If the mess is ugly fix that, don’t hide it.

Never do the same thing in a system in two or more different ways. You need to do something, do it one way and only one way, wrap it in a function, and reuse it in all other instances. This cuts down on complexity. By a huge amount. It cuts down on code, thus it cuts down on bugs.

Build up the mechanics to work at a higher level. That is, if you need an id to get to a user, and the user to get to their profile, then you should have a FindUser(id) which is supplied to the call FindProfile(user). Build up reusable pieces, don’t code down into stuff.

Thursday, February 22, 2024

Self-Inflicted Pain

The difference between regular programmers and 10x programmers is not typing speed. In some cases, it is not even knowledge.

It is that 10x programmers are aware of and strongly avoid self-inflicted injuries while coding.

That is, they tend to avoid shortcuts and work far smarter than harder. They don’t tolerate a mess, they don’t flail at their work.

They need some code, they think first before coding, they code what they need, and then they refine it rapidly until it works. Then they leverage that code, over and over again, to save crazy large amounts of time. This is why their output is so high.

If you watch other programmers, they jump in too fast. They don’t fully understand what they are doing. The code gets messier and messier. Debugging it sinks through massive effort. Then they abandon that work and do it all over again for the next part that is similar. They burn through time in all the wrong places.

All of these are self-inflicted injuries.

Writing code when you only half understand what it should do will go badly. It’s not that you should be able to predict the future, but rather that given your knowledge today it should span the code you write. If there is something you don’t understand, figure that out before starting to code. If you have to change the code later because things change, that is okay. But if you are coding beyond your current knowledge it will go badly and eat through time.

Trying to fix crappy code is a waste of time. Clean it up first, then fix it. If the code doesn’t clearly articulate what it was supposed to do, then any perceived bug may be predicated on top of a whole lot of other bugs. Foundations matter.

So, when debugging, unless it is some crazy emergency patch, you find the first bug you encounter and correct that first. Then the next one. Then the next one. You keep that up until you finally find and fix the bug you were looking for. Yes, it takes way longer to fix that bug, but not really, as you are saving yourself a lot of time down the road. Those other bugs were going to catch up with you eventually.

If you see bad names, you fix those. If you see disorganization, you fix it, or at a minimum write it down to be fixed later. If you see extra variables you get rid of them. If you see redundant functions, you switch to only using one instance. If you see poorly structured code or bad error handling, you fix that. If you see a schema or modeling problem, you either fix it now or write it down to fix it later. The things you wrote down to fix later, you actually fix them later.

The crap you ignore will always come back to haunt you. The time you saved by not dealing with it today is tiny compared to the time you will lose by allowing these problems to build up and get worse. You do not save time by wobbling through the code, fixing it at random. Those higher-level fixes get invalidated by lower-level changes, so they are a waste of time and energy.

And then, the biggest part. Once you have some good code that mostly does what you want it to do, you leverage that. That is, putting minimal effort into highly redundant code is as slow as molasses. Putting a lot of effort into a piece of code that you can use over and over again is exponentially faster. Why keep solving the same lower-level problems again and again, when instead you can lift yourself up and solve increasingly higher-level problems at faster speeds? That is the 10x secret.

If you have to solve the same basic problems again and again, it is self-inflicted. If you lose higher-level work because of lower-level fixes, it is self-inflicted. If you have to do spooky things on top of broken code, it is often self-inflicted. If you get lost or confused in your own code, it is self-inflicted. If you want to be better and faster at coding and to have less stress in your job, stop injuring yourself, it isn’t helping.

Thursday, February 15, 2024

A Rose by Any Other Name

Naming is hard. Very hard. Possibly the hardest part about building software.

And it only gets harder as the size of the codebase grows, since there are far more naming collisions. Code scales very, very badly. Do not make it worse than it has to be.

This is why naming things correctly is such a fundamental skill for all programmers.

Coding itself is oddly the second most important skill. If you write good code but bury it under a misleading name, then it doesn’t exist. You haven’t done your job. Eventually, you’ll forget where you put it. Other people can’t even find it. Tools like fancy IDEs do not save you from that fate.

There are no one-size-fits-all naming conventions that always work correctly. More pointedly there can never be such a convention. Naming is not mindless, you have to think long and hard about it. You cannot avoid thinking about it.

The good news is that the more time you spend trying to find good names, the easier it gets. It’s a skill that takes forever to master, but at least you can learn to not do it badly.

There are some basic naming rules of thumb:

First is that a name should never, ever be misleading. If the name is wrong, it is as bad a name as possible. If someone reads it and comes to the wrong conclusion, then it is the author's fault. When you name something you have to understand what that thing is and give it the best possible name.

Second is that the name should be self-describing. That is, when someone reads the name, they should arrive at the right conclusion. The variable should hold the data they expect. The function should do what it says. The repo that holds a given codebase should be obvious.

“Most people never see the names I use in my code …”

No, they do see them. All of them.

And if they see them and they are poor or even bad, they will recommend that your code gets rewritten. They will throw away your work. Nothing else you did matters. If the code is unreadable, it will not survive. If it doesn’t survive, you aren't particularly good at your job. It’s pretty simple.

Occasionally, some really awful code does get frozen way deep in a ball of mud. But that unfortunate situation is not justification for you being bad at your job. Really, it isn’t.

Third, don’t put litter into your names. Made up acronyms, strange ‘pre’ or ‘post’ text. Long and stupid names are not helping. Stop typing in long crazy names, spend some time to thinking about it. Find short reasonable names that are both descriptive and correct.

Fourth, don’t put in irrelevant or temporary stuff in there either. If some unrelated thing in an organization changes and now the name is either wrong or needs to be changed, you did it wrong. Names should be nearly timeless. Only if the nature of the problem changes, should they need changing, and you should do that right away. Names that used to be correct suck.

Names are important. They form the basis of readability, and unreadable code is just an irritant. If you were asked to really write some code, you need to really write it properly. If it takes longer, too bad. Good naming only slows you down until you get better at it. You need to be better at it.

Wednesday, February 7, 2024

Natural Decompositions

Given a large problem, we start by breaking it down into smaller, more manageable pieces. We can then solve all of the smaller problems and combine them back together to solve the original problem.

The hiccup is that not all decompositions are created equal. If you break a big problem down into subparts, when they have any sort of cross dependencies with each other you can’t work on them independently. The dependencies invalidate the decomposition.

So we call any decomposition where all of the subparts are fully independent a ‘natural’ decomposition. It is a natural, complete, hard ‘line’ that completely separates the different parts.

Do natural decompositions actually exist?

Any subpart that has no dependencies on other outside parts is fully encapsulated. It is a black box.

A black box can have an interface. You can put things into the box. It’s just that whatever happens in the box stays in the box. You don’t need to know anything about how the box works inside, just on the outside.

A car engine is a good example. You put in fuel, and you play with the pedals, then the car moves. If you are just driving around, you don’t need to know much more than that. Maybe if you are pushing it on the highway or a racetrack, you’d need to understand gearing, acceleration, or torque better, but to go to the grocery store with an automatic transmission it isn’t necessary.

Cars have fairly good natural decompositions. They are complex machines, but most people don’t really need to understand how they work. Mechanics and race car drivers do.

Software though is much harder to decompose because it isn’t visible. The lines between things can be messed up and awful, but very few people would know this. A five wheeled car/truck/motorbike monstrosity would be quickly discounted in reality, but likely survive as a software component.

Although we don’t see it the same way, we can detect when a decomposition is bad. The most obvious test is that if you have to add a line of code, how many places are there that it would fit reasonably? The answer should be one. If that is not the answer then the lines are blurred somewhere.

And that is the crux. A good decomposition eliminates the degrees of freedom. There is just one place for everything. Then your code is organized if everything is in its one place. It’s simple, yet not simple at all.

For example, If you break off part of the system as a printing subsystem, then any and all code that is specifically tied to printing must be in that subsystem.

Now it’s not to say that there isn’t an interface to the printing subsystem. There is. Handling user context and the specific gui contexts is done elsewhere and must be passed in. But no heavy lifting is ever done outside. Only on the inside. You might have to pass in a print-it-this-way context that directs what is done, but it only directs it from the outside, the ‘doing it’ part is inside the box.

One of the hardest problems in software is getting a group of programmers to agree on defining one place for all of the different types of code and actually putting that code in the one place it belongs.

It fails for two reasons. The first is that it is a huge reduction in freedom. You aren’t free anymore to put the code anywhere. The culture of programming celebrates freedom, even when it makes our lives way harder or even tragic.

The other reason is in making it quick and easy for newer programmers to know where to put stuff. If we fully documented all of those places it would be far too much to read, and if we don’t most people won’t read the code to try to figure it out for themselves. Various standards and code reviews have tried to address it over the decades, but more often than not people just create a mess and pretend like they didn’t. Occasionally you see large projects with good discipline, it happens.

This shows up in other places too. Architecture is the drawing of lines between things. An enterprise architect should draw enough lines in a company to keep it organized; a system architect should draw enough lines in a system for the same effect. Again, these lines need to be natural to be useful. If they are arbitrary they make the problems worse not better.

Decomposition is the workhorse of software development, but it's far too easy to get it wrong. Fortunately it’s not hard to figure out if its wrong and fix it. Things go a lot smoother when the decompositions are natural and the work is organized. Programming is hard enough sometimes, we don’t need to find ways to make it worse.

Thursday, February 1, 2024

Anti-patterns

“Calling something an anti-pattern is an anti-pattern.”

There are lots of ways to accomplish things with software. Some of them are better than others. But the connotation for the term ‘anti-pattern’ is that the thing you are doing is wrong, which is often not the case.

Realistically, the ‘pattern’ part of the phrase is abused. A design pattern is just a generalized abstraction of some work you are doing. It is less specific than an ‘idiom’. It is less specific than a ‘data structure’. It is just essentially a code structuring arrangement that is common. That is, it is effectively a micro-architecture, a way to structure some functionality so that it is easier to understand and will behave as expected.

So, mostly what people mean when they call something an anti-pattern is just that it is not the ‘best’ alternative. But even if it is not the best, that doesn’t make it a bad choice. Or basically, the set of alternatives for coding is not boolean. It’s not a case of right or wrong. It’s a large gradient, there are a huge number of ways to code stuff, some are better than others. And sometimes, for some contexts, a lesser approach is actually better.

We saw this in the 70s with sorting, but the understanding doesn’t seem to have crystalized.

There are lots of different ways to sort, with different performance. We can track ‘growth’ which is effectively how an algorithm performs relative to the size of the data. A cute algorithm like a pivot sort has nearly optimal growth, it is O(log N). Bubble sort however is considerably worse at O(N^2).

So, you should always implement a pivot sort if you have to implement your own sort?

No. If you have a large amount of data to sort, then you probably want to spend the time to implement a pivot sort. But… if you usually only have a few things to sort, then just putting in a bubble sort is fine.

Why?

Because the code for a bubble sort is way, way easier to write and visually validate. And performance is not even close to an issue, the set of data is always too small. With that tiny size, it wouldn’t matter if one algorithm was a few instructions different from the other, since it doesn't loop long enough for that to become a meaningful time.

So, in that reduced context, the shorter, easier, less likely to have a bug, code is the better alternative. More significantly, for front-end devs, whipping together a bubble sort is fine, for back-end ones, learning to implement pivot sorts is better.

But in modern programming, since most stacks implement pretty darn good sorting, the issue is moot. It is presented in data structure courses as a means of learning how to correctly think about implementation details, rather than an explicit skill.

In modern terms, I’m sure that a lot of people would incorrectly call a bubble sort an anti-pattern, which it is not. Most ‘lesser’ patterns are not anti-patterns. An actual anti-pattern would be to have multiple copies of the same globals, when what you really just needed was one. Another less common anti-pattern would be using string splits as the way to parse LR(1) grammars, as it would never, ever work properly but that is a longer and far more difficult discussion.

In general though, the software industry has a real problem with using hand waving to summarily dismiss significant technical issues. Programmers resort to claiming that something “right” or “wrong” far too quickly, when neither case applies. It is a form of boolean disease, caused by spending too much of the day crafting booleans, you start to see the rest of the world only in those terms.

Thursday, January 25, 2024

Context

When I discuss software issues, I often use the term ‘context’. I’ll see if I can define my usage a little more precisely.

In software programs we talk about state. The setting of a boolean variable is its state. There are only two states.

For variables with larger ranges, i.e. possible settings, there can be a huge number of possible states, they are all discrete. An integer may be set to 42.

We usually use state to refer to a group of variables. E.g. the state of a UI is its settings, navigation, and all of the preferences.

Context is similar, but somewhat expanded. The context is all of the variables, whether explicit or implicit; formal or informal. It is really anything at all that can vary, digitally or even in reality.

Sometimes people just restrict context to purely digital usages, but it is far more useful if you open it up to include any informal variability in the world around us. That way we can talk about the context of a UI, but we can also talk about the context of the user using that UI. The first is a proper subset of the second.

The reason we want it to be wider than, say just a context in the backend code is because it affects our work. Software is a solution to one or more problems. Some of those problems are purely digital, such as computations, persistence, or communications, but most of our problems are actually anchored in reality.

For instance, consider a software system that inventories cogs created at a factory. The cogs themselves and the factory are physical. The software mirrors them in the computer in order to help keep track of them. So, some of the issues that affect the cogs, the factory, or the types of usage of the system, are really just ‘informal’ effects of reality. What people do with the software is heavily influenced by what happens in the real world. The point of an inventory system is to help make better real world decisions.

We may or may not map all of those physical influences onto digital proxies, but that does not mitigate their effect. They happen regardless. So if there are real events happening in the factory that affect the cogs but are not captured correctly, the digital proxies for those cogs can fall out of sync. We might have the wrong counts in the software for example because a bunch of cogs went missing.

As well, the mappings between reality and the software can be designed incorrectly. The factory might have twenty different types of cogs, but the software can only distinguish ten different types. The cogs themselves might relate to each other in some type of hierarchy, but the software only sees them as a flat inventory list.

In that sense the software developers are not free to model the factory and its cogs in any way they choose. The context in reality needs to properly bound the software context. So that whatever happens in the larger context can be correctly tracked in the software context.

The quality of the software is rooted in its ability to remain correct. Bad software will sometimes be wrong, so it is not trustworthy, thus not too useful.

Now if the factory was very complex, it might be a huge amount of work to write some software that precisely models everything down to each and every little detail. That would be a massive amount of work. So we frequently apply simplifications to the solution context. That works if and only if the solution context is still a proper generalized subset of the problem context.

From our earlier example if all twenty physical cogs map uniquely onto the ten software cogs, the context may be okay. But if some cogs can be mapped in different ways, or some cogs cannot be mapped at all, then the software solution will drift away from reality and people will see this as bugs. If there are manual procedures and conventions to occasionally fix the map, then at some point they'll degrade and it will still fail.

Which is one of the most common fundamental problems with software. There often isn’t time to do the context mappings properly, and the shortcuts applied were invalid. The software context is shifted out from under the problem context, so it will gradually break. More software or even manual procedures will only delay the inevitable. The data, e.g. proxies, in the computer will eventually drift away from reality.

So, if we see the context of the software as needing to be a proper subset of the context of the problem we intend to solve, it is easier to understand the consequences of simplifications.

This often plays out in interesting ways. If you build a system that keeps track of a large number of people you obviously want to be able to uniquely identify them. Some people might incorrectly assume that a full name, as first, middle, last, is enough, but most names are not particularly unique. Age doesn’t help and duplicate birthdays are far too common. You could use a home address as well, but even in some parts of the world that is not enough.

Correctly and uniquely identifying ‘all’ individuals is extraordinarily hard. Identifying a small subset for an organization is much easier. So we cheat. But any mapping only works correctly for the restricted domain context when you don’t have to fiddle with the data. If you have to have Bob and Bob1 for example, then the mapping is broken and should be fixed before it gets even worse.

So as a problem we want to track a tiny group of people and we don’t have to worry about the full context. Yet, if whatever we do forces fiddling with the data, that means our solution context is misfocused and should be shifted or expanded. Manual hacks are a bug. Seen this way, it ends any sort of subjective arguments about modeling or conventions. It’s a context misfit, it needs to be fixed. It’s not ‘speculative generation’ or over-engineering, it is just obviously wrong.

The same issues play out all over software development. We build solutions, but we build them to fit against one or more problem contexts, and those often get bounced around by larger organization or industry contexts.

That is, often people narrow down the context to make an argument about why something is right or wrong, better or worse, but the argument is invalid because the context is just too narrow. The most obvious example I know is the ancient arguments about why Betamax tapes would beat out VHS, when in reality it went the other way. I think the best reference to explain it all was Geoffrey Moore in “Crossing the Chasm” when he talks about the ‘whole product’ which is an expanded context.

All of that makes understanding the various contexts that bound the system very important. Ultimately we want to build the best fitting solutions given the problems we are trying to solve. Comparing the two contexts is how we figure out if we have done a good job or not.

Thursday, January 18, 2024

Buy vs Build

When I was young, at the end of the 80s, the buy vs. build question was straightforward.

In those days, for the emerging smaller hardware, there was not a lot of software available. It was slow and expensive to build anything. For any non-software company, they existed as brick-and-mortar businesses. Software could help automate isolated parts of the company, but that was it.

Mostly, even over some medium horizons, buying an existing software product was far cheaper than building it. And that software was usually run in an independent silo, with minimal integrations, so it wasn’t hard to get it up and running.

If there was already an available product, it didn’t make sense to build it. Buying it was the default choice.

But so much has changed since then...

The biggest change is that many companies now exist in the digital realm way more than the physical one. All of the sales, communications, and management for most lines of business happen digitally. Many of the products and services are still physical, but overall most lines of business are a mix now.

Running a computer is also more complicated. In my youth, you would set up a server room with suitable power, cooling, and network connections. When you didn't have the space you would lease it. But the drop in hardware price caused an explosion, so the number of machines involved these days is astronomical (and often unnecessary).

This made operations chaotic and undesirable, so most software products are a service now. Someone else sets it up and runs it for you. It’s easier to get going, but you don’t have as much control over it.

With the increase in digital presence came a huge need for integrations. There are way more silos now, often hundreds or even thousands of them. There are specialist silos for every subproblem. When the silos were independent, integrations were rare. But now they all need each other's data; dependencies are necessary. So everything needs to be integrated into almost everything else.

When you bought software before, you could get some new hardware to host it, spin it up, and test it out. If it mostly worked as expected it went live. But these days, just running a new system at a vendor's site isn’t that useful. Being trapped in a silo cripples it. It isn’t really live until all of the major integrations are done. Silos made sense before, but they are a hazard now.

It is not in any vendor's best interest to standardize their software. It is a simple calculation. If it is standard, then it is nearly trivial for any customer to switch to someone else. If you run into any glitches and all of the customers leave, you are instantly done. So don’t use standards.

Integrating various non-standard SaaS silos with each other is an epic nightmare. The easiest way to do it is to copy the data everywhere. If you have a dozen silos that need a particular type of data, you make a dozen copies. Then desperately try to keep them in sync somehow.

To make it even worse, each integration team and vendor will choose to model the data moving around differently. So, it is endlessly translated into different formats, and some of those translations will lose valuable parts of the information. That wastes a lot of time and causes all sorts of grief.

So modern integration projects have become huge, expensive, and tricky.

It's counterintuitive, as you think you managed to avoid programming by buying everything, but now you end up having to do way more glue programming in order to connect it all together.

And so much of that ETL code is awful. It was rushed into existence by people with too little experience. You end up with masses of hopelessly intertwined spaghetti and endless operational alerts about warnings and errors, most of which are unfortunately ignored.

And that is the crux of the issue. If you buy everything now, then you’ll get lost while trying to get it to all work together properly, and this is a lot more costly than just having built it properly in the first place.

Some things you don’t want to build though. Sometimes because it's huge, but more often because it is so complex that the devs require specific knowledge and experience to build it reasonably. You can't just hire a whole bunch of kids, they’ll conjure up a mess instead of what you need, it won’t help.

For any group of programmers, there are absolute limits to what they can build. They rarely are self-aware of their own limits, but things won’t go well if you let them stray too far past their abilities.

You can assemble a strong group of developers to build exactly what you need, but if the work dries up they will dissipate and you will run into trouble keeping it functioning later. To keep good developers they need to always have good projects to work on.

Which is to say that if you need to build software, you need to set up a stable ‘dev shop’ with enough capacity to turn out and enhance the types of software you need. The dev shop is what you need to focus on. It should be able to attract new talent, and to always have enough interesting work to keep everyone motivated. Talent attracts talent, so if you get a couple of strong devs, you can grow the affair. You just have to make sure the environment stays reasonable.

If you do that, it fundamentally changes the original buy vs build dynamics.

You want to keep building enough stuff to ensure that the dev shop stays functional. Building, if you have the capacity and ability would now be the first choice. It is a longer-term goal though, as you don’t want all of your good developers to leave.

Then you want to build up and out from a few different starting points. The guidance is to minimize integrations first. They are ultimately more costly than the vendors, so you focus there.


You figure out which categories your shop can handle, then you consolidate all of the little fragmented silos into larger systems. Generalization is the key to keeping the costs in line. Software companies leverage their code for lots of companies; in-house projects need to leverage their code for lots of different problems.

The focus is not on speed though, rather it is on doing the best engineering that you can. Move slowly and carefully. Build up as much reusable kit as you can, model the data as properly as you can, and keep expanding out from the starting point slowly swallowing dozens of other products. But always keeping a close eye on both the dev shop and the operational capacity.

Obviously, the digital parts of existing lines of business would be first. You’d want to do this anyway, since just using the same vendors as everyone else has no competitive advantages. But to get those advantages back, the work you do has to be better and more relevant than the vendors, which means that you have to have strong technologists who really understand the business too.

Then funding isn’t by project, line of business, or budget. It is by dev shop. You set a solid foundation and build up capacity to implement better stuff and keep the funding stable as you grow it. A large organization can have a few different dev shops.

Outside of those areas of software growth, the old buy vs. build choice would remain, but as the starting points get larger they would end up eating stuff around the fringes so you’d need to factor that in as well.

The counterargument to all of this is that building software is seen as outside of the company’s vertical. But the modern reality is that as most businesses got deeper into the digital realm, they drifted ever closer to being applied software companies, than their original lines of business.

The classic examples are Google, a marketing company, and Amazon, a retailer. As applied software companies they thrived while their brick-and-mortar predecessors didn’t.

The general nature though is that software is not a vertical for any digital line of business, it is a part of it. Core. That, and the exploding integration costs means that reasonable software is necessary for scaling and stabilizing. Bad software makes everything unprofitable.

As software eats the world, this same fate will play out in all sorts of other industries, and the winning strategy is to accept that if you are heavily reliant on doing business in the digital realm, then you are already heavily reliant on building software.

Then it is far more effective to build some of the core stuff yourself, instead of just integrating generic vendor products. You’ll need to recruit stronger developers and make sure you can keep them, but if you do your capacity will grow. Then software development capability itself becomes a competitive advantage.

Thursday, January 11, 2024

Lessons Learned

You learn a lot during thirty years. I tried to write about most of it in this blog, at least from a higher level perspective bringing lots of different things that have happened together, but some things are smaller and just don’t fit. Each one of these is rooted in at least one epic failure.
  • Doing the screens first and persistence last is a common top-down development approach, but it is a very bad mistake. The screens are driven by the irrationality of the users, they don’t map cleanly to the demands of persistence, and they never will. If they could, then lots of the very early application generators would have worked, but they didn’t. Persist first, then gradually move it up until it gets into the screens.
  • If you have an RDBMS, use it to its nearly fullest ability to protect itself. You really don’t want to persist garbage data, that will cause all sorts of annoying bugs. You don’t want to double up stuff you are persisting too, it is wasting space and can cause stale or inconsistent data as well. People always try to cheat the database work, and they always pay a high price for it. It isn’t particularly fun work, but it anchors everything else.
  • Don’t try to break dependent things into subparts, like say put the front and back ends for the same system into two different repos. People might decompose by language, for example, but really if there is dependency, like an API, that matters more. It’s hard to explain, but if things can’t stand on their own, then you shouldn’t try to force them to.
  • Disorganization will always be the biggest problem. Organization is a place for everything, everything in its place, and not too many similar things in the same place. That is, if you have something new and you don’t know where to put it, then it is disorganized. It must go somewhere; if that is obvious, then you are okay.
  • If the programmers don’t know what the system holds for data, and they don’t know why people are doing things with that data, then it will have a huge number of bugs. Programmers are the frontline for quality, if they can’t see problems as they work, there will be lots and lots more.
  • Always only every move code in one direction. It goes from Dev to Release, with a few QA stops along the way. Never, never, break that chain. It will result in all sorts of problems including things getting accidentally rolled back, which is avoidable.
  • Always clean up right after a release. Everyone is tired, and cleanup work is boring. If you do not clean up then, you will never clean up and the mess will get worse, far worse.
  • Tackle the hard parts first, not the easy ones. The hard ones are unpredictable in time, if they don’t go well you can raise an early flag on the schedule. The other way around tends to mislead people into thinking that everything is going well when it isn’t.
  • Do the right thing when you start. Only take more shortcuts the closer you are to the deadline. If you take a shortcut, note it, and clean it up right away after the release.
  • Do not freeze code forever. If you freeze the code, you also free the bugs, which is then the foundation of everything else. Building on buggy foundations is problematic.
  • Do not let people add in onion architectures. If they are trying to avoid the main code, and just do “their thing” around the outside, that work is usually very harmful. Push them to do the work properly.
  • Don’t drink the Kool-Aid. There just isn’t an easy or right way to build stuff. The best you can do is make it readable and keep it organized. Most philosophies for coding are extreme and have worse side effects.
  • If what happens underneath matters in the system, it is not fully encapsulated. In that case, you need to learn some stuff about what happens and why it happens. You can’t just ignore it. Some components will never be fully encapsulated.
  • The ramp-up time for an experienced coder is proportional to the size of the codebase. The ramp-up time for an inexperienced coder is far longer.
  • A weird and ugly interface is a strong disincentive against usage. Useful code has a long life. The point of writing professional code is to maximize its lifespan.
  • Reduce friction, don’t tolerate it. Spending the time to mitigate or minimize it always pays off. Putting up with it always slides one downhill.
There is a lot more, but I’ll start with these. Writing code is fairly easy, but building huge reliable systems is exceptionally hard. The two are not the same.

Thursday, January 4, 2024

Time vs Risk

When I was young software development was not in the spotlight. We had quite a bit of time to get our work done. We would carefully craft things, focusing on the key issues.

It was the dawn of the World Wide Web followed by the decadence of the Dot Com era that changed all of that. Suddenly “first mover advantage” outweighed quality, correctness, and readability.

Modern coding is a high-speed game of chicken. It starts with a request to do some work in usually less than 1/3rd of the amount of time you need to do a good job. If you balk at the lack of time, they’ll take their work elsewhere. So, you might try to stretch it out a little, but then you agree.

When time is compressed, you inevitably end up taking a lot of shortcuts. Some programmers know to avoid many of these, but the industry tends to praise them.

A shortcut is a tradeoff. You do something faster now, in the hopes that it will not blow up in your face later.

Some shortcuts never blow up, you get lucky.

Some just are incremental aggravations that if they haven’t built up too deeply will only slow you down a bit later. Just friction.

Some shortcuts, however, will implode or even explode, throwing the whole affair into the trash bin or flatten it forever. It’s been bad enough that I’ve actually seen code come out spectacularly fast then spent half of a decade slogging through near-hopeless bugs. The wrong series of really bad shortcuts can be devastating.

So every shortcut is a risk. But it is hard to quantify, as there are usually aggravating factors that multiply the damage.

Given that you are inevitably pushed into having to take some shortcuts, it’s best to take the least destructive ones. Those tend to be higher up.

If you build code in a rational manner, you would lay out the foundations first and then carefully stack a lot of reusable components on top. That is the minimum amount of work you need to do.

Bad low-level code propagates trouble upward; the stuff built on top needs to counteract the awful behavior below. That tells us that the lower the shortcut, the more risky it is, the more it affects, and the worse the consequences of losing by taking it.

We see that all of the time.

Those systems, for example, where they did crazy fast things with saving the data, then wrote far too much hacky code above to try and hide the mess. If they had just modeled the data cleanly, then the tragically nested conditional nightmare piled on top, which ate huge amounts of time and spread a lot of pain, would not have been necessary. It is a super common example of a small set of shortcuts going rather horribly wrong.

You see exceptionally bad persistence all over the place causing problems. It’s likely that at least half the code ever written is totally unnecessary.

What’s always true is that if you take too many risks and lose enough of them, the time saved by the shortcuts will be massively overwhelmed by the time lost dealing with them. Coming out of the gate far too fast will always cause a project to stumble and will often cause it to lose the race.

If you are forced to take risks then it is worth learning how to evaluate them correctly. If you pick the right ones, you’ll lose a few, but keep on going. It’s not how it should be, but it is pretty much how it is these days.