The Programmer's Paradox

Sunday, September 27, 2020

Laws of Software Development

For non-technical people, it is very easy to get confused about software development. From the outside, creating software seems simple: setup some tools, bang out some source code, and *voila* you get a product. Everybody is doing it, how hard can it be?

However, over the last five decades, we’ve found that software development can be deceptive. What seems easy, ain’t, and what seems hard probably is. Here are some nearly unbreakable “laws” for development that apply:

You get what you’ve paid for.

Software development is slow and expensive. If you want it fast and cheap, the resulting software will either be bad or valueless. It may not be obviously bad, but it will quickly become a time and money sink, possibly forever. If you need the software to be around for more than a week, you’ll have to spend some serious money to make that happen and have a lot of patience. If you want your product to compete against other entries, then just a couple of month’s worth of work is not going to cut it.

Software development takes a ‘huge’ amount of knowledge and experience.

If you are hoping that some kids, right out of school, will produce the same quality of workmanship that a group of seasoned professionals will, it’s not going to happen. The kids might be fast to produce code, but they are clueless when it comes to all of the other necessary aspects of stability like error handling, build environment, packaging, and operational monitoring. A basic development shop these days needs dozens of different technologies, and each one takes years to learn. If you get the code but can’t keep it running, it isn’t really that much of an achievement.

If you don’t know what to build, don’t build it.

Despite whatever is written out there, throwing together code with little idea about what it’s going to do is rarely a productive means of getting to something that works. It’s far better to work through the difficulties on paper than it is to spend 100x that energy working through them in code. On top of that, code has a tendency to freeze itself into place, making any future work on a bad foundation way more difficult. If you did throw together the code, remember to throw it away afterward. That will save a lot of pain.

If it were easy, it probably already exists.

A tremendous amount of code has been written, rewritten, and deployed all over the place. Most basic ideas have been explored, and people have tried the same workarounds for decades, but still failed to get traction. So, it’s not a matter of brainstorming some clever new idea out of nowhere that is trivial to implement and will be life-changing. If you are looking to build something that isn’t a copy of something else, then the core ideas need to be predicated on very deep knowledge. If they are not, it’s probably a waste of time.

If it looks ugly then people won’t ‘try’ to use it.

There are lots of ugly systems out there that work really well and are great tools. But they are already existing, so there is little friction in keeping them going. Ugly is a blocker to trying stuff, not to ongoing usage. If people aren’t forced to try something new, then if it is ugly they will put up fierce resistance.

If it is unstable then people won’t keep using it.

Modern expectations for software quality are fairly low, but even still if the software is just too flaky, most people will actively look for alternatives. Any initial patience gets eroded at an exponential rate, so they might seem to be putting up with the bugs and problems right now, but as time goes by each new issue causes larger and larger amounts of damage. At some point, if the software is ‘barely’ helpful, there will be enough incentives for them to switch over to an alternative.

The pretty user interface is only a tiny part of the work that needs to be done.

Software systems are like icebergs, only a small part of them, the graphical user interface, is visible to people. GUIs do take a lot of work and design and are usually the place where most bugs are noticed, but what really holds the system together is the invisible stuff in the backend. Ignoring those foundations, just like with a house, tends to beg for a catastrophe. That backend work is generally more than 80% of the effort (it gets higher as the project grows larger).

There are, no doubt, plenty of other ‘laws’ that seem unbreakable in development. This is just the basic ones that come up in conversations frequently and are unlikely to see many -- if any -- exceptions.

Thursday, September 3, 2020

Debugging

So, there is a weird bug going on in your system. You’ve read the code, it all looks fine, yet the results are not what you expected. Something “strange” is happening. What do you do now?

The basic debugging technique is often called divide and conquer, but we can use a slight twist on that concept. It’s not about splitting the code in half, but rather about keeping 2 different positions in the code and then closing the gap between them until you have isolated the issue.

The first step, as always, in debugging is to replicate the bug in a test environment. If you can’t replicate the bug, tracking it down in a live system is similar, but it needs to play out in a slower and more complex fashion which is covered at the end of this post.

Once you’ve managed to be able to reproduce the bug, the next important step is making sure you can get adequate feedback.

There are 2 common ways of getting debugging feedback, they are similar. You can start up a debugger running the code, walk through the execution, and type of the current state of data at various points. The other way is you can put in a lot of ‘print’ statements directly in the code that output to the log file. In both cases, you need to know what functions were run, and what the contents of variables were at different points in the execution. Most programmers need to understand how to debug with either approach. Sometimes one works way better than the other, especially if you are chasing down an integration bug that spans a lot of code or time.

Now, we are ready to start, so we need to find 2 things. The first is the ‘last known’ place in the code where it was definitely working. Not “maybe” works, or “probably” works, but definitely working. You need to know this for sure. If you aren’t sure, you need to back up through the code, until you get to a place where you are definitely sure. If you are wrong, you’ll end up wasting a lot of time.

The second thing you need is the ‘first’ place in the code where it was definitely broken. Again, like above, you want to be sure that it is a) really broken and b) it either started there or possibly a little earlier. If you know there is a line of code a couple of steps before that was also broken since that is earlier, it is a better choice.

So, now, you have a way of replicating the behavior, the last working point, and a first broken point. In between those two points is a bunch of code, includes function calls, loops, conditionals, etc. The core part of debugging is to bring that range down to the smallest chunk of code possible by moving either the last or first points closer together.

By way of example, you might know that the code executes a few instructions correctly, then calls a complex function with good data, but the results are bad. If you have checked the inputs and they are right, you can go into that function, moving the last position up to the start of its code. We can move the broken position up to each and every return, handler block, or other terminal points in that function.

Do keep in mind that most programming languages support many different types of exits from functions, including returns, throws, or handlers attached to exiting. So, just because the problem is in a function, doesn’t mean that it is returning bad data out of the final return statement at the bottom. Don’t assume the exit point, confirm it.

At some point, after you have done this correctly, a bunch of times, you have narrowed the problem down probably to a small set of instructions or some type of library call. Now, you kinda have to read the code and figure out what it was meant to do and watch for syntactic or semantic reasons why that didn’t happen. You’ve narrowed it down to just a few lines. Sometimes it helps to write a small example, with the same input and code, so you can fiddle with it.

If you’ve narrowed it down to a chunk of code that completely works or is completely broken, it is usually because you made a bad assumption somewhere. Knowing that the code actually works is different from just guessing that it does. If the code compiles and/or runs then the computer is fine with it, so any confusion is coming from the person debugging.

What if the code is in a framework and the bug spans multiple entry-points in my code?

It’s similar in that you are looking for the last entry-point that works and the first one called that is wrong. It is possible that the bug is in the framework itself, but you should avoid thinking that until you have exhausted every other option. If the data is right, coming out of one entry point, you can check that it is still right going into the latter one but then gets invalidated there. Most bugs of these types are caused by the entry points not syncing up correctly, corruption is highly unlikely.

What if I don’t know what code is actually executed by the framework?

This is an all too common problem in modern frameworks. You can attach a lot of code into different places, but it isn’t often clear when and why that code is executed. If you can’t find the last working place or the first failing place, then you might have to put in logging statements or breakpoints for ‘anything’ that could have been called in-between. This type of scaffolding (it should be removed after debugging) is a bit annoying and can use up lots of time, but it is actually faster than just blindly guessing at the problem. If while rerunning the bug, you find that some of the calls are good, going in and good coming out, you can drop them. You can also drop the ones that are entirely bad, going in and bad going out (but they may turn out to be useful later for assessing whether a code change is actually fixing the problem or just making it worse).

What if the underlying cause is asynchronous code?

The code seems to be fine, but then something else running in the background messes it up. In most debugging you can just print out the ‘change’ of state, in concurrent debugging, you have to always print out the before and after. This is one place where log files are really crucial to gaining correct understanding. As well, you have to consider the possibility that while one thread of execution is making its way through the steps, another thread of execution bypasses it (starts later but finishes first). For any variables that ‘could’ be common, you either have to protect them or craft the code so that their values don’t matter between instructions.

What if I can’t replicate the problem?

There are some issues, often caused by configuration or race conditions, that occur so infrequently and only in production systems, that you basically have to use log files to set the first and last positions, then wait. Each time it triggers, you should be able to decrease the distance between the two. While waiting, you can examine the code and think up scenarios that would explain what you have seen. Thinking up lots of scenarios is best, and not getting too attached to any of them opens up the ability to insert a few extra log entries into the output that will validate or eliminate some of them.

Configuration problems show up as the programmer assuming that X is set to ‘foo’ when it is actually set to ‘bar’. They are usually fairly easy to fix, but sometimes are just a small side-effect of a larger process or communications problem that needs fixing too.

Race conditions are notoriously hard to diagnose, particular if they are very infrequent. Basically, at least 2 things are happening at once, and most of the time one finishes before the other, but on some rare occasions, it is the other way around. Most fixes for this type of problem involve adding a synchronization primitive that forces one or the other circumstances, so basically not letting them happen randomly. If you suspect there is a problem, you can ‘fix’ it, even if it isn’t wrong, but keep in mind that serializing parallel work does come with a performance cost. Still, if people are agitated by the presence of the bug, and you find 6 potential race conditions, you can fix all 6 at once, then later maybe undo a few of them when you are sure they are valid.

If the problem is neither configuration nor a race condition, then most likely it is probably just unexpected data. You can fix that in the code, but also use it as motivation to get that data into testing, so similar problems don’t keep reoccurring. It should also be noted that it is symptomatic of a larger analysis problem as well, given that the users needed to do something that the programmers were not told about.

Saturday, August 15, 2020

Defensive Coding: Minimal Code

Sometimes you come across really beautify code. It’s clear and concise. It’s obvious how it works. If you have to edit it, it is intuitive where the changes should go. It looks super-simple. It’s a great piece of work.

Most people don’t realize that getting code to look super-simple is a lot of effort and a huge challenge. Just splatting out any initial version is ugly. It takes a lot of thought, refinement and editing work to get it looking great.

All code degrades with time and changes. If it starts out good, it will get tarnished but should hold its value. If it is ugly on day one, it will be a pit of despair a year later.

One way of approaching the problem is to equate super-simple code with the act of minimizing some of the variations until we come down to one with reasonable tradeoffs. We can list out most of these variations.

Minimize:

The number of variables
The length of a ‘readable’ name
The number of external jumps needed in order to understand the code
The effort to understand a conditional
The number of flow constructs, such as if statements and for loop
The number of overlapping logic paths
The number of hardcoded constants
The number of disjoint topics
The number of layers
The number of reader’s questions
The number of possible different behaviors

We’ll go through each of them accordingly.

Variables

We obviously don’t want to have the code littered with useless variables. But we also don’t want the ‘same data’ stored in multiple places. We don’t want to overload the meaning of a variable either. And a little less obvious, if there are several dependent variables, we want to bind them together as one thing, and move it all around as just one thing.

Readable Names

We want the shortest, longest name possible. That is, for readability we want to spell everything out in its full detail, but when and where there are different options for that, we want to choose the shortest of them. We don’t want to make up acronyms, we don’t want to make up or misused words, and we certainly don’t want to decorate the names with other attributes or just arbitrarily truncate them. The names should be correct, we don’t want to lie. If the names are good, we need less documentation.

External Jumps

If you can just read the code, without having to jump all over the code base, that is really good. It’s self-contained and entirely under control. If you have to bounce all over the place to figure out what is really happening then that is spaghetti code. It doesn’t matter why you have to bounce, just that you have to do it to get an understanding of how that block of code will work.

Conditionals

Sometimes people create negative conditionals that end up getting processed as double negatives. Sometimes people see the parts of the condition getting spread across a number of different variables. This can be confusing. Conditionals should be easy to understand, so when they aren’t they should be offloaded into a function that is. So, if you have to check 3 variables for 7 different values, then you certainly don’t want to do that directly in an ‘if’ statement. If the function you need to call requires all three variables, and a couple of the values passed, you probably have too many variables. The inputs to a conditional check function shouldn’t be that complex.

Flow of Control

There is some minimum structural logic that is necessary for a reasonable computation. This is different than performance optimizations, in that code with unnecessary branches and loops is just wasting effort. So if you loop through an array, find one part of it, then loop through it again to find the other part, that is ‘deoptimized’. By fixing it, you are just getting rid of bad code, but still not optimizing what the code is doing. It’s not uncommon in ugly code to see that a more careful construction could have avoided at least half of all of the flow constructs, if not more. When those useless constructs go, what is left is way more understandable.

Overlapping Logic

A messy part of most programming languages is error handling. It can be easily abused to craft blocks of code that have a large number of different exit points. Some necessary error handling supports multiple different conditions that are handled differently, but most error handling is rather boolean. One can mix the main logic with boolean handling and still have it readable. For more sophisticated approaches, the base code and error handling usually need to be split apart in order to keep it simple.

Hardcoded Constants

Once people grew frustrated by continually hitting arbitrary limits where the programmers made a bad choice, we moved away from sticking constants right into the code. Modern code however has forgotten this and has returned to hardcoding all sorts of bad stuff. On rare occasions, it might be necessary, but it always needs to be justified. Most of the inputs to the code should come through the arguments to the function call whenever possible.

Disjoint Topics

You can take two very specific functions and jam them into one bigger function declaration. The code for each addresses a different ‘topic’, they should be separated, they shouldn’t be together. Minimizing the number of functions in code is a very bad idea, functions are cheap but they are also the basis of readability. Each function should fully address its topic, the code at any given level should all be localized.

Layers

The code should be layered. But taken to the extreme, some layers are useless and not adding any value. Get rid of them. Over minimization is bad too. If there are no layers, then the code tends towards a suer-large jumbled mess of stuff at cross-purposes. It may seem easier to read to some people since all of the underlying details are smacked together, but really it is not, since there is too much noise included. Coding massive lists of instructions exactly as a massive list is the ‘brute force’ way of building stuff. It works for small programs but goes bad quickly because of collisions as the code base grows.

Reader’s Questions

When someone reads your code, they will have lots of questions, Some things will be obvious, some can be guessed at given by the context, but somethings are just a mystery. Code doesn’t want or need mysteries, so it is quite possible for the programmer to nicely answer these questions. Comments, comment blocks, naming, and packaging all help to resolve questions. If it’s not obvious, it should be.

Different Behaviors

In some systems, there are a lot of interdependent options that can be manipulated by the users. If that optionality scrambles the code, then it was handled badly. If it’s really an indecipherable mess, then it is fundamentally untestable, and as such is not production worthy code. The options can be handled in advance by moving them to a smaller set of more reasonable parameters, or polymorphism can be used so that the major permutations fall down into specific blocks of code. Either way, giving the users lots of choices should also not give them lots of bugs.

Summary

There are probably a few more variations, but this is a good start.

If you minimize everything in this list, the code will not only be beautiful, but readable, have way fewer bugs and people can keep extending it easily in the future.

It’s worth noting that no one on the planet can type in perfect code the first time, directly from their memory. It takes work to minimize this kinda stuff, and some of the constraints conflict with each other. So, everyone’s expectation should be that you type out a rough version of the code first, fix the obvious problems, and then start gradually working that into a beautify version. When asked for an estimate, you include the time necessary to write the initial code but also the time necessary to clean it up and test it properly. If there is some unbreakable time panic, you make sure that people know that what is going into production is ugly and only a ‘partial’ answer and still needs refining.

Thursday, August 13, 2020

Defensive Coding: KISS

Nothing gets programmers into more hot water more than the ‘keep it simple, stupid’ (KISS) principle.

There are 2 important ways it causes grief.

The first is when programmers assume it applies directly to them. They want to keep their work simple, so they want to keep their code simple.

The failure here is that there is always some ‘intrinsic’ complexity that is totally unavoidable as a byproduct of our physical universe and its history. If a programmer ignores some of that complexity and makes their code super simple, the complexity hasn’t gone away, it has just moved elsewhere. More often than not, it has gone directly to the users.

So, now the code is simple, but using it has become significantly more complicated than necessary. A super simple feature either doesn’t do much or it is horrifically hard for users to get real things accomplished. Either way, its value is limited.

If you want to keep users interested in using a system, then the code in it has to really solve their problems and it has to keep it simple for them. KISS applies to the solution, not to the way it was built. That is not simple code to write since it means ensuring that the users don’t need to lean on any outside resources. The system needs to know anything and everything about the problem, to keep it all up-to-date, and to never forget stuff, which is, of course, large and very complicated.

KISS can be applied to a specific sub-component of the mechanics, but it only works if you are avoiding some explicit overcomplication. That is, if you are thinking about adding stuff that will definitely never get used, then you can remove that. However, if it might get used someday, then by removing it, you are actually setting up a new problem. Maybe a small one, but it also could be a huge one in a short time from now.

The other way programmers get into trouble with KISS is that they assume that if they have to build a series of components, that making each one of them simple will help simplify the whole thing. The opposite is often true.

Simplifications, almost by definition are things that are not there. Gaps in what could have been. Those things might have been necessary or they could have been completely extra, but they are still absent. Collecting together a bunch of non-related things all together in the same place is another way to describe disorganization, and oddly, collecting together a bunch of missing things, related or not, is almost the same.

The growing absence of stuff is a form of complexity and if not handled explicitly it will get more complex. So, a small piece missing from one area can be worked around fairly easily, but when there are dozens of things missing all over, the workarounds are harder and more complex. Just remembering what isn’t there becomes a huge burden. So, if there are lots of things that are missing, their combined effect will be multiplicative, not additive, if they are not 100% independent (which they usually aren’t). Simple is not a cumulative property, complexity is.

“Everything should be made as simple as possible, but no simpler.” -- Albert Einstein

The key part of this quote is that while you can overcomplicate things, you can also do the opposite and oversimplify them, which is bad too. When KISS is misapplied or taken to an extreme not only does it craft a system that users hate, but it will also slowly degrade into a horrific ball of complexity that at some point becomes unfixable.

It’s always a shock to the programmers, particularly if they have been diligent about trying to simplify everything when it suddenly kicks back up and becomes the problem itself.

Stuff that is missing or not included is always going to be a big problem if you find out later that you need it. It’s that old ‘a stitch in time save nine’ saying. Filling the code with a lot of extra stuff that isn’t used ain’t great, but then it is a little better than waking up one day and realizing that what you didn’t do earlier -- when you had the chance -- is now going to derail everything.

Don’t overcomplicate things, but don’t use that as an excuse to swing way out to the opposite extreme, either.

Tuesday, August 11, 2020

Integrity and Professionalism

It is important to be able to rely on the people around you.

If everyone has a hidden agenda that’s not to your benefit, then just getting through the basics of life is complicated. You have to be constantly on your guard, and you have to be quickly reactive to whatever growing problem they set off. That feeds an instability which kinda sucks.

What underpins stability and the confidence to rely on the future comes from a solid foundation. Two critical components of that foundation are integrity and professionalism.

Integrity is mostly an internal attribute. Some people mean well and try not to rock the boat or cause unnecessary problems for others. They are trying to be honest and working hard to not play games. Basically, it’s not being sneaking, but wanting to be clear and upfront with people, so that they don’t get the wrong impressions or have suspicions about your motives. They are just trying to be nice, decent people, that mean well towards others.

Professionalism is when that extends to the work you are doing. If you have the knowledge and training, and you start some piece of work with full concentration and ensure that it is done correctly, that is a professional demeanor. If you are also courteous and explain any issues, then it is better. Either way, if someone tasks you do something that is reasonable they can be assured that you’ll get the job done within an acceptable time frame. If what they are asking has issues, you’ll inform them in advance, and let them know what they can expect.

While these are often seen as just personal attributes, they extend to all other organizations as well. Basically, companies, countries, social groups, etc. all have a personality too, and they act as a super-organism. For people, they might be having an off day, for organizations you might encounter a rogue member, but the same attributes apply. You can judge an organization on the way it mostly behaves, the same way you might react to people.

Bad behavior is contagious. The best-known example is littering, where it is believed that people are much more likely to litter if the ground is already full of litter. That tends toward being true for most other bad behaviors as well. Corruption encourages corruption, violence builds up, and if most people are being horrible and sneaky and selfish, other people get pulled into that behavior too. A society left out of control will quickly descend into chaos, with everyone just looking out for themselves. It’s so pervasive in human behavior that it might actually be deeply hardwired into the species.

The opposite is true but to a lesser effect. Doing a good deed of the day, for example, is a little contagious. It doesn’t affect as many people, and it doesn’t last as long. Still, it does carry through somewhat. And it’s that attribute that is so important. One of the key reasons to act with integrity and to be professional when working is that you would prefer that that is how others act around you. If you set an example, people do tend to follow.

If you work in a dog-eat-dog environment with everyone out for themselves, the environment itself is horrible and draining. Well, some tiny number of people find it fun and exciting, but the rest of humanity does not seem to feel that. So, if we don’t want that, one of the ways to help prevent it is to actively force yourself to rise above whatever is currently happening and to help lift everyone else too.

Knowing that good behavior is somewhat contagious really helps. Your valiant efforts to make things better around you aren’t wasted, they just take a lot longer to kick in then the dark side. And knowing that the bulk of people would prefer it your way, it’s just a matter of time and constantly reminding people to try and be a little bit better each day. Bad people rely on the rest of us turning away, good ones rely on us all making small little constant positive contributions.

Aside from trying to make your own life and working environment better, we can actually use this knowledge to improve the overall world. For one, we can try to avoid dealing with or enabling any bad actors. If they don’t act with integrity or professionalism, then it is worth putting a bit of effort into figuring out how to avoid them. It doesn’t have to be a major cause or some moral battle, just the notion that if there are at least a few choices between different vendors than it is far better to take the better one, even if there is a financial reason to take the other.

That is, a lack of integrity or a lack of professionalism should beat out price as a disincentive. The horrible company may be a bit cheaper, but they are a horrible company, so why give them power or enable them? Spend a little bit more to get a better world.

The corollary to integrity and professionalism is that prologued dealings with shady people tend to taint it. That is, if you spend all of your time in the company of horrible people, eventually you’ll become a horrible person yourself, little by little. So, we do have a choice and an effect on the world around us. You can spend some small effort, now and then, to make sure that you are not enabling the dark side, and in exchange, the world gets a tiny bit better. Or you can not worry about it, chase the best deals for yourself regardless of who is offering them, and watch the world grow worse.