Sunday, December 27, 2020

Bugs and Weaknesses

 When code goes wrong, there is a wide range of different outcomes.

Some code is obviously wrong. It does something unexpected and so bad that the objective of using the code for its intended purpose is completely derailed. This is generally what most people mean by a ‘bug’. It’s followed by a need to change the code swiftly and get that back into the operational environment, so as to unblock the user.

Some code is annoying. Technically it works, it is usable, but it just makes getting something accomplished slow or painful. I tend to refer to this as a bug as well, in that the behavior actually bugs someone. These types of issues are often not fixed, and there is little urgency anymore in correcting them. Most people focus on adding net new features, not correcting historically rushed work.

The other category of wrong code is a weakness. This is code that appears to work, and most of the time works fine. It’s just that under limited scenarios, or even at some point in the future, the code no longer does what is expected. Weaknesses manifest themselves, cause problems, but subsequent investigations are often not deep enough to find them. So, they can be infrequently repeating problems that burn through time, where nobody is positioned to find and correct them. The other scenario is that seemingly unrelated changes to underlying dependencies suddenly cause the behavior of the code to majorly change. Some small change that should not have been a problem, becomes one. Weaknesses are the most pernicious of all coding errors because you can be looking straight at them and still not notice that the code is wrong. 

Weaknesses are far more common now, primarily because the underlying technologies and environments have gotten way more complicated. This helps disconnect the authors of the code from the full weight of its behavior. As programmers use more code and understand it less, weaknesses become more common. The code works sometimes, but sometimes it doesn’t. 

Really, these should be classified as bugs, but if we did that since there is a lot of very weak code out there, we’d have to admit that very little of what we build actually works correctly. That’s the opposite of our industry trends right now where we over-emphasize that we’re super flexible and can throw together stuff fast enough to match the users constantly changing their minds. It’s this real-time reactivity that people seem to want; they’d rather have a lot of bugs than have to wait a while for stuff that lasts. 

Tuesday, December 15, 2020


There are a bunch of fairly simple ideas that would make large-scale software development easier and less volatile.

The first one is ‘pre-fab’ systems. 

The idea is that you get a fully set up, architected system, ready to go on day one, right out of the box. It doesn’t do anything, you still have to add in screens and the core business logic, but everything else is there: users, security, configurations, database, domain tables, etc. 

Each one is for a specific tech stack, there are no configurable options, all that work is done in advance. It’s all set up, all of the decisions have already been made. 

There would be documentation and comments that tell you where to add in your code to alter the mechanics. It’s understood that it is all ‘untouchable’ and will need to be upgraded to newer versions as time progresses so that the ‘porting’ process is built-in already. 

The primary reason this is super helpful is that we see a lot of early programmers essentially ‘cheat’ on the setup of their projects, and those deficiencies propagate into everything else. 

It’s not easy to understand how these systems need to be set up, built, deployed, upgraded, etc. Most projects don’t need some special type of clever custom solution, they just need the boring, correct, dull setup that ensures that future work will fit nicely and be able to move forward.

A ‘standards’ and ‘modeling’ reference web site. 

Not one that is for-profit, or where these are fees to join, but rather an open international standard site that is funded well enough that it can focus on its key objectives, not making or raising money. 

The objective is to make sure that people can easily use the appropriate standards and data models in their systems to make them work correctly. It’s not uncommon in coding to find people who have manufactured their own bad standards, like ISO Country Codes, to disastrous effects. 

There is no one resource to point people to search, and say if the standard is here, you ‘have’ to use it. So, instead, we get countless bad reinventions of the same stuff, over and over again, with a large variety of naive errors. 

After decades, there are a huge number of ‘great’ standards, and most industries have a standard way of modeling their own entities. So, if we capture that, and provide a strong reference site to lookup stuff, and reuse it, a lot of the weirder code out there will gradually disappear. 

In most systems, less than 10% of the code is special, the rest is rather boring and it should be rather boring, which is a good thing. Put your creative energies into the parts that matter, then just put up the rest of the stuff that is needed around as, as cleanly as possible.

A software development ‘building code’. 

A little book of non-controversial, unbreakable rules that are just not up for discussion. They’re not popular, they are timeless. A kind of best practices on steroids, except that you can’t argue with them. 

There are some rather obvious candidates, like don’t ‘misnamed’ stuff, ‘handle all of the errors’, and you always need development, testing, and production environments. 

They are timeless and absolutely identical for each and every tech stack, each and every project, but they seem to keep getting forgotten, rediscovered, and then forgotten again by each new generation of programmers. 

Getting this into a little, non-controversial book of simple rules that ‘must’ be followed would literally save probably billions of hours of people arguing over their validity.

It’s not just discussions between programmers that get affected by this, it is also the influence of strong stakeholders that contributes to people doing obvious things that they shouldn’t. 

A trustworthy, dynamic schema, persistent solution, that preserves deterministic properties like ACID. 

That is, the quality of a program is predicated on its ability to ‘remember stuff’. If it forgets or gets confused, then it is not a particularly useful program. 

Having to establish, in advance, a strong locked-in schema (that should be 3rd or 4th NL) is a problem both for people learning about the technology, but also because it anchors any and all of the following code. If it anchored it to the wrong spot, that is usually ‘slowly’ fatal to the entire endeavor. 

So, instead, we build something that persists but is also easily changeable within the system architecture. So, the system can save an entity, then later upgrade the attributes of that entity safely, to allow both variations to be accessible in the system (and an easy path to move an older variation forward). There are tools to track the differences and to safely clean up when possible. There is an assumption that the sum of the data is spread over many instances of it running.

If schema changes are trivial and easily codable, then the fear of making them in a big system will diminish and the focus can return to extending the work, not just trying to work around it.

In finality

All of these ideas are centered around keeping programmers from doing the ‘wrong’ work. In most cases, their job is to extend systems with new features that make life easier for users, but if we dig into a lot of wayward development projects we tend to find that the work and focus ended up in the wrong place. That in turn caused political issues and the whole thing went downhill from there. 

There are always interesting puzzles to solve and areas in development that need creativity in all projects, but that effort needs to get applied to the right areas, not just wantonly all over the place. Some parts of building systems are dull and boring, and that is a good thing, as it should be. In the end, it is just work that needs to get done well, in order to get enough baseline quality into the system, that the project can keep making progress. If we are continually ‘scrambling’ that work out of some strange desire to leave a ‘legacy’ or prove ourselves, then maybe we should find a way to prevent that as an ongoing problem.

Sunday, December 6, 2020


I was kind of a mess when I first started programming. My code was awful, poorly thought out, and extremely messy. My thinking was murky and I had no sense of organization. I’d just madly flail at coding with a lot of energy and enthusiasm, unfortunately, the results weren’t particularly useful.

That changed as I worked with really skilled people who had good habits. When you are struggling at doing something simple, and you see someone breeze through work that is way more complicated, you know that some of that is experience and knowledge, but there are other attributes as well. As I watched them work, I started to see the habits that kept them in check, that kept them from going astray.

It’s worth noting that for every habit, that there are always times, partially in crunch mode, where you have to not follow the habit, and do something worse. That is fine, as the point of a habit is that it is the default when you don’t have a reason to choose some other behavior. It’s where you start, and then from a little consideration, you may choose to do something else, but if there aren’t extremely good reasons for taking a shortcut, you should definitely fall back to habit.

Cleanup, Cleanup, Cleanup

It’s not just the code. Working in a messy office, or with a messy environment, where absolutely everything a mess, is draining. You keep stumbling over things, it just slows you down. Keeping things nice and tidy takes time, but it is always less time than getting bogged down by a mess. It’s a really good habit to stop every once in a while, and clean a bunch of small things up. Clean up your office, your desk, the file system, your machine, the documentation. Whatever it is, it all needs to be kept tidy. There is no other habit, that is so obvious, that seems to influence the productivity of programmers. Those that whack out a lot of code and leave a lot of mess at the same time, tend to get overwhelmed by their own inefficiencies they created. Those that clean up as they go, tend to get faster and more efficient as time goes on.

Write it down

In an environment, where there are millions of things flying around, it is easy to forget stuff. It is hard to work with people that constantly forget stuff, you can never trust them to get their part of the problem solved, you have to constantly remind them. The best programmers have memories like elephants, but really they don’t. They just spend the time to keep lists of things that they need to know, and things that they need to do. If you develop the habit of writing stuff down, then it isn’t that much of a leap into writing it down in a way that you can share it, and sharing it in a way that everyone can find it. This set of habits makes sure that people trust you to get your work done, even when it is a crazy environment and everything is constantly in flux.

Fix it Now

Procrastination is the enemy of getting something big to work. If you keep ignoring little issues you see coming up, and instead focusing on getting something bigger down the road, a lot of the time you will come to discover that the big issue is not what you thought it was because of the little ones. That is, if you pass over 5 minor issues to fix a bug, it’s not unlikely that one of those minor issues is part of the bug, or influences it, or even convinces you to not look at the bug anymore. But it’s not just bug fixing. It’s all of the code, the documentation, the analysis. Basically everything. If you can, fix it now. Fix it right away before moving on, even if you think it is just slowing you down (it isn’t). 

Don’t use it Without Understanding it First

The temptation is to cut and paste some code from somewhere else, thinking that someone else understood enough of it to be usable. That’s a pretty ‘strong’ assumption; often when people throw together answers on the web, they kinda get it, but not fully. Their example isn’t meant to be used verbatim, but rather as a means for guiding you in your search for stronger answers. That is, if their example has 4 lines of code, you have to go through each and every line and figure out how it works first, before using it. If you skip that step, you may have completed the ‘task’, but you cheated by not understanding it fully. Later that will haunt you. So, an example on StackOverflow isn’t the final code, but rather the parts of the code that you should now go investigate further.

Take the Long Way Home

In a rush, you may have no option but to take short-cuts, but they are usually weak, with consequences that are not immediately obvious. When you are not in a rush, it is very important to force yourself to take the long route. The long, slow, ‘correct’ way of doing things. You have plenty of time, you have to switch out of crunch mode and back into ‘do it right’ mode. Doing it right may be more boring, it may be more tiring, there may be more work, and it may be harder, but unless and until you know how to do it right, doing it wrong is going to mislead you into false assumptions. Really you have to do it right at least once, or you’ll never understand the full context or the consequences you are missing. It is better to do it right a lot, then it will become muscle memory. It’s a habit that will save a lot of pain later. 

Leverage it

Whether it’s knowledge, tools, paradigms, or idioms, it is better to work with what you know and understand that first, before trying something else. That is, if you have one tool that does 5 things pretty well, it’s a better choice than having 5 tools that each do a single thing really well. More tools, in itself, is more complexity and having to use a different tool for every task is a lot more cognitively demanding, and more likely to go wrong. If you pick a tool, then you should spend the time to figure out ‘everything’ it can do, and use it for ‘all’ of those things. That’s the same for every other aspect of the work. Leverage what you’ve got, before heading out to find something new. Even if that new thing is slightly better, that’s not enough to justify adding it into the mix. If the things you have cannot accomplish what you need, or are extremely poor at it, then you have to add something new to the mix, but that should be the extreme case, not the default one. This isn’t a directive against learning, since one always needs to learn a lot of new things, but rather a point that learning something deeply is more useful than learning a lot of stuff at the shallow level. To get things done well, you need enough depth to understand them and make them work as expected.

Keep an Open Mind

There are always a lot of popular trends out there that are basically silver bullets. They claim that if you do something in a certain way, all of your problems will magically disappear. That’s fine (and this post is no better), but it is not a good idea to take these ideas as articles of faith. They might be right, they might be wrong, or they might be right sometimes and wrong for others. If you buy into them, completely, then you’ve closed your mind to learning and growing, which is not going to end well. There are plenty of conventions, idioms, styles, etc. that work well enough in the limited circumstances, but are really bad ideas outside of that range. It’s okay to try something and see if it really does fix or improve the problems, but it’s equally important to be objective about its success and to bail on it, if the side-effects are worse than the original problem.

All The Little Things

We’re told not to spend too much thought on the little things. They are ‘little’ after all. But in software development, when the millions of things we have to work on are all predicated on little things, it turns out that spending time thinking about them is vital. So, it’s another great habit to not just wave your hand and dismiss something as being ‘little’, but instead to spend at least a few minutes giving it consideration. Oddly, once you are in this habit, and you’ve considered lots of little things, it all a) gets easier b) gets faster and c) the quality of your work improves. Little things matter.

It’s Okay to be Pedantic

Unfortunately, computers are rigorous and strict environments. They are a ‘formal’ system, so they are 100% pedantic. A little one-character mistake can bring down the entire system. It’s just the way that they are. Lots of programmers try to rebel against this, and they love to say ‘it doesn’t matter’, but the really great programmers know that it does matter. The tiniest of things, every little bit, and byte. All of it. It matters, it’s worth thinking about and it is often worth discussing. There is nothing wrong with being pedantic, and at times in the development process, it is a necessity, not a defect. The quality of a system is the quality of the thinking and organization that went into the construction of that system. 

Expect it to Go Wrong

If you write some code, it will have bugs. If you spend a lot of time, you can find and remove most of those bugs. It takes a long time, but you could actually make it work nearly correctly. However, you probably don’t have anything close to the amount of time necessary to do that type of work properly. As such, when you put any code into a production environment, bad things ‘will’ happen. You should expect this, and anticipate it. The best you can do is correctly guess which ones of those bad things will be the most embarrassing, and do some extra work there to mitigate their appearance. But you’re never going to have enough time to catch them all early, so you also need to have strategies like being able to roll back everything to an earlier version. Rollback, fall back, turn off, etc. Protecting yourself in this way isn’t a waste of time, it’s just the necessary insurance to deal with those expected epic failures, that will always occur.

Saturday, November 21, 2020

Keep Your Eye on the Ball

 It’s so easy to be confused while developing software. 

Each little piece is tiny and not particularly hard to understand, but the real complexity comes from the fact that they are often millions of them.

The end goal of writing software is to get code out there that does a good job of solving one or more user’s problems. If you figure out what they need, build it, and then get it into a stable operating environment, they can use the software successfully.

Time is a funny issue in software development. These days, there is always an intense impatience with getting the work done. It’s commonly believed that issuing low-quality work now, is better than waiting for a higher quality version. 

That might be true if each piece was really small and independent, you could get an initial version out there and then build on it as necessary, but code actually ‘stacks’. Newer code builds on the older stuff. If the older stuff is flaky, then no matter how nice the newer stuff is, it is also flaky. If you try to avoid stacking the code, you either make it massively siloed or totally redundant. The siloing hurts usability, while the redundancies waste huge amounts of time. There is no easy, side-effect-free, short-cut that will decrease the time and effort needed. We’ve known this for at least 50 years.

But it’s not just the code itself that can be the victim of impatience, or even the other side of the coin: bureaucracy. It’s the whole end-to-end process, which starts with the user talking about some way the computer could help them, and ends with their frequent interactions with the code as they apply it to their tasks. This is a very long series of little steps, where coding is just one small part of what needs to happen. If this ‘process’ is disorganized, or awkward, or deliberately slowed down, it applies a great deal of pressure onto the code itself forcing it to be of lower quality. 

Low-quality code is a multiplier. A little bit of it is okay and manageable, but as you get more and more of it, the negative effects are magnified. If you get a large enough pile of low-quality workmanship, it forms an operational black hole that sucks in all other efforts and is nearly impossible to escape from. A really bad system doesn’t have any viable short-term options left to fix it. 

It’s super important to be able to ‘assess’ the quality of organization and workmanship for any ongoing software development project, and it is equally important to be able to ‘correct’ any issues that are hurting the quality. However, it is extremely easy for people to miss the importance of this. In the rush, the negative quality issues are only getting a tiny bit worse, and they believe success will come from just getting something crude out there today. But success today comes at the expense of more pain tomorrow. If you keep that up for long enough, eventually the costs are inescapable. 

Sunday, November 1, 2020

Software Development Workflow

 When someone needs a computer to help them, the very first thing that should happen is that all of the specifics of the problem are written down. 

A vague idea about the problem with a potential way to solve it is a reasonable place to start, but in order to build something that fully works, eventually, all of the details have to be gathered and it is far more efficient to do that earlier, rather than later.

This analysis includes a definition of the problem, the data it involves, the structure and frequency of the data, any calculations that need to be made, expected user workflows, and all of the interconnections to other systems. There are always lots of moving parts even for simple solutions. These details are then put together in a document. This document is added to the library of previous analysis documents used to build up the whole system.

The analysis documents are the input for design, both technical and graphical.

A system spans a related series of problems. A design needs to list out a matching solution. The architecture needs to be laid out, the interfaces need to be arranged nicely. The interconnections with other systems or components need to be prototyped and tested.

There are a number of overall documents that put caveats, requirements, and restrictions on the final design of a system. Sometimes the pieces are brand new, sometimes they are just modifications to existing works, so there is some effort necessary to go back to the older analysis and designs as well, for many of the components. 

The actual technical designs include high, mid, and low-level specifications. The depth of the design is relative to the programmers who will be tasked with building it. Difficult issues or less experienced teams need lower-level detail. 

For interfaces, issues like colors, fonts, design grids, screen layouts, navigation, and widget handling all need to be specified. As well, an interface only comes together when it’s individual features are easily interoperable. If they are intentionally silo’ed the program will be awkward or difficult to use. 

All of this design work is added to the library of previous designs for the system.

At the coding stage, the programmers now have a good idea of what they are writing, how it fits into the existing system, and what data they need from where. The depth of the specifications should match their background experience. They combine all of these different references to produce the code that will be used for both the system and for any diagnostics/testing that occurs in development or operations.

As they work, they often find missing details or ambiguous specifications. They push these issues back to the originating sources, either design or analysis. 

For the base mechanics of their code, they perform very specific tests on issues like obvious data problems, fault handling, logic flow through the permutations, etc. They also perform operational testing on starting stuff up, basic failures, slow resources, big data, etc. Once they feel that the code is sufficiently stable, they submit it to integration testing via the source code repository. 

Development and integration testing are done relative to the technical issues within the code. It’s testing to see that the code is operational and that it will perform as the technical specifications required. It does not test the fitness of the solution in solving the user’s problems. That is handled by QA. 

After the code is built and integrated into the existing system, it needs to be run through a broader series of tests that relate it back to its usage. The input to these tests is the analysis and design that preceded the code. The things that need to be tested now are the ones that were specified. That is, if the analysis identified a specific workflow, then there should at least be one test that that workflow behaves as expected. So, the user acceptance and correctness testing is driven from the deliverables that existed long ‘before’ the code was written. They should be independent of the programmers, to ensure that any assumptions and bias were not baked into the tests.

Whenever testing is completed, a set of defects is identified. These are prioritized, some need to be corrected before the code can be used operationally. Some are just livable annoyances. This type of testing happens in ‘rounds’ as the problems are found, corrected and a new round is started. Quite obviously, the list of defects should be as close to complete as possible, so that the work involved here is minimized. Testing is most efficient when handled in batches.

At some point, the software still has issues, but everyone can live with them, so it is moved into operations. For some commercial software it is published, for SaaS or in-house code it is put into production. This means that it is monitored by a special group of people to ensure that it is running, that it isn’t hitting resource issues, and that it is satisfying the user’s needs. 

The code will have common problems, these are documented, and the reaction/solution to them as they occur is also documented. When operational issues happen, they are tracked, and this tracking is fed back to the developers. If a new, unexpected problem occurs, then the programmers might have to get involved in diagnosing it. If the problem has occurred at least once already, then the programmers should not be involved in investigating it. Operations keep this library of common problems and their mitigations.

It should be noted that a software developer is involved in all five stages, while a programmer is often just working during the coding portion. As the system grows, the library of analysis and designs should grow as well. The code isn’t the only resource that needs to be collected into a repository.  

For a large or complicated system, the process is very complex. Since the time it takes to do the work exceeds the time available for all of the work to be done. Development happens in a very large series of interactions. It is often running in various parallel stages, which significantly amplifies the complexity (and threatens the quality). Deficiencies in any part of the work manifest as bugs in the system. This is expected, we cannot build complex systems perfectly, but we can at least ensure that they converge on getting more correct as the project matures. 

The work involved in initially setting up a brand new project for the analysis, design, architecture, environment, configurations, etc. is a lot and is often rushed. This causes foundational problems that can really slow down the progress of the development over the years. As such, doing a better job of initially setting the project up has a significant impact on the success of the project over its lifetime. Correcting problems early is essential. Also, it is never safe to assume that the initial setup was fine, it’s better to just double check that it was set up reasonably since most often it was not.

Consistency, in any part of the project, cuts down on the total amount of work that is necessary. If the work is getting harder as the project progresses, it is usually because the process itself is disorganized. Keeping everything organized is a huge time saver. Avoiding redundancy is also a huge time saver. Many projects get caught into a vicious cycle of taking too many short-cuts to save time, only to have the earlier short-cuts waste more time than they saved. Once that is started, it can be very difficult to break the cycle.

Sometimes, for rapid prototyping, the analysis, design, and coding stages will all get intermingled. One coder or a small group of people will do all three at once, in order to expedite the development. That gets the code done really fast, but it obliterates any reasonable ability to extend that code. It should be thrown away or at least heavily refactored after it has served its immediate purpose. Weaknesses in organization, consistency, architecture, etc. can force later work to wrap around the outside. New layers effectively freeze or disable the original code, causing fragmentation and preservation of bugs. That can start another cycle, where the existence of the older code makes it increasingly harder to make any new code work properly, eventually halting the development.

If there is no documentation, or if the documentation was lost, you can reverse engineer the specifications from the code, but any of the driving context is lost. You know what the code is doing, but you don’t know why someone chose to implement it that way. That means the logic is either now locked in stone, or you will have to rediscover what someone else already went through. Neither outcome is good, so building up libraries of analysis, design, and problem mitigations is quite beneficial.

It is crucial to never put untested code into production. Since there is always a need for rapid bug fixing, the shortest time to write code, test it, and then deploy it, for most systems is less than a day. This is predicated on reproducing a bug in the dev or test environment so that the potential change can be shown to work as expected. For normal development work, the time it takes to go from analysis to production is weeks, months, or even years. The two workflows are very different, both are necessary.

When software development is proceeding well, it is a reasonably smooth occupation. When it is degenerating or trapped in a bad cycle, it is very frustrating and stressful. The amount of friction and Yak Shaving in the work is an indicator of the suitability of the process used for the effort. If everything is late, and everybody is angry and tired, then there are serious process problems with the development project, that can and should be fixed.

Sunday, October 11, 2020


The point of programming is not to issue a fixed set of static instructions to the computer. Typing in, or using the mouse, to execute a series of specific instructions is ‘using the computer’, not ‘programming’ it.

Programming is when you assemble a set of instructions that has at least one or more variables that can change for each execution. That is, it is at least one step ‘higher’ than just using the machine. 

If there is some ‘part’ of the instructions that can vary, then you can reuse those instructions in many different circumstances, each time specifying a new set of ‘values’ for any variables.

So, you can visualize a program as being a ‘forest’ of different possible instruction executions, one of which you might choose to set in motion. 

It’s worth noting that computers are ‘discrete’; that there is always a fixed boundary for each and every variable, even if the number of possible permutations is massive.

So, we can talk about the ‘size’ of the program’s forest as a ‘space’ of all possible executions. If you have one integer variable in your program, then there are [min-integer..max-integer] number of different outcomes. If you have a number of dependent variables, then the size of the outcomes is multiplicative. If they are independent variables, then it is additive.

Quite obviously, if you have a lot of variables, the number of possible permutations is unbelievably large, bigger than we can conceptualize. But we can reason abstractly about many aspects of these possible spaces.

A key understanding is to be able to see that a single user also has some variance. Their tasks vary each time. They might want to use the system for their own constrained forest of work. In that case, it would not make much sense to write 2 or more programs to satisfy their requirements, particularly since each execution is similar to the others. One program that varies in the same way that they vary is a good, tight fit.

We can see that this gets a little more complicated when we consider a larger group of users. If they slightly expand the forest, then we can still craft the same code to satisfy all of them with one program. If however, they cluster into different independent sets themselves, then it is tempting to write one program per set. But, limiting the variations like that is unlikely to not be the fastest way to complete ‘all’ of the code for ‘all’ of the users. Depending on the forests, it’s a question of whether adding new variables is more or less expensive than crafting somewhat redundant programs. If the forests are nicely aligned, then observing that alignment can provide a higher level means of encapsulating the variability. The obvious use of this is for generic tools like editors and spreadsheets. They cover billions of users, but they do so at the cost of segmenting a very specific subset of the work into a siloed program. 

It’s also worth noting that with a group of users, the most likely scenario is that their forests are somewhat disjointed. That comes intuitively from the observation that if it wasn’t somewhat disjoint, they would effectively all be doing the same job, but for most occupations, the work is usually partitioned, deliberately.

While we can use size as a measure of quantity, it is a little easier to think in terms of ‘scope’. They are basically similar, but it’s better to talk about the whole scope of one’s work, and then relate that back to a series of programs, each of specific size that collectively covers the entire workflow. Looking at it that way introduces the second problem of programs that are siloed. They pull off some chunk of the workflow and focus exclusively on executing just those instructions. It’s different from the forests caused by similar users, in that it is usually mutually exclusive, there are very few overlaps. 

In order for the users to achieve their higher-level objectives, they have to go from silo to silo, executing code with specific variables, then in between export the data out of one silo, and import it into another. It’s easy to view import/export as just being more code with variability, but that misses the distinction that it only exists because the workflow crosses multiple silos, so it is purely artificial complexity. If there were just one, larger, more complete program, there would be no need to export/import. Obviously, any effort the user spends to navigate between the silos is artificial as well.

If the scope of a given workflow crosses a lot of silos, it’s not hard to imagine that the artificial work can add up to more than the base work. Because of that, it often makes a lot more sense from a user efficiency standpoint to build a series of components that are easily interconnectable, and just use programs as entry-points to bind them together. Then the lighter programs can be tailored with different components to closely match the user sets. It is far better than building a number of big siloed programs. 

It’s also worth noting here, that user variation tends to shift with time. They start with a limited number of workflows and generally grow or move around, which is often the underlying problem with scope creep.

If we can construct sets of instructions with variability that need to match a changing landscape of users, requirements, and even silos, then it would be best to control that effort to produce code with the widest possible scope, given the time requirements. Code with just a few variables handles a small scope, with more variables it can handle a much larger one. If it is somewhat abstract, it can handle even more. Ultimately the most optimal development carefully matches the scope of the code to the scope of the user workflows but leaves it all composable and assembled at the latest possible moment. 

In summary, it’s not programming if it can’t vary. If it does vary, it should match what the users are doing, and if they vary a lot, the code should vary a lot as well. If there are a lot of users, moving around a lot, then there will be too many variables for just basic code, so it needs some other level of organization to break it down into manageable components. These are then seamlessly reassembled to match the user's behavior. If you step up from the variability -- deal with it in a more abstract perspective -- you can massively widen the scope, getting a much better fit for the users. Given a lot of users, moving around a lot, doing a wide variety of different tasks, you should be able to craft components that reduce the final amount of code by orders of magnitude, which is obviously a lot less time and resources to build, and the users will spend a lot less time with import/export and navigation as well.

Monday, October 5, 2020

The Good, the Bad, and the Ugly

Some people believe that the quality of code is subjective. That one person's awful code is another person’s beautiful code.

There are some sprinkles of truth buried in those beliefs, but it’s not enough to actually validate it.

The key point is that the author of any piece of code is often biased. Their code is always good, no matter how bad it really is. So, if we want a better assessment of ‘quality’ it has to be from the perspective of the later coders who end up working on it, once the original author has left. 

If you can read code, then bad code is obvious. It takes a longer time than normal to work through its issues, to figure out what it is doing. So it’s easily a time problem. If the code should have taken a day to understand, but a week later you are still confused, then it is pretty safe to say that it is bad. Well, almost. Many programmers can’t ‘read’ code, they are functionally illiterate. They can write stuff, or copy and paste it from somewhere else, but it would take them a very long time to read and understand anyone else’s code, whether it was good or bad.

That’s what injects so much confusion into quantifying ‘quality’. If a programmer struggles for a week to understand some code, you can’t tell if it's the code or the programmer, or both, that is having the problem.

On top of this, there are stylistic issues, idioms, and abstractions. Encountering a new idiom in someone else’s code, for example, can really slow down the reader, particularly if they don’t recognize it as such. What might be weird and unnatural to one programmer might be a very common idiom to a different group of them. 

Even with all of these issues, we can really think in terms of expected base time, for a programmer with the correct knowledge, that would be necessary for understanding. So we can talk about bad code as being way slower to read, okay code as being more or less readable, and good as code that is easily extendable. 

There are a huge number of different ways to make code bad. It can be obfuscation, fragmented, stupidly clever, or just obscure its intent using all sorts of tricks. Bad formatting, rampant inconsistencies, and awful naming help a lot too. 

There are way fewer versions that are okay. Still lots of different permutations, but it is far easier and more creative to write bad code. Okay-code is readable, and it isn’t onerous to make a bug fix. 

There are a fairly small number of variations for good code. Primarily because the code has to be technically strong, but also map back to the business problems or implement a strong abstraction. If someone asks for an extension, and you find it really straightforward to make those changes, then you know that it is good. 

There is such a thing as great code, but it is exceedingly rare. Usually, it is abstract, but in a way that lets people leverage its power for all sorts of unexpected usage, and it too is pretty easy to extend (if you understand the abstraction).

It’s worth noting that just because code is believed to be working in production, doesn’t make it good code. Most systems have at least hundreds of bugs that exist but haven’t been triggered yet, and rust is always eating away at weak constructs. Working code contributes to the system's current stability, but it can also freeze the ability to keep developing it and suddenly become unstable when the usage suddenly changes. It might just be a bomb waiting to go off at an inconvenient time.

So, we can’t really point to just one version of the code and say that that is ‘perfect’, but we can get a sense of quality and also an understanding that as it increases, there are considerably fewer possible variations. A small number of actual implementations is good, a larger number are okay and the rest are just bad, but may not cause immediate grief. Experienced, literate, programmers can tell the difference, so it is far less subjective than most people realize.

Sunday, September 27, 2020

Laws of Software Development

 For non-technical people, it is very easy to get confused about software development. From the outside, creating software seems simple: setup some tools, bang out some source code, and *voila* you get a product. Everybody is doing it, how hard can it be?

However, over the last five decades, we’ve found that software development can be deceptive. What seems easy, ain’t, and what seems hard probably is. Here are some nearly unbreakable “laws” for development that apply:

  1. You get what you’ve paid for.

Software development is slow and expensive. If you want it fast and cheap, the resulting software will either be bad or valueless. It may not be obviously bad, but it will quickly become a time and money sink, possibly forever. If you need the software to be around for more than a week, you’ll have to spend some serious money to make that happen and have a lot of patience. If you want your product to compete against other entries, then just a couple of month’s worth of work is not going to cut it.

  1. Software development takes a ‘huge’ amount of knowledge and experience.

If you are hoping that some kids, right out of school, will produce the same quality of workmanship that a group of seasoned professionals will, it’s not going to happen. The kids might be fast to produce code, but they are clueless when it comes to all of the other necessary aspects of stability like error handling, build environment, packaging, and operational monitoring. A basic development shop these days needs dozens of different technologies, and each one takes years to learn. If you get the code but can’t keep it running, it isn’t really that much of an achievement. 

  1. If you don’t know what to build, don’t build it.

Despite whatever is written out there, throwing together code with little idea about what it’s going to do is rarely a productive means of getting to something that works. It’s far better to work through the difficulties on paper than it is to spend 100x that energy working through them in code. On top of that, code has a tendency to freeze itself into place, making any future work on a bad foundation way more difficult. If you did throw together the code, remember to throw it away afterward. That will save a lot of pain.

  1. If it were easy, it probably already exists.

A tremendous amount of code has been written, rewritten, and deployed all over the place. Most basic ideas have been explored, and people have tried the same workarounds for decades, but still failed to get traction. So, it’s not a matter of brainstorming some clever new idea out of nowhere that is trivial to implement and will be life-changing. If you are looking to build something that isn’t a copy of something else, then the core ideas need to be predicated on very deep knowledge. If they are not, it’s probably a waste of time.

  1. If it looks ugly then people won’t ‘try’ to use it.

There are lots of ugly systems out there that work really well and are great tools. But they are already existing, so there is little friction in keeping them going. Ugly is a blocker to trying stuff, not to ongoing usage. If people aren’t forced to try something new, then if it is ugly they will put up fierce resistance. 

  1. If it is unstable then people won’t keep using it.

Modern expectations for software quality are fairly low, but even still if the software is just too flaky, most people will actively look for alternatives. Any initial patience gets eroded at an exponential rate, so they might seem to be putting up with the bugs and problems right now, but as time goes by each new issue causes larger and larger amounts of damage. At some point, if the software is ‘barely’ helpful, there will be enough incentives for them to switch over to an alternative.

  1. The pretty user interface is only a tiny part of the work that needs to be done.

Software systems are like icebergs, only a small part of them, the graphical user interface, is visible to people. GUIs do take a lot of work and design and are usually the place where most bugs are noticed, but what really holds the system together is the invisible stuff in the backend. Ignoring those foundations, just like with a house, tends to beg for a catastrophe. That backend work is generally more than 80% of the effort (it gets higher as the project grows larger).

There are, no doubt, plenty of other ‘laws’ that seem unbreakable in development. This is just the basic ones that come up in conversations frequently and are unlikely to see many -- if any -- exceptions. 

Thursday, September 3, 2020


 So, there is a weird bug going on in your system. You’ve read the code, it all looks fine, yet the results are not what you expected. Something “strange” is happening. What do you do now?

The basic debugging technique is often called divide and conquer, but we can use a slight twist on that concept. It’s not about splitting the code in half, but rather about keeping 2 different positions in the code and then closing the gap between them until you have isolated the issue.

The first step, as always, in debugging is to replicate the bug in a test environment. If you can’t replicate the bug, tracking it down in a live system is similar, but it needs to play out in a slower and more complex fashion which is covered at the end of this post. 

Once you’ve managed to be able to reproduce the bug, the next important step is making sure you can get adequate feedback. 

There are 2 common ways of getting debugging feedback, they are similar. You can start up a debugger running the code, walk through the execution, and type of the current state of data at various points. The other way is you can put in a lot of ‘print’ statements directly in the code that output to the log file. In both cases, you need to know what functions were run, and what the contents of variables were at different points in the execution. Most programmers need to understand how to debug with either approach. Sometimes one works way better than the other, especially if you are chasing down an integration bug that spans a lot of code or time.

Now, we are ready to start, so we need to find 2 things. The first is the ‘last known’ place in the code where it was definitely working. Not “maybe” works, or “probably” works, but definitely working. You need to know this for sure. If you aren’t sure, you need to back up through the code, until you get to a place where you are definitely sure. If you are wrong, you’ll end up wasting a lot of time.

The second thing you need is the ‘first’ place in the code where it was definitely broken. Again, like above, you want to be sure that it is a) really broken and b) it either started there or possibly a little earlier. If you know there is a line of code a couple of steps before that was also broken since that is earlier, it is a better choice. 

So, now, you have a way of replicating the behavior, the last working point, and a first broken point. In between those two points is a bunch of code, includes function calls, loops, conditionals, etc. The core part of debugging is to bring that range down to the smallest chunk of code possible by moving either the last or first points closer together. 

By way of example, you might know that the code executes a few instructions correctly, then calls a complex function with good data, but the results are bad. If you have checked the inputs and they are right, you can go into that function, moving the last position up to the start of its code. We can move the broken position up to each and every return, handler block, or other terminal points in that function. 

Do keep in mind that most programming languages support many different types of exits from functions, including returns, throws, or handlers attached to exiting. So, just because the problem is in a function, doesn’t mean that it is returning bad data out of the final return statement at the bottom. Don’t assume the exit point, confirm it.

At some point, after you have done this correctly, a bunch of times, you have narrowed the problem down probably to a small set of instructions or some type of library call. Now, you kinda have to read the code and figure out what it was meant to do and watch for syntactic or semantic reasons why that didn’t happen. You’ve narrowed it down to just a few lines. Sometimes it helps to write a small example, with the same input and code, so you can fiddle with it.

If you’ve narrowed it down to a chunk of code that completely works or is completely broken, it is usually because you made a bad assumption somewhere. Knowing that the code actually works is different from just guessing that it does. If the code compiles and/or runs then the computer is fine with it, so any confusion is coming from the person debugging.

What if the code is in a framework and the bug spans multiple entry-points in my code?

It’s similar in that you are looking for the last entry-point that works and the first one called that is wrong. It is possible that the bug is in the framework itself, but you should avoid thinking that until you have exhausted every other option. If the data is right, coming out of one entry point, you can check that it is still right going into the latter one but then gets invalidated there. Most bugs of these types are caused by the entry points not syncing up correctly, corruption is highly unlikely. 

What if I don’t know what code is actually executed by the framework?

This is an all too common problem in modern frameworks. You can attach a lot of code into different places, but it isn’t often clear when and why that code is executed. If you can’t find the last working place or the first failing place, then you might have to put in logging statements or breakpoints for ‘anything’ that could have been called in-between. This type of scaffolding (it should be removed after debugging) is a bit annoying and can use up lots of time, but it is actually faster than just blindly guessing at the problem. If while rerunning the bug, you find that some of the calls are good, going in and good coming out, you can drop them. You can also drop the ones that are entirely bad, going in and bad going out (but they may turn out to be useful later for assessing whether a code change is actually fixing the problem or just making it worse). 

What if the underlying cause is asynchronous code?

The code seems to be fine, but then something else running in the background messes it up. In most debugging you can just print out the ‘change’ of state, in concurrent debugging, you have to always print out the before and after. This is one place where log files are really crucial to gaining correct understanding. As well, you have to consider the possibility that while one thread of execution is making its way through the steps, another thread of execution bypasses it (starts later but finishes first). For any variables that ‘could’ be common, you either have to protect them or craft the code so that their values don’t matter between instructions.

What if I can’t replicate the problem? 

There are some issues, often caused by configuration or race conditions, that occur so infrequently and only in production systems, that you basically have to use log files to set the first and last positions, then wait. Each time it triggers, you should be able to decrease the distance between the two. While waiting, you can examine the code and think up scenarios that would explain what you have seen. Thinking up lots of scenarios is best, and not getting too attached to any of them opens up the ability to insert a few extra log entries into the output that will validate or eliminate some of them. 

Configuration problems show up as the programmer assuming that X is set to ‘foo’ when it is actually set to ‘bar’. They are usually fairly easy to fix, but sometimes are just a small side-effect of a larger process or communications problem that needs fixing too.

Race conditions are notoriously hard to diagnose, particular if they are very infrequent. Basically, at least 2 things are happening at once, and most of the time one finishes before the other, but on some rare occasions, it is the other way around. Most fixes for this type of problem involve adding a synchronization primitive that forces one or the other circumstances, so basically not letting them happen randomly. If you suspect there is a problem, you can ‘fix’ it, even if it isn’t wrong, but keep in mind that serializing parallel work does come with a performance cost. Still, if people are agitated by the presence of the bug, and you find 6 potential race conditions, you can fix all 6 at once, then later maybe undo a few of them when you are sure they are valid. 

If the problem is neither configuration nor a race condition, then most likely it is probably just unexpected data. You can fix that in the code, but also use it as motivation to get that data into testing, so similar problems don’t keep reoccurring. It should also be noted that it is symptomatic of a larger analysis problem as well, given that the users needed to do something that the programmers were not told about. 

Saturday, August 15, 2020

Defensive Coding: Minimal Code

 Sometimes you come across really beautify code. It’s clear and concise. It’s obvious how it works. If you have to edit it, it is intuitive where the changes should go. It looks super-simple. It’s a great piece of work.

Most people don’t realize that getting code to look super-simple is a lot of effort and a huge challenge. Just splatting out any initial version is ugly. It takes a lot of thought, refinement and editing work to get it looking great.

All code degrades with time and changes. If it starts out good, it will get tarnished but should hold its value. If it is ugly on day one, it will be a pit of despair a year later.

One way of approaching the problem is to equate super-simple code with the act of minimizing some of the variations until we come down to one with reasonable tradeoffs. We can list out most of these variations.


  • The number of variables

  • The length of a ‘readable’ name

  • The number of external jumps needed in order to understand the code

  • The effort to understand a conditional

  • The number of flow constructs, such as if statements and for loop

  • The number of overlapping logic paths

  • The number of hardcoded constants

  • The number of disjoint topics

  • The number of layers

  • The number of reader’s questions

  • The number of possible different behaviors

We’ll go through each of them accordingly.


We obviously don’t want to have the code littered with useless variables. But we also don’t want the ‘same data’ stored in multiple places. We don’t want to overload the meaning of a variable either. And a little less obvious, if there are several dependent variables, we want to bind them together as one thing, and move it all around as just one thing.

Readable Names

We want the shortest, longest name possible. That is, for readability we want to spell everything out in its full detail, but when and where there are different options for that, we want to choose the shortest of them. We don’t want to make up acronyms, we don’t want to make up or misused words, and we certainly don’t want to decorate the names with other attributes or just arbitrarily truncate them. The names should be correct, we don’t want to lie. If the names are good, we need less documentation.

External Jumps

If you can just read the code, without having to jump all over the code base, that is really good. It’s self-contained and entirely under control. If you have to bounce all over the place to figure out what is really happening then that is spaghetti code. It doesn’t matter why you have to bounce, just that you have to do it to get an understanding of how that block of code will work.


Sometimes people create negative conditionals that end up getting processed as double negatives. Sometimes people see the parts of the condition getting spread across a number of different variables. This can be confusing. Conditionals should be easy to understand, so when they aren’t they should be offloaded into a function that is. So, if you have to check 3 variables for 7 different values, then you certainly don’t want to do that directly in an ‘if’ statement. If the function you need to call requires all three variables, and a couple of the values passed, you probably have too many variables. The inputs to a conditional check function shouldn’t be that complex.

Flow of Control

There is some minimum structural logic that is necessary for a reasonable computation. This is different than performance optimizations, in that code with unnecessary branches and loops is just wasting effort. So if you loop through an array, find one part of it, then loop through it again to find the other part, that is ‘deoptimized’. By fixing it, you are just getting rid of bad code, but still not optimizing what the code is doing. It’s not uncommon in ugly code to see that a more careful construction could have avoided at least half of all of the flow constructs, if not more. When those useless constructs go, what is left is way more understandable.

Overlapping Logic

A messy part of most programming languages is error handling. It can be easily abused to craft blocks of code that have a large number of different exit points. Some necessary error handling supports multiple different conditions that are handled differently, but most error handling is rather boolean. One can mix the main logic with boolean handling and still have it readable. For more sophisticated approaches, the base code and error handling usually need to be split apart in order to keep it simple.

Hardcoded Constants

Once people grew frustrated by continually hitting arbitrary limits where the programmers made a bad choice, we moved away from sticking constants right into the code. Modern code however has forgotten this and has returned to hardcoding all sorts of bad stuff. On rare occasions, it might be necessary, but it always needs to be justified. Most of the inputs to the code should come through the arguments to the function call whenever possible. 

Disjoint Topics

You can take two very specific functions and jam them into one bigger function declaration. The code for each addresses a different ‘topic’, they should be separated, they shouldn’t be together. Minimizing the number of functions in code is a very bad idea, functions are cheap but they are also the basis of readability. Each function should fully address its topic, the code at any given level should all be localized.


The code should be layered. But taken to the extreme, some layers are useless and not adding any value. Get rid of them. Over minimization is bad too. If there are no layers, then the code tends towards a suer-large jumbled mess of stuff at cross-purposes. It may seem easier to read to some people since all of the underlying details are smacked together, but really it is not, since there is too much noise included. Coding massive lists of instructions exactly as a massive list is the ‘brute force’ way of building stuff. It works for small programs but goes bad quickly because of collisions as the code base grows.

Reader’s Questions

When someone reads your code, they will have lots of questions, Some things will be obvious, some can be guessed at given by the context, but somethings are just a mystery. Code doesn’t want or need mysteries, so it is quite possible for the programmer to nicely answer these questions. Comments, comment blocks, naming, and packaging all help to resolve questions. If it’s not obvious, it should be.

Different Behaviors

In some systems, there are a lot of interdependent options that can be manipulated by the users. If that optionality scrambles the code, then it was handled badly. If it’s really an indecipherable mess, then it is fundamentally untestable, and as such is not production worthy code. The options can be handled in advance by moving them to a smaller set of more reasonable parameters, or polymorphism can be used so that the major permutations fall down into specific blocks of code. Either way, giving the users lots of choices should also not give them lots of bugs. 


There are probably a few more variations, but this is a good start. 

If you minimize everything in this list, the code will not only be beautiful, but readable, have way fewer bugs and people can keep extending it easily in the future. 

It’s worth noting that no one on the planet can type in perfect code the first time, directly from their memory. It takes work to minimize this kinda stuff, and some of the constraints conflict with each other. So, everyone’s expectation should be that you type out a rough version of the code first, fix the obvious problems, and then start gradually working that into a beautify version. When asked for an estimate, you include the time necessary to write the initial code but also the time necessary to clean it up and test it properly. If there is some unbreakable time panic, you make sure that people know that what is going into production is ugly and only a ‘partial’ answer and still needs refining.