Thursday, September 28, 2023

Trifecta

Right from the beginning of my career, I have been bothered by the way we handle software development. As an industry, we have a huge problem with figuring out ‘who’ is responsible for ‘what’.

For decades, we’ve had endless methodologies, large and small, but all of them just seem to make poor tradeoffs between make-work and chaos. Neither is appealing.

As well, there are all sorts of other crazy processes and plenty of misconceptions floating around. Because of this most projects are dumpster fires, which only adds to the stress, wastes energy, and ensures poor quality.

For me, whenever development has worked smoothly it was been because of strong personalities who are subverted the enforced methodology. Strong, knowledgeable leadership works well.

Whenever the projects have been excessively painful, it is often caused by confusion in the roles and responsibilities which resulted in poor outcomes. Politics blossoms when the roles or rules are convoluted or vague. Focus gets misplaced, time gets wasted, and the quality plummets. It gets ugly.

It’s not that I have an answer, but after 30 years of working and 17 years of writing about it, I feel like I should at least lay down some basic principles.

So, here goes...

There are three primary areas for software. They are: a) the problem domain, b) the operational environment, and c) the development environment. Software (c) is a set of solutions (b) for some problems (a).

A system is a collection of similar solutions for a common problem domain.

There are two primary motivators for creating software: a) vertical and b) horizontal.

A vertical motivator is effectively a business-driven need for some software. Either they use it, offer it as a service, or sell it.

A horizontal motivator is an infrastructural need for some software. Missing parts of the puzzle that are disrupting either the operational or development flow.

Desired quality is a growing exponential curve, where low-quality throw-away code is to the left, then static, hardcoded, in-house development, then decent commercial products, then likely healthcare, aerospace, and NASA. To get to the next category is maybe 2x - 10x more work for each hop.

The actual quality is the desired level plus the sum of all testing, which is also exponential. So to find the next diminishing set of less visible bugs is 2x - 10x more effort. There is an endless series of bug sets. Barely reasonable commercial quality is probably a 1:1 ratio of testing with coding.

The quality of the code itself is dependent on the design and the enforcement of good style and conventions. Messy code is buggy code. The quality of the design is dependent on the depth of the analysis. The overall results are a reflection of the understanding of the designers and coders. The problem domain is often vague and irrational but it has to be mapped to code which is precise and logical. That is a very tricky mapping.

Ultimately while software is just instructions for a computer to run, its genesis is from and all about people. It is a highly social occupation. Non-trivial software takes a team to build and a team to run.

So, for every system, we end up with three main players:
  • Domain Champion
  • Operations Manager
  • Lead Software Developer
The domain champion represents all of the users. They also represent some of the funding, they are effectively paying for work to get done. They have a short-term agenda of making sure the software out there runs as expected and they are the ones that commission new features for it. They drive any non-technical analysis. They have or can get all of the answers necessary for the problem domain, which they need to understand deeply.

The operational manager is effectively the day-to-day ‘driver’ of the software. They set it up and offer it for others to use. They need to get the software installed, upgraded, and carefully monitor it. They are the front line for dealing with any issues the users encounter. They offer access as a service.

The lead developer builds stuff. It is constructive. They should focus on figuring out the stuff that needs to be built and the best way to do it, given all of the domain and operational issues. The features are usually from the top down, but to get effective construction the code needs to be built from the bottom up. The persistence foundation should exist before the GUI, for example.

For most domain functionality, the champion is effectively responsible for making sure that the features meet the needs of the users. The lead makes sure the implementation of those features is reasonable. They do this by breaking those features down into lots of different functionality to get implemented. A champion may need the system to keep track of some critical data, the lead may implement this as a set of ETL feeds and some user screens.

If there are bugs in production the users should go directly to the operations manager. If the manager is unable to resolve the issues, then they would go to the lead developer, but it would only be for bugs that are brand new. If it's recurring, the manager already knows how to deal with it. The operations manager would know if the system is slow or overusing resources. Periodically they would provide feedback to the lead.

If a project is infrastructure, cleanup, or reuse, it would be commissioned directly by the lead developer. They should be able to fund maintenance, proof of concept, new technologies, and reuse work on their own since the other parties have no reason to do so. The project will decay if someone doesn't do it.

The lead needs to constantly make the development process better, smoother, and more effective. They need to make sure the technology used is keeping up with the industry. Their primary focus is engineering, but they also need to be concerned with solution fit, and user issues like look and feel. They set the baseline for quality. If the interface is ugly or weird, it is their fault.

As well as the champion, the operations manager would have their own system requirements. They set up and are responsible for the runtime, so they have a strong say in the technologies, configuration, security, performance, resource usage, monitoring, logging, etc. All of the behavior and functionality they need to do their job. If they have lots of different systems, obviously having it consistent or aligned would be highly important to them. They would pick the OS and persistence for example, but not the programming language. The dependencies used for integration would fall under their purview.

The process for completing the analysis needed to come up with a reasonable set of features is the responsibility of the champion. Any sort of business analyst would report to them. They would craft the high-level descriptions and required features. This would be used by the lead to get a design.

If the project is infrastructure, instead of the champion, it is the responsibility of the lead to do the analysis. Generally, the work is technical or about organization, although it could be reliant on generalities within the problem domain. The work might be combining a bunch of redundant software engines altogether, to get reuse, for example.

Any sort of technical design is the lead, and if the organization is large, they likely need to coordinate the scope and designs with the firm’s architects and security officers. As well, the operational requirements would need to be followed. A design for an integrated system is not an independent silo, it has to fit with all of the other existing systems.

Architects would also be responsible for keeping the higher level organized. So, they wouldn’t allow the lead or champion to poach work from other teams.

The process of building stuff is up to the lead. They need to do it any which way, and in any order, that best suits them and their teams. They should feel comfortable with the processes they are using to turn analysis into deployment.

They do need to give time estimates, and if they miss them, detailed explanations of why they missed them. Leads need to learn to control the expectations of the champion and the users. They can’t promise two years of work in six months, for example. If development goes poorly or the system is unusable they are on the hook for it.

There should be a separate quality assurance department that would take the requirements from the champions, leads, and operations managers, and ensure that the things being delivered meet those specifications. They would also do performance and automated testing. With the specs and the delivery items, they would return a report on all of the deficiencies to all three parties. The lead and champion would then decide which issues to fix. Time and expected quality would drive those decisions.

The items that were tested in QA are the items that are given directly to operations to install or upgrade. There are two release processes. The full one and the fast one. The operations manager schedules installations and patches at their own convience and notifies the users when they are completed. The lead just queues up the almost-finished work for QA.

The lead has minimal interaction with operations. They might get pulled into net new bug issues, they get requirements for how the software should operate, and they may occasionally, with really tricky bugs, have to get direct special access to production in order to resolve problems. But they don’t monitor the system, and they aren’t the frontline for any issues. They need to focus on designing and building stuff.

The proportion of funding for the champion and for the lead defines the control of technical debt. If the system is unstable or development is slow, more funding needs to go into cleanup. The champion controls the priority of feature development, and the lead controls the priority of the underlying functionality. That may mean that a highly desired feature gets delayed until missing low-level functionality is ready. Building code out of order is expensive and hurts quality.

So that’s it. It’s a combination of the best of all of the processes, methodologies, writing, books, arguments, and discussions that I’ve seen over the decades and in the companies that I have worked for directly or indirectly. It offsets some of the growing chaos that I’ve seen and puts back some of the forgotten knowledge.

All you need, really, is three people leading the work in sync with each other in well-defined roles. There are plenty of variations. For example in pure software companies, there is a separate operations manager at each client. In some cases, the domain champion and lead are the same person, particularly when the domain is technical. So, as long as the basic structure is clear, the exact arrangement can be tweaked. Sometimes there are conflicting, overlapping champions pulling in different directions.

Thursday, September 21, 2023

Historic Artifacts with Data

Software development communities have a lot of weird historical noise when it comes to data. I guess we’ve been trying so hard to ignore it, that we’ve made a total mess of it.

So, let’s try again:

A datam is a fixed set of bits. The size is set. It is always fixed, it does not change. We’ll call this a primitive datam.

We often need collections of datam. In the early days, the min and max numbers in the collections were fixed. Later we accepted that it could be a huge number, but keep in mind that it is never infinite. There is always a fixed limitation, it is just that it might not be easily computable in advance.

Collections can have dimensions and they can have structure. A matrix is 2 dimensions, a tree is a structure.

All of the data we can collect in our physical universe fits into this. It can be a single value, a list or set of them, a tree, a directed acyclic graph, a full graph, or even possibly a hypergraph. That covers it all.

I suppose some data out there could need a 14-dimension hypergraph to correctly represent it. I’m not sure what that would look like, and I’m probably not going to encounter that data while doing application programming.

Some of the confusion comes from things that we’ve faked. Strings are a great example of this. If you were paying attention, a character is a primitive datam. A string is an arbitrary list of characters. That list is strictly ordered. The size of a string is mostly variable, but there are plenty of locations and technologies where you need to set a max.

So, a string is just a container full of characters. Doing something like putting double quotes around it is a syntactic trick to use a special character to denote the start of the container, and the same special character to denote the end. Denoting the start and end is changing the state of the interpretation of those characters. That is, you need a way of knowing that a bunch of sequential datam should stay together. You could put in some sort of type identifier and a size, or you could use a separator and an implicit end which is usually something like EOF or EOS. Or you can just mark out the start and end, as we see commonly in strings.

Any which way, you add structure on top of a sequence of characters, but people incorrectly think is itself a primitive datam. It is not. It is actually a secret collection.

The structural markers embedded in the data are data themselves. Given a data format, there can be a lot of them. They effectively are meta-data that tells one how to collect together and identify the intervening data. They can be ambiguous, noisy, unbalanced, and a whole lot of other issues. They sometimes look redundant, but you could figure out an exact minimum for them to properly encode the structure. But properly encoding one structure is not the same as properly encoding all structures. The more general you make things, the more permutations you have to distinguish in the meta-data, thus the noisier it will seem to get.

Given all that pedantry, you could figure out the minimum necessary size of the meta-data with respect to all of the contexts it will be used for. Then you can look at any format and see if it is excessive.

Then the only thing left to do is balance out the subjectiveness of the representation of the markers.

If you publish that, and you explicitly define the contexts, then you have a format whose expressibility is understood and is as nice as possible with respect to it, and then what’s left is just socializing the subjective syntax choices.

In that sense, you are just left with the primitive datam and containers. If you view data that way, it gets a whole lot simpler. You are collecting together all of these different primitives and containers, and you need to ensure that any structure you use to hold them matches closely to the reality of their existence. We call that a model, and the full set of permutations the model can hold is its expressiveness. If you want the best possible representation then you need to tighten down that model as much as possible, without constricting it so much that legitimate data can’t be represented.

From time to time, after you have collected it, you may want to move the data around. But be careful, only one instance of that data should be modifiable. Merging structure reliably is impossible. The effort to get a good model is wasted if the data than just haphazardly spread everywhere.

Data is the foundation of every system. Modeling properly can be complex, but the data itself doesn’t have to be the problem. The code built on top is only as good as the data below it. Good code with bad data is useless. Bad code with good data is fixable.

Thursday, September 14, 2023

Modelling Complex Data

Usually, the point of building a software system is to collect data. The system is only as good as the data that it persists for a long time.

Machines go up and down, so data is really only persisted when it is stored in a long-term device like a hard disk.

What you have in memory is transitory. It may stay around for a while, it may not. A system that collects stuff, but accidentally forgets about it sometimes, is useless. You can not trust it, and trust is a fundamental requirement for every piece of software, large and small.

But even if you manage to store a massive amount of data, if it is a chaotic mess it is also useless. You store data so you can use it later; if that isn’t possible, then you haven’t really stored it.

So, it is extremely important that the data you store is organized. Organization is the means of retrieving it.

A long, long time ago everybody rolled their own persistence. It was a disaster. Then relational databases were discovered and they dominated. They work incredibly well, but they are somewhat awkward to use and you need to learn a lot of stuff in order to use them properly. Still, we had decades of being reliable.

NoSQL came along as an alternative, but to get the most out of the tech people still had to understand concepts like relational algebra and normalization. They didn’t want to, so things returned to the bad old days were people effectively rolled their own messes.

The problem isn’t the technology, it is the fact that data needs to be organized to be useful. Some new shiny tech that promises to make that issue go away is lying to you. You can’t just toss the data somewhere and figure it out later. All of those promises over the decades ended in tears.

Realistically, you cannot avoid having at least one person on every team that understands how to model persistent data. More is obviously better. Like most things in IT, from the outside, it may seem simple but it is steeped in difficulties.

The first fundamental point is that any sort of redundant data is bad. Computers are stupid and merging is mostly undeniable, so it’s not about saving disk space, but rather the integrity, aka quality, of the data. The best systems only store everything once, then the code is simpler and the overall quality of the system is always higher.

The second fundamental issue is that you want to utilize the capabilities of the computer to keep you from storing garbage. That is, the tighter your model matches the real world, the less likely it is choked with garbage.

The problem is that that means pedantically figuring out the breadth and depth of each and everything you need to store. It is a huge part of analysis, a specialized skill all on its own. Most people are too impatient to do this, so they end up paying the price.

To figure out a model that fits correctly to the problem domain means actually having to understand a lot of the problem domain. Many programmers are already so overwhelmed by the technological issues that they don’t want to poke into the domain ones too. Unfortunately, you have no choice. Coders code what they know, and if they are clueless as to what the users are doing, their code will reflect that. But also, coders with domain expertise are far more valuable than generic coders, so there's a huge career upside to learning what the users are doing with their software.

If you avoid redundant data and you utilize the underlying technology to its best abilities to ensure that the data you need fits tightly then it’s a strong foundation to build on top of.

If you don’t have this, then all of those little modeling flaws will percolate through the code, which causes it to converge rapidly on spaghetti. That is, the best code in the world will still be awful if the underlying persisted data is awful. It will be awful because either it lets the bad data through, or it goes to insane lengths to not let the bad data through. You lose either way. A crumbly foundation is an immediate failure out of the gate.

Time spent modeling the data ends up saving a lot of time that wasn’t wasted on hacking away at questionable fixes in the code. The code is better.

Saturday, September 9, 2023

The Groove

A good development project is smooth. Not fun, not exciting, but smooth.

You figure out what you need to build. Spending lots of time here helps.

Then you sit down to code it. There are always a whole lot of things, some obvious, and some not, that the code needs to do in order for it to work as expected.

If you are curious, you’ll find out lots of interesting things about the problem domain.

If you are rational, you start coding at the bottom. Get the data, add it to persistence, and make it available to the code above. You work your way slowly upwards.

If you are efficient, the code has a lot of reuse. So, instead of having to add lots of stuff again, you are mostly just extending it to do more.

Toward the end, you’ll deal with the interfaces. The API, CLI, GUI, and NUI ones. Wiring them up nicely is the last thing you do because if you do it too early, they will keep changing.

If there is a time crunch, then you’ll tighten the focus down to the things that absolutely must be there. It is not nice, but sometimes you have no choice.

Before release, you go through some form of extensive QA. When you find a problem, you fix it as low as possible. You iterate through as many cycles as you need to get the quality you desire.

The first thing you do after the release is go back and clean up the mess you made in order to get the release out. You do this before adding more features and functionality.

If you follow this pattern, you get into a rhythm. The code grows rapidly, it evolves to satisfy more and more of the user’s needs. If it's good, the users may even like using it. As time goes on, you build up larger and larger machinery, to make future coding tasks even easier. The work should get smoother over time. It’s a good sign.