Thursday, May 11, 2023

Engineering over Process

For half a century people have been complaining about their large expensive software projects exploding.

A couple of decades ago, a few people attributed these failures to a category of methodologies, known as Waterfall, which was being used by many of these failed development efforts. At least from all my experiences, that is incorrect.

The root problem is often a misfocus.

The people funding the work want to make sure that the work they are paying for is getting done. That is understandable. But their means of making sure this is done is most frequently control and tracking. That is where the problems begin.

Obviously measuring something is the first step in being able to improve it. You need tangible information about what is happening. A bunch of metrics.

But not all measures are created equal. If you try to measure really easy things that are only indirectly related to the underlying work, those measurements are questionable. They may tell you what you want to know about the work, but they may not.

If for any person doing the actual work, meeting a measurement goal becomes more important than the work itself, they will shift focus. They will concentrate on improving the metric at the expense of the work they are doing. That is, the act of measuring itself is actually distorting the work.

People want to be successful, so if the work they do isn’t seen as important, then the quality of that work will degrade to its lowest viable level. That allows them more time to focus on the measure. After all, no one seems to care about the quality, anyways.

The great classic example was counting lines of code for programmers, often known as LOC.

Way back some clever people started tracking the LOC numbers for all of their programmers. As the programmers figured this out, they switched their coding habits to produce a lot more code, which is a rather obvious consequence.

The problem is that quantity is not quality. You don’t need millions of lines of code if a hundred thousand would work just as well. In fact, having a small tight codebase is always way better. It is less work to create, less testing, fewer bugs, etc. Millions of lines of low-quality code is a pretty much an epic disaster. It is hard to wrangle, it’s brutal to extend, and you need a lot of people to keep going over it, line by line, in order to ensure that it even works. Thus, quality matters far more than quantity in programming.

Tracking LOC lead to a lot of disasters, some well-known; most quietly buried, but eventually, people figured out that it was the worst possible metric you could use from a top-down level. Easily gamed and accidentally pushing the work in all the wrong directions.

What the LOC debacle does show us is that quality really does matter. If you have a small well-written code base, you can use it for a lot of things. You can fix it easily. You can extend it as your needs grow. If you have a large, messy, brute-force codebase, it is inherently unstable, hard to manage, and clogs up the solution space with a sunk cost; the code already exists and it is large, so it is difficult to make a rational decision to fix or redo it.

We can see that another way. If the choice is between doing a good job in engineering the code or keeping very close track of how much code is getting produced, then picking quality over quantity is obviously better.

But as that simple trade-off percolates upwards, it really does morph into an organizational choice between focusing on engineering or adhering to a process. That connection isn’t quite straightforward though.

If you let a group of programmers run free in their coding, you may get a wonderful system. Or you may not. It might just be a giant ball of mud that is totally undocumented. So, obviously, you don’t want that. You need some kind of process.

But there seems to be a misunderstanding that any sort of process enforces organization. That is, if you force programmers to document stuff, for example, it is believed that the act of doing that will ensure that the work is better. But that’s another false assumption that is similar to the LOC mistake. Either the documentation is actually good, but that energy is no longer available for the coding, or the documentation is basically thrown together and is useless. You lose either way.

The problem here is that organization isn’t actually a side-effect of the process. You can have a strong process and the underlying work can still be disorganized or neglected.

So, what does keeping everything organized mean?

As far as building goes, getting organized is the very first step in design. You can’t produce a comprehensive design unless you get all of the details organized first. And you collect all of the details from the analysis, where they may be nicely organized, but that doesn’t mean the design or the code will follow suit.

We know this because of the earlier scenario about letting the programmers run free. When that fails, it is frequently because they skipped design and just started coding. They whacked out lots of little code, but it all falls apart when they try to bring that together or extend it. So “ball of mud” and “spaghetti” are just euphemisms for ‘disorganized’. Either it is a big disorganized clump of junk, or the logic wobbles so hard you can’t figure out what it is doing. This is natural in that when coding you really are too busy coding to worry about being organized so that always has to come first or it will never get done. When the work is small these types of deficiencies are hardly noticed. But as it grows, the disorganization grows faster. Every new change to the system is more brutal than the last one.

Oddly, creating a good design is an aspect of engineering. Part of being well-engineered is that it is working really well, but the other part is that it is nicely laid out. If it’s ugly or a mess, it isn’t well-engineered. If it is slow or unstable, it also isn’t well-engineered.

And so we’ve come full circle. If the process is the most important thing, people will do a good job there and ignore the construction issues. That will go badly. If engineering is more important, they will labor over the design, make sure they have optimized the code, handled all of the errors, and used the computers as efficiently as possible. They will care if the solution they built really fits the problem the users are having. They will care because everyone else cares.

Or basically, if you want minimal or better quality you have to explicitly design it into the way you work.

No comments:

Post a Comment

Thanks for the Feedback!