Thursday, May 25, 2023

Nothing beats Experience

If you’ve been doing something for a long, long time, you’ve built up a lot of experience over the decades.

Experience doesn’t always mean deep knowledge, so you may not know why you should do certain things, but you will know the consequences of doing it different ways. You might not see underlying connections, or how to extend what you are doing to other disciplines. Practice is higher and lighter than theory. Theory is important, but more for enhancing the art than just getting things done.

Sometimes people believe that changes in the software industry trends invalidate experience, but more often they really are just shallow setbacks. Change is inevitable, but only real improvements matter and they are rare. It’s too popular these days to have change for the sake of change, it's mostly meaningless. Poor changes are always shallow, that is usually why they aren’t improvements, just noise. If you see enough of them, you see the pattern.

There are people in other disciplines who specialize in rearranging things that are often hired to improve any work. Sometimes they have good insights, but more often their naivety leads them astray. From the outside, everything looks simpler than it really is, so tweaking how it happens isn’t about creativity, but rather deep analysis. You can’t do this from an external perspective, or a theoretical perspective, it has to be from raw practical experience. You have to find the people who understand how to get it done and leverage their experiences to make viable improvements. Otherwise, you are just taking wild guesses.

As for learning, doing it yourself, over and over again, is a good way to begin. But it is far better to spend some time working with someone who already has great experience. Their talent transfers somewhat. If people like that aren’t available, then formal courses can be deep and useful. Lighter discussions can highlight interesting points, but most similar things are connected. Contradictions are vital in unraveling misinformation. Pretty much it is always learning a bit, test it out, then go back for more depth. Knowledge of complexity tends to evolve erratically. Again that is why working with someone who is good at what they do is best, they’ve already been lost earlier and found their way out.

If you want something big to get done, it starts first with assembling enough experience to get it done. Without that assemblage of talent, it will take a ridiculous amount of time to reinvent and relearn. Time that probably isn’t available.

Thursday, May 18, 2023

Documentation as Blueprints

After decades of uncertainty, I think I’ve finally resolved my issues with creating software blueprints. 

Obviously, having a good blueprint would prevent a lot of large disasters, but unlike buildings, software is a little too multi-dimensional. At the same time, we know good documentation is invaluable, but most documentation out there is not good. So, maybe these are the same issue?

Unused and unusable documentation is a total waste of time. Very few software development projects have any excess time.

But having no documentation is far worse; knowledge is vaporized when people leave.

Software isn’t ever a one-time thing, it’s an ongoing infrastructure built on shifting sands; actively used systems always need more work. The job isn’t to build some code that is running for a few weeks, it's to build a system that runs for years or even decades. The code never stops changing, even if the need for changes slows down.

We can build stuff as a process of exploration and discovery, but that isn’t efficient and it heavily impairs quality. Building any code out of order is messy. If you need a large system, you need a long-term plan. A roadmap. For that, you need a reliable way to draft and extend the blueprints that keep you out of trouble.

While there isn’t one magical blueprint format, we do know the things that would have to be in there to make it useful.

We can divide the whole context into 4 levels:
    1. Enterprise (all of the systems and silos)
    2. High-level (the system)
    3. Medium-level (the components)
    4. Low-level (the details)
Each has different constraints, focus, concerns, and usage.

There are 3 driving views:
    • Data (user, domain, organizational, standards, configuration, secrets)
    • Code
      • Code Construction (how the code will be arranged in the source)
      • Code Runtime (how the code behaves when it is executed)

Each view needs to be laid out separately, clearly, and correctly.

So, the stuff we need to document at each level is:


Enterprise (organization)

  • Map out the different domain territories (necessary silos)
  • Descriptions (data, features, usage) of each major domain problem
  • Overall Systems Diagrams (boxes and lines), top-down recursive, each level is clean and simple.
  • Enterprise Standards (technology stacks, security, centralized components, monitoring)


High-level (system)

  • Coding Style & Conventions, External standards, like UTF8, ISO codes, ETC.
  • Description of features, their usage, and why they help (analysis)
  • Interface Map (Users, Admin & Operations, GUI & CLI, all end-points)
  • Data Model: List out all of the data and its structures. (ER Diagram <-> Schema, or equiv for NoSQL and other data sources such as files) (data includes domain, derived, config, interfaces, etc.)
  • List out the components, aka boxes and lines


Medium-level (components)

  • List out the code as it is running (processes)
  • List out the protocols, communication formats, and authentication
  • List out all of the sub-components
  • List out the paradigms, major patterns, constraints, design choices


Low-level (computation details)

  • Algorithms, state machines, transformations, idioms, patterns
  • Ugly data hacks
  • Hardcoded configuration, secrets, and operational identifiers
  • Settings and options


Code

  • Location constraints (what code belongs in what files)
  • Comments on why
  • Readable code and self-describing names

The point of documentation is not to impress people with the excessive complexity and details. You always need to minimize it. Massive diagrams with far too much detail are for egos, not practice. A good diagram is simple and conveys something both important and useful. A diagram is ‘great’ if people keep referring to it while they are actually working. Big repetitive text documents that scramble the details in boring paragraphs are useless too. No fluff, just exactly what is needed and nothing more. Tables and lists are preferred.

Messy diagrams may also highlight really bad disorganization. Basically, it's a disorganized mess if you cannot produce a simple diagram of it. Spaghetti architecture, design, code, and data. Spaghetti internal company structure or domain. An ugly system or messy environment impedes the ability of everyone to move forward.

The degree of importance is relative to scale, size, and the order of the categories.

No Enterprise category in a large company means a phenomenal amount of wasted work, dysfunctions, excess silos, redundancy, bugs, faults, costs, etc. Do it to prevent bad overlaps.

A high-level design should always exist in some form, but medium and low levels are commonly skipped by very experienced, senior dev teams. They’ll still figure out the medium and low designs in their head before coding, but it just doesn’t get written down due to time constraints. Some of the design parts are mostly reconstructible from the code itself. The code always has the last word.

Low-level designs are very similar to the code, just far more readable and descriptive. They can explain tricky algorithms in better detail. They really only need to exist to help junior programmers understand the work they are doing, or for code that is extremely complex. They can be skipped for routine code that properly follows the system’s styles and conventions. If the code does not follow suit, it should be rejected.

A common mistake is to have minimal or even no analysis of the problems. Usually results in horrific scope creep.

A common mistake is to start coding first before the problem is even understood. Usually results in a brute-forced, procedural spaghetti swamp of unstable code clumps, and crazy hacks.

A common mistake is to lay out only the code construction at a high level and let the data structures and runtime environments evolve erratically. Might keep the initial code cleanish, but bad data hacks caused by the increasing chaos will quickly degrade it, and the operational dysfunction will further damage the code.

To keep everything organized and make the work smooth, lots of stuff needs to be written down. But if you spend time writing out stuff that no one needs and it is never read, it is a total waste of time. First, you need to know the audience and second, you need to understand why they’ll find the information useful, then you’ll know what to write.

Summary: there is a set of problems, the software implements some features to solve parts of it. Those user features map to underlying functionality. Sometimes there is a GUI; sometimes there isn’t. Everything is anchored by necessary data, which may have a complex structure.

Thursday, May 11, 2023

Engineering over Process

For half a century people have been complaining about their large expensive software projects exploding.

A couple of decades ago, a few people attributed these failures to a category of methodologies, known as Waterfall, which was being used by many of these failed development efforts. At least from all my experiences, that is incorrect.

The root problem is often a misfocus.

The people funding the work want to make sure that the work they are paying for is getting done. That is understandable. But their means of making sure this is done is most frequently control and tracking. That is where the problems begin.

Obviously measuring something is the first step in being able to improve it. You need tangible information about what is happening. A bunch of metrics.

But not all measures are created equal. If you try to measure really easy things that are only indirectly related to the underlying work, those measurements are questionable. They may tell you what you want to know about the work, but they may not.

If for any person doing the actual work, meeting a measurement goal becomes more important than the work itself, they will shift focus. They will concentrate on improving the metric at the expense of the work they are doing. That is, the act of measuring itself is actually distorting the work.

People want to be successful, so if the work they do isn’t seen as important, then the quality of that work will degrade to its lowest viable level. That allows them more time to focus on the measure. After all, no one seems to care about the quality, anyways.

The great classic example was counting lines of code for programmers, often known as LOC.

Way back some clever people started tracking the LOC numbers for all of their programmers. As the programmers figured this out, they switched their coding habits to produce a lot more code, which is a rather obvious consequence.

The problem is that quantity is not quality. You don’t need millions of lines of code if a hundred thousand would work just as well. In fact, having a small tight codebase is always way better. It is less work to create, less testing, fewer bugs, etc. Millions of lines of low-quality code is a pretty much an epic disaster. It is hard to wrangle, it’s brutal to extend, and you need a lot of people to keep going over it, line by line, in order to ensure that it even works. Thus, quality matters far more than quantity in programming.

Tracking LOC lead to a lot of disasters, some well-known; most quietly buried, but eventually, people figured out that it was the worst possible metric you could use from a top-down level. Easily gamed and accidentally pushing the work in all the wrong directions.

What the LOC debacle does show us is that quality really does matter. If you have a small well-written code base, you can use it for a lot of things. You can fix it easily. You can extend it as your needs grow. If you have a large, messy, brute-force codebase, it is inherently unstable, hard to manage, and clogs up the solution space with a sunk cost; the code already exists and it is large, so it is difficult to make a rational decision to fix or redo it.

We can see that another way. If the choice is between doing a good job in engineering the code or keeping very close track of how much code is getting produced, then picking quality over quantity is obviously better.

But as that simple trade-off percolates upwards, it really does morph into an organizational choice between focusing on engineering or adhering to a process. That connection isn’t quite straightforward though.

If you let a group of programmers run free in their coding, you may get a wonderful system. Or you may not. It might just be a giant ball of mud that is totally undocumented. So, obviously, you don’t want that. You need some kind of process.

But there seems to be a misunderstanding that any sort of process enforces organization. That is, if you force programmers to document stuff, for example, it is believed that the act of doing that will ensure that the work is better. But that’s another false assumption that is similar to the LOC mistake. Either the documentation is actually good, but that energy is no longer available for the coding, or the documentation is basically thrown together and is useless. You lose either way.

The problem here is that organization isn’t actually a side-effect of the process. You can have a strong process and the underlying work can still be disorganized or neglected.

So, what does keeping everything organized mean?

As far as building goes, getting organized is the very first step in design. You can’t produce a comprehensive design unless you get all of the details organized first. And you collect all of the details from the analysis, where they may be nicely organized, but that doesn’t mean the design or the code will follow suit.

We know this because of the earlier scenario about letting the programmers run free. When that fails, it is frequently because they skipped design and just started coding. They whacked out lots of little code, but it all falls apart when they try to bring that together or extend it. So “ball of mud” and “spaghetti” are just euphemisms for ‘disorganized’. Either it is a big disorganized clump of junk, or the logic wobbles so hard you can’t figure out what it is doing. This is natural in that when coding you really are too busy coding to worry about being organized so that always has to come first or it will never get done. When the work is small these types of deficiencies are hardly noticed. But as it grows, the disorganization grows faster. Every new change to the system is more brutal than the last one.

Oddly, creating a good design is an aspect of engineering. Part of being well-engineered is that it is working really well, but the other part is that it is nicely laid out. If it’s ugly or a mess, it isn’t well-engineered. If it is slow or unstable, it also isn’t well-engineered.

And so we’ve come full circle. If the process is the most important thing, people will do a good job there and ignore the construction issues. That will go badly. If engineering is more important, they will labor over the design, make sure they have optimized the code, handled all of the errors, and used the computers as efficiently as possible. They will care if the solution they built really fits the problem the users are having. They will care because everyone else cares.

Or basically, if you want minimal or better quality you have to explicitly design it into the way you work.

Thursday, May 4, 2023

Complexity Blowout

Despite any good intentions, sometimes the underlying complexity of any large problem, either domain or technical, drastically exceeds the abilities of the people involved. It’s just too complex for them to deal with.

What happens then, is that everybody retreats into tunnel vision. They all clamp down their own expected context, but in different ways. Mostly the odds of any of these sub-contexts being viable are extremely low. There will usually be some variability that was deliberately ignored, that should not have been ignored. The idea may work in the sub-context but will fail miserably in the real, larger, context.

So, what you see in practice is that everybody starts to argue with each other. That is a primary symptom that may indicate that we’ve hit a complexity blowout.

People get quite angry and frustrated. Everyone can see that the other suggested contexts will not work as expected but they feel that their own context is correct. Obviously, it is unlikely that any of the suggested subcontexts are sufficient, so it is a basically impossible problem at this point.

Really the only solution is to find somebody that can “see it all” and take direction from them. But people always misestimate their own abilities, as well as others, so it would first involve admitting that they have no idea about how to properly solve the issue. That is a hard sell. Then start finding some way of verifying someone else’s understanding. We don’t normally see this type of introspection in software development circles. We’re taught to fake it until we make it.

What happens instead is that one or more bad paths eventually get chosen. And as they fail, the people responsible for the failures leave. Given the severity of this type of problem, this failure state can persist for decades. The various attempts to solve things fail, the arguments continue, and the people fracture. It’s politics as usual.

It’s a rather classic and common version of biting off more than you can chew.

If it has gone into this impossible state and you don’t have full and total authority over it, there is really nothing you can do about it. It is systemic at that point. Even if you think you know how to solve it, you are probably wrong too, so you’d just be another incorrect alternative in the endless debate.

If you did know how to solve it, you would also know enough to suspect that you didn’t know how to solve it. That is, the more confident you are in your solution, the less confident you should be in your solution, which is obviously a paradox.

The best you can do is go around to everyone else, see what they propose, understand their flaws, and see if they are also present in what you are proposing. But a lack of flaws does not imply full understanding or correctness. Just that the now obvious reasons for not proceeding are not there.

So, oddly the best choices are the ones that are crystal clear, but not pushed heavily, in that we accept that overconfidence is often driven by insecurity.

As for actually solving the problem in a reliable way, that usually doesn’t happen. Rather it is just an endless parade of broken solutions and turnover. At best, some of the solutions will kinda work but even then their lifespans will be prematurely cut short.