The Programmer's Paradox: Documentation as Blueprints

After decades of uncertainty, I think I’ve finally resolved my issues with creating software blueprints.

Obviously, having a good blueprint would prevent a lot of large disasters, but unlike buildings, software is a little too multi-dimensional. At the same time, we know good documentation is invaluable, but most documentation out there is not good. So, maybe these are the same issue?

Unused and unusable documentation is a total waste of time. Very few software development projects have any excess time.

But having no documentation is far worse; knowledge is vaporized when people leave.

Software isn’t ever a one-time thing, it’s an ongoing infrastructure built on shifting sands; actively used systems always need more work. The job isn’t to build some code that is running for a few weeks, it's to build a system that runs for years or even decades. The code never stops changing, even if the need for changes slows down.

We can build stuff as a process of exploration and discovery, but that isn’t efficient and it heavily impairs quality. Building any code out of order is messy. If you need a large system, you need a long-term plan. A roadmap. For that, you need a reliable way to draft and extend the blueprints that keep you out of trouble.

While there isn’t one magical blueprint format, we do know the things that would have to be in there to make it useful.

We can divide the whole context into 4 levels:

Enterprise (all of the systems and silos)
High-level (the system)
Medium-level (the components)
Low-level (the details)

Each has different constraints, focus, concerns, and usage.

There are 3 driving views:

Data (user, domain, organizational, standards, configuration, secrets)
Code

Code Construction (how the code will be arranged in the source)
Code Runtime (how the code behaves when it is executed)

Each view needs to be laid out separately, clearly, and correctly.

So, the stuff we need to document at each level is:

Enterprise (organization)

Map out the different domain territories (necessary silos)
Descriptions (data, features, usage) of each major domain problem
Overall Systems Diagrams (boxes and lines), top-down recursive, each level is clean and simple.
Enterprise Standards (technology stacks, security, centralized components, monitoring)

High-level (system)

Coding Style & Conventions, External standards, like UTF8, ISO codes, ETC.
Description of features, their usage, and why they help (analysis)
Interface Map (Users, Admin & Operations, GUI & CLI, all end-points)
Data Model: List out all of the data and its structures. (ER Diagram <-> Schema, or equiv for NoSQL and other data sources such as files) (data includes domain, derived, config, interfaces, etc.)
List out the components, aka boxes and lines

Medium-level (components)

List out the code as it is running (processes)
List out the protocols, communication formats, and authentication
List out all of the sub-components
List out the paradigms, major patterns, constraints, design choices

Low-level (computation details)

Algorithms, state machines, transformations, idioms, patterns
Ugly data hacks
Hardcoded configuration, secrets, and operational identifiers
Settings and options

Code

Location constraints (what code belongs in what files)
Comments on why
Readable code and self-describing names

The point of documentation is not to impress people with the excessive complexity and details. You always need to minimize it. Massive diagrams with far too much detail are for egos, not practice. A good diagram is simple and conveys something both important and useful. A diagram is ‘great’ if people keep referring to it while they are actually working. Big repetitive text documents that scramble the details in boring paragraphs are useless too. No fluff, just exactly what is needed and nothing more. Tables and lists are preferred.

Messy diagrams may also highlight really bad disorganization. Basically, it's a disorganized mess if you cannot produce a simple diagram of it. Spaghetti architecture, design, code, and data. Spaghetti internal company structure or domain. An ugly system or messy environment impedes the ability of everyone to move forward.

The degree of importance is relative to scale, size, and the order of the categories.

No Enterprise category in a large company means a phenomenal amount of wasted work, dysfunctions, excess silos, redundancy, bugs, faults, costs, etc. Do it to prevent bad overlaps.

A high-level design should always exist in some form, but medium and low levels are commonly skipped by very experienced, senior dev teams. They’ll still figure out the medium and low designs in their head before coding, but it just doesn’t get written down due to time constraints. Some of the design parts are mostly reconstructible from the code itself. The code always has the last word.

Low-level designs are very similar to the code, just far more readable and descriptive. They can explain tricky algorithms in better detail. They really only need to exist to help junior programmers understand the work they are doing, or for code that is extremely complex. They can be skipped for routine code that properly follows the system’s styles and conventions. If the code does not follow suit, it should be rejected.

A common mistake is to have minimal or even no analysis of the problems. Usually results in horrific scope creep.

A common mistake is to start coding first before the problem is even understood. Usually results in a brute-forced, procedural spaghetti swamp of unstable code clumps, and crazy hacks.

A common mistake is to lay out only the code construction at a high level and let the data structures and runtime environments evolve erratically. Might keep the initial code cleanish, but bad data hacks caused by the increasing chaos will quickly degrade it, and the operational dysfunction will further damage the code.

To keep everything organized and make the work smooth, lots of stuff needs to be written down. But if you spend time writing out stuff that no one needs and it is never read, it is a total waste of time. First, you need to know the audience and second, you need to understand why they’ll find the information useful, then you’ll know what to write.

Summary: there is a set of problems, the software implements some features to solve parts of it. Those user features map to underlying functionality. Sometimes there is a GUI; sometimes there isn’t. Everything is anchored by necessary data, which may have a complex structure.

The Programmer's Paradox

Thursday, May 18, 2023

Documentation as Blueprints

Enterprise (organization)

High-level (system)

Medium-level (components)

Low-level (computation details)

Code

No comments:

Post a Comment