Saturday, January 30, 2021

Newbie Mistakes

 Mistakes:

  • Change the “least” amount of stuff.


A huge mistake. If you make a change to something, you have to make it consistently, everywhere. Otherwise, in the not too distanced future, you will have to do that change again in one of the other locations. The problem is that until then you have to remember that you needed to make this change. Remembering that you need to make 1 or 2 changes isn’t taxing, but if you keep doing it, and there are hundreds of things you were supposed to remember, then quite obviously a whole bunch of them will be forgotten. However, if you do the change everywhere, you can forget about it. It will not crop up in the future as a problem. Way, way fewer things to remember.


  • Only release a “part” of the system.


Another tragic mistake. Unless all of the parts of the system are ‘completely’ independent of each other, those dependencies will quickly build up into a big problem. Again, keeping track of what parts were changed and when is far too taxing, so it will eventually become forgotten, causing preventable problems. It’s far safer to release ‘one’ system than it is to release 20+ subparts.


  • The names of things don’t matter.


This is a rather deep problem. Obviously, you don’t want to be working on a system where it has a variable called ‘foo’, only to find out that there are no ‘foos’ in that variable but instead it is full of ‘bars’. That is a great recipe for wasting time. Pretty much all ‘usable’ code will eventually be worked on by a lot of other programmers, so if the name is incorrect, or weird, and you waste their time, they have a right to be angry about it. We don’t just write code for ourselves, that would only ever be a ‘hobby’ system, not a real one. If it’s real code, it needs to be done correctly. Now it’s true that a lot of people feel that naming is ‘hard’, but that doesn’t mean you should do it badly or misname stuff. Rather, it just means that it is one of those parts of software development that needs more effort and concentration, not less.


  • Any code that works is good enough.


That is true for a throwaway demo. That is not true for getting a big system to be stable. At that scale, bad code is worse than no code. Thus, a giant pile of bad code is a negative contribution. If the code works most days then it is going to create a lot of operational headaches on its bad days. That drama eats away at all other aspects of the project and often ends up killing it. It diverts the focus away from doing the right work, often causing everything to spiral out of control. If you can’t code it correctly, you should not code it at all. You’re doing the project a favor.


  • Learn by doing.


It’s not a bad idea to go out and get experience. If what you are doing is fairly simple, it is fine to go back to first principles, start at the beginning and work your way through the issues. That’s the way education works. A set of good University courses combines theory and practice, with each following course getting a little harder. The mistake, however, is to jump straight into the deep end without ever having bothered to learn how to swim. For something that is complicated -- like some parts of system development -- if you go all of the way back to first principles, it will take you at least 40 years to catch up to the current state of the art. Too much water has already passed under that bridge, What you need to do is seek out knowledge and mentors. They can help you learn way, way faster, and thus narrow the gap between doing the work and being competent at it. Once you have acquired that knowledge then you can start honing your skills, and again it is way faster to do that while working with a team of experienced developers than flailing around on your own trying to reinvent wheels.

Sunday, January 24, 2021

Machinery

Over the years, I’ve had a lot of conversations about design, architecture, and the uniqueness of higher-level approaches to structuring large codebases. I’ve put together a synthesis of those discussions, but it is probably unreadable by most people.

One way of looking at code is as if some of it was a small independent machine. You feed some stuff into the machine, a computation whirls along, and then some stuff comes out the other side.

That holds together in the sense that Turing machines are finite discrete formal systems by definition, so we can take larger behaviors and partition them down into a set of finite pieces, each of which is discrete in its behavior. While the halting problem is an infinite behavior, it is actually tied to an infinitely long tape. Place a fixed bound on that tape size, the resulting mechanics is then itself finite and the halting problem goes away.

In a practical sense, that doesn’t matter, we don’t want fixed limits on some types of computations but those exceptions are rare enough, such as graphical user interfaces, or certain types of searching, or command input, that we can separate out the infinite computations from the finite ones.

With that distinction, we can alter our perception of a large computer system as mostly being an interconnected network of these finite computations (sometimes wrapped by an infinite computation), and with that viewpoint, we can see the computations themselves as black boxes.

So, we have some code that, as a machine, produces a specific output. We can build on that by having more code that uses a bunch of these machines to compute a slightly more complex output. With this, we can see the whole system as a fairly large series of machines, some decomposable, some not. If we wanted to create a new system with lots of different high-level machines that calculate related, but different outputs, it’s entirely possible to break this down into a list of all of the underlying machinery. There would be multiple ways of decomposing the higher-level problems, but there would always be some near absolute minimum size for the overall collection and each machine within it.

While that may first appear to be pointless, what it tells us is that it is possible to reduce the work needed to build the system. More useful is that recreating the same machine but having it underlying in different higher-level machines is an obvious waste of effort. More to the point, there is some ‘natural’ decomposition of the system itself, that minimizes the overall composite machinery.

But that’s not exactly true. We might have two machines that take the same input and return the same output, but they differ internally with the way they make tradeoffs such as time vs space. It’s multi-dimensional, so there is an infinite number of projections down to just one.

So, if we stop just above the internals of each machine, and the computations are encapsulated away from infinite tapes, it does seem as if we can specify a rather exact minimum. From a practical perspective, it doesn’t really matter if this is correct or not, but what does matter is that we can identify the lower-level machines, and while working avoid duplicating them. And above that, we can identify the next level and avoid duplication there as well. We can do this at each level, for all compositions in order to ensure that the work we are doing is necessary and not wasted.

The really big question is how to find such a decomposition. It turns out that the inputs and outputs to each machine are a huge clue. It’s easy to see that in most systems, the different data types utilized are finite. There are a fixed number of primitive types, there is a fairly fixed number of structural arrangements, also known as data structures. So what distinguishes the machines from each other is the permutations of the inputs and outputs, and these belong to a fixed set. Then, it would be easy to see if a machine is currently missing, but it is also easy to see if one machine is a proper subset of another. The third case is an intersection between two, but we know that at worst we could treat that as three machines. That of course applies all of the way up the hierarchy of machine composition, so if we utilize that understanding we have a means of normalizing the machinery.

Now that does open the door so that when we add new machines, we might have to renormalize the whole set again, but it seems more likely that the worst case there is restricted to a specific branch or subtree of the hierarchy. In order to avoid that as a recurring problem, when we add new inputs or outputs, we make sure the machines that utilize them are complete. That is, additions at each level always includes a bunch of new machines, even if they aren’t being used yet. That then attempts to minimize the growth at different points of time along with minimizing the total underlying work. Adding in that expectation that the work will always continue to expand, adds a bit more to each iteration, but eventually gets it taken back from fewer renormalizations.

So, we have some idea of the minimum amount of work, that is above the tradeoffs, and some idea of being able to minimize the work across the entire lifespan. We can even partition different parts of this hierarchy, bounded by a set of levels, from each other so that we can overlay a structure on top of this with specific properties. That can be understood as an ‘architecture’, and those properties can be organizational, or specifically related to behavior such as security or comprehension.

So seeing the system as a finite set of machines leads to ways to optimize parts of its construction and it also helps to place bounds on the work needed to assure that all of the parts work together as specified. What’s left undefined in this perspective is the tradeoffs made inside of the machines, and any higher level performance properties assigned at different levels.

Thursday, January 21, 2021

Default Requirements

One common source of confusion in software development comes from ‘default requirements’. Since we don’t explicitly add them into the analysis/design, people mistakenly think they are ‘optional’. They are not.

If you are going to build and maintain a computer system, no matter what it does, there are some default requirements that must be met in order for the system to be stable.


User Requirements

  • The system should never lie to or mislead the users.

  • The system should do what it is told to do, when it is told to do it.

  • The output should always be deterministic.

  • Any work/effort should never be lost or forgotten.

  • The system should be up and available when the users need to use it.


Operational Requirements

  • Installations and upgrades should be simple and safe.

  • Untested code should never make it into production.

  • All code/configuration upgrades can be rolled back.

  • Any resource outages will be tolerated.

  • Operations can be monitored for errors.

  • Reboots will always work. 

  • 100% guarantee that the matching code is easily identifiable in the repo.

  • All data imports are idempotent.


Development Requirements

  • It should be easy to set up a dev environment, all devs should use the same setup.

  • The code in the main branches of the repo should always run, but never be dangerous.

  • Everything needed to build and run the code is either in the environment or in the repo.

  • The repo contains each and every ‘source’ file, even if that file is binary.

  • There can be more than one instance of the development on the same machine.

  • No work should ever be lost (code, config, documentation, or build)

  • Anything that is not obvious, should be documented along with the system.

  • Most bugs should be reproducible. All code/functionality is fully testable in at least one test environment.

  • The code edit/build/run cycle should be as short as possible.

  • Dependencies should be fixed, updating them should be manual.

  • Everything should be named/labeled properly. Devs can’t lie to the users, they can’t lie to other IT people either.


These requirements aren’t optional, they don’t depend on language, OS, or tech stack, they are not legacy and they are not negotiable. For each one that is missed, there are problems that degrade the quality of the system and/or the ability to keep it running.


Given the rather frantic state of the software industry, it is getting harder and harder to freeze the development of a live system. Some very old technologies that are isolated can do this, but most of the newer ones, at minimum, will gradually become less secure.


It should also be noted that if you build a system that depends on technology like a relational database, for persistence, you have two systems, not one. If a dependency is self-standing, then it is external to the system, even if the system is inoperable without it. Everything that applies to code or configuration in the system also applies to schemas and default data in the persistence technology.


Thursday, January 14, 2021

Analysis

Software is a set of solutions for a bunch of related problems. 

The first step in building software is always to perform an ‘analysis’ of the problems. If you don’t know what the problem is, you certainly can’t build something to solve it.


The point of any analysis is to shed light on all parts of the ‘domain’ (business, industry, field, knowledge base, etc.) that one needs to understand in order to design a workable solution.


For software, any solutions exist purely in the digital realm but usually have various touchpoints back to the physical world. What this implies is that someone needs to look carefully at the physical world, in particular the parts of it that intersect with the domain, to find ways to encapsulate this digitally.


So, if the solution is to automate a specific process that is happening in a company, then ‘all’ aspects of that process need some form of digital equivalence. That equates to finding all of the terminology that intersects the process, essentially all of the verbs and nouns, and carefully defining them. 


The definition of the nouns is effectively a data model. The definition of the verbs is most often the workflows. These two concepts form the foundation of any analysis. 


If you were interested, for example, in keeping track of what bonds where being bought and sold by a brokerage company, then the obvious nouns, or ‘entities’, involved in this would be the ‘clients’, the subset of ‘financial instruments’ (bonds, money-market, etc.) the ‘traders’, ‘salespeople’, etc. As well, since this commerce doesn’t take place in an institution like a stock market, but rather is ‘over-the-counter’ there are various notions of the established norms and practices in the industry that are necessary too. Fundamentally, that all falls back down onto the way the traders ‘talk’ with the clients, the questions asked and the answers given, and the way that the transaction is ultimately processed. 


Oddly, although all of this might seem simple, an industry like bonds, also known as ‘fixed income’, is really old and very nuanced. It’s been ongoing for at least 250 years, so over that time a lot of reasonable and some irrational things have built up. That is pretty much true for all of the domains in our modern society. As an outsider, they might not seem complicated, but internally the complexities have build up over time and aspects of them are often ‘counter-intuitive’. You can’t build a viable solution in one of these domains without a full and complete understanding of the domain; avoiding this will result in a rather useless over-simplified partial solution that will inevitably just makes the problems worse.


And it’s exactly this intrinsic built-up complexity that needs to be captured, fully and correctly, in an analysis. If it was simple and obvious, then it would be easy to whip together a good solution, which oddly given the age of the computer industry means that it would have already been built decades ago. That is, if it were easy, it would be done already.


It’s obviously a lot of work than to take a real problem, iterate out all the complexities, and then use that to design effective solutions. For an organization that has been in the same business for decades, although the technology has been changing fairly often, the core underlying parts of their domain have probably not seen too many shifts. A complex domain also has lots of different specializations, all of which need different solutions, so there tends to be a lot of different, but related ‘systems’ built. 


Given that this has been a huge, ongoing effort for a long time now, it makes the most sense for organizations to find ways of reducing this effort. It is quite costly. From this, it is pretty obvious that if someone did a good, strong analysis, a decade ago, that while parts of it may need updating, going right back to the ‘drawing board’ would be an epic waste of time. This underpins the importance of making sure that any analysis is reusable. That it is in some format that is independent of the system, popular tech stacks, or tools. That it is assembled together so it isn’t forgotten. That is referenceable so the core parts don’t have to be picked out of the long rambling text.


That is, the main and most important deliverable of any analysis is to build up and contribute to a library that contains a full accounting of the real-world models and workflows that evolve slowly for the domain. If that resource exists, the later analyses can leverage the work done initially. It would still need to be checked and updated, but that is far less intensive than finding it all from scratch, for each new system. 


So, it’s pretty clear that the deliverable for any analysis is something that has a life of its own. It’s not just a shallow document that looks at a few details in the domain, but rather it is that set of knowledge that is foundational to the next stage, which is design. When you know enough about a problem, the solutions you create will work better, be faster to build, and will stay relevant for a lot longer.

Saturday, January 9, 2021

Structuring Code

There are many different ways to ‘structure’ code. 

A common approach is to start with a set of instructions that you want to accomplish. Put them together, then put that into a function. Later when someone needs some additional features, put those into their own function, merge the two functions together at some higher level.


The result of this brute force approach will be code that is difficult to understand since you’d have to have been there when it was written. It only makes ‘sense’ if you understand the order of growth.


Another way to structure code is to build up more powerful underlying ‘primitives’. That is, the system library and other dependencies have a predefined set of functionality. You build up new primitives on top of these that solve larger portions of the problem. Eventually, this growing sequence of primitives will equate to the top-down directives. 


This second approach is ‘intrinsically’ readable. If you start at the top, the decomposition of lower-level calls makes sense. If everything really is an upper level ‘primitive’, one of its attributes is that it doesn’t have gaps or overlaps with the other primitives at the same level. 


As well as being readable, it’s also easier to make extensions. Either the lower-level primitive exists, or they need to be written. If the extensions honor the same ‘level’ decomposition, then all of the code maintains its organization by default. 


A modified version of the second approach binds the primitives directly to the data. That is, all of the primitive functions are ‘verbs’ that act on the primitive data which are ‘nouns’. That is super-readable, and even better if the code itself reflects back the language used for the specification of the work. There is no need to translate between ‘odd’ programmer artifacts and the actual problem space.


With the modified version, it’s also easy then to get reuse out of the code. If there are two similar problems, you lift the ‘naming’ of the variables for the first problem up one level of abstraction, then you build in the variation between the different cases. Each time the underlying ‘domain’ expands, the code lifts up a little bit more. 


That runs the same risk in the first approach that the order of lifting is erratic, but if the underlying code really maintains its primitive decomposition, then it is still organized, but may just be incomplete. 


It’s fairly easy when reading code to see how it was structured, in that you can see that the programmer knew how to do a few ‘somethings’ then struggled to bring those together, or that they built up some increasingly sophisticated mechanics to work through the problem space.

Wednesday, January 6, 2021

Quality

There is a clear order of precedence for code when it comes to quality. It is a) great, b) good, c) none, and then d) weak.

Good code does exactly what people expect it to do, all of the time. It has enough readability and documentation that people can easily figure out what it is going to do. If it needs a simple extension, it is simple to accomplish.


Great code is ‘good’ code that can be used for a variety of different, but similar purposes. It meets the original requirements, but via some configuration can be reused over and over again. 


Obviously, great code would be preferable, but good code is fine too. 


Not having any code is an opportunity to build something new, and to get it done well. It’s also a chance to do some research and determine what types of attributes are necessary in order to meet all of the requirements. If the code won’t compile or is so obviously bad that it never works, then it’s pretty much ‘no code’.


Weak code is the worst. Since it exists there is no incentive to redo the work, but it is often so bad that refactoring it into something better is more time than rewriting it. This covers code that appears to work now but is mysterious, as well as code that mostly works but occasionally doesn’t, and code that will eventually stop working in the future. Spaghetti code is the most common form of weak code, but so are large code ‘piles’, balls of mud, unreadable code, and technically challenging code that ignores theory, replacing it with something so simple and wrong that it could never work properly. 


The cause of a lot of anxiety in programming is a large, badly written system with mostly weak code that is functional enough to work but has a high frequency of failures, say daily, weekly, or even monthly. In these cases, the effort to build the system has been ongoing for years, the stakeholders are very invested in the time and money already spent, but the existence of the system itself is the core obstacle towards getting everything to work. There are just too many weaknesses to be able to redirect the effort towards something viable. But there is too much effort already sunk to be able to abandon it. It becomes a zombie project.


Good and great code can be easily degraded into weak code. It’s far easier than transforming weak code into good code. It usually happens because the original programmers have left, and the new programmers don’t want to follow suit. Weak code in a good system tends to outshine any of the other efforts, enough of it will bring everything else down to its level.