Sunday, September 17, 2017

Decisions

We can model large endeavors as a series of decisions. Ultimately, their success relies on getting work completed, but the underlying effort cannot even be started until all of the preceding decisions are made. The work can be physical or it can be intellectual or it can even be creative.

If there are decisions that can be postponed, then we will adopt the convention that they refer to separate, but related pieces of work. They can occur serially with the later piece of work relying on some of the earlier decisions as well as the new ones. Some decisions are based on what is known up-front, while others can’t be made until an earlier dependent bit of work is completed.

For now, we’ll concentrate on a single piece of work that is dependent on a series of decisions. Later we can discussion parallel series and how they intertwine.

Decisions are rarely ever ‘right’ or ‘wrong’, so we need some other metric for them. In our case, we will use ‘quality’. We will take the decision relative to its current context and then talk about ‘better’ or ‘worse’ in terms of quality. A better decision will direct the work closer to the target of the endeavor, while a worse one will stray farther away. We’ll accept decisions as being a point in a continuous set and we can bound them between 0.0 and 100.0 for convenience. This allows for us to adequately map them back to the grayness of the real world.

We can take the quality of any given decision is the series as being relative to the decision before it. That is, even if there are 3 ‘worse’ decisions in a row, the 4th one can be ‘better’. It is tied to the others, but it is made in a sub-context and only has a limited range of outcomes.

We could model this in a demented way as a continuous tree whose leaves fall onto the final continuous range of quality for the endeavor itself. So, if the goal is to build a small piece of software, at the end it has a specific quality that is made up from the quality of its subparts, which are directly driven by the decisions made to get each of them constructed. Some subparts will more heavily weight the results, but all of it contributes to the quality. With software, there is also a timeliness dimension, which in many cases would out-weight the code itself. A very late project could be deemed a complete failure.

To keep things clean, we can take each decision itself to be about one and only one degree of variability. If for some underlying complex choice, there are many non-independent variables together, representing this as a set of decisions leaves room to understand that the one or more decision makers may not have realized the interdependence. Thus the collection of decisions together may not have been rational on the whole, even if all of the individual decisions were rational. In this sense, we need to define ‘rational’ as being full consideration of all things necessary or known relative to the current series of decisions. That is, one may make a rational decision in the middle of a rather irrational endeavor.

Any decision at any moment would be predicated both on an understanding of the past and the future. For the past, we accrue a great deal of knowledge about the world. All of it for a given subject would be its depth. An ‘outside’ oversimplification of it would be shallow. The overall understanding of the past would then be about the individual’s or group’s depth that they have in any knowledge necessary to make the decision. Less depth would rely on luck to get better quality. More depth would obviously decrease the amount of luck necessary.

Looking towards the future is a bit tricker. We can’t know the future, but we can know the current trajectory. That is, if uninterrupted, what happened in the past will continue. When interrupted, we assume that that is done so by a specific event. That event may have occurred in the past, so we have some indication that it is, say, a once-in-a-year event. Or once-in-a-decade. Obviously, if the decision makers have been around an area for a long time, then they will have experienced most of the lower frequency events, thus they can account for them. A person with only a year’s experience will get caught by surprise at a once-in-a-decade event. Most people will get caught by surprise at a once-in-a-century event. Getting caught by surprise is getting unlucky. In that sense, a decision that is low quality because of luck may indicate a lack of past or future understanding or both. Someone lucky may look like they possess more knowledge of both than they actually do.

If we have two parallel series of decisions, the best case is that they may be independent. Choices made on one side will have no effect on the work on the other side. But it is also possible that the decisions collide. One decision will interfere with another and thus cause some work to be done incorrectly or become unnecessary. These types of collisions can often be the by-product of disorganization. The decision makers are both sides are unaware of their overlap because the choice to pursue parallel effort was not deliberate.

This shows that some decisions are implicitly made by willingly or accidentally choosing to ignore specific aspects of the problem. So, if everyone focuses on a secondary issue instead of what is really important, that in itself is an implicit decision. If the choices are made by people without the prerequisite knowledge or experience, that too is an implicit decision.

Within this model then, even for a small endeavor, we can see that there are a huge number of implicit and explicit decisions and that in many cases it is likely that there are many more implicit ones made than explicit. If the people are inexperienced, then we expect the ratio to have a really high multiplier and that the quality of the outcome then relies more heavily on luck. If there is a great deal of experience, and the choices made are explicit, then the final quality is more reflective of the underlying abilities.

Now all decisions are made relative to their given context and we can categorize these across the range of being ‘strategic’ or ‘tactical’. Strategic decisions are either environmental or directional. That is, at the higher level someone has to set up the ‘game’ and point it in a direction. Then as we get nearer to the actual work, the choices become more tactical. They get grounded in the detail, become quite fine-grained and although they can have severe long-term implications, they are about getting things accomplished right now. Given any work, there are an endless number of tiny decisions that must be made by the worker, based on their current situation, the tactics and hopefully the strategic direction. So, with this, we get some notion of decisions having a scope and ultimately an impact. Some very poor choices by a low-level core programmer on a huge project, for instance, can have ramifications that literally last for decades, and that degrade any ability to make larger strategic decisions.

In that sense, for a large endeavor, the series of decisions made, both large and small, accumulate together to contribute to the final quality. For long running software projects, each recurring set of decisions for each release builds up not only a context, but also boundaries that intrinsically limit the best and worst possible outcomes. Thus a development project 10 years in the making is not going to radically shift direction since it is weighted down by all of its past decisions. Turning large endeavors slows as they build up more choices.

In terms of experience, it is important for everyone involved at the various levels to understand whether they have the knowledge and experience to make a particular decision and also whether they are the right person at the right level as well. An upper-level strategic choice to set some weird programming convention, for example, is probably not appropriate if the management does not understand the consequences. A lower-level choice to shove in some new technology that is not inline with the overall direction is also equally dubious. As the work progresses bad decisions at any level will reverberate throughout the endeavor, generally reducing quality but also the possible effectiveness of any upcoming decisions. In this way, one strong quality of a ‘highly effective’ team is that the right decisions get made at the right level by the right people.

The converse is also true. In a project that has derailed, a careful study of the accumulated decisions can lead one back through a series of bad choices. That can be traced way back, far enough, to find earlier decisions that should have taken better options.

It’s extraordinarily hard to choose to reverse a choice made years ago, but if it is understood that the other outcomes will never be positive enough, it can be easier to weigh all of the future options correctly. While this is understandable, in practice we rarely see people with the context, knowledge, tolerance and experience to safely make these types of radical decisions. More often, they just stick to the same gradually decaying trajectory or rely entirely on pure luck to reset the direction. Thus the status quo is preserved or ‘change’ is pursued without any real sense of whether it might be actually worse. We tend to ping-pong between these extremities.

This model then seems to help understand why complexity just grows and why it is so hard to tame. At some point, things can become so complex that the knowledge and experience needed to fix them is well beyond the capacity of any single human. If many of the prior decisions were increasingly poor than any sort of radical change will likely just make things worse. Unless one can objectively analyze both the complexity and the decisions leading to it, they cannot return to a limited set of decisions that need to be reversed properly, and so at some point, the amount of luck necessary will exceed that of winning a lottery ticket. In that sense, we have seen that arbitrary change and/or oversimplifications generally make things worse.

Once we accept that the decisions are the banks of the river that the work flows through, we can orient ourselves into trying to architect better outcomes. Given an endeavor proceeding really poorly, we might need to examine the faults and reset the processes, environment or decision makers appropriately to redirect the flow of work in a more productive direction. This doesn’t mean just focusing on the workers or just focusing on the management. All of these levels intertwine over time, so unwinding it enough to improve it is quite challenging. Still, it is likely that if we start with the little problems, and work them back upwards while tracking how the context grows, at some point we should be able to identify significant reversible poor decisions. If we revisit those, with enough knowledge and experience, we should be able to identify better choices. This gives us a means of taking feedback from the outcomes and reapplying it to the process so we can have some confidence that the effects of the change will be positive. That is, we really shouldn’t be relying on luck to make changes, but not doing so is far less than trivial and requires deeper knowledge.

Sunday, September 10, 2017

Some Rules

Big projects can be confusing and people rarely have little time or energy to think deeply about their full intertwined complexity. Not thinking enough is often the start of serious problems.


Programmers would prefer to concentrate on one area at a time, blindly following established rules for the other areas. Most of our existing rules for software are unfortunately detached from each other or are far too inflexible, so I created yet another set:


  1. The work needs to get used
  2. There is never enough time to do everything properly
  3. Never lie to any users
  4. The code needs to be as readable as possible
  5. The system needs to be fully organized and encapsulated
  6. Don’t be clever, be smart
  7. Avoid redundancy, it is a waste of time

The work needs to get used



At the end of the day, stuff needs to get done. No software project will survive unless it actually produces usable software. Software is expensive, someone is paying for it. It needs backers and they need constant reassurance that the project is progressing. Fail at that, nothing else matters.

There is never enough time to do everything properly



Time is the #1 enemy of software projects. It’s lack of time that forces shortcuts which build up and spiral the technical debt out of control. Thus, using the extremely limited time wisely is crucial to surviving. All obvious make-work is a waste of time, so each bit of work that gets done needs to have a well-understood use; it can’t just be “because”. Each bit of code needs to do what it needs to do. Each piece of documentation needs to be read. All analysis and design needs to flow into helping to create actual code. The tests need to have the potential to find bugs.


Still, even without enough time, some parts of the system require more intensive focus, particularly if they are deeply embedded. Corners can be cut in some places, but not everywhere. Time lost for non-essential issues is time that is not available for this type of fundamental work. Building on a bad foundation will fail.

Never lie to any users



Lying causes misinformation, misinformation causes confusion, confusion causes a waste of time and resources, lack of resources causes panic and short-cuts which results in a disaster. It all starts with deviating from the truth. All of the code, data, interfaces, discussions, documentation, etc. should be honest with all users (including other programmers and operations people); get this wrong and a sea of chaos erupts. It is that simple. If someone is going to rely on the work, they need it to be truthful.

The code needs to be as readable as possible



The code in most systems is the only documentation that is actually trustworthy. In a huge system, most knowledge is long gone from people's memory, the documentation has gone stale, so if the code isn't readable that knowledge is basically forgotten and it has become dangerous. If you depend on the unknown and blind luck, expect severe problems. If, however, you can actually read the code quickly, life is a whole lot easier.

The system needs to be fully organized and encapsulated



If the organization of the code, configuration, documentation or data is a mess, you can never find the right parts to read or change when you need to. So it all needs organization, and as the project grows, the organizational schemes need to grow with it and they need organization too, which means that one must keep stacking more and higher levels of organization on top. It is never ending and it is part of the ongoing work that cannot be ignored. In huge systems, there is a lot of stuff that needs organization.


Some of the intrinsic complexity can be mitigated by creating black boxes; encapsulating sub-parts of the system. If these boxes are clean and well-thought out, they can be used easily at whatever level is necessary and the underlying details can be temporarily ignored. That makes development faster and builds up ‘reusable’ pieces which leverages the previous work, which saves time and of course, not having time is the big problem. It takes plenty of thought and hard work to encapsulate, but it isn’t optional once the system gets large enough. Ignoring or procrastinating on this issue only makes the fundamental problems worse.

Don’t be clever, be smart



Languages, technologies and tools often allow for cute or clever hacks. That is fine for a quick proof of concept, but it is never industrial strength. Clever is the death of readability, organization and it is indirectly a subtle form of lying, so it causes lots of grief and eats lots of time. If something clever is completely and totally unavoidable then it must be well-documented and totally encapsulated. But it should be initially viewed as stupendously dangerous and wrong. Every other option should be tried first.

Avoid redundancy, it is a huge waste of time



It is easy to stop thinking and just type the same stuff in over and over again. Redundancy is convenient. But it is ultimately a massive waste of time. Most projects have far more time than they realize, but they waste it foolishly. Redundancy makes the system fragile and it reduces its quality. It generates more testing. It may sometimes seem like the easiest approach, but it isn’t. If you apply brute force to just pound out the ugliest code possible, while it is initially faster it is ultimately destructive.

Overall

There really is no finite, well-ordered set of one-size-fits-all rules for software development but this set, in this order, will likely bypass most of the common problems we keep seeing over the decades. Still, software projects cannot be run without having to think deeply about the interrelated issues and to constantly correct for the ever-changing context. To keep a big project moving forward efficiently requires a lot of special skills and a large knowledge base of why projects fail. The only real existing way to gain these skills is to get mentored on a really successful project. At depth, many aspects of developing big systems are counter-intuitive to outsiders with oversimplifications. That is, with no strong, prior industry experience, most people will misunderstand where they need to focus their attention. If you are working hard on the wrong area of the project, the neglected areas are likely to fail.

Often times, given an unexpectedly crazy context, many of the rules have to be temporarily dropped, but they should never be forgotten and all attempts should be made to return the project to a well-controlled, proactive state. Just accepting the madness and assuming that it should be that way is the road to failure.

Monday, September 4, 2017

Data Modeling

The core of most software systems is the collection of data. This data is rarely independent; there are many inter-relationships and outward associations with the real world. Some of these interconnections form the underlying structure of the data.

If the system doesn’t collect the data or if the data collected is badly structured, it can be extraordinarily expensive to fix it later. Most often, if there are deficiencies in the data that are too expensive to fix that leads the software developers to overlay convoluted hacks in an attempt to hide the problems. Long-term application of this strategy eventually degrades the system to the point of uselessness.

As such, data forms the foundation for all software systems. Code without data is useless, and unlike data, code can be easily changed later. This means that when building big complex systems, more care should be spent on ensuring that the necessary data is collected and structured properly than any other aspect of the system. Getting the data wrong is a fatal mistake that often takes years to fully manifest itself.

A common way to model data is with an ER diagram. This decomposes the data into a set of ‘entities’ and provides a means of specifying ‘relationships’ between them. The relationships are oriented on frequency so they can be zero, one or many on either side. Thus one entity might be connected to many different related entities, forming a one-to-many relationship. All possible permutations are viable, although between multiple entities some collections of relationships are unlikely.

This is a reasonable way to model some of the structure, but it is hampered because its expressiveness is directly tied to relational databases. Their underlying model is to store the entities into tables which represent a two-dimensional means of encoding the data. This works for many common programming problems but is not able to easily capture complex internal relationships like hierarchies or graphs.

The older form of decomposing into data structures is far more expressive. You can break down extremely complex programs as a set of data structures like lists, hash tables, trees, graphs, etc. With those, and some associated atomic primitives as well as the larger algorithms and glue code, we have the means to specify any known form of computation. That decomposition works but does not seem to be generally well known or understood by the software industry. It is also a lot harder to explain and teach. It was softened way back to form the underlying philosophy behind object-oriented programming, but essentially got blended with other ideas about distributed computing, evolutionary design, and real-world objects. The modern remnants of this consolidation are quite confusing and frequently misunderstood.

So we have at least two means of modeling data, but neither of them really fit well or are easy to apply in practice. We can, however, fuse them together and normalize them so that the result is both consistent and complete. The rest of this post will try to outline how this might be achieved.

Terminology and definitions form the basis for understanding and communication, so it is well worth consideration. From ER diagrams, the word ‘entity’ is general enough for our purposes and isn’t horrifically overloaded in the software industry yet. For the subparts of an entity, the common term is ‘attributes’. Often in other contexts, we use the term ‘properties’. Either seems suitable for describing the primitive data that forms the core of an entity, but it might be better to stick with ‘attributes’. It is worth noting that if some of this underlying associated data has its own inherent non-trivial structure, it can be modeled as a separate entity bound by a one-to-one relationship. Because of that, we can constrain any of these attributes to just being single instances of primitive data. No need for internal recursion.

From an earlier post on null handling (Containers, Collections and Null) we can set the rules that no instance of an entity can exist unless it has all of its mandatory attributes. If an attribute is optional for any possible instance of the entity, it is optional for all possible instances. These types of constraints ensure that the handling and logic are simple and consistent. They need to be rigorous.

Every entity corresponds to something unique in the real world. This is another rigorous constraint that is necessary to ensure the data is useful. If under some circumstance that isn’t true, then the entity corresponds to a ‘collection’ of such instances, not an instance itself. In this way, we can always bind the data to something unique, even if that is just a bunch of arbitrarily categorized collections. The caveat though, is that we should never just arbitrarily categorize because that creates problems (since we are adding in fake information). But instead, we need a deeper analysis of the underlying domain to resolve any ambiguities.

If we get a unique principle in place, then it is absolutely true that every entity has a unique key. In many circumstances this will be a composite key, but then any such instance refers to one and only one entity in the real world.

In developing systems we often create a system-generated key as a level of indirection to replace some of the original data keys. This is a good practice in that it allows some of the composite attributes to be modifiable, without having to create a new entity. In historic systems, this keeps the related data attached across changes. This is a necessary property for systems that deal with human-centered input, mostly because there is a relatively constant rate of errors and long periods of time before they are noticed. That needs to be built into the data model.

So, with some tighter constraints, we can decompose all of the data into entities and we can show how they relate to each other, but we still need a means of showing how the entities relate internally to their own set. As an example, mostly the employees in a company form a tree or directed acyclic graph with respect to their managers. We could model this with some messy arrangement that has multiple entity types allowing different people to both report upwards and have employees as well, but this would be quite awkward and pedantic. What is preferable is to just model the entity itself as a tree (ignoring the more complicated arrangements for the moment).
This could be as easy as specifying the type for the entity as a ‘tree’ and adding in an attribute for ‘manager’. We simply overlay the data structure decomposition on top of the original entity. Thus we get an extension, where the internal entity relationship might be a data structure like a tree and to satisfy that structure we add in the appropriate attributes.

That works cleanly, but we would like to know which set of data structures is necessary to express the full width of any possible data. A guess at what might start initially with this collection: item, list, tree, dag, graph, hypergraph. That may seem fairly expressive, but we are aware of at least some form of dimensionality with respect to entities. That is, the entity might be a list of lists, or even a hypergraph of trees of lists. Thus we have the initial collection on any given axis, with any N axises. It can be any such combination.

As weird as that sounds, we could then specify a matrix as an entity of list^2, but noting that the lists are strongly bounded by the M and N sizes of the matrix. In most underlying data collections, the entities are just not that complicated, but we really don’t want to limit the models to the 80% mark.

As for representing these models, we can easily use the original ER diagrams with the extra information for including the entity type. We would also like a means of serializing these and given that the relationships are not necessarily planar we need a bit of extra mechanics here as well. Before we get there, it is worth noting that the external relationships are bound as graphs. A reasonable extension would be to allow them to be hypergraphs as well, which would make them consistent with the internal relationships. That is, we could have a relationship such as one-many-many between three entities. That could further compact the model, but would only be usable if it was there to highlight a relationship, not obscure it.

We can easily serialize an entity (and we know it isn’t intrinsically recursive). So all we need is a means to serialize a hypergraph. That is well understood for graphs as being an adjacency matrix, but with the cells containing the relationship itself instead of just 1 or 0. Or we can just list out the vertices (entities) and the edges (relationships). Something like ‘entityA-entityB-entityC: one-many-many’ would be quite suitable. We could iterate any list of unnamed variables as brackets [], and any named variables (like entities) as braces {}. Another suggestion is to iterate keys first, separated internally by commas, but switch to a semicolon to separate them from the rest of the attributes or other keys (there may be multiple means of getting uniqueness). If there is some reason to distinguish entities from other hash tables, we could use parenthesis () for this distinction. Thus we might then differentiate between objects and dictionaries, for clarity.

With a bit of white space, this would all provide a means to have a serialized text-based format for specifying a data model, which means that we could apply other tools such as source-code control and formatting to be able to manipulate these specifications. With the addition of positioning information, we could easily render this as a diagram when needed.

Given the importance of data for any software system, it seems that we should focus strongly on building up analytical tools for correctly laying the foundations for the system. It should be easy to model data and organizations should have large repositories of all of their current models. It is often the case that some systems use eclectic subsets of the actual real underlying data models, but what we need is both a way to catch this early and to gradually enhance our knowledge to properly model the data for all of the domains. A subset is fine, but when it needs to be extended, it should be done properly. It is reckless to leave this to chance, and we so often see the consequences of doing that. If the analysis stage for software development is properly focused on really modeling the data, the other stages will proceed smoothly. If not, the initial failures will percolate into the rest of the work.