Sunday, July 10, 2016

Analysis

As I said in my last post on Quality, the ‘real’ world is a very messy place.

So messy that we are designed with filters to obscure the details for stuff we are not willing to accept. These filters are relative to how we internally model the world, which is unique for every individual. Software developers seek to define precise mappings between reality and the underlying formal machinery of a computer, but intrinsically we are unable to be objective; to be able to accept the world as it is.


The foundations of great software are rooted in getting as close to reality as humanly possible. That is if we capture a significant portion of high-quality data that is correctly modeled, then augmenting that with code to perform useful functionality for the users is considerably easier. Data forms the base of this construction and the ability to structure it correctly is bound to our own perspectives. So if we want to build more sophisticated, and thus more useful software, we need to understand how to ‘objectively’analyze the world around us.


The first and most important point is to realize that all individuals have this limited perspective and most likely an agenda. They see what they want, and they bend that towards their own goals. With an information stream like that, it makes it extremely difficult to consistently get objective facts, given that the bulk of the input for analysis is these streams. However, each stream likely contains shades of the underlying truth, so examining enough of them helps to us to converge on the underlying knowledge. This equates to a very simple rule of thumb of never taking only one source of information as being correct. It is best to assume that everything from any one source might be wrong, that it needs to be verified from many sources even if it seems to be simple, complete or intuitive. Cross-reference all data, all knowledge. Always.


Assumptions are the enemy of objectivity because they are just extrapolations of data consumed from untrustworthy underlying streams. That is, the analyst has been fed some not entirely correct or incomplete knowledge, that they extend in intuitive ways but that then bends from the external reality. Poor quality at the base propagates into the conclusions, usually as an assumption of some form. The situation is also recursive since all of the prior streams that fed into this are also somewhat bent as well. Somewhere, way down, someone made a weak assumption and that has percolated upwards to cause at least one issue. Given this, it is often so amazing that we have achieved the degree of sophisticated in some areas that we have. Our interactions are so often based around low-quality information.


Once an analyst has accepted that knowledge collection is intrinsically difficult, the next big challenge is to properly organize what is collected. Disorganized knowledge is not usable, in that it is artificially more complex. Chaos injects information.


Multi-dimensional information is also inherently difficult for humans to organize, but any organizational scheme is better than none. All that matters is that a scheme is applied to everything consistently. Inconsistent subparts are just another form of disorganization. They make it easy to overlook similarities and differences, which are the generalized structure necessary to eliminate special cases.


Knowledge that is artificially fragmented is likely to be misinterpreted. It just opens the door to inconsistencies, that then lead to bad assumptions. All of these problems feed on each other.


At some point, a really good analysis will boil down to a concrete understanding of all that is currently knowable, but reality being rather informal still contains unexpected surprises. What this means is that no analysis, however perfect, is immune to time. As the clock progresses, things change. That ensures that any analysis can never be complete. It can never be done. It is a perpetual requirement that is always converging but never arriving, at a conclusion.


For software that means that any underlying data model is, and will always be, incomplete. As time progresses, the deltas might get smaller, but they will never get to zero. Accepting that, ongoing analysis is a fundamental necessity to deal with software rusting, and it is absolutely necessary to continually expanding the underlying model, not just amplifying chaos by appending new detached submodels.


Analysis is the creation of that mapping between reality and the underlying formal systems that are the core of software. Sorting out the information flows may seem to be mechanical, but the greyness of reality prevents it from being trivial. Given that the output is foundational to software, analysis should be one of the keys areas of study for Computer Science but oddly it seems to have been overlooked. One could guess that that occurred because small aspects, such as database normalization, came quickly and easily, so on the surface, they appeared as obvious and dull. The failure in that assumption is that it is only correct from the formal side of the mapping. From the other side, it clearly isn’t true. If you normalize that wrong structure, you still have the wrong structure. Normalization isn’t really the problem, so it is only a small part of the answer. We can’t formally prove that any informality is correct, so we can’t ever prove the mappings, but we still show that some are much closer to being objective than others.

I personally think we need to deeply revisit our understanding of analysis in a big way, particularly in an age where misinformation dominates truth. Not only is this affecting our software, but it also taints many other aspects of our societies. If we can’t objectively converge on truth, we’ll just spend forever lost in unrealistic agendas.