Sunday, April 13, 2008

The Essence of Analysis

In any venture if you do not start off on the right foot, your chances of success are dramatically diminished. So it is easy to understand that the farther away we are from our initial target, the longer it will take to get there. If we have no idea where we are even going, then we won't even know if we have ever arrived. Some things, just are.

Many software developers understand that design is a highly critical part of getting the software right. You can't just magically iterate your way into the perfect system without first understanding what the system is supposed to do. Formal, or on the back of napkin, developers need a target to work towards. However, it is often the case that even the best of all designs still cannot help if the even earlier stages are fatal. Design is not actually the first step.

The very first thing for any building is to have a reason to build it. A building without occupants, a book without readers, or a software product that does not solve any real problems are all examples of things that got lost in their very first stage: analysis.


A while back, I wrote down some ideas about what I thought were the essential development problems:

I identified four big issues a) perception, b) discipline, c) generalization and d) analysis. Of these, analysis is the one that I have least talked about, primarily because without resorting to domain specific examples -- generally proprietary in the software world -- talking about it is difficult.

But analysis is clearly one of the keys to getting the right thing built at the right time. Turning a list of functions into executing code takes effort, but turning a vague understanding of someone's process into a list of functions takes a genuine understanding. It is a very difficult, very overlooked and a very uncommon skill set.


In analysis, only one group of people is really important: the user of the software. All of the other 'stakeholders' have some effect in helping or derailing the project, but your quality of work comes from only pleasing one group. For this entry I'll focus only on them, leaving the others as just background obstacles to be overcome.

All that software can do is to help a user manipulate a growing pile of data. Whether it is visualization, tracking or editing, software is incredibly simple; each function just helps play with the information in some way. The key to building good software, then is to obviously tie this back to something useful in the user's life. Software that helps solves specific problems is software that is desirable. It extends out the user's ability to manage their information efficiently and effectively. Software that violates that covenant just irritates it users. Realistically most software is a mixture of useful and irritating features. Sometimes on purpose, often not.

Without a doubt, the most common approach to software development is a combination of hanging around on an ivory tower and collectively guessing at the sorts of problems that the user might want solved. If this sounds haughty and aloof, it generally is, which is why a great deal of software lacks a considerable amount of empathy for its users. Missing empathy equates to sticky, hard to use features that beg to be rewritten. It doesn't need to be like that, but that's the industry norm.

If you stand back a bit, you can see that software solves two basic problems: one from the business domain, the other from the technical. The technical domain is the packaging that is required by the programmers to get the business solution to the right people. The users for example may want to keep track of their financial information; the database, GUI, programming language, etc. are all just technical solutions that help the user access the financial tracking capabilities that they need. From a user's perspective, they are essentially noise that has to be put up with, in order to accomplish their goals. User's don't, and shouldn't care about the technical issues.

Programmers, on the other hand, love solving technical problems which are easier and more straight-forward. As such they try to take those abilities learned from creating technical solutions and throw them at the business problems. Generally to very painful results. Technical problems are consistent, simple and rational. Business problems, at least from the perspective of the programmers are messy, irrational crazy things that make no sense. The type of problems that scare most developers. Trying to wrap the irrational in a simple consistent model is a hopeless task.

Because of its messiness most programmer's prefer working on technical issues. If you need any real proof for that, a quick examination of the OpenSource movement shows that for the most part, the projects -- particularly the successful ones -- have been entirely spawned around and dedicated to technical problems. Volunteer programmers want to tackle easy fun problems, so they are naturally drawn to the technical ones. Why tackle the painful stuff for free?

Not surprisingly, the bulk of the world's programming problems are not technical. They are domain specific, which is why commercial development and consulting are still significant industries, even in the face of so many programmings seemingly giving away their craft. Given the difference between the two types of problems and the obvious lack of appeal for the ugly business code, it is more than likely software development will continue to be a well-paying profession for some time.


A computer programmer is someone who can assemble a complex set of instructions intended to implement some functionality in a software program. A software developer, on the other hand, is someone who can analyse real world issues, isolate the problems and then conceive, design, implement, test and deploy a working solution to help with those problems. The difference is clearly the scope of the work being performed. There are lots of computer programmers, but very few real software developers. You need to master the technology first, and then grow your abilities to start tackling the really hard problems.

The users are the expert at understanding their own problems, but an experience software developer is the expert at turning that into some workable solution. Naturally because domain problems are extremely messy, deep and irrational, years of experience with a specific domain are important. You can't solve problems if you don't really understand them.

While the developers need to understand the domain in which they are working, they are not by definition the experts. They don't have to know it all, or are necessarily functional in it. They really only need to see it from the perspective of the computer software:

- the data, its structure, frequency, quality, etc.
- the processes, their timings, interactions, differences, etc.
- the players, their roles, responsibilities, expectations, etc.

A computer takes a deterministic static view of the data, so all that the software developer really needs to understand is that viewpoint. But, that viewpoint must, to an incredible level of detail, be understand really thoroughly. The success of software rests on its implementation of the details.


The first mistake most developers make is trying to solve the wrong problems. Computers can't change things, they can only act as extension or tool for the users. In that sense, any software development is an automation project, one that is geared towards a specific pile of data. Thinking your shifting their paradigm, rocking their world, or changing the way they work will ultimately fall flat. At very most, someone farther upstream might be able to eliminate some lower-down position, with help from the computer, but at the very end, it all needs to be anchored back to a person, and they, at least in their capacity of using the software, aren't going to be pure management (which is a reporting only position).

So, software problems, are simple automation projects that involve building up piles of data. That leads to a fundamental property:

- data has to be captured or entered into the software.

It sounds silly, but in large projects people often fail to trace the data back to its source, a simple exercise that easily reveals many of the worst detail problems with the foundations. If you trace back the data to its source, you generally have a broader understanding of the domain. Even in a big warehousing project, all of the data starts somewhere, be it an an interface for service people, some type of capture scanning, or some feed from some other database. The other database had to gets its data from somewhere. So in general, most data was entered directly from manual forms on a screen. If you dig deep enough you'll find the real source, and thus the real problems. For each piece of data, you have to know the quality and frequency, that is vital.


Bad assumptions, and misunderstanding data are very common problems that propagate through out development projects. I've speculated that there is essentially one great superstructure underneath in some of my blog entries (Age of Clarity, The Science of Information), but whether or not you believe that, getting a real working proper model of the underlying data is a hugely problematic issue. In many industries, even after thirty years, huge elements of the underlying data are frequently misunderstood. Guessing -- which is the most common approach to understanding data -- is dangerous, particularly if those poor initial pot-shots get permanently enshrined into the schema for reasons of backward compatibility.

So many software programs start off with a slightly-off underlying model, and are never able to truly fix their original problems. Seeing the danger of that, it is important to simplify, but not to miss any obvious chunks. Because of the limits of our current technology, it is extremely hard to 'uncomplicate' a schema once it is in production.

Clearly, the best most reasonable approach is to seek out the users on their own ground and have them explain their problems to you in their own language. This initial 'incomprehensible' babble, is only so, when you don't understand their domain. As such, any communication impedance mismatches, or user complaints, most often point to significant domain issues that you need to understand and resolve. Programmers are quick at dismissing their users as 'stupid and ugly', while software developers know to look deeper. We build for them, and if we don't take the time to see their point of view, that is reflected in our work. With understanding and experience, the things that you initially took for trivial turn out to have significant value.

There are of course, times when the users are just letting off steam, but unless you know that for sure -- not as an assumption -- then it is best to assume there is something tangible, even if it is slight, underneath that is worth considering.


In that data has an underlying structure, that understanding and interaction is rarely what the user is after. For example, we want to interact with our computers in a desktop metaphor, while underneath they need to deal with files, folders and links, and under that: file formats, data structures, bytes and bits.

An incredibly common mistake, particularly when it comes to code generators is to assume that the user perspective, and the data perspective are identical. The data contains inherent ugliness that the computer should shield the user from. In that sense, in a classic system, the database schema will contain a level of detail that is completely uninteresting to the user. Their application summarizes, stores context, and essentially hides away a great deal of that detail. And, the user perspective is irrational, there is no formulaic way to tie it back to the data model. It will be similar, but it will always be very different in irrational ways.

Harmonizing that translation between the user perspective and the true data perspective is another great problem. Overtime people have understood that it was there, but all too often they have attributed it to the wrong causes, like assuming it was physical storage issues, or alternative viewpoints.

I guess for simplification reasons, we would really like to have one single consistent model for all of our data. And having one model that we could magically use to generate all of the code is the next logical conclusions. But, as many programmers know, there are a huge number of transformations happening in any well-written software. Now, we just do the work, and stick it were it is convenient, but the duality between the structure of the data, and the user view, points to an area where we really should be able to separate out the different models in the architecture and formalize (to some degree) the transformations back and forth between these different perspectives.

With further consideration, you can see this as a strong reason why many attempts at re-use fail so badly. The developers, in their quest for one overall model, mix the application-specific structure with the underlying real structure tying the resulting code absolutely to the one and only one application for which it mostly matches.

Beyond the technical problems of implementation, this has a big impact on analysis. The analyst needs to work with the users to see their perspective, but they also need to get down to the real underlying data perspective as well, something that the users may or may not be aware of. The user champions, then are often only really experts at half of the details. The analyst still needs to find and understand the other half. That is a huge problem, but manageable if you are aware of its existence.


So, great you've successfully gone out and found a hundred or so really important 'functions' that would make your user's lives easier as they accumulate their growing pile of data. Randomly slapping these 'things' into a GUI is a bad idea. Sure it is easy to just extend the system by attaching a 'bag' onto the side of it with some disconnected functionality, but that is the type of analysis and design that makes all of us hate those 'big company' programmers.

If you've really come to understand your user's needs, then you should understand how to integrate that back into the existing solution without just gluing on more disconnected features. It is this consolidation of the feature set, that is often extremely difficult. But finding a way to accomplish this, not only makes for a better tool, it also significantly cuts down on testing. E.g. testing one slightly enhanced application is less work than testing two disjoint ones (although most companies fail to recognize this, and only increase their testing resources by a fraction of what is really needed, so they get two apps tested to a lessor degree than the original one).

Mostly, we design our tools to have some consistent overall metaphor, or style of working. Extending that, while maintaining its consistency is crucial. At times, the right answer may to actually throw away all of the old perspective and go up to the next level for a totally new one. Keeping everything simple and consistent, is extremely hard, and extremely dependent on the underlying domain and tool set.

Backward compatibility is important to some degree, but that should not be used as an excuse to not properly integrate functionality into an application. The tools should get easier to use as the coverage of the functionality grows, something that is clearly not happening with a lot of modern software.

Consolidation is as hard as the initial analysis, and is considerably more important. We've reached the stage were we can bang out some pretty sophisticated software products, but we have not gone over to being able to take a large amount of functionality and make it useful. This is obviously somewhere significant where Computer Science needs to grow up.


Bad, poor, or missing analysis is the start of a doomed project. You can't sit in a cubicle, miles away from your users and expect to be able to write useful software for them. Where you succeed with this approach, it is only and absolutely luck, and nothing else. It is easy to see why so much of our software is so defective.

In five years of constant programming, most people who stay at it will master to some degree the ability of structuring sets of instructions for the computer to execute. These people are computer programmers, and mostly should get some help from others to determine what the instruction set should accomplish.

Our industry standard is to recognize a five year career as being a senior, but while that might be true of someone attempting to master implementing specific functionality into a particular technology, that isn't even remotely true of someone trying to learn how to analyse a problem domain and use that information to build tools. There is far more beyond just belting out sets of instruction that is required to build things that are usable.

The secret then, if there is one to be known and understood, is to master the various techniques and technologies of software development, but never ever, even for a moment 'assume' that you have completely mastered the problem domain issues. Well, once you have re-written the same system three times, you're probably on your way towards mastering that specific issue, but jump problems, domains or industries and wam! your back to ground zero again. We have to be very careful to understand the difference between what we know, and what we assume we know. Building a simple system to assemble a big pile of data, does not make us really understand how that data works in the world around us.

If you can accept this, then you can prepare for a significant number of serious problems in the development leaking in from your lack of knowledge of the problem domain. Once you are past your own ego, you are a little more likely to accept the problems and their irrational nature for what they really are: the things that your software needs to solve to become better. Building software is easy, knowing what to build is the hard problem. All too often our industry wants to make the former sound harder and ignore the latter.

From scratch or extending a system, an experienced software developer makes a huge difference. Luck is the only thing that replaces a lack of analysis. The software industry's high failure rate comes from its inability to recognize the important of hiring developers with senior analytical skills to properly lead development projects. Analysis is a significant skill set. It is unique and not related to writing code. It is not just taking notes from user conversations and building up a list of requirements. It is not writing down stories, or guessing at how people might use things. It is a full in-depth understanding of the user's needs and the underlying data, and how to tie these two disparate things together into tool that is usable.

Even after nearly twenty years and many different software development domains, I am still often surprised by the 'depth' of complexity that some domain specific information is hiding. That is to say, that with each new problem I attempt to analyse, it is far deeper, far more complex and far more irrational than I'd ever really expect, and I've dealt with some pretty deep, complex and irrational problems already. But, I have certainly seen, again and again, that if you get into the user's perspective and build them a tool that really works, while capturing (but hiding) the true complexity of the data, you've come very close to really mastering software development.

If you get the analysis correct, then all you have to do it build it :-)