Saturday, September 3, 2016

Meta-methodology

How we go about building complex software is often called ‘methodology’.

To be complete, a methodology should cover every step of the process from gathering all necessary information about the problem of deploying and operating the solution in a production environment.

What has become obvious over the years is that for the full range of software development there is no one consistent methodology that will work effectively for every possible project. The resources, scale, and environments that drive software development to impose enough external constraints that each methodology needs to be explicitly tailored in order to remain effective.

Sometimes this lack of a specific answer is used as reasoning to not have any formalities with the development, but for any non-trivial project, the resulting disorganization from this choice is detrimental to the success and quality of the effort. A methodology is so necessary that a bad one is often better than nothing.

At a higher level, however, there are certainly many known attributes about software understanding that we have learned over the last half-century. What's needed to ensure success in development projects is to extract and apply these lessons in ways that contribute to the effort, rather than harm it. Thus, if we can’t specify the perfect methodology, we can certainly specify a meta-methodology that ensures that what happens in practice is as good as the circumstances allow.

The first and most import property of a methodology is that it is always changing. That is it needs to keep up with the overall changes in the resources and environment. That doesn’t mean changes are arbitrary; they need to be driven by feedback and sensitivity for any side-effects. Some parts of the methodology should actually be as static as possible, they need to be near constants throughout the chaos or work will not progress. A constantly shifting landscape is not a suitable foundation for building on. Still, as the development ebbs and flows, the methodology needs to stay in sync.

To keep any methodology in focus, it needs to be the responsibility of a single individual. Not someone from the outside, since they would only have a shallow view of the details, but rather the main leading technologist. The lead software developer. They are the key person on the development side whose responsibility is to ensure that the parts of the project get completed. That makes sense given that their role is really to push all of the work through to completion, so how that work gets done is a huge part of their responsibilities. Rather obviously that implies that they have significant prior experience in the full breadth of the software process, not just coding. If they only have limited experience in part of the effort, that is where they will incorrectly focus their attention. If only part of the development process is working, then overall the whole process is not.

This does tie the success of the project to the lead developer, but that has usually been the case, whether or not people have been willing to admit it. Projects without strong technical leadership frequently go off the rails, mostly by just endlessly spinning in circles. Expecting good leadership from the domain side is risky because they most often have expertise in any or everything but software development, so they too tend to focus on what they understand, not the rapidly accumulating problems.

For very large scale development, a single leader will not suffice. In that case, though, there should be a hierarchy of sub-leaders, with clear delineations between their responsibilities. That’s necessary to avoid competition and politics, both of which inject external complexity into the overall process. When leadership is spending too much effort on external issues, it has little time to correct or improve internal ones. At the top, if this hierarchy, the overall picture still falls to a single individual.

Any usable methodology for software addresses all five different, but necessary, stages: analysis, design, programming, testing, and operations. Each of these stages has its own issues and challenges. To solve a non-trivial problem, we need to go out into the world and understand as much of it as possible in an organized manner. Then we need to bring that knowledge back, mix it with underlying technologies and set some overall encapsulating structure so that it can be built. All of that work needs to be coded in a relatively clean and readable manner, but that work also requires significant editing passes to be able to fit nicely into any existing or new efforts. Once it's all built, it is necessary to ensure that it is working as expected, both for the users and for its intended operating environment. If it is ready to go, then it needs to be deployed, and any subsequent problems need to be fed back into the earlier stages. All of this required work remains constant for any given software solution, but each stage has a very different perspective on what is being done.

Most problems in the quality or the stability of the final running software come from process problems that occurred earlier. An all too frequent issue in modern development is for the programmers to be implicitly, but not directly responsible for the other stages. Thus major bugs appear because the software wasn’t tested properly; because the programmers who set the tests were too focused on the simple cases that they understand and not on the full range of possibilities.

In some projects, analysis and design are sub-tasked to the programmers, in essence, to make their jobs more interesting, but the results are significant gaps or overlaps in the final work, as well as lack of overall coherent organization.

The all too common scope creep is either a failure to properly do analysis or a by-product of the project direction wobbling too frequently.

Overall stability issues are frequently failures in design to properly encompass the reality of operations. They skip or mishandle issues like error handling. Ugly interfaces or obtuse functionality come directly from design failures, such that the prerequisite skills to prevent them were not available or believed necessary. Ugliness is often compounded by inconsistencies caused by lack of focus; too many people involved.

Following these examples, we can frame any and all deficiencies in the final product as breakdowns in the process of development. This is useful because it avoids just setting the blame on individuals. Most often if a person on the project is producing substandard work it is because the process has not properly guided them onto a useful path. This property is one of the key reasons why any methodology will need to continuously tweaked. As the staff change, they will need more or less guidance to get their work correct. A battle-hardened team of programmers needs considerably less analysis and specifications than a team of juniors. Their experience tends to focus them on the right issues.

Still, there are always rogue employees that don’t or can’t work well with others, so it is crucial to be able to move them out of the project swiftly. Responsibility for evaluating and quickly fixing these types of personality issues falls directly on the technical lead. They need full authority over who's involved in the work at most stages (operations is usually the exception to this rule) and who is no longer part of the project.

All of this sets a rather heavy burden on the technical lead. That is really unavoidable, but the lead’s still subservient to the direction of the domain experts and funding so while they can modify the methodology to restructure the priorities, they can’t necessarily alter the overall scope of the work. They can’t run off and build something completely different, and if they end up not meeting at least the basic requirements the project should be deemed a failure and they should be removed. Most times this is both what the different types of stakeholders want and what they need.

Sometimes, however, what the users need is not what the main stakeholders want. In those situations tying the responsibility for the system entirely to the lead developer is actually a good thing. Their strengths in doing their job come from being able to navigate these political minefields in order to get the best possible result for the users. Without at least a chance of moving this dial, the project is ultimately bound for disaster. With the responsibilities defined properly, at least the odds are better. And if the project does fail, at least we know who to blame, and what skills they were missing.

There is currently a huge range of known static methodologies. The heavy-weight ones follow the waterfall approach, while the lighter ones loosely are called agile. For any project, the static adoption of any one of these is likely as bad as any other for reasons previously mentioned. So the most reasonable approach is to pick and choose the best pieces or qualities. This may seem like a good way to get a mess, but this should really only be the starting point. As the project progresses, the understanding of its problems should be applied as fixes to the methodology and this should be ongoing throughout the whole life of the project.

In practice, however, most gnarled veterans of software have experienced at least one decent, mostly working methodology in the past, so it's rather obvious that they start with that and then improve upon it. Rather obviously, for qualifications to lead a big development project, a lot of interest should be shown about the methodology they intend to follow, and less about the specifics of their past coding, design, analysis, testing and operation experiences, but clearly these are all tied together.

As for the pieces, software development is too subjective to trends. We forget the past too quickly and seem to keep going around having to relearn the same lessons over and over again. Good leadership has risen above this, so the right qualities for a methodology are not what is popular, but rather what has been shown to really work. For example, it is quite popular to say bad things about waterfall, but the justifications for this are not soundly based. Not all waterfall projects failed, and those that did frequently did so because of a lack of leadership, not the methodology. It does take time for waterfall projects to complete, but they also have a much better long-term perspective on the work and when run well can be considerably more effective and often more consistent. It’s not that we should return entirely back to these sorts of methodologies, but rather that some of them did have some excellent properties and these should be utilized if needed. 

At the other end of the spectrum, many of the lighter approaches seem to embrace chaos by trying to become ultra-reactive. That might fix the time issue and prevent them from steering away from what the stakeholders want, but it comes at the cost of horrendous technical debt, which sets a very short shelf life on the results.

A good methodology would then obviously find some path between these extremes, but be weighted on one side or the other because of the available resources and environment. Thus, it would likely have variable length iterations, but could even pipeline different parts of the work through the stages at different speeds. Some deep core work might be slow and closer to waterfall, while some functionality at the fringes might be as agile as possible. The methodology would encompass both of these efforts.

Because of the past, many people think that methodologies are vast tomes. That to be a methodology, everything has to be written down in the longest and most precise of detail. For a huge development effort that might be true, but for smaller scales what needs to be written is only what will be quickly forgotten about or abused. That is, the documentation of any methodology is only necessary to prevent it from not being followed. If everyone involved has remembered the rules, then the documents are redundant. And if each time there are changes, the rules can change too, then the documentation will also be redundant. As such, a small tiger team of experts might have exactly zero percent of their methodology on paper, and there isn’t anything wrong with that if they are consistently following it.

There are occurrences however whether outsiders need to vet and approve the methodology for regulatory or contractual reasons. That’s fine, but since parts of the methodology change, the dynamic parts need to be minimally documented in order to avoid them becoming static or out-of-date.

Another reason for documentation is to bring new resources up to speed faster. That is more often the case for a new project that is growing rapidly. At some point, however, in the later stages of life, that type of effort exceeds its value.

From this it is clear that methodologies should include the intercommunication between the different people and stages of the project. All of this interim work eventually influences how the final code is produced and it also influences how that code is checked for correctness. Some of the older heavyweight methodologies focused too intensely on these issues, but they are important because software really is a sum of these efforts. Thus, for example, the structure and layout of any analysis does make a direct difference to the final quality of the work, but it also can help show that some areas are incomplete and they need further analysis. The analysts then in a large project should be laying out their results for the convenience of the designers, testers and the lead developer. They may need to confirm their work with domain users, but the work itself is targeted to other stages.

Communication of the types of complex information needed to build non-trivial systems is a tricky issue. If every detail is precisely laid out in absolute terms, the work involved will be staggering and ironically the programmers will just need to write a program to read the specifications and then generate the code. That is completely hopeless in practice. The programmers are the specialists in being pedantic enough to please their computers, so most other people involved are going to be somewhat vague and sometimes irrational. The job of programming is to find a way to map between these two worlds. Still, that sort of mapping involves deep knowledge, and that type of knowledge takes decades to acquire, so most of the time the programmers have a bit of the ability, but not all of it. A good methodology then ensures that for each individual producing code they have everything they need to augment their own knowledge in order to successfully complete the work. Obviously, that is very different for each and every programmer, so the most effective methodology gives everybody what they need, but doesn’t waste resources by giving too much.

Then the higher level specifications are resolved only to the depth required by specific individuals. That might seem impossible, but really it means that multiple people in the development have to extend the depth of any specifications at different times. That is, the architects need to produce a high-level structuring for the system that goes to the senior developers. They either do the work, or they add some more depth and pass it down to the intermediates, who follow suit. By the time it arrives on a junior’s desk, it is deeply specified. If a senior does the work, they’ll just mentally fill in the blanks and get it done. This trickle-down approach prevents everything from being fully specified and resources wasted but does not leave less experienced people flailing at their workload. It also means that from a design perspective, people can focus just on the big issues without getting bogged down in too many details. All of the different individuals get the input they need, only when they really need it and the upper-level expertise is more evenly distributed across the full effort.

There are many more issues to be discussed with respect to methodology, but I think pushing them upwards to be properties or qualities of a meta-methodology is a viable way to proceed. It avoids the obvious problem with one approach not fitting for different circumstances, while still being able to extract out the best of the knowledge acquired in practice. We still have a long way to go before most software development produces results that we can really rely upon, but our societies are moving too quickly into dependence now. At this early stage, ‘software eating the world’ might seem to be an improvement, but we have yet to see the full and exact costs of our choices. Better than waiting, it would be wiser to advance our knowledge of development up to the point where we can actually rely on the results. That might take some of the mystery out of the work, but hopefully, it will also remove lots of stress and disappointment as well.