Friday, December 4, 2015

Requirements and Specifications

Programmers often complain about scope creep. The underlying cause is likely that their project has been caught up in an endless cycle of requirements and specifications that bounce all over the place, which is extraordinarily expensive and frustrating for everyone.

The inputs to programming are design specifications, which are created from requirements gathered during analysis. There are many other ways to describe these two stages and their outputs, but ultimately they all boil down to the same underlying information. Failures in one or both of these earlier stages is really obvious during the coding stage. If the details aren’t ever locked down, and everything is interconnected, then frequent erratic changes means that a lot of work gets wasted. In that sense, scope creep isn’t a really programming problem, but rather a process one.

Quite obviously, scope creep wouldn’t happen if the specification for the system was 100%. The programmers would just code exactly what is needed -- once -- and then proceed to polish the work with testing. The irony is that the work of specifying a system to 100% is actually the work of writing the system itself. That is, if you make the effort to insure that no detail was vague or left unspecified, then you could write another program to turn that specification directly into the required program.

A slight variation on this idea was actually floated a long time ago by Jack W Reeves in “What is Software Design?” but never went mainstream:

Of course, time and a growing understanding of the solution generally mean that any original ideas for a piece of software will always require some fiddling. But it is obviously cheaper and far more efficient to work out these changes on a smaller scale first -- on paper -- before moving on to committing to the slow, detailed work of writing and testing the code. Thus, it is a very good practice to create a short, high-level specifications to re-arrange the details, long before slogging through the real effort.

As mentioned, a good specification is the by-product of the two earlier stages, analysis and design. The first stage is the collection of all details that are necessary to solve the problem. The second is to mix that analysis with technology and operational requirements, in order to flesh out an architecture that organizes the details. Scope creep is most often caused by a failure of analysis. The details were never collected, or they weren’t vetted, or an aspect of the domain that is dynamic was treated statically. Any one of these three problems will result in significant changes, and any one of them in a disorganized environment will set off a change cyclone.

There is also a rather interesting fourth problem: the details were collected, but not in a way that was stable.

The traditional approach to requirements is to craft a sentence that explains some need for the user. By definition, this expresses a winding ‘path’ through the underlying data and computations while often being targeted at only a special case. People find it easier to isolate their needs this way, but it is actually problematic. If the specification for a large system is composed of a really large collection of these path-based requirements, then it resembles a sort of cross-hatched attempt to fill in the details of the system, not unlike the scratchings of a toddler in a coloring book. But the details are really a ‘territory’, in that it is a finite set of data, broken down by modelled entities, with higher level functionality coming from computations and derived data. It is also packed with some navigational aids and a visual aesthetic.

A good system is complete, in the sense that it manages all of the data correctly and provides a complete set of tools to do any sort of display or manipulation necessary. It is a bounded territory that needs to be filled in. Nicely. Describing this with an erratic set of cross-hatched paths is obviously confusing, and prone to error. If the programmers fixate on the wrong subset of paths, necessary parts of the system fall through the cracks. Then when they are noticed, things have to change to fill those gaps. Overlaps likewise cause problems in driving the creation of redundancies which eventually lose synchronization with each other.

A very simple example of this confusion happened a while back when a user told an analyst that he needed to ‘add’ some new data. The analyst took that path ‘literally’ and set it down as a requirement, which he turned into a screen specification. The programmer took that screen literally and set it down as code. A little time passed and the user made a typo, that he only noticed after he had already saved the data. He went to edit the data, but… The system could ‘add’ new data, however it lacked any ability to ‘edit’ it, or ‘delete’ it, because these were not explicitly specified by the user. That’s pretty silly because what the user meant by ‘add’ was really ‘manage’ and that implies that the three bits of functionality: add, edit and delete are all available. They are a ‘unit’, they only make sense together.

If instead of focusing on the literalness of the user, the analyst understood that the system itself was going to be the master repository for this newly collected entity then it would have been more than obvious what functionality was necessary. The work to create the requirements and the screen where superfluous and already well-defined by the existing territorial boundaries (the screen didn’t even match the existing interface conventions). A single new requirement to properly manage a new data entity was all that should should have been necessary. Nothing more. The specification would then be completely derived from this and the existing conventions, either explicitly by an interface designer or implicitly by the programmer who would need to look up the current screen conventions in the code (and hopefully reuse most of it).

It is important to understand that territorial requirements are a lot less work, as well as being less vague. You need only list out the data, the computations, and for the interface: the navigation. In some cases you might also have to list out hard outputs like specific reports (because there is limited flexibility in how they appear or their digital formats). With this information and performance and operational requirements, the designers can go about finding efficient ways to organize and layout the details for the programmers.

While the boundary described by the requirements needs to be reasonably close to 100% (although it can be abstract), the actual depth of the specifications are entirely dependent on the abilities of the programming teams. Younger, less experienced programmers, need more depth in the specifications to prevent them from going rogue. Battle-scarred seniors might only need the requirements themselves. Keep in mind that unwanted ‘creativity’ makes for a hideous interfaces, convoluted navigation and brutal operational problems, as well as being a huge resource drain. A programmer that creates a whole new unique sub-system within an existing one is really just pouring fuel on the fire. It will only anony the users and waste resources, even if it initially looks like it will be faster to code. The resulting disorganization is deadly, so it's best to not let it take hold. A programmer that goes rogue when there is an existing specification is far easier to manage then if there is nothing. Thus specifications are often vital to keep a group of programmers working nicely together. To keep them all building one integrated system, instead of just a pile of disconnect code.

The two initial stages can be described in many different ways, but they are best understood as decomposition and recomposition. That is, analysis is decomposing the problem into its underlying details. The most efficient way of doing this ensures that the parts of the territory are not overlapping, or just expressing the same things in different ways. Recomposition is the opposite. All of the pieces are put back together again, as a design, that ensures that the minimal amount of effort is needed to organize and complete the work. Stated that way, it is obvious that effective designs will heavily leverage reuse because it will takes the least amount of overall work. Massive redundancies introduced via brute force will prevent entanglement but they do it by trading them for significant future problems. For any non-trivial system, that rapidly becomes the dominant roadblock.

An unfortunate cultural problem in programming is to continually push all decisions back to the users. Many programmers feel that it is not their responsibility to interpret or correct the incoming requirements and specifications. Somehow the idea that the user champions can correctly visualize the elements of a large system has become rooted. They certainly do know what they want the program to do, but they know this as a massive collection of different, independent path requirements, and often that set in their head isn’t fully resolved or complete and might even be contradictory. Solutions are indeed built for the users, but the work needs to progress reasonably. Building off a territory means the development teams can appropriately arrange the pieces to get constructed in the most efficient manner. Building off a stream of paths, most often means that each is handled independent, at a huge work multiplier. And no organization can get applied.

In that sense, giving control of the development process directly to a user champion will not result in anything close to an efficient use of resources, rather the incoming chaos percolates throughout the code base. There might be some rare champion that does have the abilities to visualize the territorial aspects of the requirements, but even then the specifications still need to be created.

Analysis and design are different, although related, skill sets that need to exist and can likely be independently measured. For example, if there is significant scope creep, the analysis is failing. If there are plenty of integration problems, it is the specification. The first is that the necessary details were never known, while the second is that they were never organized well enough that the independent tasks were synchronized. In fact, categorizing bugs and using them to identify and fix overall process problems is the best way to capitalize on testing and operational feedback. The code needs to be fixed, but the process is often weak as well.

In a very real sense, it is entirely possible walk backwards from a bug, to the code, to the specifications and then to the requirements, to see if the flow of work has serious problems. There will always be minor hiccups, but in really badly run projects you see rather consistent patterns, such as massive redundancies. These can always be unwound by insuring that the different stages of development fit together properly. Specifications, or the lack of them, sit in the middle, so they provide an early indicator of success.

Its also worth noting that some projects are small enough or straightforward enough that they don’t really need to actuate the specifications. The requirements should be recorded, but more as a means for knowing the the direction that is driven by the users. If organization exists and the new requirements are just filling in familiar territory, then the code itself is enough to specify the next round of work. That’s why it is not uncommon on medium sized programs to see senior developers jump straight from conversations with the users to actual working code. Every detail that is necessary is already well-known, so given the lack of resources, the documentation portion is never done. That does work well when the developer is an organized, methodical person, and if they are ever replaced it is by someone that can actually read code (the only existing specification), but it fails really badly if those two necessary conditions don’t exist. Some code handovers go smoothly, some are major disasters.

Sometimes people use shifting territories as a reason to avoid analysis and specification. That is, because the territory itself isn’t even locked down, then everything should be ad hoc, or experimental. This is most common with startups that are likely to pivot, at some point. The fallacy with this, is that the pivots most often do not shift entirely away from the starting territory. That is, a significant aspect of the business changed, but not the technologies nor the required base infrastructure. And in most cases the shift itself doesn’t remove data entities, it just adds news ones that are higher in priority. So, a great deal of the technical base is still intact. If it wasn’t, then the only rational thing to do would be to drop 100% of the previous effort, but that necessity is actually quite rare. In that sense, territories expand and contract throughout the life of any development. Seeing and adjusting to that is important in effectively using the available resources, but it is an issue that is independent of analysis and design. No matter how the territories change, they still need to be decomposed, organized and then recomposed in order to move forward. The work is always constant, even if it is sporadically spread accross the effort.

Building anything practical is always a byproduct of some form of analysis and design. As the scale of the attempt increases, the need for rigour and organization become increasingly bound to quality. If we set our sights on creating sophisticated software solutions that really make life easy for everyone, we need to understand how to properly set up these prerequisites to insure that this happens as desired.