Monday, October 29, 2007

Repeated Expressions

I'll apologize in advance. This blog entry gets a little weird, but it is the type of conceptual weirdness that comes directly from the underlying problems.

You may have to squint your eyes a bit to get a really fuzzy abstract view of reality in order to understand the basis, but I think that you'll find it is worth the effort.

Jumping right in, we start with the notion that users create piles of information that they beat on with their software tools.

That's not too hard to handle, but if you dig a bit deeper into the idea of how to "express" the tools themselves things can get interesting.

In a very abstract sense, a software design specification for a system defines a bunch of functionality operating on a set of data. Generally it is very imprecise, but it is a way of actually "expressing" a solution in one or more problem domains. The specification, which consists of various design documents with specific syntax represents a "language" capable of defining the capabilities. While it doesn't run directly, and it is generally very sloppy, it does define the absolute boundaries for the problem.

The code itself, once it is created is another way to express the solution to the same problem. It is far more specific and needs to be more correct, although rarely does it actually need to be perfect. Different versions of the system, as they progress over time are different ways of expressing a similar solution. Generally, for each iteration the depth of solution improves, but all of the solutions implemented were acceptable.

Testing -- even though in a philosophical way it is the reverse -- bounds the same problem domain, so it to is yet another way of expressing the same solution. In pencil drawing courses, they often do exercises with "negative space" by drawing the shapes where the subject is not located. Specifying testing is similar; you bounding the problem by the things that it should not do. Testing occurs at many different levels often providing a great degree of overlap in the expressiveness of the solution. Generally layer upon layer of tests are applied, regardless of how effectively they work.

In an ideal world, for a perfectly defined system the specifications, code and tests are all just different, but complete ways of expressing the same solution.

Given that hard to fathom perspective, we essentially solve the same problem at least three times during development, in three different ways. This gives rise to various speculations about what work we really don't need.

For instance, Jack Reeves felt that the code was the design not the specification. That perspective could lead to spending less effort on the initial design; correcting the problems only at the coding level.

Newer practices such as test driven development (TDD) imply that the testing could be used as the design. You layout the functionality by matching it back towards the tests. These are interesting ideas indirectly arising from the repetitive nature of development.

One easily senses that if could we only expressed the design twice instead of three times, we would be saving ourselves labor. If we only expressed it once, that would be amazing. It would seem as if we might be able to save ourselves up to two thirds of the work.

It is an intriguing idea, but highly unlikely for any thing over a medium sized project.

Some individuals can get away with solidifying their designs initially entirely within their consciousness. We know this. There are many programmers out there that for medium to small systems don't need to write it down on paper. Interesting, but we need to remember that such instantiation is just another valid form of expressing the solution, even if the original programmer cannot necessarily share that internal knowledge and understanding.

Even the small systems need at least two distinct versions to exist.

For large projects, the work needs to be distributed over a big group of people so it needs to be documented in a way that is sharable. Without that, all of the developers will go their own way which will always create a mess. As well, all of the testers will make their own assumptions which again will leave huge gaps in the quality of the testing.

Without a common picture chaos will rule and the project will fail.

Even though it sounds like a waste, specifying the same design in three different ways actually helps to insure that the final output is more likely to be correct. In the same way that you could enhance the quality of data entry by having three typists input the same file, iterating out the problem as a design, code and tests gives three distinct answers which can all be used to find the most likely one.

Even though we probably can't reduce the number of times we express the solution, understanding that we are repeating ourselves can lead to more effective ways of handing it.

Ok, so it is not all that weird, but it does put a different perspective on the development tasks.

In the end, we are left with a pile of data, some functionality that needs to be applied to that data and a number of different ways in which we have to express this in order to insure that the final work meets its initial constraints. We put this all together, expressing it in various ways until in the end it becomes a concrete representation of our near idea solution to the specific set of given problems.

Call it what you will, it all comes down to the various ways in which we express our solutions to our problems.