Tuesday, January 20, 2015

Scale and Organization

Scale is a fundamental factor within organization. Let's start with a simple example.

Say you have 3 books. If you wanted to organize them, all you need to do is put them on a shelf. Any order is fine, since it doesn't take long for your eyes to glance over the spines, you can find the book you want.

Now if that set of books grows to 30, the shelf probably still works, but you may consider reordering the books based on titles or author or even perhaps some simple category system. 30 books is a lot to scan each time when you need so something, so to save time finding what you want, you might try ordering by title.

If the number of books grows again to 300, then a whole new set of problems is introduced. You have to abandon the shelf for a bookcase. It's a less convenient piece of furniture that essentially takes up a full space in a room, so it's location is probably not as handy as the shelf. Order also becomes more significant. You may have chosen a lexical order on the titles originally, but now with a full bookcase you find that you really want books by the same author to be together. With 300 entries, you might have several authors with multiple titles. Still, a sort by author makes it harder to find a specific title, so perhaps you prepare a little list that cross-references the titles with their appropriate shelf in the bookcase. It's a looser indexing scheme, but it saves time later and doesn't take that long to prepare.

But alas, the number of books keeps growing and now we get to 3,000. There are now 10 bookcases, and because of that some of them no longer fit in the same office. There are several locations to check and way to much work to keep the secondary paper index up-to-date. Instead, you recategorgize the books based on topics, and you assign a small number of topics to each bookcase. Within the case you can sort by author or return to the original title order. That suffices for the moment.

But still it grows and soon enough the number hits 30,000 books. With 100 bookcases the collection now requires its own 'library' and although the books are categorized, there are so many of them that you need multiple indexing methods. The bookshelves have a special code, and you have indices on title, author and a few major sub-categories. You employ a full-time librarian to keep the categorization up to date and to constantly roam the bookcases reordering and putting out-of-place books back where they belong.

The pace continues, so the number of books reaches 300,000. They've outgrown the original library, there are now three separate locations, each of which specializes in specific sub-categories. Within each, there are two full time librarians, and the work to keep it all tidy and in order has grown considerably. Finding any specific book is tougher because there are three separate indexing schemes, so there is a global project to consolidate those into three copies of a single larger index. That work, plus the advancements in categorization, since there are now a much larger number of subcategories and subsubcategories, are keeping everyone busy.

Now for this example, one of the key features is that the growth of books is exponential. It keeps increasing by x10. For each one of those up-ticks, the previous organization basically failed and an entirely new scheme was necessary. Also at each tick the amount of work changed radically. It started out as a personal collection and finished as a team of 6 people. Exponential growth does this quite quickly, but any sort of growth can approach the next tick, it just takes time. The key to organization is that growth changes it. The larger things get, the more effort required to keep it organized; to keep it usable. Growth inherently increases disorganization.

This same pattern is true in software development as well. As the code grows, the organization of it needs to keep pace with its new size. A small project needs very little organization while a massive project the organization is a full time effort in itself. For everything in between there are discrete ticks that have very different organizational requirements. The best way to tell if things are organized is similar to knowing if the books are organized. You examine how long it takes to find something specific given a number of different scenarios. For books you might look at how long it takes to find a specific book or to be able to browse a subcategory. For code, it's how long it takes to identify the code that produces a specific behaviour, which coincidently is the same as being able to find and fix a bug (the behaviour is just unwanted in the latter case). If the time has become excessive, then new means and/or layers of organization are necessary to insure that it doesn't become worse. In an organizational sense, if it takes a good programmer a month to identify the code behind a given bug than either the system is massive, it's overly complex or it's poorly organized. Most often it is the last case.

In software, the problem also applies to the data. It requires it's own organization although often that can be automatically indexed via the code and creating proper tools for the data administration. As the data grows, new ways are found to explore it so a never ending new set of indices is necessary as it grows. It is not enough to just lock it into some static structure, create one ordering and forget about it. Handling growth requires responding dynamically to each new level of scale. This ongoing problem, and the increasing ease in which we can collect data remain great challenges for software development and operations.

Organization is the underlying key to making many things 'usable'. A large collection of books has less value if you struggle to find the book you need. The problem inherent in building up a collection is more than just getting new books, rather it is keeping them organized in a fashion were they retain their value. That this problem occurs twice in software should really should be of no surprise. Code and data form the twin axes on which software operates. Each has its own unique set of problems, thus the organizational issues differ, but span over both. In a rather abstract way, organizing things is the crucial science that underpins our intelligence. We are smart not because of what we might know, but rather because of our ability to apply that knowledge in the world around us.

Sunday, January 11, 2015

Reactive vs. Proactive

An increasingly common way to build software is in response to users bringing in their current problems to the developers. This user-driven approach is believed by many to insure that what is being built both matches the users needs and prevents it from heading off into potentially unsuitable directions. The system gets built step-by-step as a direct 'reaction' to the users. Since by definition most users need their current problems solved right away, time is usually the single most critical issue.

Reactive development approaches have been popular for decades, mostly as an alternative to the failings of big slow long-running projects. When the scope of a project bloats heavily, the various forces involved can accidently send it off on an unreasonable course. Because the time scales were so long, any misdirection could take a long time to detect and then cost a lot to correct. The reactive idea was that a much larger number of smaller changes driven 'directly' by the users would insure that the users get exactly what they specified. In a very real sense, that's what reactive development achieves.

The problem with reactive development is that most of the control over the development is now outside of the software development team. For a simple bit of code with little architectural needs, this user-driven approach has a good chance of succeeding. A fairly verbose user with a good vision on how to really solve their own problems can articulate the interface and data required, leaving the programmers to fill in the blanks. This works so long as the bulk of the programming is primarily business related. It is however, a slow process and the resulting code is disorganized and redundant.

Generally experts in a specific user domain frame their thinking relative to their own knowledge, so they are most unlikely to see problem decomposition in the same manner that programmers have learn to as they developed their skills. Better decompositions in programming tend towards abstraction and generalization with fewer special cases, while most domain experts tend towards the opposite. They learn to be specific and focus only on one case at a time. This alternative perspective does not fit well within software's mathematical foundations, so the inevitable result is a signifcant increase in the artifical complexity of the code, driven directly by the user's specifications.

This type of decomposition problem is hardly noticable in small or even some medium sized systems, but as scope increases it begins to dominate the technical debt.

On top of that, because of the scheduling, the fastest approach to adding new features is to tack them onto the outside of the existing code. This avoids the extra step of having to understand what is already there. Continued use of this approach means that the code base loses any and all upper-level organization, becoming an increasingly large ball of mud. Constantly reacting to user needs also kills any ability to re-organize -- refactor -- the code, so once this type of development approach gets set in motion, there is generally no turning back.

At some point, if the system keeps growing in this manner, it becomes large enough and complex enough to cross a threshold where the redundency, lack of architecture, time pressures and inconsisent problem decopositions drive down the quailty so far that more time is spent patching the mess than is spent on adding new features. This is the reactive version of a death march, where the development team just marches around in circles until somebody finally pulls the plug.

Writing code that doesn't solve user problems is by definition a waste of time, but assembling odd bits of code in a user-driven manner isn't actually better. Reacting to stuff is essentially the opposite of 'engineering'. The later seeks to construct something that behaves precisely according to the builder's understanding. The former just randomly assembles stuff driven by an outside force. It lacks organization, thought and often it's full range of behaviour is undefined.

Users, by definition, are rarely engineers so they won't choose to focus on solving the necessary engineering problems that come up constantly in large development projects. They just ignore them and focus on the problems they can solve. But the solutions they need, also need to be encased in a properly engineered system. Both parts of the puzzle are absolutely necessary to avoid creating bad systems. Users are the most import source for domain specific requirements, but that's where their expertise both begins and ends. Software developers are the experts in the technical domain which includes both the technical programming and the process used to develop the system. They should known how to solve technical problems and they should also understand how to arrange large amounts of work to be completed in an effective manner. Users can't help with either problem.

Reactive approaches aren't the only way to build software, there are plenty of other ways. One that is particularly effective is to actively seek out 'solvable' user problems. In this circumstance, the technology is well-understood first, the developers are just looking for ways to apply it to help the users in their roles. Since this sort of development is driven both by the capabilities of the technology and the needs of the users it has an increased likelihood of better matching the technology to the issues.

Being proactive means that there is considerable work done first to establish a base for handling the user issues. The initial code doesn't solve problems, rather it sets up the organization necessary to be able to do that in the future. It is not unlike having to lay a foundation first in a building, so that the apartment units can be built on to something reliable later. That 'pay now' and 'receive a benefit later' quality scares a lot of people with lingering memories of the defective waterfall projects, but in this case it is very different. The old waterfall projects aimed to complete the entire system in one massive development cycle. A proactive approach on the other hand aims to construct usable Lego-like bits of technology first, so that they can be employed quickly later. Its focus is on setting the stage for reuse, without committing to a final direction.

A way that I've handled this in the past was to build up a strong base platform that deals with the necessary system requirements, such as data persistence, locking, caching, users, etc. On top of this I've added in a domain specific language (DSL) to allow the users to fine tune their own domain or business logic. The DSL essentially runs in a sandbox, so that whatever the users do, good or bad, cannot interfere with how the bulk of the system operates. This then separates out the purely technical problems from the domain ones and insures that they don't co-mingle later in unpredictable ways. The downside is that for a very long time the system is under development, but from a user's perspective it does absolutely nothing. The upside is that as the project proceeds, instead of slowing it down, it starts to speed up. Once the foundations get established, the user functionality flows quickly and if the architecture is smart it becomes increasingly easier for the users to reconfigure their logic to meet unexpected changes.

This approach can be taken very far down that road. One goal I've had in the past is to minimize the total amount of code necessary for any interface screen. Most interface code is hugely redundant. Reusing it over and over again saves a massive amount of time, but it also helps to achieve consistency within the interface. Thus it would be extremely convenient to be able to specify only the differences between screens in as few as a couple of hundred lines of code. It takes a considerable amount of thinking and some inspiration to achieve this goal, but once it has been completed it makes any additions or changes to the screens trivial.

On one project, we decided the current interface was completely wrong so we entirely rewrote it within a couple of weeks. That type of flexibility may seem excessive, but what usually happens with large interfaces is that changes are so expensive that the interface gradually bloats and gets convoluted as the work progresses. It becomes impossible for the users to navigate. Being able to avoid that fate, because it was proactively understood that it would occur, allows the design to be finessed properly as the first work continues. If some major misunderstanding occurred in the way the screens were originally structured, it is no longer an expensive problem to correct it.

In fact it is this type of flexibility that gets lost with a reactive approach. The code is built to solve very specific instances of a problem, but within most domains, most problems reoccur repeatedly just in slightly different forms. And many of these domain problems share linkages to underlying common technical ones, particularly as the scope increases. A proactive approach seeks to build up a large number of reusable prieces and then apply these to the solution which opens up the flexibility to easily rearrange them later . The cost of spending time to make the parts reusable is paid for by the savings achieved of not having the code statically welded into place.

Although a proactive approach requires more initial work, it is still a piecewise approach. That is, it can be done in a series of iterations and these can be influenced by the user requirements. It's not quite that 1:1 that defines reactive approaches, but the direction is still driven by the users. The difference is that there will be times in the development cycle where the technical or reuse requirements trump the user ones, and as such although a quick hack could be done immediately, the road travelled will be a bit longer. This of course is subject to the politics of software development and maintaining that balance is a key factor. Even in a well-run proactive development project, sometimes reacting is required to maintain confidence, although it is cleaned up immediately afterwards.

From experience, the best analogy I've found for applying a proactive approach is with Lego blocks. The idea behind the development is to continuously assemble larger and larger Lego blocks, gradually building up a collection that can solve any and all of the user issues. The blocks should be general enough that they can be used all over the place, but specific enough that the underlying problems aren't just blindly transfered to the configuration. Each block fully encapsulates a set of problems. A big project has a large number of blocks of varying sizes and these themselves need some higher level of organization. It takes a bit get out the first set of blocks, but once they exist extending their functionality gets easier. As time progresses, if the work is organized, solving new problems gets faster because the existing blocks provide a vocabulary of expression at an increasingly higher level. The blocks quite literally converge on the nouns and verbs that exist in the user's own description of their problems. That becomes convienent to check that both the development direction is correct, but also that the advanced business logic really decomposes properly on the users side.

The projects then is rooted in the low-level technical issues that build up a foundation, but gradually progresses to higher and higher level domain problems. As this grows, the capabilities of the system extend out to handle the more sophisticated issues. It's a top-down perspective that drives the bottom-up development.

Reacting to the users concedes all control to outside forces, forcing the developers to march through the work one case at a time. It is the lest effective method of building systems and is unlikely to produce quality output. The developers are just constantly chasing the ball. Getting ahead of that ball means that the developers can choose to employ smarter and more reusable approaches to their work, in anticipation of the upcoming needs of the users. That forward perspective is what is ultimately necessary to have the time to properly engineer a system. Without that, the users may get what they've asked for, but they will definitely not get what they wanted, or even what they need.