Tuesday, January 20, 2015

Scale and Organization

Scale is a fundamental factor within organization. Let's start with a simple example.

Say you have 3 books. If you wanted to organize them, all you need to do is put them on a shelf. Any order is fine, since it doesn't take long for your eyes to glance over the spines, you can find the book you want.

Now if that set of books grows to 30, the shelf probably still works, but you may consider reordering the books based on titles or author or even perhaps some simple category system. 30 books is a lot to scan each time when you need so something, so to save time finding what you want, you might try ordering by title.

If the number of books grows again to 300, then a whole new set of problems is introduced. You have to abandon the shelf for a bookcase. It's a less convenient piece of furniture that essentially takes up a full space in a room, so it's location is probably not as handy as the shelf. Order also becomes more significant. You may have chosen a lexical order on the titles originally, but now with a full bookcase you find that you really want books by the same author to be together. With 300 entries, you might have several authors with multiple titles. Still, a sort by author makes it harder to find a specific title, so perhaps you prepare a little list that cross-references the titles with their appropriate shelf in the bookcase. It's a looser indexing scheme, but it saves time later and doesn't take that long to prepare.

But alas, the number of books keeps growing and now we get to 3,000. There are now 10 bookcases, and because of that some of them no longer fit in the same office. There are several locations to check and way to much work to keep the secondary paper index up-to-date. Instead, you recategorgize the books based on topics, and you assign a small number of topics to each bookcase. Within the case you can sort by author or return to the original title order. That suffices for the moment.

But still it grows and soon enough the number hits 30,000 books. With 100 bookcases the collection now requires its own 'library' and although the books are categorized, there are so many of them that you need multiple indexing methods. The bookshelves have a special code, and you have indices on title, author and a few major sub-categories. You employ a full-time librarian to keep the categorization up to date and to constantly roam the bookcases reordering and putting out-of-place books back where they belong.

The pace continues, so the number of books reaches 300,000. They've outgrown the original library, there are now three separate locations, each of which specializes in specific sub-categories. Within each, there are two full time librarians, and the work to keep it all tidy and in order has grown considerably. Finding any specific book is tougher because there are three separate indexing schemes, so there is a global project to consolidate those into three copies of a single larger index. That work, plus the advancements in categorization, since there are now a much larger number of subcategories and subsubcategories, are keeping everyone busy.

Now for this example, one of the key features is that the growth of books is exponential. It keeps increasing by x10. For each one of those up-ticks, the previous organization basically failed and an entirely new scheme was necessary. Also at each tick the amount of work changed radically. It started out as a personal collection and finished as a team of 6 people. Exponential growth does this quite quickly, but any sort of growth can approach the next tick, it just takes time. The key to organization is that growth changes it. The larger things get, the more effort required to keep it organized; to keep it usable. Growth inherently increases disorganization.

This same pattern is true in software development as well. As the code grows, the organization of it needs to keep pace with its new size. A small project needs very little organization while a massive project the organization is a full time effort in itself. For everything in between there are discrete ticks that have very different organizational requirements. The best way to tell if things are organized is similar to knowing if the books are organized. You examine how long it takes to find something specific given a number of different scenarios. For books you might look at how long it takes to find a specific book or to be able to browse a subcategory. For code, it's how long it takes to identify the code that produces a specific behaviour, which coincidently is the same as being able to find and fix a bug (the behaviour is just unwanted in the latter case). If the time has become excessive, then new means and/or layers of organization are necessary to insure that it doesn't become worse. In an organizational sense, if it takes a good programmer a month to identify the code behind a given bug than either the system is massive, it's overly complex or it's poorly organized. Most often it is the last case.

In software, the problem also applies to the data. It requires it's own organization although often that can be automatically indexed via the code and creating proper tools for the data administration. As the data grows, new ways are found to explore it so a never ending new set of indices is necessary as it grows. It is not enough to just lock it into some static structure, create one ordering and forget about it. Handling growth requires responding dynamically to each new level of scale. This ongoing problem, and the increasing ease in which we can collect data remain great challenges for software development and operations.

Organization is the underlying key to making many things 'usable'. A large collection of books has less value if you struggle to find the book you need. The problem inherent in building up a collection is more than just getting new books, rather it is keeping them organized in a fashion were they retain their value. That this problem occurs twice in software should really should be of no surprise. Code and data form the twin axes on which software operates. Each has its own unique set of problems, thus the organizational issues differ, but span over both. In a rather abstract way, organizing things is the crucial science that underpins our intelligence. We are smart not because of what we might know, but rather because of our ability to apply that knowledge in the world around us.