Scale is a fundamental factor within organization. Let's start with a simple example.
Say
you have 3 books. If you wanted to organize them, all you need to do is
put them on a shelf. Any order is fine, since it doesn't take long for
your eyes to glance over the spines, you can find the book you want.
Now
if that set of books grows to 30, the shelf probably still works, but
you may consider reordering the books based on titles or author or even
perhaps some simple category system. 30 books is a lot to scan each time
when you need so something, so to save time finding what you want, you
might try ordering by title.
If
the number of books grows again to 300, then a whole new set of
problems is introduced. You have to abandon the shelf for a bookcase.
It's a less convenient piece of furniture that essentially takes up a
full space in a room, so it's location is probably not as handy as the
shelf. Order also becomes more significant. You may have chosen a
lexical order on the titles originally, but now with a full bookcase you
find that you really want books by the same author to be together. With
300 entries, you might have several authors with multiple titles.
Still, a sort by author makes it harder to find a specific title, so
perhaps you prepare a little list that cross-references the titles with
their appropriate shelf in the bookcase. It's a looser indexing scheme,
but it saves time later and doesn't take that long to prepare.
But
alas, the number of books keeps growing and now we get to 3,000. There
are now 10 bookcases, and because of that some of them no longer fit in
the same office. There are several locations to check and way to much
work to keep the secondary paper index up-to-date. Instead, you
recategorgize the books based on topics, and you assign a small number
of topics to each bookcase. Within the case you can sort by author or
return to the original title order. That suffices for the moment.
But
still it grows and soon enough the number hits 30,000 books. With 100
bookcases the collection now requires its own 'library' and although the
books are categorized, there are so many of them that you need multiple
indexing methods. The bookshelves have a special code, and you have
indices on title, author and a few major sub-categories. You employ a
full-time librarian to keep the categorization up to date and to
constantly roam the bookcases reordering and putting out-of-place books
back where they belong.
The
pace continues, so the number of books reaches 300,000. They've
outgrown the original library, there are now three separate locations,
each of which specializes in specific sub-categories. Within each, there
are two full time librarians, and the work to keep it all tidy and in
order has grown considerably. Finding any specific book is tougher
because there are three separate indexing schemes, so there is a global
project to consolidate those into three copies of a single larger index.
That work, plus the advancements in categorization, since there are now
a much larger number of subcategories and subsubcategories, are keeping
everyone busy.
Now
for this example, one of the key features is that the growth of books
is exponential. It keeps increasing by x10. For each one of those
up-ticks, the previous organization basically failed and an entirely new
scheme was necessary. Also at each tick the amount of work changed
radically. It started out as a personal collection and finished as a
team of 6 people. Exponential growth does this quite quickly, but any
sort of growth can approach the next tick, it just takes time. The key
to organization is that growth changes it. The larger things get, the
more effort required to keep it organized; to keep it usable. Growth
inherently increases disorganization.
This
same pattern is true in software development as well. As the code
grows, the organization of it needs to keep pace with its new size. A
small project needs very little organization while a massive project the
organization is a full time effort in itself. For everything in between
there are discrete ticks that have very different organizational
requirements. The best way to tell if things are organized is similar to
knowing if the books are organized. You examine how long it takes to
find something specific given a number of different scenarios. For books
you might look at how long it takes to find a specific book or to be
able to browse a subcategory. For code, it's how long it takes to
identify the code that produces a specific behaviour, which coincidently
is the same as being able to find and fix a bug (the behaviour is just
unwanted in the latter case). If the time has become excessive, then new
means and/or layers of organization are necessary to insure that it
doesn't become worse. In an organizational sense, if it takes a good
programmer a month to identify the code behind a given bug than either
the system is massive, it's overly complex or it's poorly organized.
Most often it is the last case.
In
software, the problem also applies to the data. It requires it's own
organization although often that can be automatically indexed via the
code and creating proper tools for the data administration. As the data
grows, new ways are found to explore it so a never ending new set of
indices is necessary as it grows. It is not enough to just lock it into
some static structure, create one ordering and forget about it. Handling
growth requires responding dynamically to each new level of scale. This
ongoing problem, and the increasing ease in which we can collect data
remain great challenges for software development and operations.
Organization
is the underlying key to making many things 'usable'. A large
collection of books has less value if you struggle to find the book you
need. The problem inherent in building up a collection is more than just
getting new books, rather it is keeping them organized in a fashion
were they retain their value. That this problem occurs twice in software
should really should be of no surprise. Code and data form the twin
axes on which software operates. Each has its own unique set of
problems, thus the organizational issues differ, but span over both. In a
rather abstract way, organizing things is the crucial science that
underpins our intelligence. We are smart not because of what we might
know, but rather because of our ability to apply that knowledge in the
world around us.
No comments:
Post a Comment
Thanks for the Feedback!