Sunday, February 21, 2010

Data, Data, Data

In one of my early jobs, almost twenty years back, I worked on a team that spent years building a very sophisticated cache. It was a UNIX based multi-process system that was built for speed and fault tolerance.

For each instance, it featured eight 'writer' processes combining their efforts to minimize the work of grabbing masses of data from a humongous, but slow database. Their results were restructured and then stored in a huge shared memory segment that was accessible by eight more 'reader' processes responsible for responding to data requests.

The code was written in straight simple ANSI C, and we had no access to any underlying libraries, so we had to write all of our own mechanics: memory management, resource management, locking, process monitoring, logging, etc. It took well over a year to get the initialize design up and running, but we also built in a system-level automated testing component, so the code was extremely robust (it went for years in production over several releases, and had only one known bug).

Once we got the system up and running, I did the obvious thing and went back over it to see where we could eek out some better performance.

We had started with a full design, carefully segmented the project into pieces and then built each piece consistently and using very rigid standards. We were a small but dedicated team. We researched all our algorithms, and used the best practices for building 'systems level' code within the architecture. For the most part, the code was clean, easy to read, well-structured and very consistent. Still, even the best works always have some aspect that could have been better.

As I started to go through the code, I concentrated on the way data was percolating throughout the system.

For instance, a request would come in, move from the initial parsing into a structure that was then placed in a queue in shared memory. When available, a number of writers would start up, each with a minimized fragment of the query, and they would hit database (it was highly parallel).

The responses from the database would then be reformatted, broken up if necessary, and the various sections of shared memory updated. When complete, the last writer would signal any appropriate waiting readers. From there the readers would grab their data, package it for transport and then send it out to a waiting client. A basic straight-forward multi-process read-only caching architecture.

As I dug, I realized that the requests and responses moved through the system were passing through a large number of different components. We had broken the system down into well-defined pieces to make its development and our task allocation easier to manage, but this break down had come with consequences. Each component, being written with an appropriate level of caution, carefully checked and copied its incoming data to insure that it was valid and correct. Each time the cost was small, and the resulting code was more robust, allowing it to detect problems quickly, report them, abort and then force us to fix them while still in development.

The approach was solid, and certainly for the amount of code we wrote, being that consistent and tight with error handing had been a big reason why we were on time and on budget. But, we were losing a considerable amount of time to this excessive copying and checking of data. In one case I remembering seeing 6 or 7 memcpys on large buffers as the code moved from component to component. In each component, each mempy seemed reasonable, but the sum of all of them was not optimal for performance.

Of course, we are taught to not worry about performance until later. So, from a development perspective, given our results the project was a raging success. Still, after my analysis, when I had finally cut down on all excessive copies and had moved much of the checking into 'debug mode', we picked up a 5% to 10% boost, which given our equipment and operational specs was necessary. At the time I couldn't help wondering if with some small changes to our development approach we might not have been able to avoid this problem right away.


DATA-STRUCTURES

From my university days, I was taught the importance of data structures and how we should use them as the basic building blocks on which to form the rest of the system. This approach, sometimes referred to as ADTs was the father of Object Oriented programming. Before OO, it was up to the programmer's discretion as to how to structure their code in an organized manner. The data-structure approach proved so powerful that the next evolution was for it to become embedded directly into our computer languages, and thus OO was born.

Still, the point of many of these earlier approaches was to help the programmers find the simplest and most effect means in encoding their systems to run on the computer.

If you think of a system as a large set of functionality, that overall perspective can be quite complex. It can also lead to programmers thinking that the best and fastest way to get new functionality is to copy existing functionality and edit it with the changes. That, it turns out, is the heart and soul of a very effective technique to create masses of really bad spaghetti code. It may seem simple and fast at first, but it accrues so much technical debt, so quickly, that it isn't long before it hopelessly swamps the project and ends in failure. Of course, the earlier programmers had lived through this, and figured out ways around it.

If instead of being hyper-focused on the code, the programmer considers the system as just the data moving around within it, the overall system drops astronomically in complexity. In this perspective, the code is secondary, and really only responsible for getting the data from point A to point B with a bit of reformatting along the way. Functionality isn't a way of manipulating data, rather it is what transforms it from some raw state into something more usable for the users.

The 'print' mode in an document editor for instance, takes the data from the raw, internal cut-buffer data-structure and then pretties it up in a way that is convenient for a printer to output it. There are lots of different ways of making it pretty, so there are lots of little choices for the user. If you consider all of the 'functionality' available in the printing sub-system it is huge, but if you see it all as only some conversions between these two types of data-structures, it is really not that complex at all.

While Object Oriented was an attempt to enshire these concepts directly into the languages, so that the programmers would have to utilize them, most development still fails to approach this correctly. Most programmers still seem to be steadfastly focused on the 'code' they are going to write, only thinking about the data later. Code, code, code. And that's why, quite quickly as the work progresses they keep heading to bigger and bigger messes.


WHAT WORKS?

Oddly, one of the most successful technological implementations I've even see for getting the focus right was a COBOL derivative. I can't remember the name of the language/system, it was so long ago, but programming in it was simple.

You specified what you wanted from the database. You specified how that was changed as it went to the screen. You specified what each of the different keys on the screen did (if it wasn't default). And then you specified the validation and restructuring of the data as it went back into the database. Simple and straight-forward.

In truth, I hated it because it was so orderly, simple and easy to do. I felt more like a typist then a computer programmer when I was building systems. Still, it and so many of those other older tools are no doubt the reason why so much of the world's crucial data relies on mainframe technologies and not some of these newer cooler, hipper (but undependable) technologies. Too much freedom may seem fun initially; right up until you start missing deadlines and everyone is screaming.

In many ways you can see the results of the 'functionality' based approach right away in most big systems. All you need to do is understand the breath of the underlying data, and compare it back to the interface.

So many products are disconnected masses of randomly placed functionality that have way more to do with laziness and history than with making it easily accessible to the users, or logically consistent. You quickly get lost in these stupid senseless mazes of drop-down menus, redundant screens and little click-able thingies that have gradually built up over time. Related functionality is split all over many different screens. It is organized (badly) by barely related functionality and the composition of programming teams, when it would have been far more accessible if it was organized by the way it outputs or manipulated its data.

For example, you'd like all of the related controls for manipulating the outputted printing structure to be grouped together by the final manipulations produced, not by how the programmers wanted to cut and paste the code. Not by which DLLS or libraries were installed. Not by which driver was active.

A trade-off must always be made between the convenience of the programmers and the convenience of the users. Most developers, these days, make the wrong choice. It is extremely rare to find well-thought out interfaces. And in some cases, the first generation might have manged to do it well, but were quickly followed by a less gifted one.


A NEW PERSPECTIVE

There is a cure. Things are bad in programming these days, but it doesn't have to be this way. Certainly our elders learned their lessons and through ideas like Objected Oriented they tried to pass them on to the newer generations. It failed, but it doesn't mean we can't redeem ourselves.

Mostly, the single largest and most successful change programmers can make is in their perspective. If they give in, and stop fighting against the underlying data-oriented concepts like objects, they can quickly learn to build faster, better code that isn't so hard to maintain. It comes simply from putting the data first.

The first thing software developers need to assemble is some kind of model of the data that will be in the system. Long before they start asking the users questions about what they are doing, they need to know what is in  underlying data, how frequent it is, its lifespan, its quality, and its structure. All the cool functionality in the world is no good unless it has the right data to work on. The first pillar of every system is whether or not it contains the necessary data.

And it's not only the user's data that is important. It is also the system admin's data and the operational data too. There is always more to a system than just the key functionality required, and so often all of these contributing parts get skipped over until later. So, it's the data in the persistent storage, but also the data in the config files and any other parameters that need to all be addressed in a similar and consistent manner.

The thing I learned most from my earlier example with the cache is that it is important to minimize the way the data flows through the system. Having access to the data is just a first step.

It is also important to minimize the parsing and re-assembling of the data in various different subsystems. The data coming into the system should be broken down into its primitive pieces as early as possible. And it should not be reassembled until as late as possible. Both of these rules of thumb help keep the data in its most usable state, and cut down on unnecessary manipulations.

It should not be copied into different components. In fact, in an OO system, it should be wrapped initially with one and only one object. Different data objects might be related and grouped together at a higher level, but since each piece of data is immediately broken down into its most primitive components, it should only exist in only one form in the system, in only one instance.

As the data flows through the code, similarities should be exploited. That is, if the system supports jpeg, png and gif images, while each may have its own object type if necessary, there should be an overall 'image' category object as well. The higher up you go, the more you should exploit polymorphism, the less code that is required.

But with less code, there is also less debugging. That is, if you have four functions that each use their own code, you need to test four times. If you have four functions that all share the same basic code, testing one essentially tests the others. When you first write the code you should probably test all four, but later on, for small fixes and updates the 'impact' of the changes can often be assessed by only testing one. The long-run savings in 'impact' testing can be enormous if the architecture is well-structured.

Once you have the data in the system, and it is managed by a minimal amount of common infrastructure you can start having fun by thinking about how to manipulate it.


FLEXIBILITY AND CHANGES

Because it helps in addressing their issues, users will generally require some redundant screens, and slightly inconsistent handling. It is the nature of people. We're messy and we need messy interfaces.

The trick is to confine those inconsistencies and duplications to the least amount of code possible. If you can contain it, minimize it and then make it easy to read and change, it becomes relatively simple to re-arrange the different aspects of the interface to suite the particularities of the users (it would be nicer if the users could do this themselves).

Interfaces that are locked in stone, or take massive amounts of similar work to create, tend to become broken very quickly. It was one of the few weaknesses with the COBOL systems I discussed above. There is way too much work going into too many inflexible screens. Changes -- which are constant and inevitable -- become problematic and costly. The accrued technical debt becomes an impossible weight, and development grinds to a halt.

Not unsurprisingly Object Oriented concepts are well suited towards visual interfaces. This is where they really shine. Interfaces really are just ways of laying out and collecting various bits of data on a screen. The same ideas that apply to getting data from a database, also apply to laying out data in an interface. They're just different angles on the same problem. If you map each and everything you see on a screen to a unique object, then all screen definitions are just some methods for constructing larger structures.

Sure the frameworks and toolkits do this with panels and widgets, but most programmers stop there and then just start belting out long lines of ugly redundant construction code.

The idea is not to make one big bloated super-object for the entire application, but to continue the object composition upwards.

That is, there should be a type of object for everything that is visible on the screen, including the various sections containing underlying widgets and all of the different pieces. If someone points to it and refers to it as something, then that something should be an object of that type. Getting that visual-to-object map simple and consistent makes it far easier to construct or rearrange objects, and thus screens. It makes it easier to debug as well. It exploits the power of the paradigm.

This technique works best if you also map the visual containers (like lists, tables, sections etc) to generic objects, with minimal instance information. In this way you can build up screens from larger and larger consistent structures, and one change will fix all of the different instances on all of the different screens. Instead of fixing all of the screens with user information, you fix the user information object, and it is updated on all of the screens.

As well as interfaces, most big systems will have some large specialize set of functionality that is not presentation related. That is, most systems contain some type of processing engine. Generally, these are well-defined enough that they can be fully isolated from the rest of the system. Engines can and should stand on their own.

Here is one of the few places where checking the input, at least in some type of debug mode is a really good and valuable idea. When engines go wrong, their depth makes them difficult to diagnose, so a fail loud and fast policy tends to pay big divides in diagnosing escaped bugs.

One interesting thing about engines is that underneath they are driven by their algorithms. In a stark contrast to the rest of the system, here is a place where the programmer should be far more concerned with the code, then with the data.

The input to an engine is usually all of the necessary data, while the output is either a set of errors, or the finish data structures. There is no interface. Building an engine with a fine-grained object approach tends to lead towards distributing the behavior of the code all over the engine. But in this type of purely computational programming, a bigger object that is based around an algorithm, and contains a large number of well thought out, small primitive operations is generally a more readable approach, precisely because it keeps all of the related code in the same proximity.

This is an important point in programming, because while it is nice to have hard and fast rules for the development, and consistency is vital, there are always times where an exception can be the more appropriate answer. Strong consistency in 80% of the code is generally what I am shooting for, but you can't sacrifice other useful attributes like readability just to get to 100%. It's all about making the right trade-offs.

On the other hand, the consistent approach should always be tried or investigated first. Most of the time it works, so most of the time it is a safe bet. Ignore hunches.


PATTERNS, LIBRARIES AND OTHER ISSUES

When design patterns first came along, I thought they were excellent and would be really useful. That was, until I started seeing programmers use them as defacto building blocks and naming their objects after them such as *Factory or *Facade or such.

If your perspective is on the data, it is easy to see that patterns are code related. They've really the opposite of data-structures (code-structures?). Because of this, if you use them as building blocks, they become effective tools for obfuscating the real underlying flow of the code. They were meant to be a starting point for coding; just an initial template, not a building block.

Most sophisticated objects require multiple overlapping patterns. By separating them, and raising them to the status of building block, they just become more noise to hide what the underlying code is really doing. Does knowing that an object is a Singleton change how you'd use it? It shouldn't. So you don't need to know.

Patterns should be mixed and matched, and then should show up in the comments, with references, but they should never, never be in the object name space. With a data-oriented approach, patterns can be useful in helping to drive some consistency into the underlying implementations surrounding the data, but I guess because they are code-centric they are easy to abuse, and this has become standard practice.

A data-oriented approach to libraries would be exceptionally useful particularly if programmers could agree on consistent interfaces. After all, most misery in modern programming comes from interacting with all of these badly engineered, oddly interfaced library. Java is the worst language for that. The libraries are sporadic and messy, with horrible interfaces that seem bound to make it much harder to use.

For libraries, I only want two types: one that provides an encapsulation of some type of data. An image library perhaps, that allows me to load and save images. The other type of library is for handling specific algorithms. Say, something to run Gaussian blur on my images.

I'd like them separated because sometimes I only want clean and simple access to the data, particularly if my intent is to add my own higher level algorithms.

If it were a matter of just matching libraries to specific categories of data, then choosing implementations and working with them would be a whole lot easier. These days, you often get these "half-baked" libraries that partially solve some limited aspect and add in a fraction of data. They are unusable on their own.

They often have large usage flexibility too and you have to really wonder who the programmer's thought would use all of this stuff. The coders were probably so concentrated on writing the code, that they didn't give much though to how people could or would use it. But even if you wanted to, it is not safe to depend on some minor functionality contained in a library. That's the type of thing that changes quickly between releases. Or it doesn't get updated. Either way, it is too much technical debt.

Often with many of the modern available libraries, it seems clear that the programmers were more concerned about the ease of their implementations, than about the ease of other coders using their works. It's a great recipe for an ugly interface.

This leads to simple technique. Sometimes in programming, if I have some very complex interface to design I'll start by writing the higher-level example code first. That is, I will write out the calling code in a simple and straight-forward manner as it 'should' be, then later I will back-fill in the missing library code to make it match how it was called. It's a hugely valuable technique in getting clean and simple interfaces.

A final issue always related to data is simplicity. Most programmers get into programming because they have a deep down love for intricate complexity. They've always loved machinery, cogs, gears, that sort of thing. That is fine, but that is also their Achilles heal. That is, their love of 'intricate' leads them to build intricate code. A finely crafted watch is a thing of beauty. Code that is fine and intricate is also fragile, delicate and it is hard to explain how it really works. All attributes that are undesirable in programming.

The simplest, most straight forward code that does the job (so long as it is not COBOL :-) is the code that all programmers should be striving for.

That doesn't mean that you can't have large complex abstractions that generalize the mechanics and allow for lots of code to leverage off a little. That type of abstraction, particularly if it is within the data-level is fine, and actually desirable. If you weight it properly, 10,000 lines of an abstract engine for some large range of calculations easily beats 250,000 brute forced lines, where each set of options is cut and pasted. Not only is the smaller, denser code more stable, it is also easier to test.

Simple is especially important when it comes down to user and admin behaviors. A good approach would probably be to write the documentation first, and then back-fill the code in later. Sophisticated code with complex algorithms is really hard to explain to users, and because of that, not particularly appreciated. If the users keep forgetting how the code works, they can't exploit it properly.

Simple code is really spartan code. That is, there are a minimum of variables, nothing is overloaded, and all of the code is broken into reasonably sized functions. Short, clean and with no extra effort. Fancy comments, or extra annotations are just more things that need to be updated in the end, and should be avoided. What programmers need is the exact minimum amount of work that is needed to get the job done, and nothing else.

Spartan code is data-oriented code. It happens that way because the programmer's are concentrated on the least number of variables, the least number of calls, and least amount of duplication, and all of these things come easily if you follow how the data moves through the system, not how the code moves the data through it.


SUMMARY

Data, data, data. I can't say it enough, and even if I sound like a broken record it doesn't seem as if the message is getting out there.

This isn't really a new idea, it has been there all along. We just seem to have trouble as an industry in passing along our own domain knowledge to the next generations of programmers. We keep losing our understandings, and each new generation re-invents the same wheels with a small set of improvements, and some hug steps backwards.

It's tough because as the technologies become more complex, and until we get better abstractions on which to build, most development is sitting on edge of a cliff about to fall. You'd think that the industry would seek out methods of making coding more reliable, but instead we seem to want to do the opposite. More people write code, and more new code is coming into the markets, yet the overall quality has gone down.

Young programmers are too interested in writing code to care if what they are writing is reasonable. Many experience programmers I know have just dropped out of any public discussions about their occupation. They see little value in many of the newer methodologies. And old programmers... , well, old programmers rarely exist. Long before most people have mastered coding, then have left it behind.

Still, with one simple change in perspective, even the largest, most daunting projects can become manageable. If you start out with the right foundations, and keep up with the technical debt, then most software development doesn't have to be a risky endeavor. Projects don't have to fail as often as they do. We don't have to put in as much effort as we have been.

Wednesday, February 17, 2010

Big and small experiences

For most of my career I have often ping-ponged between big and small companies. Large companies offer security, and they are generally slower, more relaxed environments, but I actually prefer working in smaller, even tiny organizations.

For one, you don't have to wait for people to do stuff. If it is going to get done, then you're probably the one to do it. Nothing is worse than having to wait, only to find out that the work done was shoddy. I'd rather just go at it myself, accepting that it may not be perfect, particularly if I've never done anything like it before, but it least it will get completed.

Process, or at least the simplicity of it is another reason why small companies are great. Many people who have spend their whole lives buried in big institutions have some sort of strange mystical faith in their underlying process. The bigger and more overly complex it gets, the better they seem to like it. In a small company, you quickly learn to only do work if there is any real value in it. If it is a waste of time, you drop it. In a big company, all sorts of people do all sorts of stupid make-work tasks, over and over again, as if somehow repetition will magically add value to their time-wasting. It's kinda creepy, in a sad way.

Communication. In a big company people can spend nearly forever getting around to saying what they are trying to say. Some might even pride themselves on an ability to spend extra long to release as little as possible. In a small company, if you've got something to say, as Johnny Rotten screamed "f*cking say it". You don't waste words, you don't beat around the bush.

Politics in big companies are horrid, stupid and insipid, with the various parties trying to outwit each other wherever and whenever possible. At best it's clever, but as an old friend loved to say "monkey's are clever". And mostly it turns out that the ones who think they are best at these brain-dead sports, are usually the ones most frequently tricked into thinking they've won more than just another bad deal. Politics in a small company is usually just friction between the personalities. Mostly, once it starts to get bad it resolves itself pretty quickly (usually by someone leaving).

Bullsh*t in all its shapes and sizes is a landscape feature in large companies. Big organization obtain big mountains of it. This includes stuff like "mission statements", huggy feely corporate messages, performance reviews, messages from the top and any other career or worldly advice that the robots in management are told that they should tell you in order to pump up your floundering morale and to help exploit more of your time. After all, if it comes down to a choice between you and some executroid's bonus, you don't stand a chance in hell, but otherwise they really "are" looking out for you best interests. At least in the little organizations, there is no time for dispensing this putrid stuff; it is a lot more honest.

Basically, while you are insulated from anything but your core job in a big company, it is also easy to become detached from the reality of what you are actually doing. You could spend your entire life hidden behind massive walls, only to realize that you've never even witnessed but a tiny fraction of the reality of whatever you are trying to accomplish. If you've ever wanted to try every related position, experience all the angles, and get down and dirty with a lot of different, but interesting jobs in your industry, then a small company is the place to be. Little companies can't afford specialists, so you are able to get your hands into more aspects of the operations. Little companies just don't have the same 'turf' problems that plague the big ones.

If you've stayed too long in corporate-land, it becomes nearly impossible to leave. Like an over-protected child getting out for the first time, many poor souls that grew too accustomed to their cushy lives have met crushing defeat when exposed to the real world. Most people are not cut out to survive a small company, it is just too raw for them.

Monday, February 15, 2010

Privacy Concerns

In this very public day and age, many of us have lost our sense of needing to protect our privacy. We voluntarily fill the net with way too many details of ourselves, possibly a byproduct of the craze for reality TV and the idea that everyone should get their fifteen minutes of fame.

Still, as harmless as it may seem, we have to remember the true importance of privacy.  It plays a crucial role in any society.

Everyone has an opinion. And for every possible opinion, at least one person holds it, no matter how unpopular it has become. That means that for every possible issue out there, is a full range of possible thoughts, beliefs, phobias, myths and other base information, each getting wound into the public discourse.

In such a massive sea of differences, there is some popular or 'average' opinion, but a significant number of people deviate from this conformist view. In fact, to some degree we all 'deviate' from 'average' on at least one of our significant opinions, it is inevitable, it is what makes us all unique.

And we are proud and happy to display our uniqueness. To not have to hide any non-conformist views from the world at large. We are who we are.

Like opinions, mankind always has the full range of personality types, some good, some not so good. One common personality type is the 'control freak', that is a person who feels most comfortable when they are able to control the actions and behaviors of others. To lessor or grander degrees, it is a very common personality type, and in many circumstances these types of people lead good and useful lives. There are always situations where someone needs to take charge, and they need to be explicit about it. Control freaks do well in these circumstances.

Still, there are some that have not found such useful outputs for their impulses, and sometimes these people make it into positions of great or influential power. It is there, from these heights, that we generally run into problems. Sometimes it is just an individual, but often history lets a group align to do something dangerous.

Control freaks with too much power, inevitably start some move towards consistency. That is, they choose some set of values, someopinions , some beliefs, and then they deem those to be superior and correct. From there, what they want most is to "help" people into seeing the world "correctly"; at least their version of correct. This is were the trouble starts, and this is were it gets dangerous.

My all-time favorite example of a period like this is "McCarthyism" in the fifties, in the United States. McCarthy was a senator, with a known drinking problem, but a pretty good orator. To keep getting elected, he became the mouthpiece for a group of rather nasty people, whose main concern at the time was communism. Now it is true, that there was great tension between the US and Russia, and that there was great tension between Communists and Capitalists, as movements, but McCarthyism used that backdrop as a pretense to "weed" out undesirables. That is, for any man or woman that held, expressed momentarily doubted, the wrong opinion of communism, the witch hunts had begun. And they were bent on and quite successful at ruining people's lives. It was mean, ugly and an entirely useless period.

So, it wasn't a period about facts or loyalties, but rather a witch hunt for deviant opinions. Exactly what a group of control freaks would devise in order to "cleanse" the world of dangerous or undesirable people.

Far from being an anomaly, these types of collective restructuring efforts are carried on regularly within all societies, at all levels. There are always control freaks, finding their way into positions where they think that for the common good they can impose their views on others. This happens constantly.

Privacy is the only shield, when these types of circumstances rear their ugly head. That is, if some section of your society wakes up one day and decides that they don't like your specific view on some issue, you'll need all the privacy you can get to get away from these people. You can't fight them, and you can't win. Time eventually turns on them, but until that happens they are immune to logic and reason.

And don't forget, each and everyone of us has this vulnerability on at least one of our opinions. That is, control freaks from the "right" might take power, or control freaks from the "left", or even control freaks that think that people who don't side with either or don't vote are really the danger. There is no safe haven, there are no safe opinions, and even 'no' opinions are unsafe.

Someday, at sometime, someone coming into power, will have a beef with what you believe, and for sometime you will need something safe to hide behind. It is the way it works, and the way it will always work. Privacy is a valuable resource in our society to allow it to function correctly.

Sunday, February 7, 2010

The Generation of Complexity

"Everything should be made as simple as possible, but not simpler."  -- Albert Einstein

When I was younger, I had a tendency to view history as being a smooth transition between the major events of mankind. Things progressed, we evolved and gradually we got to the point we are at today, with our knowledge and our technologies. All in a nice neat, smooth line.

However, as you delve deeper and deeper into the past, you start to realize that the chaos of modern times has always been the way things have worked with people. Far from being a clean and orderly transition, history is a boiling cauldron of great leaps forward, and horrific leaps back. Societies live and die, often they come crashing down from their great heights, or slowly sink into a stupor. Knowledge is found, bent and then forgotten along the way. It is a twisted path, one that is even deliberately erased or altered from time to time, making it that much harder to understand.

It makes sense, after all, since history is all about the interactions of people with each other and with the elements of nature around them. The disorder we see in the world today is the disorder that has always been there. We're no different from people two thousand years ago, other than we have an expanded set of knowledge, and we have access to far more technological gadgets. The same intellect and curiosity have existed in our species for a long time now.

Far from being 'old news' history is the driving force behind everything we know, and why and how we know it. Alternatives in people, places and times would have left us with similar, but significantly altered understandings.

All of our cities, states, and countries owe their names and definitions to the turmoil of history. Our sciences and engineering were all driven by personalities and events, great and small. Progress, or the lack of it, can be traced to the periods in time when great people swayed their populations, their societies or their contemporaries into achievement or silence.

And our fundamentals, those core pieces of knowledge on which all else rests, are most often named for the people or organizations that created, inspired or dominated them. Even our technological weaknesses bare strongly on the underlying processes and conventions in our world.

That is, everything we know, in any discipline, comes to us from an arbitrary series of events. Someone discovers something. Other people extend it. Success or failures occur, and we change our underlying assumptions. All and all, it is driven first and foremost by the character and personalities of those initially involved, and shaped, time and time again, by those who stand on their shoulders.

A completely different group of humans might still reach the same conclusions, but ultimately the different names and personalities would leave an indelible mark on their works, and thus our knowledge.

Another factor that is equally as important is that we can only build on what we understand. Great leaps happen, but they are few and far between. In the meantime, the rest of humanity absorbs the knowledge, gradually bending it to make it of practical value. Still, each and every fragment of understanding we have depends heavily on a multitude of proceeding fragments, and the events and personalities along that way that helped shape them.

Had Albert Einstein been around two thousand years ago, he would not have been able to conceive of the Special Theory of Relativity precisely because all of the groundwork for his understanding had not yet been laid.

Most people know and except that, but few seem to really understand how that places a layer of arbitrariness over our intellectual endeavors, our processes, our institutions, our knowledge, and our lives. Had the people, personalities or organizations been different, then the history that we know and are familiar with would have also been radically different.


THE SUM OF ITS HISTORY

One of the greatest pleasures of building software is that it allows the developers a rare chance to dig into other people's problems. That is, we build tools for people to use, and to do so well, we must not only understand the tool, we must also understand what they are trying to do with it. We must understand their domain.

To get this knowledge, we need to dig deeply into the roots of their problems, characterize their efforts and then find useful ways to map that back to some automated, or semi-automated computer software. Without this prerequisite knowledge, it is unlikely that the tools will help significantly, and often they can actually become impediments, making the problems worse.

Some developers make the mistake of looking down from their perches, staying far away from their users. Generally, the result of this is overly simplified, or even highly complex solutions that do not fit well to the actual problems. Tools that are awkward or painful to use.

We've known for a long time that the best, most useful tools have been driven by developers in close proximity to their users. To really help, one has to really understand and to really understand, one has to have a deep although not necessarily complete knowledge of what they are trying to automate.

It is in this relentless digging that we do, those software developers are often exposed to the real ugly underbelly of their target domains. That is, we have to see the arbitrary messiness of it all, and then try to lay some structure on top in order to bring the problem down into something that is manageable, both by the computer but also by the development team itself.

Software is notorious for frequently changing, but those changes are more often the result of mistakes in understanding by the original developers then they are shifts or changes in the underlying domain. That is, most of what is wrong about the software comes from a failure of the designers and programmers to really understand the domain. Or even sometimes, to understand what aspects of the domain are inherently flexible.

Most domains have been around and established for a long time, they have settled nicely into a set of practices and conventions. To the domain experts, who specialized in these branches of knowledge there is a certain consistency and structure to their work. To outside observers, such as developers with limited exposure, things may seem a little more haphazard. And that is where history comes crashing back into the software development process.

That is, the underlying complexity of a domain is built up gradually through history as a result of the personalities and organizations that have come and gone from the domain. History is the driving force behind the underlying complexity in the data, the process and all other aspects that define the domain.


TYPES OF COMPLEXITY

Complexity is a complicated beast. It is not so much a thing as it is the difference between two related things. That is, if you take two similar things, one simple, and one not, then the difference is pure complexity.

That is a relative definition of complexity; we could try for an absolute one, but for any metric we assign, numeric or otherwise, the meaning would essentially be arbitrary. If we choose some number X, then the difference between 15 and 245 in this complexity number doesn't provide any useful understanding if we can't relate these numbers back to something tangible. And in that relationship, we might as well just stick with some relative difference, since 230 as a number is just as meaningless unless we understand the original two things we are comparing.

So when we think of complexity, we really need to think of how it can range from simple to massive in order to fully grasp what it means. It is how it changes that is important. Of course, simple itself is not an easy concept either as I pondered in:

http://theprogrammersparadox.blogspot.com/2007/12/nature-of-simple.html

Still, even with only a weak relative understanding of complexity we can go forth and examine it's two main underlying causes. They are:

- Strange Loops
- Volume

Complexity comes from either the inherent complexity embedded in some underlying idea, or from the sheer volume of simple stuff that is stacked together.

Strange loops as explained by Douglas Hofstadter in GEB are hard to understand, non-intuitive concepts like recursion, infinity, self-reference, etc. We see plenty of examples ranging in their 'hardness'. For example, some people find mathematics generally confusing. Certainly, most people find the bending of time and space to be confusing if not the way things work with quantum mechanics. Chaos theory, fractals, etc. are all complex concepts that take some effort to be able to understand. However, once groked we can then look back and see them as simple. But for the new and uninitiated they can be huge mountains to climb.

Volume speaks for itself. If you have some mass of information, not terribly hard, it still takes a significant effort to work through it all, and remember it. Complexity need not be difficult, it can just be related to size. The sum total of all of the civil law, as rules and regulations for a region, might, for example, be a massive volume of cases and histories all intertwining within themselves. Generally, the underlying cases are just about facts and events, finished with some sort of overall ruling, but still, the legal world can be a complex and painful labyrinth to navigate. What is mostly simple in pieces, can quickly build up to be overwhelming.

And of course, anyone familiar with working in a large bureaucracy would also know the pain of trying to get anything to change. A vast mountain of meaningless process and rules built up over an extended period can lead to a nearly impossibly rigid and static mess.

We see examples of people caught in David and Goliath battles with bureaucracies all of the time. Observers are often left wondering how they could have ever gotten built up so badly. But once you've become a cog in the machine, the tar pit makes considerably more sense, the longer you hang around.

And, mostly in these weighty organizations, it is how the different personalities and politics play out that keep things from progressing; that keep them from changing. Those that complain the loudest, are often the hardest to budge in their own little corners.

Volume-related complexity gets into the system for a couple of reasons:

- Intrinsic complexity.
- Accumulated through a long history.
- People making the problems more complex than necessary.

Some things have an inherent underlying complexity to them. Fractals are a great example in that at each level the patterns may appear simple, but the self-similarity that spans all of the levels is inherently complex. It took a long time for people to start to understand them, and it will take a long time before they have been fully integrated into our overall knowledge base.

I've already talked a lot about how all things are a sum of their histories, and how even small differences in the histories may account for large shifts in knowledge. Most complexity comes in from a long process of getting built up over time. Each piece may be simple in its own right, but the sheer scale and volume getting built up can quickly become unmanageable.

As well, People often have an inherent desire or need to over-complicate things. Somehow it seems to manifest as insecurity about appearing smart and as a result, many people overdo the required effort to make themselves look or feel better.

Strangely, their results rarely fool other people, but the consequences usually last essentially forever. Once some complexity has become enshrined into a system, there is very little that can be done to remove, refactor, or replace it.


EXAMPLES OF COMPLEXITY

There are so many great examples of staggering complexity in our modern world that is hard to know where to begin.

We find it easily in our modern lives. Our properties and possessions require effort. The first-world owns more stuff per person, on average than at any other time in history (I am guessing). Everything we own comes with some effort to learn how to utilize it, an expectation for time and some amount of on-going maintenance. The more we own, the more effort we accrue. A massive pile of stuff requires a massive amount of time. Either that or is just collecting dust in our basements.

For the middle class and above, our lives and our professions drive us to more and more interactions with various different types of professionals. From simple health-care specialists, financial help and advisers, property repair, contracting, career-related support to purchasing both short and long term goods we get a myriad of advice from an ongoing collection of professionals.

So much advice, so often, that it exceeds our abilities to follow it all, correctly. It even exceeds our abilities to remember the bulk of it. It's just an ever growing list of things that we should have done, from which we can only pick a small subset to complete because that's how much time and effort we have left at the end of the day. We all fail to live up to our complete expectations, it has become the norm.

In the world around us, it is commonly understood that "ignorance of the law is no excuse", which is to say that our societies have a strong expectation that we will all know and obey all of our laws. However, most, if not all, modern societies have been adding new legalize to their books for so long, and in such volumes that is impossible for any single human to know all of the laws that apply to them.

That is, while we might have some general vague notion, it is unlikely that even the professionals can quote every major law in both the criminal and civil codes. There are simply too many rules. Still, we are held responsible for things we couldn't possibly know. Clearly a defect in our structure.

Earlier I talked about bureaucracies. They are mammoth organizations that have become so plagued by their own processes and rules that they are unable to break away from the status quo at anything other than a crawl. The length of their history only strengthens their problems. Bureaucracies are disasters by definition. In the past, when they were bad but smaller, ignoring them might have been an acceptable option, but now as they threaten the sustainability of our societies we are quickly reaching a point where an action is necessary. Where we cannot allow them to continue on in their broken fashion.

Even our own languages and the way we communicate with them can be affected by complexity. Who hasn't read examples of writing that tries too hard to impress by littering the text will long sentences and complex terms. Truly convoluted writing is a masterpiece in complexity.

There are a huge number of examples of this type of banter, many coming from deep within the academic circles. It is not a surprise given that most academic settings are harsh and highly competitive environments. A lot of people want to be the smartest people around, and many of them are willing to do anything to get those honors.

A related example is how restaurant dishwashers often joke about being "ceramic maintenance engineers". A wonderfully obtuse term whose example is sadly followed in many serious branches of sciences.

Of course, it is not just our lives and the way we organize ourselves that is affected. All of our knowledge from the soft sciences, right down to mathematics itself is plagued with overcomplexity.

Huge structures like bridges can be overly complexity, built to withstand unrealistic challenges. Still, this is one of the few areas were overly cautious and overly complex works are not necessarily bad things, in that, we don't know how long things should last for. An over-engineered bridge, while costly, is far better than an under-engineered one. Perhaps that why we have so many? Of course, when we build them but don't need them, they start to impact whether or no our societies are sustainable. Big projects require big maintenance, which requires big money.

Some of our softer sciences are more convoluted than they are real. It is easy to fake a "scientific" method, and then start writing up technical, complex mumbo jumbo that literally means nothing, or draws invalid conclusions. The media is awash with questionable "scientific results" constantly because their findings are usually shocking, which makes for a good news story (regardless of whether or not it is true). Readers want "exciting" and the media is always willing to provide that.

In some cases, such as economics, there is some real strength to the underlying theories, but in practice, a rosy prediction of the future is far more likely to impress, than a truthful one. The actual practice of science is bent towards more short-term objectives, like making a living.

We've seen this often as well in the financial sector, where most recently billions were made by selling complex financial instruments (CDS), that were based on ridiculously bad mathematical foundations. Stuff that was obviously wrong, but people willingly put faith in the embedded complexity, assuming that it was correct because they didn't understand it and it made them money.

A couple of very surprising examples of complexity come from unexpected places.

My view of mathematics was as branches of pure untouchable abstractions, theories, and formulas that can get applied to help us understand the world around us. Still, I've dipped into a couple of these branches where the resulting work is exceedingly obfuscated, to the point was it seems intended to obscure the underlying ideas. The work seems like exercises in excessive symbol manipulation.

Like everyone else, there are enough mathematicians of average ability out there that really want to be seen as standing above their peers, so they resort to trying to spice up their works a bit.

Mathematics consists of many formal systems that are analogous to computer software programs in such a way that the authors of mathematical papers can produce spaghetti definitions, spaghetti theorems, and spaghetti formulas in the same way that a programmer can produce spaghetti code. That is, they can create things that are orders of magnitude more convoluted and complex than is truly necessary. Calling something spaghetti code doesn't mean that it won't work properly, but it does mean that understanding it is hard and changing it is likely fraught with difficulties.

Nature only recently (midway through last century) revealed the structure and shape of its complexities to us in the forms of fractals and chaos theory.  We can see the self-similarity of the underlying bits reflect themselves over and over again as we go in and out of the detail of many simple things like trees, mountains, clouds, and forests. In chaos theory, we've come to understand how small changes, even in simple formulas, can have profoundly large effects, and also how things can have seemingly random paths that still appear to orbit about certain regions.

Both are manufactured world, and the natural one in which we live are rife with limitless examples of complexity. Even the natural progression of our universe, that is 'entropy' is bound to gradually reduce any order into chaos. Complexity is the state to which all things return, to which we have to fight the hardest to prevent.


THE LIMITS OF COMPLEXITY

Some people are incredibly smart, but even the very smartest of them are not massively more intelligent than the average person. That is, although it is hard to quantify and measure, we won't find someone 10x more intelligent than average, and it is unlikely to even find someone "twice" as smart. Our brains have fundamental limitations beyond which we cannot go. There is only so much we can hold, understand and respond to at a given time. Some people may have moments of greatness, but that is balanced by the rest of their time. No doubt, the smartest man or woman on the planet right now has the occasional off day; days were their level of functioning is well below average.

And certainly, we can see from experience that even a group of smart people, collectively can be working together at a fairly low level of intelligence.

Committees resist intellectual qualities because they essentially normalize their output down to the lowest common denominator. Groups of people do not easily raise the barrier of intelligence. Intelligence just doesn't scale, we don't get 10x more intelligent behavior from a group of ten super-smart people. And depending on the state of their interaction, we might not even get 1X more intelligent behavior from a particularly discontent or dysfunctional group of smart people.

What that means is that for every person, there is some level of complexity in front of which they can operate, but beyond which they start to fall apart. Things start to happen, unexpected ones. They cannot cope.

We all have this threshold, over which we cross and our abilities become compromised.

And sadly, or strangely, for most people, their individual thresholds are not nearly as far apart from each other as most people want to believe. There are differences between people's abilities, of course, based on environment or personality, or intellectual capabilities or even just pure memory retention, but few people really transcend their own origins in the way that they've convinced themselves that they have.

We're the masters of making our own myths. Of believing that we've somehow risen above our nature and can now proceed, consistently, at some higher level of behavior and thought. Crashing down from those lofty heights is a common pain, felt by most.

Having well-defined intrinsic limits on the overall complexity we can handle, has forced us to search out newer and stronger methods of mitigating our eventual problems with control.


CONTAINMENT

It helps us so little to understand the nature and form of complexity if we are just going to accept it as is. Complexity builds up and becomes increasingly dangerous as it does so. Things start out OK, but gradually over time, they get worse and worse. Because of that, the next really big leap in our modern age won't be new branches of science or even more effective engineering, but it will be developing an understanding of complexity and being able to systematically reduce it in whatever guise it is hiding.

A significant cause of project failure in software comes from projects where the developers let the complexity run out of control. This is something that most veteran programmers have experienced at least once, if not many times.

It happens either because the project is changing too often, or the scope is gradually getting larger and larger. Either way, the differences and fixing them start to quickly negate any real effort in the works. The project spins its wheels endlessly, at full speed, yet gets nowhere.

Often this is called a death march.

Software developers have a long and sordid history of losing their works to these types of organizational disasters. Once the downward trend starts, it can be difficult or impossible to reverse, and the project is essentially doomed.

But it is in the exploration of these types of issues, that the software community leads many other domains.

We do, after all, acknowledge our short and long-term choices in terms of 'technical debt'. We know to control complexity (even if we don't) and we know the importance of going back over our work and refactoring it, or just doing necessary cleanup.

Different movements in programming have been arguing about the right approaches for decades now, but at least the argument is occurring. You don't see massive bureaucracies seriously try to control or bring down their internal complexity often. There is essentially no refactoring happening in either the scientific community or in our massive organizational structures.

Since it is so much easier to add complexity, little gets done to relieve it.

Still, like an out of control software project, so many of our intuitions are plagued with serious complexity. Each and every time I've dug into a specific domain, I been surrounded with staggeringly bad examples of out of control complexity. Each and every time the domain experts have dismissed it as just being the "way it is"; that it will never change; that you have to live with it.

Admitting to complexity and controlling it has been key to getting many large projects into the successful category in my career. Ultimately if you know the real source of the problems, dealing with them effectively isn't terribly hard. Still, possibly because the history of software is so short, and has been burdened by so many public failures, many software developers can acknowledge their problems and move on. Most other domains aren't so lucky.

Making even a small change to some fundamental issue in mathematics, for instance, even if it did help tremendously, is likely something that would take generations to accomplish. Mathematics being perceived as being more rigid than all other disciplines is also the one that will easily take the longest to change, even if necessary.

Bureaucracies don't change, almost by definition, and in all ways, every aspect of our modern lifestyles just get more and more complex. We only have ways to increase the complexity, not to study it, and definitely not to reduce it correctly.


THE STRUCTURE OF KNOWLEDGE

If you're being honest and objective about it, what we know, the sum total of our knowledge as a species is a total mess. It is a ragtag collection of bits and pieces, sometimes stitched together neatly, but mostly just dumped out in batches and clumps, over a long period of time. It includes real universally true knowledge, myths, fallacies, relative truths and a huge collection of various unknown bits. Sections of what we know are tidier than average, but the sum total of all of it is a mess.

What we need is both a way to put a structure over what we understand and ways to systematically reduce what we know into some simpler, more accurate clean form.

We do have some quantitative ways of re-arranging knowledge, mostly akin to refactoring.

Simplification, normalization, and optimization are three ways of re-arranging the underlying information to change its properties. Abstraction is a way of just taking out the essence and ignoring the rest. Encapsulation is a way of hiding the underlying detail from a higher level, while still being able to make significant use out of it. All of these can be applied to any structure of knowledge. In some cases, it may be slow and tedious, but it is certainly possible.

Knowledge comes in levels. That is, at some higher understanding we can see a 10,000-foot view of the problem, but to understand more, we need to get lower and lower into the details. At the detail level, it can be very difficult to understand the significance of things to the overall context. To control and understand it, we need to make it consistent and workable on all of its different levels. That is, a great higher level abstraction of a branch of science is not all that useful if the details are still a huge mess. The structure and consistency at each level needs to be harmonized.

Knowledge can be spaghetti. That is, it can be artificially complex in any number of degrees so that it hides the real underlying core. Any text, discipline, idea or thought can be obfuscated to the point where it is super difficult, if not impossible to discern the original. Of course, we know we can re-arrange the knowledge and drop the complexity. In that sense, knowledge is "code" of some type. Our natural languages form a type of programming languages on which we describe the structure of what we know, and the process of using that knowledge in the real world. The sum total of our knowledge as contained structurally is just another system, less formal, but not unlike computers or branches of mathematics.

Most importantly, subsystems of knowledge have isomorphisms to other subsystems. That is, we can map one type of knowledge onto another, and then draw advanced properties and meta-knowledge from both. In this way if we have two completing branches of mathematics, or two competing sciences or even two completing definitions for bureaucracy, we can apply metrics to them to conclusively show that one is simpler with respect to some attributes, then the other. We can decide which one is better suited for our use and move down to that version.

In theory, at least, we should be able to redraft a simpler legal system for example, that contains most if not all, of the same depth as the current one, yet whose definition is only a fraction of the same size. Basically, we could boil down all our laws into a much smaller, more workable subset. The same is true in most organizations. With some investigative work, we should be able to simplify things enormously, with little consequence.


SOME FINAL THOUGHTS

Still, although most people have an inkling that this is possible, in each organization, in each discipline, and in each endeavor history is filled with failed attempts that ultimately have only made matters worse. And it is these, that the pessimists guarding the gates will use as examples of "why it is not possible". The gatekeepers generally believe that they are protecting things, but most often they enshrine the underlying madness. If it can't change, it can't get worse, they say. But it also can't get better.

To get past the natural hesitancy to keep things the same, anything we do with complexity needs to be rigorous, and provable. That is until you can show decisively that some analysis and refactoring will fix the organization, you'll never be allowed to change the organization. And it is entirely because of this, that all things in our lives, over the last couple of centuries, have simply just gotten more complex, and entirely out of control.

Either we find some new way of containing and reducing them, or we just simply let history repeat itself by crushing our society and then eventually starting a new one again from scratch in a few hundred years. Death, it seems is our only current method of effective complexity control. And not unsurprisingly, death marches are exactly what software developers work exceedingly hard to avoid in their projects. Our modern lives are well on their way towards their own destruction from imploding because of rampant over-complexity.

It is easy to guess what will eventually happen (because it always does), but really hard to find a way out of our fate. Perhaps this time, we've risen higher enough to a degree of collective intelligence such that we can avoid the fates of all other times. Perhaps this time.