Sunday, September 21, 2008

The Necessity of Determinism

Perspective is relative. You might, for instance, see a desktop computer as an item you purchase, and then update with various software to complete the package. A stand-alone machine acting as a tool to help you manage your data. This has been our traditional view of computers, and until recently it has been mostly correct.

If you take an abstract view, any given computer has exactly F ways to fail. By failure, I don't mean small annoying bugs, but rather total and complete melt down of the platform to the point where it requires a significant fix: reboot, new hardware, or new software. It is unusable without intervention.

F varies, based on the hardware, operating system and any installed software. A couple of decades ago, with a simple IBM box, DOS 6 and a normal upscale vendor for peripherals, there might have been hundreds of ways for the machine to fail. Any of the core hardware could burn out, software could go wild, or the OS could find a reason to crash. The actual number doesn't matter, we far more interested in the way its changing, and why its changing.

In a sense, on using the machine to complete a task there was one way to success, and F ways to fail. The successful branch is overwhelmingly likely, but the size of F is not insignificant.


DETERMINISM AND ITS NON

There is a dictionary definition for the word determinism that describes it as a philosophical doctrine. A belief in cause and effect. In Computer Science however, we tend to use the term more as an objective property of a system, such as a deterministic finite automata (DFA). The transitions between states in the automata are driven deterministically by the input. There is an observable causal relationship. If the system does exactly what you would expect it to do, no more, and no less, then it is deterministic.

Computers, by their very nature are entirely deterministic. The CPU starts executing a fix series of instructions, such that given the same initial pre-conditions the results will always be the same. That deterministic quality, as we'll get into more detail, is very important for making the computers a useful tool.

Interestingly enough, although their behavior is predicable, computers can be used to simulate non-deterministic behavior. In at least a limited sense, the regular expression operator * eats up an arbitrary number of characters in a string until it is finished or the expression is proven to not match. a*b behaves differently depending on the input string, matching a string like annnnb, but not one like baaaaaa.

A common way of implementing this type of functionality in software is by using a non-deterministic finite automata (NDFA), which is a rather long way of describing an abstract machine with a set of internal states where the transition from one state to another is caused by a non-deterministic reaction to the input. You just don't know when the machine will change state.

You'd think that writing something on a deterministic computer to have non-deterministic behavior would be a complex problem, but they solved that rather early in our history by spawning off heaps of DFAs for each possible state that could exist. These possible transition automata may or may not exist, one input of which may collapse all of them into a single defined state. The NDFA may be any number of possible DFAs, but similar to Quantum probabilities, they all collaps down to one (or none) at the very end.

With this knowledge, it became easy to simulate a NDFA using a dynamic list of DFAs. Non-determinism, it seems, can be easily simulated. Or at least aspects of it.

A neat trick, but it also holds a broader understanding: non-deterministic behavior is easily emulated by determinism machines. Just because the essence of something is predictable, doesn't mean that everything you do with it will be predicable as well.


INCREASING WOES

Over the years, as the software has gotten more complex, the number of possible failures, F, has risen significantly. I believe it is rising exponentially, but proving that is beyond my horizon.

These days, there are many more reasons why modern desktop computers fail. Hardware, while getting cheaper is also less in quality. Software, while more abundant has outpaced it environment for complexity. Software practices have decayed over the years. Although modern operating systems are far more protective of themselves, there are still a lot of ways left to render a machine useless. New holes open up, faster than old ones are patched. And now of course, there are people actively trying to subvert the system for profit.

If it was just an issue of an exponentially increasing F exceeding our operational thresholds for stability, technological advancements would gradually reduce the issue. We'll eventually find stronger abstractions that help with reducing these problems.

However, indirectly, we've created an even worse problem for ourselves. One that is growing more rapidly than just the normal equipment or algorithmic failures.

As a stand-alone machine, we define the number of problems in a computer as F. If we link two machines by a network, this really doesn't change. The addition of networking hardware and software increases F, but linearly, relative to the new pieces. Failures on one machine, in very limited cases may cascade to another, but that is rare and unlikely. Each machines reacts to its data independently.

If one machine becomes responsible for pushing code and data to another machine, then everything changes. The two machines are now intertwined to some degree. F might not double, but there are all sorts of reasons why the second machine may easily clobber the first. As we see all of the different chips and micro-controllers of any standard PC as a single machine, we must now see all of them for both machines as one machine. Bind them correctly, and they are no longer independent entities. The two become one.

Once the interaction goes beyond a simple question answer protocol, where the exchanged data itself can contains failures, both devices are tied to each other.

With two machines and a rather large increase in F, the behavior of the whole new machine becomes rather less deterministic. Like the NDFAs, you can see the whole as the different permutations of its kids. A big machine containing two independent simulations of other machines. We can still follow the behavior of the boxes, but it is no longer as predictable. Small differences in execution order on either of the boxes may change the results dramatically. Subtle differences that mean non-determinism.


THE NETWORK IS A COMPUTER

Almost all of our modern machines are now interconnected to each other in some way. Internal corporate intranets connect to the Internet, which connect to our home networks. We have constant, easy access to our resources from a huge number of locations.

These increases in networking have revolutionized our machines, giving us unprecedented access to a near infinite amount of data (at least relative to one's lifetime). They helped integrate these tools into every aspect of our daily existence.

Software companies have utilized this connectedness to make it easier than ever to keep our current software up-to-date. There is a mass of interaction, at both the operating system level and the application levels. There are tools great and small for automatically patching, updating, reinstalling and coordinating the software on our machines. Windows XP, Java, Firefox, Acrobat Reader and dozen of other packages check on a frequent basis for the latest and greatest updates.

Even more interesting, if you account for all the virus, spam and phishing, then there is a massive number of sources out there interacting with your machine, trying (if not always successful) to give it new data, but also new code.

In essence, we've tied together our machines on so many different levels that we have effectively taken the millions and millions of previously independent computers and created one big massive giant one. One big massive giant, vulnerable one. With an F that is shooting well out of control.


BAD HAIR DAY

In its simplest form, all it takes to trigger a failure is for one of these updating mechanisms to install a change that renders the machine useless. Although there are checks and balances to prevent such things, history is full of great examples of nasty bugs getting lose in the last final round. It is inevitable.

So now we're becoming increasingly dependent on some poor coder in Redmond not having a bad hair day and accidentally scorching our boxes in a serious flame-out.

We're also dependent on some poor coder in the outskirts of civilization hopefully not having a good hair day and finding some new great way to subvert our machines for evil purposes.

If we started by thinking that F was in the hundreds, once you start counting all of the little possibilities, and how they might intertwine with each other, F grows at a frightening rate. There is a staggering number of reasons why an average desktop computer can now fail. Millions?

Given my experiences both at work and at home, these are not just infinitesimally small probabilities either. Over the last few years I've seen more hardware and software failures for an ever changing set of reasons, then I've seen in my whole career. Yet my usage and purpose and intent with my machines hasn't changed all that significantly. In truth I do use more web applications, and spend more time on my home machine then I used to, but the increases in failures has been way over my increases in usage.

My machine at work fails every time there is an system update to a piece of software I de-installed years ago. The box locks up, hangs, and turns off my database. This occurs multiple times per year. Several of the other auto-updates have crippled my box on occasion as well.

My friend's machine rebooted on a usb key insertion. My wifes machine has seized up for a number of unknown reasons.

If it were just me, I could forget about it, but virtually everybody I know has been entangled with at least one good computer bug story in the last couple of years. I'd start listing them out, but I don't think it necessary to prove the point, if you use your machine significantly, you're probably already aware of this.

In truth, it has been this way for a while.

Way back, I remember a system administrator telling me that Windows NT works fine, so long as you don't touch it. Sadly, he was serious, and defending the platform, which I had been verbally abusing because it keep crashing while running some test software. Nothing about the software should have effected the box, yet it was. Desktop computers have long since had a history of being flakier than any of their earlier cousins.


PHYSICAL DETERMINISM

A world-spanning massive super machine is both an interesting, but also a scary idea. Our little desktop machine is just a tiny piece of this bigger cube, entirely subject to its will, not ours. We only get to use it to complete our tasks if we are lucky. The F for this machine is way too high.

Purchasing a time-share on a massive computer is an interesting prospect, but would the circumstances be helped by achieving a massive bump in quality? If there was better security and better testing, would this change things?

These are very tough questions to answer simply. The short answer, often demanded by a younger, and less patient generation is that tools need to be deterministic, and that our current methods of interconnecting our machines defeats this entirely. In its own philosophy, we could say that the bugs are just the effects, the underlying cause is the lack of determinism.

To really understand that, we must start with a very simple observation. A bulldozer is a tool used by construction workers to move large volumes of earth from one place to another, and to flatten big areas. Bulldozers act as a tool to leverage the power of the machine to accommodate the drivers actions. The machine extends the driver's abilities.

It is entirely possible to build some type of Rube Goldberg contraption that given a setting, puts the bulldozer through a precise set of instructions. Physical, like the machines that help manufacturing lines assemble complex objects, completely non-computerized, just physics. We could build it so that the bulldozer is dropped off in a location, turned on and then it would do a specific operation like clear an area of land, or move a big pile of dirt somewhere. Set the switch to 40x40 and you get a huge square of land precisely flattened.

This type of mechanical automation could be used to remove the necessity of the driver, whom after all is just sitting there operating the machinery. The problem is that, in operating by itself, even with a very fixed simple set of rules, there are always unexpected circumstances that will creep up. In order to prevent serious accidents, someone must monitor the progress of the machine, and it is far better for them to be working with it while this happens, then for them to be sitting on the sidelines in a chair.

You can automate a factory because it is a limited controlled environment, but you'd never be able to automate something for an un-controlled one, common sense and safety keep us from doing so. Even the tiniest of accidents would draw a huge storm of protest.

The tool is best if it deterministically caries out the instructions of the driver, extending their abilities to do manual jobs. For both the factory and the bulldozer, determinism is a crucial aspect.


INTELLECTUAL PURSUITS

While that makes sense in the physical realm, people often differentiate between physical effort and mental effort. It is one of those class hold-overs where physical is seen as less desirable. In reality, most physical jobs demand a huge degree of mental energy, and sometimes this shifts over time. Los Alamos relied on human calculators in the pre-computer days, to do all of the "manual" calculation work, now easily seen as intellectual effort handled easily by a simple calculator. An Olympic class athlete's brain is working overtime trying to control their reactions to a massively precise level, a huge feat in thinking.

Effort of all kinds is really just a mix of some percentage between physical and mental. The two are closer than most people care to admit, some analogies relate the brain to being a muscle for thinking. Just another way to expend effort.

As anybody who has ever tried to automate a complex intellectual task knows, if you can't contain it there are as many possible obstacles to getting it to work correct as their are for the bulldozer. We cannot predict all of the variability of the real world, be it a mountain of dirt in a field, or a mountain of information in a corporation. It might, in some way be possible to account for all things, but realistically we need to assume that the possible failure conditions are infinite. Some are tiny, but it's still infinite.


REACTIVE INTELLIGENCE

In a world of unpredictability, we'd need a extremely complex calculation engine to be able to cope with an infinite variety of errors. Artificial intelligence is the much heralded savior for our woes. The only problem is that some people strongly believe that it is not possible, and have constructed proofs for such a truth. Many people believe that the failure in research to achieve it already is proof enough.

It is an interesting debate, and I'll digress a little into it. I have a tendency to think it is possible, but I think that most people studying it have grossly underestimated its inherent complexity. In a sense, they are looking for a simplified, pure, abstraction that provides this heighten capabilities. An abstraction that fits the neat ordering of our thinking.

I tend to see thinking as being intrinsically messy, and in many cases such as creativity, quite possibly flawed. I've often suggested that creative sparks are failures to keep things properly separated in our minds. A flaw, but a useful one.

A good simple example is the Turing test, a simple way of determining if something is "intelligent" or not. A person blindly interacts with a couple of entities with the intent of finding out which of them is a machine, and which is a human. If the person cannot distinguish between the two, then the computer's behavior has been deemed intelligent.

The problem with this test comes from the original episodes of Star Trek. Given a Vulcan -- an idealized, overly logical race of beings -- as the machine entity in a test against a human, most people would assume the Vulcan to be a computer. The logic and lack of emotion would be the keys. A Turning test for a human and a Vulcan would fail, showing the Vulcan as not being intelligent.

However, our idealized alien, at least in a TV-show script sort of way, is in fact an intelligent being quite capable of building star ships and traveling through the universe, a massively complex and clearly intelligent feat. Although it is only a TV show, it does act as a well constructed thought-based test. There is at least one alien intelligence we can conceive of that would intrinsically fail the Turing test. It would appear as a computer without intelligence, when it clearly wasn't.

I see this as hugely important in that the researchers are out there looking for something idealized. Something pretty, something with structure. They are searching in a specific location in a rather large field, my hunch is that their prey is hiding at rather an opposite location. And, as always, we can't search there, until our current path(s) take us closer to that area. I'd guess that we have to understand the structure of knowledge first, before we can learn to reliably extract out specific relationships.

It's likely to turn out that our own hubris-based definition of intelligence may actually be a big part of the problem.


GOING TO THE DOGS

What we consider as intelligence isn't often so. There are lots of examples, but fear and politics always provides the most interesting ones.

Over time, various media have discovered that shocking scary stories help sell the news. Because people like simple stories, there has been a tendency to report dog attacks, but only if the animal is described fully or partially as a Pit Bull. The term has come to invoke fear, and the common stereotype is of a vicious animal most often owned by drug dealers or other nefarious folk.

It's always a good simple story that gets a reaction. Often the dogs in question aren't even remotely Pit Bulls. The largest size of the breed is around 65-85 pounds, but its not uncommon to see stories about 120 pound "Pit Bulls".

It also doesn't matter that all other dogs bite, there are millions of Pit Bulls, nor that most of the supposed Pit Bulls are actually just mixes of many other breeds. Somehow this one domesticated animal has been granted superior dog capabilities. We've been living closely with the "child's nanny" for over a hundred years, they are the mascot of WWI, the RCA icon, appeared with children in films and have thousands of other cultural references. However, people wanted a villain, and the Pit Bulls were appointed.

Fatalities is where the anti-dog fanatics really like to focus, but they are slow to compare the dog-based numbers to those of other animals such as horses or even cows. Cars and guns kill millions every year, and somehow a tiny number of deaths becomes a growing epidemic. Yes, interaction with animals sometimes ends badly, that has always been true, no matter what animal is under consideration, domestic or wild. Anybody who has ever owned pets of any kind easily understands this.

Hype, misinformation, etc. aren't new, but we live in an age were it is becoming harder and harder to get away with this, and were ironically we are getting more and more of it. So, as a side effect of selling newspapers and nightly news casts, one of North America's most distinctive dog breeds has been used as the scape-goat for all of our angst about nature. Ironic, particularly in a period of intense "green" frenzy. And what more would we expect from this?

Of course some regional government, desperate to look "proactive" seizes the day and passes a breed-specific law banning Pit Bulls from an entire province. Sad, given that the negative hype was initially profit driven. The underlying facts aren't there, or are irrelevant. "Surely anybody who reads the newspaper would know that Pit Bull bites have reached an epidemic status and something needs to be done!"

The twist in all of this, is that we use our intelligence to build up and maintain our collective set of social rules. Because we live in an "intelligent" society we make rules and laws based on our understanding of the world around us. Smart people supposedly come together to lead us.

Ironically, the law didn't even really pass correctly. The dogs they were trying to address were mixes of Pit Bulls, but the law on docket is now restricted to only pure Pit Bulls, dogs with papers. It's all but useless, other than having taken away the newspapers ability to print good "Pit Bull" stories, and keep breeders from selling pups. It's a sad example of irrational fear gone completely wrong.

A new set of laws exist that don't do what they were supposed to, and the only reason they were created was to keep the public believing that a political party was actively trying to solve a problem (even if it was a trivial, non-existent one). It's insane, and clearly not intelligent. Even if you buy into the anti- "Pit Bull" hype, the resulting law itself is completely in-effective, a failure on top of a failure. The pure Pit Bulls have been all been replaced by mixed breeds. Nothings changed. All we have is another useless set of laws drafted by supposedly intelligent beings.


EXCEEDING DILBERTIZATION

This is but one simple example of how we have been building up our rules, our political systems, our organizations and our knowledge over the decades. We're exponentially piling on Dilbert inspired thinking to poorly solved unrelated problems. We have absolutely no way of disproving bad ideas or compressing our increasing mass of mediocre ones. Any new legislation from any existing body is equally as likely to be bad as it is good. And oddly, it is not just subjective taste, there are massive examples of clearly stupid short- or long-term ideas getting passed as though they were somehow intelligent. We live in a depressing era. Good for cartoonist, bad for intelligence.

Given the arbitrary mess of our collective knowledge, its easy to see how we appear as kids just barely beginning to qualitatively think about our surroundings. There was a time without zeros and negative numbers, pretty hard to conceive of now, but it wasn't that long ago. There will come a time when we can show an irrational bit of thought for what it really is. That, I guess, is probably were real intelligence lies, we're only partly there right now.

So what we know take for intelligence is barely such. We react to the world around us, but often at such a guttural level that it should never be confused with intelligent behavior. We still rely heavily on our older emotional capabilities, our intuition and of course superstition as aides in making many of our daily choices. Hardly rational, and barely intelligent.

Computer Science sits on the cusp of this. It is the one discipline that confronts this messy disarray of our knowledge and behavior on a daily basis. We guess what data we want, guess its shape, and then hope it works. The rampant changes in specifications come from the poorness of our guesses. If we were right initially, it wouldn't keep changing.

And our most venerable solution is by going around the problem with artificial intelligence. If we only had that, some people think, then our problems would be solved.

The biggest problem with our actually finding artificial intelligence is that like the Vulcan, Spock from Star Trek, it will simply annoy us, because it's rationale for doing things will be well beyond our own understanding. It will act in an intelligent manner, always.

It is very likely that we are in fact only half-intelligent. Simply a good step down a very long road to hopefully becoming a fully intelligent creature some day. Any good read of a daily newspaper will easily fuel that suspicion.

So, in all likelihood, even if artificial intelligence is discovered, it is likely something we don't want in our lives at this stage in our evolution. Some day perhaps, but not now. It just isn't going to help, and if Hollywood is to be believed, it will make things a lot worse.

If we can't have it, then our only option is to make all of our tools simple and deterministic. The tools have to work, no matter what environment that are placed in. We can't just keep adding rampant complexity and hoping that some magical solution will come along and fix all of the issues.


THE LITTLEST OF DETERMINISM

While I've gone on about the effects of failures, even at the smallest level our tools need to be deterministic as well. Certainly there was enough initial theory for GUIs about making everything on the screen be a direct result of the users actions. What happened to great ideas like avoiding modal behavior?

Good practice that we have been losing over time. Windows move on their own, focus shifts arbitrarily and dialogs pop up unexpectedly. It's an annoying type of sloppiness that degenerates the usefulness of tool.

We shouldn't have to stare at the machine to be able to ascertain its state, our actions alone should accomplish that. In such, a blind person should be in control of a simple program without ever having to look at the screen. The results on the screen should be absolutely deterministically caused by the user's actions. Strange erratic actions are non-deterministic.

Some early operating systems like Oberon did this exceedingly well, but as is often the case in software, with each new generating ignoring the knowledge of the past, much is lost while little is gained.

In general the idea of building non-deterministic tools is easily proven crazy. We clearly wouldn't be happy if the controls on a bulldozer or car, "sort of" worked. Forward mostly went forward, stop kinda stopped, etc.

And how useful would a calculator be if it randomly added or subtracted a few numbers from the result. If the results of 34 + 23 could vary by a couple of positions?

Tools are there to extend what we are doing, and they work when that extension is simple enough and predictable enough for us to have confidence in the results. When it has all become convoluted beyond a simple degree, we may have a tool, but using it is uncomfortable. A stupid semi-automated bulldozer is not an intelligent idea, it is an accident waiting to happen.


AND BACK AROUND AGAIN

Errors, networks, machines, intelligence, features, etc. all are tied together by the necessity of having that deterministic property for a useful tool. We must be able to predict the tools behavior.

We want our intellectual tools to work in the same way as our physical ones do. They should leverage our abilities in a simple and deterministic way, so that we can accomplish so much more. Tools should leverage our efforts.

Our modern computers are increasingly failing on this front. The number of failures is increasing rapidly, as we add new "features" we keep kicking it up to the next higher levels. At some point, the instability exceeds our abilities to construct it and the usefulness of the machine plummets significantly with each new increase in failures.

Our interfaces too have been increasingly failing. We have forgotten those simple attributes that should anchor our designs, things like determinism. We might have big fancy displays, spewing lots of colorful graphics, but if we can't trust what we are seeing, the enhanced presentation is all meaningless.

Artificial intelligence may some day grace us with its presence, but if anything it will complicate matters. We'll still need tools and they'll still need to be deterministic. It's usage will (and should) be limited and tightly controlled, possibly it is a similar dilemma to athletes taking steriods, something that clearly has an effect, but for obvious reasons is entirely non-desirable.

Determinism is a hugely important property of computers, that we've been letting slip away from us in our haste to make prettier systems. Where we think that tight bindings of the systems are making them easier to use, the truth is exactly the opposite. The more instabilities in the machine the more we stop trusting it. It is said that a poor workman blames his tools, but I'd guess that a foolish one uses undependable tools and should probably be blaming them.

Given that we have the foundations in Computer Science to understand why we should be building deterministic systems, our increasing failure to do so is all the more disconcerting. We want simple, but not at the cost of stability, a point where we have already sacrificed way too much.

Saturday, September 13, 2008

7 Fabulous Ways to Great Programming

This post is for all of you coding surfers that have ever anonymously filled in "TL;DR".

So you wanna be a great programmer, do ya? All you have to do is follow these seven easy bullet points:

  1. Stop reading bullet points!
  2. You heard me, stop reading these stupid bullet points!
  3. They're not helping, you know.
  4. They are often just fluffy platitudes.
  5. Still reading? I thought you'd get wise by now?
  6. It just shows how useless bullet points actually are ...
"Crap, he tricked me", you're thinking? I did so on purpose, but only because I really do want you to be a better programmer, I'm not kidding.

"Get on with it", your inner voice is screaming, adding in "just give me the highlites, the bullet points, the summary, dude. I don't need the rest of your stupid rant."

The problem -- in bullet points -- is:
  • bullet points only convey "summary" information.
  • bullet points are forgettable.
  • bullet points are junk food, a kinda McIdea that neither teaches nor satisfies.
Honestly, you can't learn anything significant from bullet points. It's just not possible. If you're lucky, they'll remind you of something you learned earlier and bring it back to the surface, but you're not going to achieve knowledge from something like Cole's notes (google it). These things may help you if you've already had exposure, but they just ain't got the knowledge in them.

If you want the knowledge then you have to get the knowledge, otherwise you know nothing.

Ways to do that:
  • Spend hundreds of years writing every possible type of program.
  • Apprentice with an experienced programmer.
  • Read, read, read and read.
  • Take courses, then read.
  • More reading.
"Well, I don't want to waste my precious time reading someone else's long rambling crud!", your might be screaming by now. Hell, even spending the 30 secs to type "TL;DR" might be excruciatingly painful.

Well, too bad. So sad, sorry and all of that. You foolishly picked a rather incomplete occupation; programming and software development are barely out of their diapers. We just haven't had centuries to distill the knowledge out of the experience, yet.

Some day, perhaps, but until then we're all struggling with trying to find a voice to share our experiences. Sometimes that provides good solid reading, but sometimes it only comes off as a rant or a ramble, with hidden buried gems piled deep in the subtext.

But, and here's the BIG point:
  • There is real knowledge buried in the subtext.
  • You can't learn that knowledge unless you read the full text.
  • It can't be summarized.
  • It's not even necessary the main point.
  • Skimming the text misses its real value!
Some stuff is subtle and easily missed. If the author knew how to really express it, it probably would have been written that way. But it's impossible to squish all of the knowledge into summary. That's why its called a summary.

A little knowledge is a dangerous thing. Always has been, always will be. If you sort of know how to drive a car, and you mostly follow some of those "rules" about silly things like lanes and stop signs and such, you're not going to last long. You're just a roving accident waiting to happen.

And oh, if that knowledge, has been even further diluted into a list of platitudes, ack! Seven reasons for "anything" is probably useless to you. It is probably useless to most people, unless it is just acting as a reminder for known ideas. Really. For starters:
  • Platitudes say nothing, but sound good.
  • They are easily forgotten.
  • They fill you up on junk knowledge, when you should have been learning.
A bad food diet is an obvious fail, so is a bad intellectual one.

"Still, people could distill their crap into 3 easily readable paragraphs, dude." you're insisting.

Possibly, but most bloggers are amateurs, we barely have time in our lives to write this stuff. Half the time we don't even really know what it all means. Not, as you might guess, the explicit text of what is being said, but what it really means in the bigger sense, the holistic view. There is so much buried, hidden between the lines, just waiting to get a voice. Particularly in a field like Computer Science were it is still not fully understood, huge amounts of important knowledge get buried in people's direct experiences. It is hard to fully express that understanding.

In a strange sense, we can only communicate what we explicitly know, but often times the topics are implicit. Sometimes when I am writing, for example, I'll dig at something deeper, but other than just bouncing around it, I don't really have the vocabulary, yet, to express what I am trying to say.

Some of my later works, are just continuations or follow-ups of my earlier writings, each time they are getting a little closer to the real underlying truth.

Pretty much if you read most of the serious essayist bloggers you'll find the same thing. Learning to express something unknown is a truly creative act, a spontaneous one. Of course just repeating well-known platitudes isn't, but then isn't that why they are platitudes in the first place?

Blogging as a medium is a direct way to access a mass amount of early information. It has not been comfortably packaged into neat theories and pretty textbooks. It might be the predecessor of some well structured understanding, but only if you're willing to wait for it to drift into the main stream. In an industry like software development, where so much of what we do is based on intuition and guessing, getting any addition comprehension is a major assistance.

"Fine, so long blogs are often interesting, but badly packaged information; I've got masses of dysfunctional code that I've written that needs urgent fixing. I just don't have time to dig for gems." you moan, adding it "and it has nothing to do with my bugs anyways!".

Initially I said:
  • I want to people to be better programmers.
  • bullet points suck!
  • real knowledge is buried in the subtext.
Like many bloggers, I write because I want to share my knowledge and experiences, but it's not nearly as altruistic as it sounds. Programming mistakes are making my life miserable. The current state of our industry is embarrassing. I would have hoped that programming had progressed a little further as I got older, but so much of our current code base is just awful. Damned awful, really. And I keep getting stuck trying to add something useful on top of an ever increasing mess. Sadly, most of the problems stem from messy inconsistent behaviors in the code, a problem that is getting worse, not better.

We could wait for an understanding to trickle down upon us from academia. Some day, there will be cohesive theories and processes that support reliably building complex systems. Someone will eventually discover a better way of coding. But, that process is slow, and at its current rate things are not likely to improve until well after I have finished my career, possibly my lifetime.

Another alternative is to reach out to the industry, and hopefully explore the issues in a way that we all come to learn how to build better code. There are pockets of excellence in programming, but they certainly are not the state of the industry.

The single largest problem with software comes from its inconsistency. It is a dog's breakfast of inoperative ways of handling data, often half finished, and poorly extended. It is a big mess that we are deliberately making bigger.

"Sure, but not my stuff. It just has a few bugs, that's all" you chime.

Here's the rub (as Shakespeare might have said, according to Cole's notes): The single greatest, most important, significant all-powerful encompassing, awesome, extreme, critical, supreme points about programming are:
  • Focus!
  • Self-discipline.
  • Consistency.
You'll never meet a great programmer that doesn't have all three. They are mandatory. They often have other qualities, but none of the highly skilled programmers can survive without these basic attributes.

If they're out there telling you otherwise, you know they're just blowing smoke. It's simple:
  • good programmers write good code.
  • good code is neat and simple.
  • good code is consistent.
A huge mess of code is just that, a huge mess, not good code. There might be some great ideas buried in the design, but unless it is well implemented it is not good. Programming is about the output, if it is messy or flaky that defines the quality of the work.

"So what the hell does this cranky nonsense of yours have to do with my code?" you start to ponder.

Here is the easy bit. Very simple. This entry, as the title indicated was about fabulous ways to great programming. The most important of these is the ability to focus for long periods. If you are finding that reading large blog entries is far too taxing, then you are having problems focusing. If you can't get through several pages, then how can you expect to get through five years working on the same massive code base? Serious development is a huge amount of work. A month of HTML is fun, but light-weight coding isn't the same as having written something big and serious.

The problems, the real ones, take a long time to sort out and are complex to build. Inherent in this, is a tremendous amount of focus and consistency. To survive, year after year, you need to be self-disciplined. These attributes are all intertwined.

If you're flailing at the keyboard, or waiting for your boss to "make you" refactor that mess you spazzed into the machine last month, then focus or self-discipline could be the problem.

In point of fact, if every long blog article is automatically "TL;DR" by default, this ADD driven approach is bound to spill over into the other parts of your career. If you can't focus for longer than a few minutes, there is no way your code is going to be "great". Just ain't happening. Programming isn't a multi-tasking opportunity; you sit down for long periods at a time, heavily focused on the work. If you cannot do this, your code may be a hell of lot of other things, some of which may end up entertaining people on WTF, but it unlikely to ever be great, and probably not even good.

Consistency is a mandatory property of good code. Focus and self-discipline are the ways to get it there. Bouncing around on the net, reading only short platitudes and bullet points indicates a possible inability to focus. Leaving comments such as "TL;DF" says more about the reader than I think they would care to admit.

The way to better coding is to spend more time trying to learn. Experience is a way to refine knowledge, but you need to acquire it first, or else you will just wander around in the dark forever. Beyond the standard texts, which are far from complete, the new and often critical knowledge these days is buried in long blog discussions. An apprenticeship would be better, but experience leaves our industry fast, and many people misgauge their own abilities.

Certainly any increase in focus and concentration will filter back into your coding practices. If you keep it up, the things you write will be cleaner and better structured, giving you a fighting chance to remain a programmer for longer than just a few years. Good work comes from good habits. Great work comes from really understanding all of the details and nuances of what you are doing. It is several levels beyond just being able to get it to compile.

For all those that skipped most of the above text, just a quick summery:
  • bullet points rock!
  • great careers are opening up in marketing.
  • happiness is a bigger hard-drive.
  • the clothes make the man (or woman).

Saturday, September 6, 2008

A Dependency Too Far

History is littered with failed ideas. Bright sounding, sensible suggestions that utterly fail to deliver, or are just outright fallacies. Heck, we spent how long thinking the planet was flat? Math was only geometry? Alchemy and science were magic? Women were envious?

Every great intellectual pursuit has wandered down a blind alley and gotten stuck for periods in its history. It's expected. Once we get a bad idea entrenched, it takes some time to free it up again. It's a shame, but it is all part of growth. Progress, in terms of learning, spreads out in all directions, some of which are duds.

Computer Science, being young and all, has more than its fair share of bad ideas. Where some disciplines have received a bit of rain, we've always gotten a torrent. We're drowning under a steady flow of black and white rigid half-truths that only have limited applicability. Something about software causes people to try to jam down half-baked one-size-fits-all theories that ultimately cause more harm than good. Ideas that stick when they shouldn't.


REINVENT ME

My personal favorite, used so many times incorrectly is "don't reinvent the wheel". You can't get a bunch of programmers together for too long, before one or more of them falls into this old saying.

When most people shout this one out they are using it as the de facto reason why all programmers, everywhere in the world should use some specific existing library to perform some actions. It is usually used as a reason why you shouldn't be writing code. "It already exists, you clearly don't want to reinvent the wheel, do you?" goes the logic. Sometimes, its specific to a known library, often its just a reference to writing anything that is already known to have been completed.

There is so much wrong with this statement, and its usage is often badly applied.

The first point is that it isn't really a reference to making use of a specific library, its a reference to making use of a specific idea. By "not reinventing the wheel", you're actually trying to avoid recreating the ideas behind it. The wheel is a concept, not a specific instance.

That point is more obvious when you realize that you wouldn't use a bike wheel on a car, or a wooden carriage wheel on a bike. Depending on what you are building, you often have to create your own adapted version of the wheel.

The advice is really to avoid rethinking through solved problems; not about sticking to only the instances that are available. Reinventing it is quite different than reimplementing it.

It is far closer to saying don't reinvent the design pattern, then it is to saying not to reinvent the specific code in some library. If there is a known algorithm for handling that problem, why agonize on creating a new one? Build on the ideas that are there already.

But you can't confuse that with just building on specific known instances. Working through your own algorithm is totally different than working through your own code.

Had we not reimplemented the wheel occasionally we never would have evolved. Given that we invented wooden wheels first, had everyone stuck with them, most of the vehicle's we have today would not have been possible. Our wooden wheels would blow apart on the highway. Progress would be nil.


EXISTING CODE

Even if you counter for the mis-use of the term, a lot of programmers think that it is always a good idea to exploit any existing code. If the code exists, why rewrite it? This notion is predicated on the idea that if it was already written, for some reason it must the a reasonable and workable way to solve the problem. It's sort of the anti-thesis to progress.

It is one thing to reuse the code you wrote, over and over again in a project. That is not only perfectly reasonable, it is also considered good practice. Internally in any development, the programmers should leverage all of their work to the maximum effect, which definitely means not constantly rewriting the same subset of instructions again and again. Coding is slow and time consuming, finding ways to leverage the effort is good.

We're this goes wrong is that people assume that if it works internally within the project, it is just as reasonable externally. If the code exists in the industry, it should also always be used, no matter what. No exceptions.

If this were true, we'd get some horrible problems. Once its written, it is written for all time. We can never fix the problems or find a better solution, because someone already wrote that. Also implicitly, the people to get to the problem first, inherently must know how to solve the problem better. If someone wrote a library for handling logging, for instance, some people seem to think that implies that they are experts on logging. Sometimes that is the case, while often it is not. With more experience and hindsight the implementations are often much better.

Just because something exists, that in itself is not a good enough reason to use it. Yes, you should be aware of it, but sometimes it can be more trouble then its worth.

The real problem, is that programmers are always looking for hard and fast rules that keep them from thinking. Everyone just wants something fixed, so the issue doesn't have to go up for discussion for hours and hours, or days and days. That wish, of course is the key part of problem.


DEPENDENCIES

Ultimately, we must always look at our software from a higher perspective. When we write a solution, these days, because of the intense complexity of our environments, infrastructure, etc. we are only writing a very small percentage of the actual final code. What we control is tiny in comparison to what we do not.

A huge problem then, one that often rears its head in development is what happens with the code we do not control.

If we find a bug or strange behavior in something we wrote, obviously we can fix it. If it's not ours, but is fixable (avoidable) that problem propagates upwards into our work. If we can't fix it, we have even more problems.

The old (and reasonable) term for someone else's code was calling it a dependency. That term implies that the underlying library, framework, technology, protocol, etc. is something on which we are depending. And the effect of any dependency is to impact the success or failure of a project.

We can't release our code if its based on features that don't work in a database, for example. All the work we've done, all of the effort is wasted, if we can't find a way around the problem.

Dependencies are risky things. They are inherently dangerous, and should be well tracked and well understood.

If you knew there was an inherent risk in performing some action, even if it was small, then you understand that the more you perform that act, the higher the likelihood of having problems. That's the basis of probability. Do something five times with a 1 in 5 chance for some outcome, and the odds are pretty good you'll trigger that outcome.

If you're out there dumping every possible dependency you see into your code base, you're on your way to a major meltdown. Your risk is sky rocketing at an alarming rate.

The way its used, the phrase "don't reinvent the wheel" is tantamount to saying we should try to gather as many dependencies as possible, without thinking at all about whether or not they are reasonable: implying that dependencies are always better than code. That's just madness.


REASONS

Sometimes a dependency is unavoidable, sometimes it isn't. The total inherent complexity of one's own code can be less than the inherent complexity of a library. That is, sometimes, in some cases, for some reasons, writing it yourself is a much much better idea.

But, as you may have guessed, I was being really really ambiguous about when and why that is true. Like all good and messy things, the choices aren't simple.

A couple of easy ones exist. If you're the expert, or more expert than an existing version, then you should likely to write your own version. If the code is so simple, that everyone is an equal expert, then again, unless there are time constraints, you should probably write it. Another expression "a bird in hand is worth two in the bush" applies with code, specifically when the version you write can be orders of magnitude simpler than an establish library.

That happens when, for example, the library code is heavily generalized and attempting to meet the requirements of a large diverse group. If your usage is a tiny fraction, and you understand it well, then it can be time well spent to make your own.

On the other hand, if you're building an accounting application, it would make almost no sense to try to rewrite something huge and complex like a jpeg library. When you are coming from a problem domain unrelated to some specific technological domain, you have to focus on the types of code and solutions that you want to support. You might have the ability to write an image library, or a specific subset, but do you want lose focus on the real coding issues? What happens when your expert image programmer leaves, who will maintaining or enhance the code?

Another thing to consider is who is building the code, and how long are they going to continue to support it. Some libraries need frequent updates, yet if the initiative drops out, you've invested time in integrating something that is rusted and useless. Particularly for Open Source, if the project doesn't reach a significant level of success it quickly becomes orphaned. That can render the code legacy; why maintain someone's else mistake, when you could have created your own better solution?

On occasion freely available code has even shifted, and been ground down in ugly legal issues. The license for a version is not necessarily the for the next. Projects have gone from Open Source to commercial and back again. If people smell money, that can change their intent. If there never was any money, that can also influence them. Things change, and sometimes that create serious problems. You can't always trust the developers.

In some cases, the functionality is basically simple, but the code is large anyways. If the initial size of the development work is small for example, it doesn't make sense to add in a huge amount of already simple code. It just increases the code base to make the project medium or large, with no real advantage. Assuming the code works well enough, the size of the original project vs. the size of the library is important in not pushing a smaller project into a larger implementation, without sufficient payback for the complexity increases.

A classic is example is a small project taking advantage of a weak, but usable logging library in exchange for keeping the project small. A large project would easily benefit from a total rewrite, since it would fix problems and provide enhanced capabilities, but also a lot more code to maintain. Crossing the size barrier for a project has all sorts of implications.

These issues get even more complex if you account for the different sectors of the programming industry. Programming is done in a wide variety of places with different constraints and different expectations. Where you are programming, and what you are programming effect your choices.


IN-HOUSE

In-house refers to a group of programmers working on domain specific applications in a company or organization whose focus in not software. That includes industries like financial, manufacturing or health care. If the central goal of the organization is not to code, then its priorities lie elsewhere. Programming is a means to an end. Although this is a diminishing area, for the longest time this is where most of the code was getting written.

All major companies needs accounting, inventory and sales systems, and may have found specific automations can be competitive advantages. Internally the programmers are driven to build custom systems, but often with a focus on trying to jam in as many off-the-shelf parts as possible. Historically in-house code has been quirky, and developer-centric, so organizations have learned hard lessons about having mission critical systems tightly bound to specific employees. Costs and faults shoot through the roof if key personal are lost, so after decades, most big companies are actively trying to avoid more of this type of work.

The range and quality of in-house code varies widely. Some of it is high quality, but the internal standards are always significantly less than any commercial sector. Essentially, in-house works can be very crude, and half-completed, but because there is only one real client, and a lot of hand-on operational exposure this is accepted.

Mostly, for many shops, the "don't reinvent the wheel" philosophy is reasonable. This is the heartland of that expression. In-house developers stay for ever or turn over really quickly, but either way the focus is on stitching together a final solution at a high level, not developing a full and complete real solution to the problem. The depth of the code base is always very shallow. With the exception of mainframes, the average lifespan of in-house code is short, and often relative to the turnover in employees.


CONSULTING

Because of the obvious problems with companies expending too much effort on custom implementations, only to have their lifespans cut short, companies are eager to avoid in-house development. That would be fine, accept that some customizations in code can represent significant competitive advantages in the market. An easy example is in the financial sector offering better information access and highly customize reporting to help draw in clients with higher incomes. Tracking is a significant issue for people with a lot of financial assets. Providing this draws in clients.

Over the last few decades consulting companies have risen to take up the challenge of building custom applications for large companies. Their mandate is usually to gather a great mass of requirements and then build a big system. Financially, writing code is only a small part for the revenue, so consultants most often prefer to use libraries wherever possible. Interestingly enough, bugs with underlying libraries are not the annoyances they are to most other programmers.

Most consulting companies work on a "get the foot in the door" philosophy, where their initial goal is just to get the base contract. From there, they want to "widen the crack", and introduce more billing. Scope increase is a great way to do thing, but bugs in underlying dependencies also help. You can't blame your consultants if they have to do twice as much work to get around a database bug, can you?

Consulting code is generally neat, organized and well documented. Mostly because that increases revenue to make it that way. It's biggest problem is that there is little incentive to do a great long-term job, so the code is usually cobbled together from a very short-term perspective. Classically poor architecture for instance. It's far better to just wing it today, and then rewrite it in five years, then it is to get it right for the next twenty or even fifty. Alas, a constraint in consulting is to make it possible to continue consulting.

For consulting, new technolgoies and a huge number of dependencies fit well into their philosophy. Long design times, and short development further reduce failure risks. Most of the systems built are one-offs so they can be quirky, incomplete systems, but they need to map back to a large amount of paperwork somewhere, the paper after all, pays better than the code. It's far more visible.

Even if they are expensive and heavily focused on the short-term, companies still like consulting built systems because the employee turnover risk is completely removed. Any of the companies will happily return to fix the system for you, for as long you want. For a price, of course. It becomes a price vs. risk trade-off.


ASP COMMERCIAL

A huge up and coming market has been the commercial quality systems that are being hosted in limited places. The market is called Application Service Provider (ASP), an unfortunate acronym clash with a Microsoft technology of the same letters. It's usually companies that write and host their own solutions. They cover a full range of services.

Unlike in-house development, the code is often used by a huge number of paying or engaged customers. At the interface level, this demands a much higher degree of sophistication and appearance. In-house interfaces can be quirky, few ASP ones can get away with that. Another huge issue is often performance. Some of these systems are clearly the largest systems in the world, hitting huge levels of complexity that are hard to imagine for many ofter development sectors. Millions of users have a huge impact, a high load and a lot of feedback.

Without digging into it, you might guess that dependencies are good here as well, but clearly the single largest player in the field, Google has shown that rewriting everything from scratch is a better idea. The more they write, the more they own, the more they control. If you control it, you can change and fix it, that's not possible if its a dependency.

Fewer dependencies are better. They are less expensive to fix and less rigid. When you have more control, you can tackle bigger problems.

Reimplementing the wheel, particularity as a wheel 2.0 is a strong way to gain market share. If you are good with technology, the things you build can surpass most of what is available.


COMMERCIAL PRODUCTS

At an operations level, ASP code get implemented in a smaller more personalized environment. The quality of the interface may have to be excellent, but the overall packing of the code, can be sloppy.

The next level up happens when even small problems at an operation level can become huge financial burdens. The most complicated, hardest sector of programming easily belongs to the shipped commercial product arena.

Products have to maintain a higher degree of professionalism and a higher degree of packaging. They also have to have built in means of distributing support issues like patches and upgrades. While it is the hard level of programming, it is so easily to find many companies, products and even industries that fall entirely short of living up to this expectation.

The market, over the years has traded reduced expectations for faster implementations. A poor trade-off to be sure, but people get swayed by dancing baloney and forget about frequent crashes.

In a product, a dependency of any type is an unwelcome issue. If it were possible, writing the whole operating system would be best, because it would eliminate any external issues. Microsoft understands this, but Google seems to be cluing in as well.

While its far too much work for most people, commercial products still have to be extremely vigilant in letting in dependencies. Not only do they cause problems in development, documentation and support, they can also be financial or ownership issues as well. Raising investment is far easier, for example, if you own a significant amount of the underlying intellectual property (IP) of the product. If it's just a nice set of finishing touches on somebody else's code, it is far easier for competition to enter the market, thus it is far risker to fund the effort.

Another big issue with libraies on the commercial venue are the licenses. Some licenses, like the GPL are totally inaccessible. You can't use them, the code is untouchable. You might be able to weasel around the issue with an ASP implementation, but not in a real product. The vast number of vague, confusing and often changing licenses is a constant headache for any commercial developer. Reason enough to avoid specific groups of coders.

Strong products control their dependencies, eliminating them wherever possible. If your running your code on an infinitely large permutation of different configurations, things are ugly enough without having to trust that some other group of developers did it right as well. In cases of extremely high complexity, eliminating any of it, even if it is small is a priority.


A BETTER IDEA

Hopefully I've made my case that there are times and places where "reimplementing the wheel" is actually crucial to success. The real trick in software development is too not get too attached to the any way of doing things. There is never really a "right" way, there are only the blinders that we put on to convince ourselves that we don't have to think about something. In an industry where thinking and creativity are revered, it's odd that people keep tending in the opposite direction on these types of issues.

If you need to have fixed rules to follow, you should start with the idea that a dependency is never good. It's not something you want, but occasionally, it's something that you have to accept. Instead of a negative platitude that forces always accepting, we should be looking for strong reasons why we aren't rewriting this code.

Even in consideration of time, and schedules, it often makes sense to exploit a library today with the intent of re-writing it tomorrow. We just may have to live with a few more dependencies than are helpful for a while.

Like dead code, we should always be going back, trying to figure out how to remove these things from the project. Of course some of them are there forever, usually because they encapsulate something super-complex like images, or because the sheer amount of added code to implement them would significantly bump up the size and complexity to a new level. Whatever, we need to understand why they shouldn't be removed, not vice versa.

In the grandest possible manner, it's not unusual for the complexity of the dependences to exceed the complexities of the code. It's almost always true in a small project, particularly if they tie in a couple of different technologies. It is also sometimes true for large or huge projects as well.

It is a hugely significant statement to say that the un-encapsulated complexity of all of the dependencies is more significant than the coding work in a project. It shows where you should be putting your design and testing resources, it also shows how minimal control you have over the full effect of the system. Control, particularly in an industry prone to legacy or degenerating new versions can make a big different on the long-term success of the project. You can't change it if you don't control it.

Monday, September 1, 2008

Some Lessons from Experience

I started by thinking about what a good strong definition for a software architect would be. That pushed me into reflecting about how I've changed over the years. The things I've learned, the differences in viewpoint, the way I approach problems.

Every so often, it's a great exercise to reflect on our progress, it helps in learning from our experiences. I've been at this software development thing for quite a while, in so many different contexts, that I ought to be able to distill some of that into a few discrete pieces of wisdom. Or at least show how I've changed.

My focus over the last few months has been on deconstructing the various roles that I've had while doing software development. I've been breaking it down into programming, analysis, architecture, vision and management. I do this in the hopes that if we better understand the parts, we'll draw stronger conclusions. When it is all a big giant blur, we tend to ascribe the wrong effects to the wrong causes.

Within the different roles, I want to find a simple sentence to identify my focus. Of course, since the basis for this post is how things have changed for me, my experiences as a junior, intermediate and senior are the central focus.

I thought about setting times on the entries as well, such as the number of years it took me to become an intermediate. But I would guess it's very different for different people, so I don't want anyone to draw the wrong conclusions. It takes as long as it takes to get to the next level, it's not about time, it's about knowledge and experience.

Currently, on a related track, a programmer in our industry will be promoted to a senior position within only five years. But that's somewhat misleading. Programming is just a small piece of software development. It's not being a senior programmer that is important, it is being a senior software developer, but that is a much harder challenge.

I really don't think that anybody, after only five years, really has a grasp on the whole picture. They may know how to code, but the real problems come from being able to see the problems and their solutions, all the way out to deploying them. Vision, analysis, design, deployment and support are all much broader than just programming.

It's a huge amount of knowledge and understanding, which includes groking people and politics. You're certainly not a senior software developer until you've been doing it for at least a decade, and even after putting in the time, many people will never grow up enough to even reach the senior designation. Immature people make for horrible software developers.


ROLES, RESPONSIBILITIES AND FOCUS

For this next section, I'll include a role, and my definition of its responsibilities. Then for each level of experience, a simple sentence that best describes my focus at the time. As a disclaimer, I reserve the right to change my senior viewpoint overtime, I don't want to ever be stuck into a static definition for any role.


PROGRAMMER

Programming is the act of taking a design, often in requirements, and turning it into code for a system. As programmers get more experience, the scope of the code gets larger.

Junior: I was struggling with just getting the code to work.

Intermediate: I developed an obsession with the 'right' way and performance.

Senior: I can build it, if I can see it.


ANALYST

All solutions must tie back to making piles of data solve real world issues. Mapping the messy real world onto an abstract mathematical one is the job of an analyst. They need to understand the size, frequency and structure of all of the data, as well as the relative merits of any of the algorithms underlying the user's functionality.

Junior: I still thought that terminology, formulas and definitions matter.

Intermediate: I saw vague patterns in the data and the functionality, but not quite getting it.

Senior: I see everything (including process) in terms of the underlying structure of information, its size, quality and timeliness. I accept that some things are just irrational.


ARCHITECT

An architect lays out the design of the system to meet the business, technological and environmental constraints of a project. This involves weaving the analysis into the standards and conventions, while overlaying a high-level structure onto the data, the code and the programming teams.

Junior: I was doing bottom-up design, hopefully getting enough pieces for it to work correctly.

Intermediate: I started top-down, breaking the system into major components, then building up enough pieces.

Senior: I have a more holistic approach, drawing the various lines in the project to accommodate the entire process including analysis, development, deployment and support.


VISIONARY

There are a billion problems out there that can be solved with a computer. The person who finds one and suggests a well-rounded solution to meet that need is a visionary. This is not a skill, it's something more, like being able to see the future. It is extremely rare, and is always underestimated in value. Everyone thinks they are visionary, but almost none really are.

Junior: I figured that if I solved a problem, the users will come.

Intermediate: I tried to find a niche, solve the problem then the users will come.

Senior: I finally understand why the users aren't coming, I've learned to go to them.


MANAGER

If you get enough people together, they get cranky so someone has to keep them working. Management is absolutely essential, even if it's subtle. Being a good manager of programmers is far more complicated then being a good manager of most other groups. Culture, intellect and a high degree of immaturity make it tough. Patience, understanding, some hand-holding and a firm but fair attitude are essential to keep things on track.

Junior: I figured that if I got a good-enough team together, they will build it.

Intermediate: I figured that if I treat the team well-enough, they will build it and it will be good.

Senior: I learned to remove all of the reasons for the team not to build it.


INDIVIDUAL VISION

Just a little side rant, that had no where else to go. It seems like a good fit right here:

In my above description of a visionary I left a huge gap for someone to creativity find the problem to be solved. In many ways, this is the very same creativity that so many programmers "want" to solve during their development phase. Often, at the coding level, programmers want unrestricted ability to put their own heavily creative touch into their works.

We see this again and again, where the programmers in a big company, for example, find a little place in the code and do something interesting and creative. That's great for the programmer, as they may get an enhanced sense of pride in having contributed to the over all project, but honestly it is the very same reason that most of the code coming out of these big companies is hideous, ugly and hard to use.

Dozens of young programmers deviating from the design to put in their own erratic little personal touches does not make for a good system. It makes an ugly inconsistent mess. It's ironic, when all of that extra effort, is in fact highly negative.

This phenomenon shows up in any large project, commercial or Open Source, when it is no longer being driven from one consistent perspective. For a tool to be good, it must hold together at a higher level to solve the user's problems. Little clever bits of functionality, badly placed, don't help the users. A nice feature, that's buried in some non-obvious place, is wasted effort. And a system full of too much wasted effort, is one that has taken a sharp negative turn. So many systems have turned to mush under the yoke of too much micro-creativity. It's an epidemic.


AND FINALLY

I put my experience in terms of myself, but I'm sure there will be a few people who disagree, as usual. Anytime we try to distill things down to their essence, we slip into subjective territory. Still, the more we decompose things, the better we get. Some day, in time, when all of the bias is removed, this will all be regulated to an introductory textbook somewhere. Until then, comments are expected. Please be nice :-)