Wednesday, December 3, 2008

The Lines of Progress

Some posts come easily, this one however was a struggle. Life it seems, conspires to keep us occupied making it hard sometimes to find long periods for focusing. And it's an age thing too, I think. One friend of mine believes that thirty is the age cut-off for great inventors, but I tend to favor the idea that our responsibilities grow as we age, along with our expectations. Often the combination far exceeds our abilities. Whichever way, sometimes it makes it hard to get any writing done.

Without bemoaning fate, in this entry I want to show why Computer Science, as well as many of mans other endeavors, suffers so horribly from our fuzzy perceptions of the world around us. It's as if we are walking through a fog with blinders on. The things that underpin our knowledge are often so much weaker than we know or really want to believe, yet until we can accept our knowledge as it is, we cannot fix our understandings. We'll just stay trapped in a world of both abundant truth and falsity that often balance each other out.

I'm sure that most people see our modern knowledge as extensive and nearly complete, we as a species often have a strange faith that somebody somewhere entirely gets it, even if we are personally no longer able to keep up. That might be the case for some limited branches of knowledge, but it has become increasingly clear that the sum of our understandings easily exceeds even the best and the brightest minds of our time. We collectively know way more than any individual can assimilate. And this overall amount of knowledge is increasing at a rapid pace.

It's in exactly this growth that the real dangers lie. Each new leap is becoming increasingly dependent on what preceded it, and increasingly likely to be built on some falsehoods, even if they are subtle and unintentional. This isn't a problem that just effects a specific domain. It winds it way through all of the sciences, no matter how rigorous our efforts.

We build on a house of cards, which often comes to light when we try to organize our understanding, or if we try to automate it in some way. The details reveal our weaknesses.

It starts close to the very roots of our knowledge. Mathematics is our highest abstract, theoretically pure system, far removed from the real world. It sits at the top of all of our knowledge. Below that, science is how we rationalize the world around us. It takes the real world and tries to make sense of it. Computer Science which sits in that shadowy middle ground between these two, ties elements of the real world to some type of theoretical foundation. Running code is theoretically pure mathematics, what it is calculating is not.

It's because of this positioning that software development is one of the few occupations that demands that its practitioners must be able to learn and understand the other disciplines. Software developers are frequently interlopers into other bodies of knowledge. We are free to delve into the details of other knowledge bases, in fact we have no choice, we must learn from their details in order to complete our own efforts.

Because we build on the other disciplines, we delve deep into their information. But so often, as we try to map this back towards our own purely logical systems, we confront all of the irrational inconsistencies that have been ignored and accepted as conventions. Knowledge in a textbook can lightly skip over a few dubious facts, but once in a computer these issues become glaring problems.


AN ANCIENT PERSPECTIVE

As we age, we trade our inexperience for a diminished focus. We may know more but we have less opportunity to change things, mostly because of our own priorities. It's not that we don't rise to positions of power, but rather that the limitless energy and enthusiasm of youth quietly disappears over time. We're higher in authority, but we choose to do less.

In experiencing this knowledge verses energy trade-off I can't help but think about all of the things that I've learned over the years that are slowly fading away. It's the natural consequence of aging that we start forgetting twenty year old details. This makes me realize the fragility of everything I understand. The details fade, but I retain the patterns; the justifications disappear but I remember the results. Learning fades first at the details, then spreads upwards.

Our knowledge is built on a foundation. In our modern age we have increasingly become more rigorous in studying and proving our advances, but that rigor is tightly focused. What we learn from scientific methods is combined with what we understand from the world around us. The melding of our theoretical and empirical understandings, which is necessary because we have to allow for the messiness of the real world, opens up a gaping hole whereby the underlying absoluteness of our understanding can become tainted. What might be perfect in a research paper, can become suspect as it reaches practice. What we know, even if it is based around islands of proof, is not nearly as correct as we believe.

To really understand this perspective, you have to set yourself in the position of an academic scholar several hundred years ago. Pick a time when it was easily possible to have a nearly complete and full understanding of all things known to mankind. If you studied long and hard enough you would reach a point where you knew all that there was to know, yet from our modern perspective you knew just a proverbial drop in the bucket.

Yes, you understood geometry, philosophy and farming, and the patterns of the stars and the moon and whatnot, but electricity, engines, flight and computers would all be pure magic. Our modern chemical based lifestyle with it's vast array of foods and materials would seem to be predicated on a mass of unknown information.

Information that would exceed a man's lifespan to learn. So much more detail than one could fill up on. It's not the differences in technology that are so stunning, or the increases in basic issues like human rights and fairness. No, rather it's that we have progressed from a time where a person might have understood most things, to a time where we can barely even keep up with all of the details of our own field, let alone the explosion of science and knowledge that are raining down on us.

As an ancient scholar we could probably cope with the concept of a car, and possibly the highway system. What might be more difficult are the actual details of a gas powered internal combustion engine. The things that are similar to what we observe, we have to accept, but we may choose to explain the underlying details with some other, more convenient explanation.

If you can set yourself back to that ancient time, then you might also be able to set yourself forward about the same distance.

Although to many it may feel like our modern society has opened all of the major doors to most branches of knowledge, there is still much to learn. As an ancient scholar we were equally confident of the fullness of our knowledge, but look how vastly we were mistaken. There is a huge amount we didn't know, just waiting for us. Cars, flight, and chemistry for instance.

In a sense we are probably only halfway between that old ancient knowledge, and the new future knowledge that we are still left to discover. There is as much waiting for us to learn as there was for the ancient scholar if he was in our time. We know a lot, but really we know so little.


HANGING LAUNDRY

I was buried deep in the functionality of a software program recently trying to get a sense of how it was working. It was old, maybe twenty or thirty years -- ancient by software standards -- and it's behavior was not as expected.

There was something wrong with the underlying algorithms, something pointing to code that was far cruder than anyone would have initially guessed. There are a lot of known algorithms, but this code wasn't matching any of them.

But then to some degree that is a common problem in software. Dig in something deep enough and you'll always find a programmer winging out a crude version of some known algorithm. Worse, sometimes that coder isn't actually a Computer Scientist, but a domain based expert that has moved in coding. There may be a multitude of better, faster, more accurate algorithms, but the one used forever in production is lame.

I can't even say anymore how many times I've been digging into the details, only to find a significant collection of systematic yet long-time accepted mistakes underpinning some well-known software. It's far too common.

It doesn't really matter the industry either, from round-off errors in finance, to printing errors in marketing, to calculation errors in scientific code, to threading errors in development products. The software we produce has a significant number of problems. Some known, some just ignored.

Often it's not even the software's fault. Not actually a bug per say. I can remember one financial product where the convention for a summary statistic was based around an entrenched hardware bug. The bug, extremely well-known, became the basis for the industry convention. A not entirely odd occurrence where an industry bent towards the irrationality of its history. It's always been that way, so why change it?

But it's exactly that digging over the years in so many different industries that has open up the door to my seeing the foundations of our real understanding. Or at least into accepting that most of what we're currently doing is guessing. I keep drilling down into the details, only to find that the details are wrong. Incorrect. Broken. Never by a huge amount, but almost always by something.

But whenever I've push this back up, the various industries are almost always aware of these errors. Small enough to be ignored, big enough to be significant.

Digging into any domain when building software, always involves digging into some industry's dirty laundry.


THE CRACKS THAT BIND US

If you sit on the fence between purity and reality, you quickly find that the cleanness and elegance of abstract mathematics holds an allure, a fascinating smooth, clean black and white philosophy for the world around us. Of course, it's completely untrue, the real world is a messy grey place with many more in-betweens than we want, but that region on the border easily high lights the differences.

Whenever my desire for perfection in the real world becomes to strong, I always fall back on my understanding of sidewalks.

In my neighborhood, at the top of the street the sidewalk has a huge crack running through it. The ground probably shifted sometime after the concrete was laid. Although this bit of reality in many ways is an ugly blight, that particularly sidewalk hosts a large number of people, going back and forth on a daily basis.

Everyday a large number of people walk by. And rarely, if no more than any other place, does someone trip on the crack. For it seems that however un-perfect the crack, it does not in most ways diminish the usefulness or working life of the sidewalk. It continues to serve a purpose, cracked or not.

When I've looked back at the software problems, although the errors get out there, the industries themselves just tend to route around them. That is, they become known, accepted, then move into being the convention. Often people just take that knowledge as if it were somehow absolutely true and undeniable.


BUGS THAT DON'T BITE

My understand of the sidewalk, always reminds me that much of what we have or need in this world does not have to be perfect. Of course, because I've seen software working for decades, pinned on incorrect calculations, or serious bugs, that have been worked around. Computers crash, software generates bad numbers, hardware burns out, and yet all things continue to exist. We live in a world, were our reality tolerates a considerable amount of incorrect data, flaws and disasters.

But that is exactly the key towards looking toward our future. The details in our knowledge, are immense, but they are filled with as many old wives tales, myths, mis-believes, lies, spin, half truths, and other assorted bits of incorrect, or nearly incorrect bits of knowledge. Even when we are sure that the things underneath are solid, it is not uncommon to dig deep enough and find serious mistakes.

Consider our earlier perspective of an ancient scholar drifting through life with the full complement of man's knowledge. Now, with the exception of man-made items, we as this scholar knew of as many things visible in this world, as people do now. Yet for all of these things, say perhaps lightning, while the sense was the same -- we can see and hear it -- the underlying explanation was completely different.

Since electricity was unknown, static electricity was too. Thus, there was some other theory attached to lightning to explain it, and in our studies we were taught these ideas. Doesn't really matter what we was taught, but it does matter that as time wore on, these explanation grew and grew more correct, probably by leaps and bounds, until it came to match the modern explanation for lightning. Lightning always existed, but much like a plant, the explanation for it has been morphing and growing all of these centuries, gradually working its way down farther into the depths of the details.

In a sense, if knowledge isn't get broader at least not at this moment, it certainly is getting deeper. Much deeper. But even with that trend, everyone can see lightning, and most people can explain simple elements of it, but how many really understand it?


OPENING THE FLOODGATES

My sense from exploring the other industries has always been that they each have their serious cracks. That the knowledge we know is vast, yet messy and incomplete, and more importantly filled with tonnes of misdirection.

Yes, we've learned how to be rigorous with some of the smaller details, some of the process, but we have no idea how to assemble this knowledge in a rigorous manner. We can collect the knowledge, interpret parts of it, but we cannot stitch it together.

Computers, while being great tools are also great at showing us our obvious flaws in what we understand. Software isn't hard in concept, but the messiness of our real world understanding makes it horribly difficult. System fail because people grossly mis-estimate the complexity. Complexity that stems from inconsistencies and misunderstandings.

And its in automating our efforts that we reveal the problems. Only a programmer really needs to understand how the calculations for financial instruments work for example. They are so messy and often incorrect. The financial industry has lots of quick cheats and rules of thumb for partially accurate calculation. That's all that is needed to start trading them. It's only at the deepest level of detail where you have to agonize over the tiniest of points, that you really start to see the holes.

Yes, we do know a lot, but we need a way to organize and contain that knowledge. It no longer fits in someone's head. We can't utilize what we know because we don't know how to connect the dots, to bring it all together into something coherent.

Really, if we did we could build one big massive all inclusive database, containing all of man's known facts. Yet while that idea is easy to write down in a blog post, nothing we have in the way of technology can do this. The best we can do is a great pyramid inspired scheme to throw massive manpower at it. We can create something like wikipedia which may seem impressive, but it's not our knowledge that allowed this to happen, it's our vast numbers that we're utilizing.


BACK TO THE FUTURE

In many ways it's easy to predict the our future. It is, after all, something that Computer Science will have to achieve one day in order to progress into a real science or perhaps engineering. To make things work we must follow an obvious path.

We will have to learn the structure of data and we will have to learn how to combine it together. We will have to learn how to sort out the truth, the real stuff, from the masses of mis-information that are swarming all over.

If in the past we saw a trend where our information was getting broader over time -- spanning out -- then in the future, it will be getting deeper. We know the categories, we just need to understand the details.

We won't just guess at what we know, we won't just optimistically collect it in little bits, hoping to find it useful later. We'll know we need it, and what to do with it.

I could always hope that we'll get there soon, but the truth about mankind is that it doesn't really like progress. Sure, we've become addicted to little fiddly electronics like iPod, and other toys, but while the advancement of these beads have gone at breakneck speeds throughout my lifetime, the really big leaps have been far slower. We shouldn't get the consequences of a few major jumps confused with the jumps themselves.

Of course we have more people with more effort, but those population advances have not been radically outpaced by our technological innovations. And we're easily lead astray. It's not surprising for example, to find that whole generations can succumb to easily ideas like Freud. We, as a species bend towards the static and easy. Perhaps when we are younger we have the strength and energy to change the world, but time wears us down.

Our next great leaps will come from our dropping the notion that thinking itself is somehow magical; that knowledge mysteriously appears out of nowhere. We've learned to organize physical labor, to control factories, yet we are careful never to apply these industrial ideas to our thinking in other areas. Why?

Oddly, software development, which interacts with so many other disciplines is the one that has been the most active in trying to stay far away from any attempt to organize our approaches to mental effort. Programmers despise order, organization and methodology.

The very things that we know we'll need in the future are the things that scare us most in the present. And it's not like another discipline will find the answer first, software development is the platform on which all of the others now rely. As we guess, and hack, and flail at our keyboard, they follow our examples with their own development. The limits of software now drive research and development.

That it seems may actually be a possible explanation for the high degree of systematic mistakes that pollute our technologies. Domain experts go at building their specific code in the same reckless manner that Computer Scientists have applied to their discipline for decades.


STILL HAVEN'T FOUND WHAT WE'VE AVOIDING

The things that we need to get to the next step are simple. We have the puzzle pieces, but we need to know what to do with them. We have a little bit of depth, but we need to perfect it, rather than just wrap it in more complexity.

We need to be able to put structure to our data, in a way that we know, not guess, but know that it is correct. Data forms the heart of our knowledge and while its still collected randomly with no provable basis for its structure, it is little more than useless. We have lots of it, but we can't combine it easily and we certainly can't mine it for anything other than trivial known relationships. We talk bravely about exploiting it, but you can't dig a mine in a tar pit.

We need to normalize our code and change our practices to stop wrapping the older crappy stuff. You can't build a technological base on mistakes and rampant complexity. You can't just endless whack out new code in the hopes that someday it will accidentally be the right stuff. Elegance is not a dream and it's not unachievable. We need and can have beautiful code, not as a means for itself, but as the only way to build a foundation stable enough that we can actually leverage it to really increase our knowledge. We need to know, what we know, and know how to know more of it.

We need to take the knowledge the we acquire in building today's systems, and apply it to tomorrow's. We need to capture our understandings so that we can extend them, not just restart each time with a new technology. Just as industrialization transformed factories, we now need it in our technologies. Software, of all the known industries is the worst here. We're polarized between ideas based around dysfunctional decades old large-scale manufacturing, that are too static and too large to work properly, and ideas based around cutesy fun game-like processes that are oriented more on being popular (to increase book and training sales) and less on actually being functional or improving the process. Old school dysfunction vs new school irresponsibility.

Still for all we need these things, they are absolutely not what people want, or our looking for. We've continually chosen the opposite, the fast food concept of knowledge, gorging steadily until we're ready to burst.

Judging from interest, programmers don't want blueprints, they don't want to be able to normalize their code, and they really don't want a methodology that works. They'd much rather go into work and wing it, making wild often incorrect guesses than spend the effort to figure out how to really make it work. Even when they have techniques like database normalization, they'd rather just whip it together on their own, basing their designs on instinct and hunches.

The state of the industry is that most projects fail because most programmers would rather that outcome than change their ways. In a couple of decades we've barely improved from a 15% success rate to a 30% success rate. That says a lot.

The web, of course, documents this. There's far more gossip and fanboy posting than there are fair and honest discussions of a technical nature. And those in the online community themselves, are only a small fraction of the currently practicing software industry. Most programmers stay far away from talking about programming. The industry wants to know what magical lines are required to fix a problem, but they don't want to know why they are required, or even whether or not the issue should be fixed.


FINAL THOUGHTS

We live in a age where every week you can read about a new scientific break-through study that supposedly changes the way we should think about something. But if you read a bit deeper you'll find that some of these studies, often it feels like more than half of them, are sitting on very dubious foundations. They make it to the news not based on their truth, but rather on their news worthiness. That is, they are entertaining enough to make it to the papers.

That might be OK if these questionable efforts where to disappear, but we're steady filling our knowledge banks with as much bad information as we are filling it with good. The world of infomercials has overflowed from our TVs and right into our research and development. The information age has promoted a kind of madness where rigor appears everywhere, and nowhere all at once. Who care what we know, if it's so padded with crap that we don't know what's true or not anymore. Proper marketing techniques are replacing proper critical thinking ones; grant money, after all, has become more important than progress.

In our future -- distance or not, that's our choice -- we'll eventually find the intellectual tools to quickly sort out the underlying truthfulness of what we know. We have to, as we are quickly being swamped by masses of questionable information. This circumstance cannot continue forever. One day people will look back on these as the dark ages, an information explosion perhaps, but one where we we flooded with propaganda and lies.