A few posts ago, I did some wishful thinking about the possible value of having blueprints for software projects:
http://theprogrammersparadox.blogspot.com/2008/05/software-blueprints.html
One of my readers John Siegrist, became enchanted by the idea and suggested that we start a project to look further into it. To this end, we've put together a wiki at Wetpaint for anyone who is interested to pop by and give us a hand:
http://softblue.wetpaint.com/
One NOTE of CAUTION: many discussions in software end up in polarized arguments based around intractable positions defending one form of technology over another:
http://people.lulu.com/blogs/view_post.php?post_id=33933
I'd like to avoid that for this new site. If you come to visit or to contribute to this project, please come with an "open mind".
An exec once told me: "the dirty little secret of the software industry is that NONE of this stuff works!" That's closer to the truth than I think most people realize. In the end, it all has its share of problems, and we aren't even a fraction of the way there yet, so why dig in an call it finished? If it is to be a real quest for answers, we first have to let go of our prejudices, only then can we objectively see the world.
A LITTLE DEFINING
To get our work going, I figured I would fall way back and try decomposing the essence of software.
We all know, or think we know what it is, but one of the great aspects of our world is how it allows us to look again and again at the same thing, but from a slightly different perspective. Each time, from each view we learn a little more about what we see. The details are infinite and often times even the smallest nuances are the cause of major mis-understandings.
My first observation is that the only thing that "software" does is to help us build up "piles" of data.
Hardware may interact with the real world, but software only exists in its own virtual domain. The only thing in that domain that is even remotely tangible is data. Oddly, people like to think that software doesn't physically exist, but in fact it does have a real physical "presence". At minimum it's on a hard disk, if not stored activity in the RAM of at least one computer. It is physical -- magnetically polarized bits of metal and currents flowing through silicon -- it is just very hard to observe.
The "pile" perspective is important because it gets back to what I think is at the heart of all software: the data.
Most programmers tend to think of software as a set of instructions to carry out some work, but that subtle misleading perspective lures one away from seeing how much more important the data is than the instructions that operate on it.
That is a key point which always helps to simplify one's understanding of how to build complex systems. As masses of "instructions", these systems are extremely complicated, but as "transitions" on various structures of data, they are in fact quite simple. You can convolute the instruction set all you want, but you cannot do that with the data structure snapshots, they are what they are.
That leads nicely into another major observation. All software packages are nothing more than just a collection of functions that help manage a pile of data.
It doesn't matter if the software consists only of command line utilities, a simple command shell, or a fancy GUI. All of these are just ways of tying some "functions", their "arguments" and a "context" together to display or manipulate some data. It doesn't matter if an actual "user" triggers the function or just another piece of code, it is all the same underneath.
The functionality "gets" some clump of data, applies some type of "manipulation" to it, and then "saves" it.
If it's printing something to the screen, the 'get' probably comes from some data source, the 'manipulation' is to make it pretty, and the 'save' is null. If it is adding new data, the 'get' is null, the 'manipulation' is to fill it out, and the 'save' writes it back to a data source. Updates and deletes are similar.
There is nothing more to a computer program; the rest of it is how this is repeated over and over again within the code.
The source of the data can change. It can be an internal model, a library of some type, a file, or even a direct connection to an external data source like an RDBMS. All of these things are just different representations of the data as they are stored in different mediums. Although the data may be similar, the mediums often impose their own bias on how the data is expressed.
DATA IN THE ABSTRACT
If you freeze time, you'll find that at any given moment all of the data in a computer system is just a set of static data structures.
This includes of course the classic structures like linked lists, hash tables, trees, etc. But it also includes simple primitive types, any explicit language structures, and any paradigm-based structures such as objects.
Pretty much any variable in the system is the start of, or contained in some internal data structure. Some of these structures can be quite complex, as localized references touch into larger global structures, but at our fixed point in time all of the variables in the system belong to at least one (possible trivial) data structure.
That is important because we want to decompose all of the "code" in the system as simply being a mapping from one data structure to another. For any two static data structures A and B, a function F in the system simply maps A -> B. In its simplest form, a computer is just a massive swirl of different constantly morphing structures that are shifting in and out of various formats over time. And hey, you thought you were shooting aliens, didn't you?
Because data is the key, we need to look deeper into it.
Data is always "about" something. It is an instance of something, be that a noun or a verb. If it is a noun, than more than likely the data is about a three dimensional object in our world. A placeholder for something we know about.
If it is a verb, then it is more than likely related to "time" in some way. The nouns are the objects in our world, while the verbs are the actions.
That makes the verbs interesting because their definition allows within it, for data to exist about some functional transition within the software. I.e. we can store the list of last run commands from an interactive shell; these become verbs which relate back to the most recent actions of the shell.
That "relationship" is also interesting, but it should not be confused with running code; frozen in time, the data stored -- verb or noun -- is still fixed static data that is set in a structure. The "meaning" of the data may be relative (and very close) to the functioning of the code, but it does not alter what it is underneath, which is static data in a structure. You keep a description of the commands, not the actual commands.
Going backwards and selecting an old command to run is simply a way to fill in the "context" for the run functionality to something that occurred earlier, the computer is not in any way shape or form, reconnecting with its earlier results.
HARD LINKING
There is, it can seem, almost like a magical tie between data at times. However, computers are simple deterministic machines, so there is absolutely nothing special about the way some data is linked to other data.
The only thing unexpected is how, under the covers, the computer may be holding a lot more "contextual" information than most people realize.
As a massive sea of data structures, to make it more functional, users often need to bind one set of data to another one. There are pieces of data, that when understood "link" one piece of data to another one. These links come in two basic flavors: explicit, and implicit.
The easiest one to understand is a an explicit link. It a a connection between two data structures. It can be direct, such as a "reference" or a "pointer", or it can be indirect bouncing a number of times between intermediate values that bind the two structures together in a given direction. As there is no limit to the number of "hops" in an indirect link, in many systems these can be very complex relationships.
Explicit links are always one way, so two are needed if you intend on traveling from either structure to the other one. For example, "doubly" linked lists allow you to equally traverse the list from either direction, singly linked lists can only be traversed in one direction.
Explicit links form the bread and butter of classical data structures. All of the difference instances, such as lists or trees are just different ways of arranging different links to allow for various behaviors during a traversal.
BEHIND THE CURTAIN
A well-maintained set of explicit links between a series of data structures, forms itself into one great big giant structure. All the transformations need to do is traverse the structure based on rules and then update part it or a new structure. It is a very controlled deterministic behavior. If that were all there was to software, it would be easy, but unfortunately it is not so simple.
An "implicit" link is one that "can be made" between two data structures, allowing some piece of code to traverse from one to the other. The "can be made" part of this sentence hides the real complexity, but first I'll have to detour a little bit.
An algorithm is a deterministic (possibly finite) sequence of steps that accomplishes some goal. Well, better than that: it works 100% of the time. A heuristic, on the other hand, in Computer Science is a set of sequences that "mostly" works. I.e it works "less than 100%" of the time. It is hugely critical to see the difference between these two, algorithms ALWAYS works, heuristics MOSTLY work.
It is a common misconception to think a particular heuristic is in fact an algorithm, IT IS NOT, and they always need to be treated differently.
If you stick an algorithm in your system, and you debug it, you know that it will always do the right thing. With a heuristic, even after some massive amount of testing, production, etc. There is a chance, even if it is minute, that it could still fail. A chance that you must always be aware of, and account for in your development. There is a huge difference between knowing it is going to work, and thinking that it is working. Even if its a 0.00000001% chance of failure it opens the door as a possibility. Failure doesn't automatically mean that there is a bug.
For an implicit link, there is some piece of code, somewhere that can bind together the two ends of the link. Usually in both directions, but not always. Where an explicit link traversal is an always an algorithm, and implicit link one may not be. It depends on the code, and it depends on the data. Thus it gets very complicated every quickly, especially if your not careful.
All implicit links are code-based. Common ones include full-text searching, grep and even some SQL queries. Anywhere, where the 'connection' is dependent on some piece of running code.
So for example, if you have a user Id in your system, going to the database to get the 'row' for that Id is explicit. Going to the database to to match a user-name like '%mer%' is implicit. The Id in the system is data that is bound to that specific row, and if it exists, it will absolutely be returned. In the implicit case the wild-cards may or may not match the expected 'set' of entries. You may be expecting only one, and multiple ones could be found. There are lots of different possibilities and the results are not always consistent.
There is some variability with implicit links based on text and wild-card searches in a relational database. There is a huge amount of it when using a full-text Internet search tool like Google. In the latter case, the results can change on a frequent basis, so that even two concurrent searches can return different results.
THE TROUBLE WITH TRIBBLES
Explicit links help maintain the deterministic quality that is inherent in a computer. The problem, is that they are fragile, and if the data is linked to a huge amount of related data, they can be horrible to keep updated.
Not only that, but they require fore-though on the part of the computer programmers to put them in place when the data is collected. Often, because they are just guessing, the software developers rarely know how the data will get used years later, and it is generally a surprise. Forgetting to collect data relationships is a common problem.
On the other hand, implicit data is imprecise, messy and can also require mass amounts of CPU. Not only that, but connecting any two non-obvious pieces of data heuristically is an "intelligence" problem; requiring human intervention.
In most modern systems the quality of the data ranges from poor to extremely crappy, which becomes a significant impediment toward implicit links. At least explicitly, the quality issues are dealt with at the time of constructing the initial data. Low-quality data can provide an astoundingly strong defence against making implicit links, where under some threshold, the linking might as well be random.
WHAT?!
So why is this important in terms of software blueprints? One of the key points I think is important for a blueprint is to be able to specify the behavior of the system in a way that isn't technology dependent. Even more important, it cannot 'ape' the underlying programming paradigm.
Going way back, we were taught to write out pseudo-code before we settled into writing actual code. It was an early form of trying to find a more relaxed way of describing the behavior without having to go into the rigorous details.
The problem with that, is that pseudo code is only a mild generalization of the common procedural programming languages of the day. I.e. it was just sloppy C code, and if you were developing in C it was only a half-step away from just writing it.
That means that the effort to pseudo code an algorithm wasn't actually all that far away from the effort of just coding it, so why not just code it?
In some circumstances, alternative perspectives, such as ER diagrams and SQL schemas are useful enough that you might choose to switch back and forth between them. The textual view vs. the graphical view is useful in distinguishing problems, particularly consistency ones.
With C and pseudo code, it is a one way mapping: C -> pseudo code, and the effort is close enough, and the perspective is close enough. There is little value in that.
BEYOND ESSENCE
Construction blueprints are simple 2D representations of 3D objects that contain just enough information that the building can be reliable, but not enough that they require the same time to generate. They are a tiny (in comparison) simplified representation of the core details, enough to allow someone to visualize the final results, so that it can be approved, and it isn't a shock when completed. One person can lay out the essence of the work later done by hundreds.
If a computer system is just a collection of functions that operate on a specific pile of data, then we are looking for a similar higher-level simplified view that can provide just enough information to know the design is good, without requiring enough effort to take years to complete.
I see this, abstractly as being able to get a hold of the main 'chunks' of data in the pile, and listing out only some of the major functionality about them. In a sense, I don't care if the user's name is stored as full-name, first-name/last-name or first-name/middle-name/last-name, these are just 'stylistic' conventions in the schema if they don't have a significant impact on the overall functionality of the system.
If the system holds address information, the specifics of the decomposition are equally unimportant for the blueprints. One big field or a variety of small ones, often the different is trivial or nearly trivial.
At times, the decomposition in one system may help or impede implicit links with other systems. In these cases, then for that interoperability the specific 'attribute' may be important, but only if one thinks about it ahead of time, or it is some corporate or common standard that needs to be followed.
Not only are the smaller sub-attributes of the specific data not important for the blueprints, but also a significant amount of the overall functionality is unimportant as well.
Who's not seen a specification for a system that goes into over-the-top extreme detail about every little administration function, trivial or not. Most often it was a waste of someone's time. Frequently it doesn't even get implemented in the same way it was specified.
In most cases, if you have the system handle a specific type of data, such as users, for instance, there are 'obvious' functions that need to be included. If you store user information, you need to be able to add new users, delete them, and change their information. If you miss any of this functionality initially, you will be forced at some time to add it back in. The only 'real' issue is "who" has access to these functions.
In fact, it is pretty safe to say that for each and every type of data that you allow to be manipulated, in one form or another, you'll need to put in add, delete and modify capabilities. To not do so, means the system will not be 'well-rounded'; it will be only a partial implementation.
While that's a trivial case, the underlying general case is that if you walk a specific 'path' with some set of functionality, you will have to walk the entire path at some point. I.e. if you quickly toss in something to add some new type of data, later the delete and mods will become highly-critical changes that need to be implemented.
If you provide some type of linkage to another system, then a whole host of data synchronization, administration and navigation issues will all show up really fast. More importantly, once you take even one baby step on that path, everyone will be screaming about how you have to take more, and more. User's never let you get away with just a partial implementation, no matter why clever you are in thinking that you can simplify it.
If you open the door and take a couple of steps, you're committed to have to round out the implementation.
From a higher viewpoint, that means that data comes in 'lumps' and functionality comes in 'collections'. So much so, that we could go a long way in a software program with just a simple statement:
"The system will require user accounts and allow them to be managed."
That specifies a lump of data, and a collection of functionality to manage that data.
If the system is a Web 2.0 social site, the "unique id" for the users may be an email account, if it is more of an older 'main-frame' style, it is probably a centrally administered mangling of the user's name.
The specification gives a general idea for the software, "convention" sets the specific data and functions. The blueprints are some way of having all relevant parties understand that the software requires the users to log in with some type of account information. What that really means is up to the the underlying technology and the driving conventions. The blueprints are higher than both.
JUST ABOUT DONE
There is a significant benefit in being able to decompose software into its essence. If we look beyond technologies, techniques and paradigms into the heart of what we are doing, we can find the types of common patterns that allow us to frame our work in a more meaningful way. And, as in the case of our search for blueprints, we can use that higher-level perspective to help us decide what insignificant 'details' we can drop from the blueprints that won't "radically" change the implementation.
Blueprints help going both forward and back.
In front they help us leverage an experienced architect's ability to design and implement large complex systems, and behind they allow architects traveling in the same footprints to take something they know that works and extend it. We can, and should learn from our past, and apply that to our future. A reasonable thing that most other disciplines already do as part of their normal process.
Speculating, I also tend to think that if we find someway of creating good software blueprints, that a side effect of the format will be that the perspective will become data-centric, not function-centric.
While data is not the intrinsic way to view systems, it is far far easier and has been at the heart of movements like object-oriented. It is just that it never seems to really catch on, or it becomes perverted back to being functionally-oriented.
If blueprints help to teach programmers to "see" their systems by the data first, then they will inevitably help to simplify them. In an industry where companies routinely build tens of millions of lines of code to solve problems that only require a fraction of that size, we need significant help in learning how to create elegant solutions, not brute force ones.
The explicit complexity of a system should not significantly exceed the ability of a programmer's lifetime to fully understand it. That's clearly past some reasonable threshold. Ideological tools like abstraction should be applied to compact our knowledge into some form that is more manageable. Especially if our lives and finances depend on it.
We are at that fork in the road where we can decide (again) whether we want to continue down the brute force path, or whether we would like to shift to a higher less painful route. Personally, I'm hoping to avoid more pain.
Software is a static list of instructions, which we are constantly changing.
Saturday, May 31, 2008
Tuesday, May 20, 2008
Jurassic Office Park
As I carefully peer around the giant palm leaf -- a great fan spreading out before me -- I keep my eyes facing forward. I am desperately in search of any sudden movement. The lush green vegetation closes in, providing some feeling of safety if not also hiding my bulk from any unwanted observers. In front, the watering hole opens like a chunk torn from the forest canopy.
I am -- in this daydream -- am a herbivore of some unknown type. My size, shape and girth are perhaps issues that I can not fully grasp yet. I can not know, and it is not really helpful to ask for assistance in that type of self-analysis anyways. Ultimately I suppose it doesn't matter; big or small, fast or slow, I am what I am, and in this case I am just another dinosaur plodding around an ancient thickly overgrown jungle. One of many.
My problem, it would seem, is that endless sense of being hunted. One that I cannot get away from. You have to drink to live, and I always feel so vulnerable when succumbing to my needs. Around the watering hole lurks trouble, big trouble. I'd avoid it, but ultimately I have no choice in the matter.
You see, in this current perspective of mine, built around a wild analogy, I am a producer. Well, to be fair, more of a thinker, a sort of lowly worker bee. You know, one of those poor souls that just builds things, day after day.
This particular analogy -- as any good black and white idealist view of the world portrays -- sets up the creatures in its realm as either thinkers or doers, producers or consumers. You know, those people that do things for a living, and those that make lists of things to do. It's a fairly common breakdown. For some it's about getting it done, and for others it is about saying it is done. The split is about how you spend your day: doing things, or telling people to do things.
Those dinosaurs that live entirely on plant matter are clearly herbivores. They take from the world around them and that sustains them. They work hard to extract the nutrients from the vegetation. They focus on something, going at it day after day, working their way towards getting it done. In that way, they 'build' up their presence. They do things. They form the bulk of the food chain, and maintain a balance with the vegetation, which on its own would erupt into chaos. Herbivores are the staple on which everything runs.
The other guys, you ask? Why carnivores of course. They live by virtue of hunting and consuming herbivores, living off our backs, so to speak. They come in a wide range of sizes and appetites, but one thing is certain, for most herbivores the present of a carnivore is a bad thing. Well mostly, in my analogy I guess as a herbivore there are endless instances of yourself, enough to feed an army of carnivores. Thus they can come back again and again to refresh their needs, all at your expense. Lucky us.
Encounters are draining, but ultimately herbivores keep on munching until they retire. They have to, it is an endless cycle. An unlucky herbivore will face many encounters with carnivores of all shapes and sizes. That's just how it goes.
Herbivores are big and slow, while carnivores are culturally often called A-types, you know those fast moving persons that are always on the prowl. Shifting around the office, looking for their next encounter, their next big thing to 'get done'. The entrepreneurs, the executives, the politicians, all of them 'doers'. "Get it done at any cost", goes the refrain. "Just do it" is the commercial version. We live in a period where the end justifies everything, just so long as you don't get caught. "Carnivore" captures that mentality perfectly. It's all about consumption, isn't it?
Money makes the world go round, and carnivores are the ones living off the spinning. Being at the top of the food chain, one good kill is enough to rest on for quite a while. Herbivores face an endless destiny grazing the vegetation and chewing it down into energy; carnivores come along and reap the benefits. It's a brutal world in this analogy, not unlike the one out there.
I like this perspective, crazy as it is, because its explains why it is that carnivores always believe that they are above everyone else. It is not, as they would like us to believe, that they are working harder than the rest of us in processing the vegetation. Nope. No crappy lettuce for them. Raw meat has way more nutrients. But you have to meet enough of them, trying to justify why they're better and more important than the herbivores to get the full appreciation of their ego, hunger and falsehoods. Whether you are in the room with a T-Rex, or just a newbie raptor, you are never far away from the platitudes and the drooling. Their famous last words, for you at least, always seem to be "trust me". Delivered with a hint of steak-sauce-breath; that's when you "know" you are in trouble. Chomp!
Most of the really clever carnivores look to stake territory and corral a clump of herbivores to sustain themselves. The realistic herbivores see this as a bargain with the devil, the foolish ones think its some form of partnership. Carnivores, it seem, never miss a chance to feed, unless of course it's only because the herbivore is far too puny to be worth the effort. Not, as you might have guess, the best foundation for a partnership, is it?
Carnivores can hunt individually or in packs, they can be surprisingly pleasant when they are trying to lure you in. You are after all their meal-ticket, generally causing a great burst of effort, on which they can sit back and relax for a while. Herbivore resumes talk a lot about hanging around the same watering hole year after year, while carnivores roam over vast distances. Often carnivores overfeed on an area, forcing themselves to move on to the next feeding ground. Sometimes it's just competition from others that sets them forth. They rarely like to 'stick it out'. If you wait long enough, they tend to go away.
I make a dash for the water, the hairs on the back of my neck standing high. Whoosh, another brief encounter with a carnivore, but this time I am lucky. I manage to get back into the canopy just in time to avoid losing too much of myself.
Getting sick of being stuck down on the food chain, I once tried the omnivore thing. If you are a herbivore, you can never really leave your nature. You're always drawn back to compassion, to doing things for yourself, to not wanting to bully and yell at other people, and to trying to find the time to get it right, just for the sake of getting it right. But that constant life on the run, feeding a dizzying array of mad chomping machines, with all of their cars, their mansions and their great big piles of treasure, makes one at least peer enviously up the chain towards the next level. So I crossed the line, jumped in, and did a brief omnivore phase. I tried. But in the end, all it did was give me indigestion. I should have guessed. Some things were not meant to be.
After that I've come to accept my grazing and constant chomping. I am -- I admit -- a herbivore, and I don't really see that about to change anytime soon. My fate is a lifetime of grass and a whole lot of quick dashes in hopes of saving my ass from being consumed. It's not great, I would like a mansion and a Ferrari, but if I manage to pay the bills, eat reasonably well and avoid starving in my old age, I'll have to consider my life to have gone pretty well.
Since I'm not about to grow fangs and claws, I guess the only question is: how large a herbivore am I? Something huge like a stegosaurus or even a massive brontosaurus? Or am I just one of those little ones,you know the ones with the names you always quickly forget at the museum, like piddly-little-saurus or squished-a-saurus or something tiny like that? If I figure it out, I'll let you know someday, just as soon as I've dashed for more water ...
I am -- in this daydream -- am a herbivore of some unknown type. My size, shape and girth are perhaps issues that I can not fully grasp yet. I can not know, and it is not really helpful to ask for assistance in that type of self-analysis anyways. Ultimately I suppose it doesn't matter; big or small, fast or slow, I am what I am, and in this case I am just another dinosaur plodding around an ancient thickly overgrown jungle. One of many.
My problem, it would seem, is that endless sense of being hunted. One that I cannot get away from. You have to drink to live, and I always feel so vulnerable when succumbing to my needs. Around the watering hole lurks trouble, big trouble. I'd avoid it, but ultimately I have no choice in the matter.
You see, in this current perspective of mine, built around a wild analogy, I am a producer. Well, to be fair, more of a thinker, a sort of lowly worker bee. You know, one of those poor souls that just builds things, day after day.
This particular analogy -- as any good black and white idealist view of the world portrays -- sets up the creatures in its realm as either thinkers or doers, producers or consumers. You know, those people that do things for a living, and those that make lists of things to do. It's a fairly common breakdown. For some it's about getting it done, and for others it is about saying it is done. The split is about how you spend your day: doing things, or telling people to do things.
Those dinosaurs that live entirely on plant matter are clearly herbivores. They take from the world around them and that sustains them. They work hard to extract the nutrients from the vegetation. They focus on something, going at it day after day, working their way towards getting it done. In that way, they 'build' up their presence. They do things. They form the bulk of the food chain, and maintain a balance with the vegetation, which on its own would erupt into chaos. Herbivores are the staple on which everything runs.
The other guys, you ask? Why carnivores of course. They live by virtue of hunting and consuming herbivores, living off our backs, so to speak. They come in a wide range of sizes and appetites, but one thing is certain, for most herbivores the present of a carnivore is a bad thing. Well mostly, in my analogy I guess as a herbivore there are endless instances of yourself, enough to feed an army of carnivores. Thus they can come back again and again to refresh their needs, all at your expense. Lucky us.
Encounters are draining, but ultimately herbivores keep on munching until they retire. They have to, it is an endless cycle. An unlucky herbivore will face many encounters with carnivores of all shapes and sizes. That's just how it goes.
Herbivores are big and slow, while carnivores are culturally often called A-types, you know those fast moving persons that are always on the prowl. Shifting around the office, looking for their next encounter, their next big thing to 'get done'. The entrepreneurs, the executives, the politicians, all of them 'doers'. "Get it done at any cost", goes the refrain. "Just do it" is the commercial version. We live in a period where the end justifies everything, just so long as you don't get caught. "Carnivore" captures that mentality perfectly. It's all about consumption, isn't it?
Money makes the world go round, and carnivores are the ones living off the spinning. Being at the top of the food chain, one good kill is enough to rest on for quite a while. Herbivores face an endless destiny grazing the vegetation and chewing it down into energy; carnivores come along and reap the benefits. It's a brutal world in this analogy, not unlike the one out there.
I like this perspective, crazy as it is, because its explains why it is that carnivores always believe that they are above everyone else. It is not, as they would like us to believe, that they are working harder than the rest of us in processing the vegetation. Nope. No crappy lettuce for them. Raw meat has way more nutrients. But you have to meet enough of them, trying to justify why they're better and more important than the herbivores to get the full appreciation of their ego, hunger and falsehoods. Whether you are in the room with a T-Rex, or just a newbie raptor, you are never far away from the platitudes and the drooling. Their famous last words, for you at least, always seem to be "trust me". Delivered with a hint of steak-sauce-breath; that's when you "know" you are in trouble. Chomp!
Most of the really clever carnivores look to stake territory and corral a clump of herbivores to sustain themselves. The realistic herbivores see this as a bargain with the devil, the foolish ones think its some form of partnership. Carnivores, it seem, never miss a chance to feed, unless of course it's only because the herbivore is far too puny to be worth the effort. Not, as you might have guess, the best foundation for a partnership, is it?
Carnivores can hunt individually or in packs, they can be surprisingly pleasant when they are trying to lure you in. You are after all their meal-ticket, generally causing a great burst of effort, on which they can sit back and relax for a while. Herbivore resumes talk a lot about hanging around the same watering hole year after year, while carnivores roam over vast distances. Often carnivores overfeed on an area, forcing themselves to move on to the next feeding ground. Sometimes it's just competition from others that sets them forth. They rarely like to 'stick it out'. If you wait long enough, they tend to go away.
I make a dash for the water, the hairs on the back of my neck standing high. Whoosh, another brief encounter with a carnivore, but this time I am lucky. I manage to get back into the canopy just in time to avoid losing too much of myself.
Getting sick of being stuck down on the food chain, I once tried the omnivore thing. If you are a herbivore, you can never really leave your nature. You're always drawn back to compassion, to doing things for yourself, to not wanting to bully and yell at other people, and to trying to find the time to get it right, just for the sake of getting it right. But that constant life on the run, feeding a dizzying array of mad chomping machines, with all of their cars, their mansions and their great big piles of treasure, makes one at least peer enviously up the chain towards the next level. So I crossed the line, jumped in, and did a brief omnivore phase. I tried. But in the end, all it did was give me indigestion. I should have guessed. Some things were not meant to be.
After that I've come to accept my grazing and constant chomping. I am -- I admit -- a herbivore, and I don't really see that about to change anytime soon. My fate is a lifetime of grass and a whole lot of quick dashes in hopes of saving my ass from being consumed. It's not great, I would like a mansion and a Ferrari, but if I manage to pay the bills, eat reasonably well and avoid starving in my old age, I'll have to consider my life to have gone pretty well.
Since I'm not about to grow fangs and claws, I guess the only question is: how large a herbivore am I? Something huge like a stegosaurus or even a massive brontosaurus? Or am I just one of those little ones,you know the ones with the names you always quickly forget at the museum, like piddly-little-saurus or squished-a-saurus or something tiny like that? If I figure it out, I'll let you know someday, just as soon as I've dashed for more water ...
Wednesday, May 14, 2008
Hard Code'n
I think that at this stage in our industry, it is important to differentiate between several key, yet very different parts of the software development process. Specifically, I see a huge difference between "software development", which includes design, development and deployment of software, and "programming", which is focused on completing a set of instructions in a computer language to implement some functionality. One is the all encompassing act of creating software including every aspect from beginning to end, while the other is a very specific subset of the process that focuses on writing code to implement some set of algorithms.
In many ways I see this division as similar to accounting vs. bookkeeping. Bookkeeping is an important part of accounting, but it doesn't necessarily have to be handled by a fully-trained accountant, in fact most bookkeepers I know are not accredited accountants. Accounting includes far more than bookkeeping, but bookkeeping is an essential part of it. There is even a "higher" side of management accounting, which still deals with the science, yet only at a very high management level.
CLEVER DEFINITIONS
If I am going to split one out from the other I need to carefully define them or risk the wrath of the net (or even worse, silence). I see programmers as taking descriptions of functionality and making them into code. Software developers on the other hand, analyse specific user domain problems and then design and implement solutions to aide the users in building up their ever increasing piles of data. In that way, programming is just a tiny part of the overall software development. It happens somewhere in the middle. It is the process of "encoding" some functionality into a language as a long set of instructions and doing some testing/fixing to make sure it works. Everything else is software development.
I'm well aware of how our industry and programming culture love to mix together analysis, design, requirements, coding and testing all into one giant lump; for most people these are one in the same operation. Just another day at the office.
I think that mixing these together is a huge mistake because the skill sets are very different from each other. Not to trivialized it, but programming -- such as implementing a function to calculate the Fibonacci sequence -- is reasonably well-understood and well-established. Depending on the functionality, there exists an algorithm or not. At worst, implementing the functionality may require the use of several different algorithms all combined together or modified slightly. Generally, for most types of code, examples already exist and can be modified to fit. The problem of function -> code can have it challenging moments, but ultimately it is not a hard problem if you know what you are building. The key is in knowing.
That is why I really like this differentiation. You see, for all of the software developers out there when they are discussing whether or not it is an art or a craft, or intrinsically hard, etc. what they tend to do is blur the line between analysis and programming. Analysis is hard because we don't know what the users need or what will actually work, but once having settled on a specific algorithm, "coding" it is not all that challenging. Sometimes it involves a bit of research (or should), but after that it's just work.
WHERE THE TROUBLE BEGINS
Well, sometimes. If you sit on enough development teams you quickly come to realize that many programmers have serious weaknesses. We, as techies, love the intricacies of tiny machinery like watches. All those little dials and gears and little things appeal to most programmers at a low level. Not unsurprisingly, our single greatest problem while programming is the tendency to "over-complicate" our solutions. We drift towards pedantic, complex solutions that come from over-thinking the problem. We like the fiddly bits, so we add them wherever possible. We are also "option" happy, adding in tonnes of them that never get used.
You'll see it so often in most code, tonnes of unnecessary variables, conditions, loops. Redundant copies, extra layers of handling, and buckets of "glue" code. Fiddly little bits on diagrams, excessive casting, big ugly useless comments, 2 inch thick designs or manuals, etc. Even programmers who love to call themselves lazy will frequently implement 5,000 lines of code when a mere 200 might do.
It is an epidemic problem with programmers, and I've never met one that wasn't guilty in some form or another. If you think you don't do it, then you've probably not been coding long or hard enough; the simplest, most elegant answer is far simpler and more elegant than most programmers have even begun to realize.
Not only do we constantly over-shoot the code, we also build intricate and complex solutions that drive our users nuts. They're often looking for a quick simple solution, and instead we've build some monolithic all encompassing power-hungry solution were even the simplest bit requires the memorization of masses of new terminology and a three-week course on how to apply it. Manuals, they like to say, are only there to document the design flaws. A reasonable viewpoint, I think.
TRANSFORMERS ARE MORE THAN JUST ROBOTS
Getting back to programming. If indeed you understand the steps necessary to implement your specific functionality, then it is not a particularly hard endeavour. In the end, for most languages, it's some number of variables, a clump of conditions and a few loops, the fewer the better. Programmers "love" to dive into writing some complex code, but most often its either really simple and straight-forward, or there is a well-known algorithm to handle it. Most code is just tying things together and converting between one physical structure of the data and another. These days, the really complex stuff is buried in libraries, far away from most programmer's hands or eyes.
Even more simply, you can see any type of functionality as a transformation on some data. That makes it almost trivial: the data exists in the system or it needs to be loaded, then some algorithm is applied to transform it into some other structure. Then it is saved and/or written out. Programming, from that perspective is not particularly complex; unless we choose to make it so.
When it is complicated, we tend to find really simple reasons why that is true. The most common is that the programmer is making it too complicated, either they've misunderstood the problem or they've misunderstood how the tools work. I've seen enough programmers "flailing" at their keyboards over the years. There is some abstract aspect to programming that some people just never grasp, while other have to work hard to get better at it. Mostly, I think it's some type of anxiety, where people "think" that the problem is hard, so they skip right past the simple solution and start making it really complicated. A kinda of programming fear-of-failure delusion. "It just can't be 'that' simple" we like to tell ourselves.
There are many people afflicted with this type of problem, but fear not if you are one, for most of them coding gets simpler and easier with practice. The real trick is to keep going back and "simplifying" the code, not "adding" to it. E.g. if it doesn't work, don't try to "add in" more logic, instead start stripping it away until it is smaller and simpler. Removing code is the best tool for debugging. It may seem like a slower approach, but it is way way faster than flailing at it. I had a boss once that taught me by leaning over my shoulder and hitting the delete key over and over again. He'd nuke it and make me type it in again. It was the best programming lesson I ever learned (by the third time, you really get it).
FAR TOO CLEVER
Beyond intricate, some programmers gravitate to "clever". They get pulled into really clever ideas that seem like they are going to work really well. Well, at first they seem great. The problem with clever is that it is an extremely "low" level of working. Clever is not simple, in fact it is nearly the opposite. It's a little bit of concentrated complexity all nicely bundled up into a neat programming package. That might work for writing, but it's the type of thing that you come back to "months" later and instantly regret.
Clever you see is just a waste of time at some point in the future. The problem is that to get to something clever, you probably had some cool inspiration. A light went off in your head, or a neat idea popped up in your mind. That's great, but it's not the normal way of thinking. Generally that causes a type of compressed complexity, a neatly packaged clever idea. That makes it a land-mine waiting to get stepped on.
Someone can easy mistake the point or functioning of the code, and in all likelihood unless your lucky enough to get fired, someday, at some point, when you least expect it, you'll have to go back in a rush and try to fix some stupid problem. That, by the way, is always the case with clever. You are essentially just setting yourselves up aren't you?
Given that, however, "abstraction" is not clever. It is a generalization of the purpose of the code, not some cute little syntax trick or something else tricky. When I say clever is bad, some times people take that to mean that "brute force" is good, but that's hardly what I mean either. Pounding out each and every instruction is a huge waste of time, and it's hard to maintain. Brute force is to specific and too large. Clever is too compressed, it took longer to write it, and it's a land-mine.
Good simple short code -- the definition of elegance -- that works at a reasonable level of abstraction so that it can be leveraged, is what does the best for the long term goals of a software development project. A great programmer is someone who can take a hard problem and make the resulting code look simple. It should be so obvious that it doesn't look like a lot of work.
FUNCTIONALLY FLAWED
Another really common problem draws its strength from our unfortunate desire to see programming as an 'art form'. You meet enough programmers who don't want to be engineers, so that don't want any process of any kind. Worse, still, they want the creative "right" to pick a new and unique way of solving each problem, each time. Even if its the same problem over and over again.
And so, by their inconsistency, and the lack of structure they create around themselves an ever increasing vortex of complexity. Mostly you see this with the cowboys, and their fast, yet dangerous band-aid approaches. Cut and pasters are another entertaining variety.
It's quick, its fluid, it works for a while, but like any continual short-term strategy it builds up to the point where it becomes an uncontrollable nightmare.
Fundamentally software development is engineering. We are building something, and we do need to balance out the long-term work with the short-term pressures. Software is saved by the fact that its total ugliness is not visible (if it were there would be a lot of "fired" programmers), but that doesn't mean the effects won't be visible. When you are building "anything" you can only cheat for so long before it becomes unworkable. Sure, if it is a short "assembly" job of combining some pieces together to whack out a simple application for a couple of months, you can get away with a huge number of short-cuts, but once it becomes a multi-year, multi-developer project, each and every short-cut (even the ones that you don't think are actually short-cuts) builds up.
If and when they build up enough, they account for a significant number of project failures. Sadly, "sloppy process" failures are entirely preventable, but only by people who understand them.
BIG BALL OF MUD
If it is not the programmer, or the chaos then it is the functionality itself. It's either poorly specified, or perhaps even just a really "bad" idea. The real trouble in programming doesn't come from feeding in lists of instructions into an abstract machine for execution. Nope. It comes from tying that back to the "real world".
People are irrational, messy and the source of huge problems. If the functionality is not well-defined or it is not "workable", the core reasons behind it almost always come down to people, whether it be limited thinking, politics or egos, it doesn't really matter it's all the same.
All software ultimately is for people to use, so it is actually easy to get the functionality back onto the right track: "pick something simple". Then specify it, in some format that makes it easy to see if it's complete or not. From there, it's just back to programming.
Once in a while, in order to get the system running, the core contains something extremely complex. Generally this is some type of engine or parser or processor or something extremely hard. The really heavy duty programming can be tough, particularly if it is breaking new ground, but it rarely accounts for even a significant percentage of the overall system. Writing good heavy weight code generally involves a strong understanding of some complex discipline or the actual problem domain. Ultimately thought, even the most complex "engine" breaks down into a large number of simple functions. The trick is not writing the pieces, it is getting them to all work together in some intricate, yet simple and elegant solution, a problem which is clearly "architectural" in nature and not really programming.
What hooks a lot of people is that they tackle complex functionality without considering architecture, so the result is a lot of hit and miss attempts to get it all working together properly. If you build the mechanics into the architecture at the general level, then the lower-levels are just specific algorithms to transform data from one stage in the process into another. The code doesn't really fail, it's the architecture that convoluted the process and makes it messy.
No architecture? No wonder your having problems. You wouldn't build a house without first designing the internal frames, so why wouldn't you do the same for your code?
THE LAST FEW HANDS
Programming, then by itself, is relatively simple. That's hardly surprising as you find that in a lot of specific problem domains, many of the programmers are actually domain experts, not Computer Scientists. You don't need a Computer Science degree to write code. In a very real sense, that is why it is closely aligned with bookkeeping, even though I realize that a lot of people might take offense at that comparison. But, like it or not, great reams of domain-specific code is easily written by other disciplines. And, even more horrifying to admit, for you basic bread and butter medium-weight programming work, a degree in computer science is over-kill. You don't have to know about Turing machines to create a screen in a GUI to accept human resources data. You don't need to understand the halting problem to write a social web-app. The expressibility of SQL; does it really matter?
These things have their place in software development, but not necessarily in most programming, they usually only come into play in the core of the technical aspect to a solution, something that is generally wrapped in a framework or infrastructure.
Programming still has it moments when time is tight and you are having trouble focusing, but for most people, after about five years of steady coding it mostly becomes instinctual. I know, there are still readers out there that have been at it a longer time and are still struggling, but if they are fair about why they are struggling, the reasons come down to not really knowing what they are building, as they are building it. It's personal, architectural or analysis, not programming. Really it's a bigger problem.
Software development, on the other hand is extremely young, completely unfinished, and extremely complex. It's the type of thing that people just don't get, and is really hard, even at its simplest level.
You learn this, intrinsically, when you end up in meetings with users who are insisting that the software work in a specific way, while you are quite aware that it is impossible. Not just difficult, but completely and utterly impossible. Yet, it becomes very difficult to explain why it won't work. The certainty is there from experience, yet the ability to simplify it and pass that knowledge onto to someone else is lacking.
In that overlap between people and mathematics, the grey area in there is a largely unexplored, unknown world of fantastically complex problems that we haven't even begun to enunciate yet, let alone tackle. We missing at least one if not many different sciences that make up the knowledge needed to build "reliable" complex systems. We're pretty much guessing at it right now, when we should be far more knowledgeable about what works and what doesn't.
Still, while there are many great problems left to solve in Computer Science, and there is still a whole 'process' left to create to solve the on-going "software crisis", the act of programming is not among the key problems. Our biggest issue with programming is our constantly confusing the issues, and trying to fit a one-size-fits-all approach to unify programming and software development. Getting back to my initial point, if you see them as different, then it becomes easier to see and deal with understanding their own unique issues. A bit of structure can be a grand thing.
In many ways I see this division as similar to accounting vs. bookkeeping. Bookkeeping is an important part of accounting, but it doesn't necessarily have to be handled by a fully-trained accountant, in fact most bookkeepers I know are not accredited accountants. Accounting includes far more than bookkeeping, but bookkeeping is an essential part of it. There is even a "higher" side of management accounting, which still deals with the science, yet only at a very high management level.
CLEVER DEFINITIONS
If I am going to split one out from the other I need to carefully define them or risk the wrath of the net (or even worse, silence). I see programmers as taking descriptions of functionality and making them into code. Software developers on the other hand, analyse specific user domain problems and then design and implement solutions to aide the users in building up their ever increasing piles of data. In that way, programming is just a tiny part of the overall software development. It happens somewhere in the middle. It is the process of "encoding" some functionality into a language as a long set of instructions and doing some testing/fixing to make sure it works. Everything else is software development.
I'm well aware of how our industry and programming culture love to mix together analysis, design, requirements, coding and testing all into one giant lump; for most people these are one in the same operation. Just another day at the office.
I think that mixing these together is a huge mistake because the skill sets are very different from each other. Not to trivialized it, but programming -- such as implementing a function to calculate the Fibonacci sequence -- is reasonably well-understood and well-established. Depending on the functionality, there exists an algorithm or not. At worst, implementing the functionality may require the use of several different algorithms all combined together or modified slightly. Generally, for most types of code, examples already exist and can be modified to fit. The problem of function -> code can have it challenging moments, but ultimately it is not a hard problem if you know what you are building. The key is in knowing.
That is why I really like this differentiation. You see, for all of the software developers out there when they are discussing whether or not it is an art or a craft, or intrinsically hard, etc. what they tend to do is blur the line between analysis and programming. Analysis is hard because we don't know what the users need or what will actually work, but once having settled on a specific algorithm, "coding" it is not all that challenging. Sometimes it involves a bit of research (or should), but after that it's just work.
WHERE THE TROUBLE BEGINS
Well, sometimes. If you sit on enough development teams you quickly come to realize that many programmers have serious weaknesses. We, as techies, love the intricacies of tiny machinery like watches. All those little dials and gears and little things appeal to most programmers at a low level. Not unsurprisingly, our single greatest problem while programming is the tendency to "over-complicate" our solutions. We drift towards pedantic, complex solutions that come from over-thinking the problem. We like the fiddly bits, so we add them wherever possible. We are also "option" happy, adding in tonnes of them that never get used.
You'll see it so often in most code, tonnes of unnecessary variables, conditions, loops. Redundant copies, extra layers of handling, and buckets of "glue" code. Fiddly little bits on diagrams, excessive casting, big ugly useless comments, 2 inch thick designs or manuals, etc. Even programmers who love to call themselves lazy will frequently implement 5,000 lines of code when a mere 200 might do.
It is an epidemic problem with programmers, and I've never met one that wasn't guilty in some form or another. If you think you don't do it, then you've probably not been coding long or hard enough; the simplest, most elegant answer is far simpler and more elegant than most programmers have even begun to realize.
Not only do we constantly over-shoot the code, we also build intricate and complex solutions that drive our users nuts. They're often looking for a quick simple solution, and instead we've build some monolithic all encompassing power-hungry solution were even the simplest bit requires the memorization of masses of new terminology and a three-week course on how to apply it. Manuals, they like to say, are only there to document the design flaws. A reasonable viewpoint, I think.
TRANSFORMERS ARE MORE THAN JUST ROBOTS
Getting back to programming. If indeed you understand the steps necessary to implement your specific functionality, then it is not a particularly hard endeavour. In the end, for most languages, it's some number of variables, a clump of conditions and a few loops, the fewer the better. Programmers "love" to dive into writing some complex code, but most often its either really simple and straight-forward, or there is a well-known algorithm to handle it. Most code is just tying things together and converting between one physical structure of the data and another. These days, the really complex stuff is buried in libraries, far away from most programmer's hands or eyes.
Even more simply, you can see any type of functionality as a transformation on some data. That makes it almost trivial: the data exists in the system or it needs to be loaded, then some algorithm is applied to transform it into some other structure. Then it is saved and/or written out. Programming, from that perspective is not particularly complex; unless we choose to make it so.
When it is complicated, we tend to find really simple reasons why that is true. The most common is that the programmer is making it too complicated, either they've misunderstood the problem or they've misunderstood how the tools work. I've seen enough programmers "flailing" at their keyboards over the years. There is some abstract aspect to programming that some people just never grasp, while other have to work hard to get better at it. Mostly, I think it's some type of anxiety, where people "think" that the problem is hard, so they skip right past the simple solution and start making it really complicated. A kinda of programming fear-of-failure delusion. "It just can't be 'that' simple" we like to tell ourselves.
There are many people afflicted with this type of problem, but fear not if you are one, for most of them coding gets simpler and easier with practice. The real trick is to keep going back and "simplifying" the code, not "adding" to it. E.g. if it doesn't work, don't try to "add in" more logic, instead start stripping it away until it is smaller and simpler. Removing code is the best tool for debugging. It may seem like a slower approach, but it is way way faster than flailing at it. I had a boss once that taught me by leaning over my shoulder and hitting the delete key over and over again. He'd nuke it and make me type it in again. It was the best programming lesson I ever learned (by the third time, you really get it).
FAR TOO CLEVER
Beyond intricate, some programmers gravitate to "clever". They get pulled into really clever ideas that seem like they are going to work really well. Well, at first they seem great. The problem with clever is that it is an extremely "low" level of working. Clever is not simple, in fact it is nearly the opposite. It's a little bit of concentrated complexity all nicely bundled up into a neat programming package. That might work for writing, but it's the type of thing that you come back to "months" later and instantly regret.
Clever you see is just a waste of time at some point in the future. The problem is that to get to something clever, you probably had some cool inspiration. A light went off in your head, or a neat idea popped up in your mind. That's great, but it's not the normal way of thinking. Generally that causes a type of compressed complexity, a neatly packaged clever idea. That makes it a land-mine waiting to get stepped on.
Someone can easy mistake the point or functioning of the code, and in all likelihood unless your lucky enough to get fired, someday, at some point, when you least expect it, you'll have to go back in a rush and try to fix some stupid problem. That, by the way, is always the case with clever. You are essentially just setting yourselves up aren't you?
Given that, however, "abstraction" is not clever. It is a generalization of the purpose of the code, not some cute little syntax trick or something else tricky. When I say clever is bad, some times people take that to mean that "brute force" is good, but that's hardly what I mean either. Pounding out each and every instruction is a huge waste of time, and it's hard to maintain. Brute force is to specific and too large. Clever is too compressed, it took longer to write it, and it's a land-mine.
Good simple short code -- the definition of elegance -- that works at a reasonable level of abstraction so that it can be leveraged, is what does the best for the long term goals of a software development project. A great programmer is someone who can take a hard problem and make the resulting code look simple. It should be so obvious that it doesn't look like a lot of work.
FUNCTIONALLY FLAWED
Another really common problem draws its strength from our unfortunate desire to see programming as an 'art form'. You meet enough programmers who don't want to be engineers, so that don't want any process of any kind. Worse, still, they want the creative "right" to pick a new and unique way of solving each problem, each time. Even if its the same problem over and over again.
And so, by their inconsistency, and the lack of structure they create around themselves an ever increasing vortex of complexity. Mostly you see this with the cowboys, and their fast, yet dangerous band-aid approaches. Cut and pasters are another entertaining variety.
It's quick, its fluid, it works for a while, but like any continual short-term strategy it builds up to the point where it becomes an uncontrollable nightmare.
Fundamentally software development is engineering. We are building something, and we do need to balance out the long-term work with the short-term pressures. Software is saved by the fact that its total ugliness is not visible (if it were there would be a lot of "fired" programmers), but that doesn't mean the effects won't be visible. When you are building "anything" you can only cheat for so long before it becomes unworkable. Sure, if it is a short "assembly" job of combining some pieces together to whack out a simple application for a couple of months, you can get away with a huge number of short-cuts, but once it becomes a multi-year, multi-developer project, each and every short-cut (even the ones that you don't think are actually short-cuts) builds up.
If and when they build up enough, they account for a significant number of project failures. Sadly, "sloppy process" failures are entirely preventable, but only by people who understand them.
BIG BALL OF MUD
If it is not the programmer, or the chaos then it is the functionality itself. It's either poorly specified, or perhaps even just a really "bad" idea. The real trouble in programming doesn't come from feeding in lists of instructions into an abstract machine for execution. Nope. It comes from tying that back to the "real world".
People are irrational, messy and the source of huge problems. If the functionality is not well-defined or it is not "workable", the core reasons behind it almost always come down to people, whether it be limited thinking, politics or egos, it doesn't really matter it's all the same.
All software ultimately is for people to use, so it is actually easy to get the functionality back onto the right track: "pick something simple". Then specify it, in some format that makes it easy to see if it's complete or not. From there, it's just back to programming.
Once in a while, in order to get the system running, the core contains something extremely complex. Generally this is some type of engine or parser or processor or something extremely hard. The really heavy duty programming can be tough, particularly if it is breaking new ground, but it rarely accounts for even a significant percentage of the overall system. Writing good heavy weight code generally involves a strong understanding of some complex discipline or the actual problem domain. Ultimately thought, even the most complex "engine" breaks down into a large number of simple functions. The trick is not writing the pieces, it is getting them to all work together in some intricate, yet simple and elegant solution, a problem which is clearly "architectural" in nature and not really programming.
What hooks a lot of people is that they tackle complex functionality without considering architecture, so the result is a lot of hit and miss attempts to get it all working together properly. If you build the mechanics into the architecture at the general level, then the lower-levels are just specific algorithms to transform data from one stage in the process into another. The code doesn't really fail, it's the architecture that convoluted the process and makes it messy.
No architecture? No wonder your having problems. You wouldn't build a house without first designing the internal frames, so why wouldn't you do the same for your code?
THE LAST FEW HANDS
Programming, then by itself, is relatively simple. That's hardly surprising as you find that in a lot of specific problem domains, many of the programmers are actually domain experts, not Computer Scientists. You don't need a Computer Science degree to write code. In a very real sense, that is why it is closely aligned with bookkeeping, even though I realize that a lot of people might take offense at that comparison. But, like it or not, great reams of domain-specific code is easily written by other disciplines. And, even more horrifying to admit, for you basic bread and butter medium-weight programming work, a degree in computer science is over-kill. You don't have to know about Turing machines to create a screen in a GUI to accept human resources data. You don't need to understand the halting problem to write a social web-app. The expressibility of SQL; does it really matter?
These things have their place in software development, but not necessarily in most programming, they usually only come into play in the core of the technical aspect to a solution, something that is generally wrapped in a framework or infrastructure.
Programming still has it moments when time is tight and you are having trouble focusing, but for most people, after about five years of steady coding it mostly becomes instinctual. I know, there are still readers out there that have been at it a longer time and are still struggling, but if they are fair about why they are struggling, the reasons come down to not really knowing what they are building, as they are building it. It's personal, architectural or analysis, not programming. Really it's a bigger problem.
Software development, on the other hand is extremely young, completely unfinished, and extremely complex. It's the type of thing that people just don't get, and is really hard, even at its simplest level.
You learn this, intrinsically, when you end up in meetings with users who are insisting that the software work in a specific way, while you are quite aware that it is impossible. Not just difficult, but completely and utterly impossible. Yet, it becomes very difficult to explain why it won't work. The certainty is there from experience, yet the ability to simplify it and pass that knowledge onto to someone else is lacking.
In that overlap between people and mathematics, the grey area in there is a largely unexplored, unknown world of fantastically complex problems that we haven't even begun to enunciate yet, let alone tackle. We missing at least one if not many different sciences that make up the knowledge needed to build "reliable" complex systems. We're pretty much guessing at it right now, when we should be far more knowledgeable about what works and what doesn't.
Still, while there are many great problems left to solve in Computer Science, and there is still a whole 'process' left to create to solve the on-going "software crisis", the act of programming is not among the key problems. Our biggest issue with programming is our constantly confusing the issues, and trying to fit a one-size-fits-all approach to unify programming and software development. Getting back to my initial point, if you see them as different, then it becomes easier to see and deal with understanding their own unique issues. A bit of structure can be a grand thing.
Friday, May 2, 2008
Software Blueprints
Over at www.hans-eric.com -- Hans-Eric Grönlund's most excellent blog -- an interesting discussion occurred in the comments for the blog entry "is agile only for the elites?":
http://www.hans-eric.com/2008/03/28/is-agile-only-for-elites/
I got into a conversation with Jack Repenning; we were talking about the ideas of Alan Cooper. The focus arrived at the point where Jack questioned whether or not blueprints for software can even exist. It is a 'big' topic, one that does not easily fit into a comment box, but one that I have given a great deal of consideration to lately. I thought, given the timing, that I could elaborate on my thoughts in a full blown blog entry.
The crux of the matter is whether or not there exists some way to create a blueprint from which a complex piece of software can be developed. To an outsider, that may seem like an easy question: "of course" would be their likely answer, but if that were the case one would expect that 'blueprints' in some fashion or other would already exist in the standard software development environment.
We do have lots of documentation, and big projects often have detailed designs, but most commonly the ground-floor programmers ignore the designs in their quest to actually get the code up and running. The belief for why this occurs is that there are too many details in software, which are constantly changing as the code is being built. This swirling vortex of chaos invalidates the design, long before the programmers ever get close enough to starting, rendering the bulk of the design effort moot.
Instead, the more modern idea in software development has been to work with small tight iterations, some upfront design, but essentially you set the project free and let it go where it wants. The small cycles mitigate the risks, and the provide quick feedback to the team, so that they can change with the prevailing winds. Thus the chaos that destroys the usefulness of the grand design / master plan is harnessed as the driving force behind the short iterations. A nice idea, but limited in some ways.
DEEP, DARK AND DAMP QUESTIONS
There obviously are a lot of complex ideas and questions floating around in our development processes. To get at the root, we need to find a good solid place to start, so the first and most obvious question is: "whether or not we can actually create a blueprint for software". By this I mean two things: a) a representation of the system that is 'small' enough that its flaws can be detected by observation and b) the validity of the blueprints stays intact throughout the entire development; the chaos does not significantly damage it. Without these two things, the blueprint is just a management exercise in wasting time. If it isn't small enough to provide value or it is out-of-date even before it is finished, then it is not worth considering.
So our idea blueprint is just a summary of the non-volatile details at a level that can be used to deterministically build the lower levels. Not all things need to be specified, but those things that are not, will not derail the project. I.e. if the programmer picks 'i' for the name of their iterator variable, instead of 'index' the net effect will be the same. If the programmer picks a bubble sort algorithm instead of a quick sort one, the net effect will also be the same. If the programmer chooses a drop-down list, instead of a table, again the net effect will be the same. If a programmer changes the way a major specified formula is calculated, there will be trouble. If they change the way the components are structured, again the results will be bad. The details in the blueprint are the necessary ones.
We can skip past the general question of whether or not a blueprint can actually exist. Jack Reeves put forward an interestingly unique idea twenty years ago when he suggested that the 'code' was the blueprint:
http://www.developerdotstar.com/mag/articles/reeves_design_main.html
it is a fascinating idea: the programmer is the designer and the computer is actually the worker. In that sense, even with modern interpreted languages, the code in the source code control could be considered the blueprint while the packed, distributed and deployed systems are the final products. Packaging is manufacturing. Interpreted or not, the idea doesn't change.
The great flaw, I think, in this perspective is that a huge advantage of doing a blueprint is to check to insure the major bits are all correct and in place long before the work starts. That implies that a blueprint, to be useful, must be a small compact representation of the 'final' design. Going back to my comments with Jack Repenning, I think he was surprised that I said one should be able to create a blueprint for an operating system in six months to a year. The key point in that, was that anything longer was probably not short enough to be able to get the maximum benefit out of doing the work to create a separate blueprint in the first place. The work needs value to be effective. No value, no need for a blueprint. As such, than I easily expect that if a format for creating a useful blueprint really exists for software, specifically in this case for a new operating system, that it is absolutely should not require much more than a year to get down the details. The longer the time, the more useless the work, but I will get back into that later (the time doesn't have to be from scratch).
A PILE OF UNIX
UNIX keeps popping up in this discussion not necessarily for what it is -- although operating systems are very complex and getting more so all of the time -- but because it is the root of all programs, is extremely well understood on a technical level, contains minimal messy business issues and for the most part is extremely well-documented. Yes, well documented. If you hit the book store (and find a few out-of-prints), you can gather a collection of books on UNIX that include the Steven's book, the list of the internal data structures, man pages, a reference book for every major tool in UNIX, and a few for non-major ones. Throw in excellent books like the AWK one, and another couple on shell programming, and you have a huge number of books on UNIX, in general, and in specific.
So imagine one day, that absolutely every 'bit' in UNIX is erased. Poof. Gone. Destroyed. Dumped into the bit bucket. Lost forever, with absolutely no way to get it back. If we ran around the various libraries (and a few houses) and gathered together all the books on UNIX that we have, would that be enough information for us to re-create it?
The answer is emphatically: yes. Linux is a near-perfect example, as it was based on second-hand knowledge of Minux and other UNIXes. We could assemble a team of programmers, give them the UNIX books and set them to work recreating the software. Of course, depending on their focus on the details in the books, and their interpretation, the results would be a little different, but UNIX would once again exist, and it could be similar to the original. They might even fix a few small systemic problems.
From Jack Reeves we know that depending on definition, there is a design, and from the pile of UNIX books we know that it doesn't have to be in the form of 'code' as we know it. More over, from the books we know it can be full of ambiguities, and human-based inaccuracies. It just has to be enough to get the programmers 'there' but it doesn't have to be perfect, that is not its goal or function.
The obvious problem with the pile of books is that it is too big by far to be useful. With something that size it is easy for human nature to take over and let the programmers start creatively interpreting what they read and 'fixing' it, spiraling the project into the never-ending effort that essential fails to get done. There is, it seems a correlation between the size of the design and the desire for the programmers to ignore it and go their own way. Anyone on a huge mega-project has probably felt that at some point or another. But, that is really an issue of discipline and organization, not content.
Still, if we cannot find something small enough, we do not really have solid 'working' blueprints.
THE REALLY BIG GUYS
This seems like a good time for a tangent to me. Why flow linearly when we have the freedom to jump around a bit, it is far more exciting.
I wish I knew more about the evolution of our modern construction techniques, starting with huts and ending in skyscrapers. Construction and design are incremental, skyscraper designs have been evolving over the last 100 years, probably starting long before the Eiffel tower. They built the tower in the world's fair as proof that a steel structure was viable and would allow buildings to exceed a simple three floor minimum. This was a momentous leap in construction. One step in the many that have lead to our modern behemoths. Skyscrapers, then didn't just spring into existence, they evolved, design after design. Each time getting larger, and more complex.
What makes them so fascinating is that they are phenomenally complicated structures to build, yet even with that, once started they usually get done, and they rarely fall down. It is absolutely that type of 'record' that makes any old-time programmer drool. What we wanted twenty years ago, what we hoped for, was the ability to build complex systems that actually worked. And to build them in a way where we weren't just taking wild guesses at how to get it done. Should any piece of software achieve the elegance and beauty of even the ugliest skyscraper, with a comparable amount of complexity, that system would just blow us away. They don't guess about minimum concrete thickness and hope to get it correct. We do.
But then again, I am guessing (sorry it is habitual, as part of the profession). Skyscrapers are designed but they don't really start from scratch with each design either. There is an evolutionary process where the buildings grow on the principles and standards of other buildings. There is, something to that, that we need to understand.
I know it takes years to validate the designs, and usually longer than a year to actually build it, but if you put all of the man-years of effort to get the 'core' of the skyscraper built up against any one of our longer running commercial software packages, I'm tempted to guess that there was actually less time spent on the skyscraper. We pour a tremendous amount of effort and testing into many of our commercial products; a staggeringly mod-bogglingly amount of effort if you trace some of the more notorious ones right back to their origins. They are huge sinkholes for effort.
I've never seen the blueprints for a skyscraper, but I'd guess that they are a lot smaller than my earlier pile of UNIX books. We should marvel at how they build a building that large and complex, with ways less documentation, and big distributed teams of multi-disciplinary specialists, while insuring that the quality of work is good to excellent. Complexity for complexity, pit that against an equally sized software project, and consider that the initials odds of the code even getting partially finished are way less than 50/50. What have they got that we don't?
BRING IT HOME TO US
From this viewpoint, I find it hard to believe that there isn't some obvious form of blueprint. After all we definitely know it can exist, it's just that in that case it is too large to be useful.
One of the favorite arguments against the existence of blueprints is the circular one that if it were possible it would exist already. That ignores two keys issues a) computer science is still young, so we've barely started building things and b) the biggest works of computer science have been behind closed doors. In the second case, someone may have already come up with the perfect format, we just aren't aware of it yet. However, given the nature of the industry, this type of thing has little IP and big bragging rights, so its likely that unless their was fear of Microsoft getting their hands on it and wreaking havoc, it would have made it out into the general public pretty swiftly.
To me a more likely explanation is culture. Right from the beginning, programmers have been trying to distance themselves from engineers. It's that inherent desire to not be highly constrained during development that is probably the most likely explanation for not having blueprints. We don't have them, because nobody is looking. It's opposite to the hacker culture. The freewheeling chaos of the agile extremist movement is the type of dynamic environment that most programmers really want to work in. Work should be fast, fun and flexible.
While it's hard to argue with that, I do find that fast, fun and flexible often leads to 'fucked', which is a downer. I guess as I've gotten older I am less strongly motivated to whack out code and more motivated to build complex, sophisticated machinery, that I know -- not guessing -- will do the proper job correctly. What good is a dynamic environment if none of the stuff actually works? Even if you enjoy work, are you really proud of 'that' mess you made in the code? You knew the right way to build it, so why didn't you? Is it always somebody else's fault?
So, if we can get past our biases and imagine both a world were blueprints really do exist -- but not at the cost of making coding some 'dreaded cog' like position -- it is easier to realize that there aren't any easy concrete reasons why we don't have or use blueprints. It works for some really big and complex things, it could work for us too.
More interestingly, at least for myself, if not for a large variety of other programmers, I always visualize the code before I set down at the keyboard. Internally, I can see what I am writing, but I don't see it as a massive sequence of steps to execute. I see it as some sort of 'abstract machine', but it is in my head without real physical manifestation; indescribable, but always enough to allow me to run through its operation to make sure it will work correctly. So I know that 'it' can be smaller, and 'it' can fit in my head, but unfortunately for me, I have no idea how to pass 'it' onwards to other people.
Also, human as I am, sometime my internal design conveniently skips a point or two in physics, so that making it work in the real world is similar but not an exact translation. Still, it is exactly that type of internal model that has ensured that the systems I have worked on over the years start their development with a better than fighting chance to survive. When they didn't make it to a release, it wasn't coding problems that ever brought them down.
Guessing at what works, and knowing it are two different things altogether. The importance of removing the 'guessing' from computer science cannot be understated. It is the big evil fire-breathing six tonne dragon sitting in the room with us, each time we get passionately talking to management about how we are going to solve the next problem. We 'think' we know how to do it, but ultimately as we all cry during the estimation stage, were not sure because we've never done 'this' before.
Oddly having some established place to start wouldn't be all that bad. if you could take something you know worked, and then use it to push the envelope, the stresses would probably be way less. All of that FUD that comes midway through a big development, as the various parties are starting to lose faith, that could be avoided. The last plan at least should have been a working system, so there is always a fallback if things are too ambitious. It is these types of attributes, plus the ability to set lose the design on a large team and 'know' that it will be built correctly that should make all software architects extremely envious of their construction peers. That type of certainly only belongs to programmers that successfully delude themselves, or those who have actually finished their third system -- end-to-end -- without fail. The latter being an extremely rare breed in software.
THE NATURE OF A BLUEPRINT
A short, simple model to prove that a specific design will work as predicted is a small thing in most industries, but a huge one in programming. Sometimes when it is hard to visualize, I like to go at a problem by addition and subtraction. Vaguely, what is it, what is it not? In this, there are attributes that a blueprint must absolutely not have:
- too much detail, wasted effort
- too many pretty charts, seriously wasted effort
- things that don't make large or architectural differences
but there are some attributes that have to be there:
- all details that really make a difference (pivotal ones)
- how to handle errors, every program fails, must deal with it
- all vertical and horizontal architectural lines, (and why they exist)
I'm sure there is more, but one shouldn't take all of the fun out of it. Whatever is there, it must be based around a minimalist view. Too much stuff doesn't just waste time, it doubles up its effect by obscuring the details that really matter.
To this I should add a quick note: with any complex business problem, there is an inherent amount of chaos built directly into it, that is constantly changing. We know this, it is the Achilles heel that brings down a lot of projects. The agile approach is to embrace this and change with it. My alternative is to take the changes themselves and use them as the lines in which to draw the architecture. So, instead of getting aggravated with the users as they ping pong between a series of options, you simply except that allowing the ping-ponging itself is one of the requirements. In that way, instead of running from the chaos, you turn and embrace it as part of the design. Yes, to some degree it is more expensive, but if you consider that the alternative could be failing, then it is far far cheaper.
A WILD GUESS
With the above set of attributes, I can make a completely wild guess as to what should be in a blueprint.
In a sense, you can see a function in a computer as a set of instructions tied with a context. If the code is brute forced, then for each and every function you have an explicit listing of all of the steps. The more modular, the more code that is shared between functions. The more generalized, the higher and more abstract the code. In any way, there is some low level of detail within the steps and their arrangement that is needed to actually make the code work, but there is at least one higher level of detail that imparts an understanding of how the code works, without actually explicitly laying out all of the details.
In a sense we could split it into micro-coding, the actual list of instructions in a specific language, and macro-coding, the essence of creating the list in a higher representation. A function written in macro-coding is a simplified summary of the 'main' details, but is not specific enough to work. It needs, added to it, all of the micro-coded details. Pseudo code is one common form of macro-coding, but its general practice is still fairly low-level. The value, in this is to find that higher level expression that still defines the solution to the problem, without going to far and disconnecting the programmer.
A useful blueprint, then is the 'shortest' high-level representation of the code that is able to let one of more humans explicitly 'understand' what the code is going to do, without being detailed enough or rigorous enough to actually be the code.
FINAL THOUGHTS
The definition of insane -- they like to tell us jokingly -- is to continue to do the same things over and over again, but expect a different result. Given that, by any standard, the better part of the whole software industry is totally 'whacked'. What started as a 'software crisis' well over fifty years ago is a full-blown software calamity. We depend so heavily on this stuff, yet we have no idea how to actually build it, and every new generation of programmers goes back to nearly ground-zero to just remake the same crappy mistakes over and over again. We are in this endless bad loop of churning out near-misses. Things that 'almost' work. Stuff that kinda does what it is supposed to, so long as you don't 'push' it too hard. Bleck!
Mostly, skyscrapers don't have even a fraction of the types of problems that our best examples of software are plagued with. Yes, time is a key difference, but also the evolutionary cycle of just enhancing the last design a bit is clearly a big part of it. Each step is validated, but the next step can make real leaps. Slowly, but surely the buildings have improved.
Another reason is that if they built skyscrapers using the same type of twenty year process of just 'slapping' on new parts, in much the same way we try to 'iterate' from a starting point into something greater, the building would be so damned ugly and scary that the local government would go after the owners and make them take it down, either because it was an eye-sore, or because it was just plain dangerous. Most software is fortunate that it is not visible to the naked eye or it would suffer a similar fate.
The big problem with software is that we are not learning from our past mistakes, yet we are frequently exceeding our own thresholds of complexity. We build too big, too fast, and we have no way of going back and learning from it. A single programmer might grow over a long career, and help move a team into building more complex things, but we really are an industry that puts our older programmers out to pasture way too quickly, so there is no history, nothing to build on.
Blueprints then, wouldn't just leverage a small number of people's expertise, they would also allow retrospective mining of the strengths and weakness of various systems built over the years. That's the type of knowledge that we are sorely missing right now, yet we could easily use. It leaks out of the industry faster than we can restore it. We aren't even leveraging 10% of a computer's ability, and we are running around like chickens with our heads cut off just to keep our existing poor band-aid solutions from falling over. We really need to find a way through this.
Nothing is more important than using our time wisely. And that comes from not just winging it, or guessing and being wrong. Luck is a harsh mistress. The difference between being able to hack out a quick solution that hopefully works, and being able to carefully assemble a solid working solution to a problem is absolutely huge, but a completely misunderstood distinction for the industry. Setting down one person's vision, then evolving it to the next level, in a way that is transparent, and documented is such a huge fundamental change to software that if we were to find working blueprints the consequences to not only our industry, but also are societies would be immense. An evolutionary leap worth being a part of.
In the end though, this isn't only about programmers wanting to leverage their design abilities so that they can build bigger systems. Instead this is about freeing our industry from an ongoing crisis / dark-age and allowing us to finally utilize the hardware correctly to build truly wonderful software that really does help its users allowing them to reach higher levels of productivity and understanding. You know, all of those platitudes that vendors have been tossing into their commercials for years and years, that just haven't materialize in our current technologies. A promise we made, but are unable to keep.
http://www.hans-eric.com/2008/03/28/is-agile-only-for-elites/
I got into a conversation with Jack Repenning; we were talking about the ideas of Alan Cooper. The focus arrived at the point where Jack questioned whether or not blueprints for software can even exist. It is a 'big' topic, one that does not easily fit into a comment box, but one that I have given a great deal of consideration to lately. I thought, given the timing, that I could elaborate on my thoughts in a full blown blog entry.
The crux of the matter is whether or not there exists some way to create a blueprint from which a complex piece of software can be developed. To an outsider, that may seem like an easy question: "of course" would be their likely answer, but if that were the case one would expect that 'blueprints' in some fashion or other would already exist in the standard software development environment.
We do have lots of documentation, and big projects often have detailed designs, but most commonly the ground-floor programmers ignore the designs in their quest to actually get the code up and running. The belief for why this occurs is that there are too many details in software, which are constantly changing as the code is being built. This swirling vortex of chaos invalidates the design, long before the programmers ever get close enough to starting, rendering the bulk of the design effort moot.
Instead, the more modern idea in software development has been to work with small tight iterations, some upfront design, but essentially you set the project free and let it go where it wants. The small cycles mitigate the risks, and the provide quick feedback to the team, so that they can change with the prevailing winds. Thus the chaos that destroys the usefulness of the grand design / master plan is harnessed as the driving force behind the short iterations. A nice idea, but limited in some ways.
DEEP, DARK AND DAMP QUESTIONS
There obviously are a lot of complex ideas and questions floating around in our development processes. To get at the root, we need to find a good solid place to start, so the first and most obvious question is: "whether or not we can actually create a blueprint for software". By this I mean two things: a) a representation of the system that is 'small' enough that its flaws can be detected by observation and b) the validity of the blueprints stays intact throughout the entire development; the chaos does not significantly damage it. Without these two things, the blueprint is just a management exercise in wasting time. If it isn't small enough to provide value or it is out-of-date even before it is finished, then it is not worth considering.
So our idea blueprint is just a summary of the non-volatile details at a level that can be used to deterministically build the lower levels. Not all things need to be specified, but those things that are not, will not derail the project. I.e. if the programmer picks 'i' for the name of their iterator variable, instead of 'index' the net effect will be the same. If the programmer picks a bubble sort algorithm instead of a quick sort one, the net effect will also be the same. If the programmer chooses a drop-down list, instead of a table, again the net effect will be the same. If a programmer changes the way a major specified formula is calculated, there will be trouble. If they change the way the components are structured, again the results will be bad. The details in the blueprint are the necessary ones.
We can skip past the general question of whether or not a blueprint can actually exist. Jack Reeves put forward an interestingly unique idea twenty years ago when he suggested that the 'code' was the blueprint:
http://www.developerdotstar.com/mag/articles/reeves_design_main.html
it is a fascinating idea: the programmer is the designer and the computer is actually the worker. In that sense, even with modern interpreted languages, the code in the source code control could be considered the blueprint while the packed, distributed and deployed systems are the final products. Packaging is manufacturing. Interpreted or not, the idea doesn't change.
The great flaw, I think, in this perspective is that a huge advantage of doing a blueprint is to check to insure the major bits are all correct and in place long before the work starts. That implies that a blueprint, to be useful, must be a small compact representation of the 'final' design. Going back to my comments with Jack Repenning, I think he was surprised that I said one should be able to create a blueprint for an operating system in six months to a year. The key point in that, was that anything longer was probably not short enough to be able to get the maximum benefit out of doing the work to create a separate blueprint in the first place. The work needs value to be effective. No value, no need for a blueprint. As such, than I easily expect that if a format for creating a useful blueprint really exists for software, specifically in this case for a new operating system, that it is absolutely should not require much more than a year to get down the details. The longer the time, the more useless the work, but I will get back into that later (the time doesn't have to be from scratch).
A PILE OF UNIX
UNIX keeps popping up in this discussion not necessarily for what it is -- although operating systems are very complex and getting more so all of the time -- but because it is the root of all programs, is extremely well understood on a technical level, contains minimal messy business issues and for the most part is extremely well-documented. Yes, well documented. If you hit the book store (and find a few out-of-prints), you can gather a collection of books on UNIX that include the Steven's book, the list of the internal data structures, man pages, a reference book for every major tool in UNIX, and a few for non-major ones. Throw in excellent books like the AWK one, and another couple on shell programming, and you have a huge number of books on UNIX, in general, and in specific.
So imagine one day, that absolutely every 'bit' in UNIX is erased. Poof. Gone. Destroyed. Dumped into the bit bucket. Lost forever, with absolutely no way to get it back. If we ran around the various libraries (and a few houses) and gathered together all the books on UNIX that we have, would that be enough information for us to re-create it?
The answer is emphatically: yes. Linux is a near-perfect example, as it was based on second-hand knowledge of Minux and other UNIXes. We could assemble a team of programmers, give them the UNIX books and set them to work recreating the software. Of course, depending on their focus on the details in the books, and their interpretation, the results would be a little different, but UNIX would once again exist, and it could be similar to the original. They might even fix a few small systemic problems.
From Jack Reeves we know that depending on definition, there is a design, and from the pile of UNIX books we know that it doesn't have to be in the form of 'code' as we know it. More over, from the books we know it can be full of ambiguities, and human-based inaccuracies. It just has to be enough to get the programmers 'there' but it doesn't have to be perfect, that is not its goal or function.
The obvious problem with the pile of books is that it is too big by far to be useful. With something that size it is easy for human nature to take over and let the programmers start creatively interpreting what they read and 'fixing' it, spiraling the project into the never-ending effort that essential fails to get done. There is, it seems a correlation between the size of the design and the desire for the programmers to ignore it and go their own way. Anyone on a huge mega-project has probably felt that at some point or another. But, that is really an issue of discipline and organization, not content.
Still, if we cannot find something small enough, we do not really have solid 'working' blueprints.
THE REALLY BIG GUYS
This seems like a good time for a tangent to me. Why flow linearly when we have the freedom to jump around a bit, it is far more exciting.
I wish I knew more about the evolution of our modern construction techniques, starting with huts and ending in skyscrapers. Construction and design are incremental, skyscraper designs have been evolving over the last 100 years, probably starting long before the Eiffel tower. They built the tower in the world's fair as proof that a steel structure was viable and would allow buildings to exceed a simple three floor minimum. This was a momentous leap in construction. One step in the many that have lead to our modern behemoths. Skyscrapers, then didn't just spring into existence, they evolved, design after design. Each time getting larger, and more complex.
What makes them so fascinating is that they are phenomenally complicated structures to build, yet even with that, once started they usually get done, and they rarely fall down. It is absolutely that type of 'record' that makes any old-time programmer drool. What we wanted twenty years ago, what we hoped for, was the ability to build complex systems that actually worked. And to build them in a way where we weren't just taking wild guesses at how to get it done. Should any piece of software achieve the elegance and beauty of even the ugliest skyscraper, with a comparable amount of complexity, that system would just blow us away. They don't guess about minimum concrete thickness and hope to get it correct. We do.
But then again, I am guessing (sorry it is habitual, as part of the profession). Skyscrapers are designed but they don't really start from scratch with each design either. There is an evolutionary process where the buildings grow on the principles and standards of other buildings. There is, something to that, that we need to understand.
I know it takes years to validate the designs, and usually longer than a year to actually build it, but if you put all of the man-years of effort to get the 'core' of the skyscraper built up against any one of our longer running commercial software packages, I'm tempted to guess that there was actually less time spent on the skyscraper. We pour a tremendous amount of effort and testing into many of our commercial products; a staggeringly mod-bogglingly amount of effort if you trace some of the more notorious ones right back to their origins. They are huge sinkholes for effort.
I've never seen the blueprints for a skyscraper, but I'd guess that they are a lot smaller than my earlier pile of UNIX books. We should marvel at how they build a building that large and complex, with ways less documentation, and big distributed teams of multi-disciplinary specialists, while insuring that the quality of work is good to excellent. Complexity for complexity, pit that against an equally sized software project, and consider that the initials odds of the code even getting partially finished are way less than 50/50. What have they got that we don't?
BRING IT HOME TO US
From this viewpoint, I find it hard to believe that there isn't some obvious form of blueprint. After all we definitely know it can exist, it's just that in that case it is too large to be useful.
One of the favorite arguments against the existence of blueprints is the circular one that if it were possible it would exist already. That ignores two keys issues a) computer science is still young, so we've barely started building things and b) the biggest works of computer science have been behind closed doors. In the second case, someone may have already come up with the perfect format, we just aren't aware of it yet. However, given the nature of the industry, this type of thing has little IP and big bragging rights, so its likely that unless their was fear of Microsoft getting their hands on it and wreaking havoc, it would have made it out into the general public pretty swiftly.
To me a more likely explanation is culture. Right from the beginning, programmers have been trying to distance themselves from engineers. It's that inherent desire to not be highly constrained during development that is probably the most likely explanation for not having blueprints. We don't have them, because nobody is looking. It's opposite to the hacker culture. The freewheeling chaos of the agile extremist movement is the type of dynamic environment that most programmers really want to work in. Work should be fast, fun and flexible.
While it's hard to argue with that, I do find that fast, fun and flexible often leads to 'fucked', which is a downer. I guess as I've gotten older I am less strongly motivated to whack out code and more motivated to build complex, sophisticated machinery, that I know -- not guessing -- will do the proper job correctly. What good is a dynamic environment if none of the stuff actually works? Even if you enjoy work, are you really proud of 'that' mess you made in the code? You knew the right way to build it, so why didn't you? Is it always somebody else's fault?
So, if we can get past our biases and imagine both a world were blueprints really do exist -- but not at the cost of making coding some 'dreaded cog' like position -- it is easier to realize that there aren't any easy concrete reasons why we don't have or use blueprints. It works for some really big and complex things, it could work for us too.
More interestingly, at least for myself, if not for a large variety of other programmers, I always visualize the code before I set down at the keyboard. Internally, I can see what I am writing, but I don't see it as a massive sequence of steps to execute. I see it as some sort of 'abstract machine', but it is in my head without real physical manifestation; indescribable, but always enough to allow me to run through its operation to make sure it will work correctly. So I know that 'it' can be smaller, and 'it' can fit in my head, but unfortunately for me, I have no idea how to pass 'it' onwards to other people.
Also, human as I am, sometime my internal design conveniently skips a point or two in physics, so that making it work in the real world is similar but not an exact translation. Still, it is exactly that type of internal model that has ensured that the systems I have worked on over the years start their development with a better than fighting chance to survive. When they didn't make it to a release, it wasn't coding problems that ever brought them down.
Guessing at what works, and knowing it are two different things altogether. The importance of removing the 'guessing' from computer science cannot be understated. It is the big evil fire-breathing six tonne dragon sitting in the room with us, each time we get passionately talking to management about how we are going to solve the next problem. We 'think' we know how to do it, but ultimately as we all cry during the estimation stage, were not sure because we've never done 'this' before.
Oddly having some established place to start wouldn't be all that bad. if you could take something you know worked, and then use it to push the envelope, the stresses would probably be way less. All of that FUD that comes midway through a big development, as the various parties are starting to lose faith, that could be avoided. The last plan at least should have been a working system, so there is always a fallback if things are too ambitious. It is these types of attributes, plus the ability to set lose the design on a large team and 'know' that it will be built correctly that should make all software architects extremely envious of their construction peers. That type of certainly only belongs to programmers that successfully delude themselves, or those who have actually finished their third system -- end-to-end -- without fail. The latter being an extremely rare breed in software.
THE NATURE OF A BLUEPRINT
A short, simple model to prove that a specific design will work as predicted is a small thing in most industries, but a huge one in programming. Sometimes when it is hard to visualize, I like to go at a problem by addition and subtraction. Vaguely, what is it, what is it not? In this, there are attributes that a blueprint must absolutely not have:
- too much detail, wasted effort
- too many pretty charts, seriously wasted effort
- things that don't make large or architectural differences
but there are some attributes that have to be there:
- all details that really make a difference (pivotal ones)
- how to handle errors, every program fails, must deal with it
- all vertical and horizontal architectural lines, (and why they exist)
I'm sure there is more, but one shouldn't take all of the fun out of it. Whatever is there, it must be based around a minimalist view. Too much stuff doesn't just waste time, it doubles up its effect by obscuring the details that really matter.
To this I should add a quick note: with any complex business problem, there is an inherent amount of chaos built directly into it, that is constantly changing. We know this, it is the Achilles heel that brings down a lot of projects. The agile approach is to embrace this and change with it. My alternative is to take the changes themselves and use them as the lines in which to draw the architecture. So, instead of getting aggravated with the users as they ping pong between a series of options, you simply except that allowing the ping-ponging itself is one of the requirements. In that way, instead of running from the chaos, you turn and embrace it as part of the design. Yes, to some degree it is more expensive, but if you consider that the alternative could be failing, then it is far far cheaper.
A WILD GUESS
With the above set of attributes, I can make a completely wild guess as to what should be in a blueprint.
In a sense, you can see a function in a computer as a set of instructions tied with a context. If the code is brute forced, then for each and every function you have an explicit listing of all of the steps. The more modular, the more code that is shared between functions. The more generalized, the higher and more abstract the code. In any way, there is some low level of detail within the steps and their arrangement that is needed to actually make the code work, but there is at least one higher level of detail that imparts an understanding of how the code works, without actually explicitly laying out all of the details.
In a sense we could split it into micro-coding, the actual list of instructions in a specific language, and macro-coding, the essence of creating the list in a higher representation. A function written in macro-coding is a simplified summary of the 'main' details, but is not specific enough to work. It needs, added to it, all of the micro-coded details. Pseudo code is one common form of macro-coding, but its general practice is still fairly low-level. The value, in this is to find that higher level expression that still defines the solution to the problem, without going to far and disconnecting the programmer.
A useful blueprint, then is the 'shortest' high-level representation of the code that is able to let one of more humans explicitly 'understand' what the code is going to do, without being detailed enough or rigorous enough to actually be the code.
FINAL THOUGHTS
The definition of insane -- they like to tell us jokingly -- is to continue to do the same things over and over again, but expect a different result. Given that, by any standard, the better part of the whole software industry is totally 'whacked'. What started as a 'software crisis' well over fifty years ago is a full-blown software calamity. We depend so heavily on this stuff, yet we have no idea how to actually build it, and every new generation of programmers goes back to nearly ground-zero to just remake the same crappy mistakes over and over again. We are in this endless bad loop of churning out near-misses. Things that 'almost' work. Stuff that kinda does what it is supposed to, so long as you don't 'push' it too hard. Bleck!
Mostly, skyscrapers don't have even a fraction of the types of problems that our best examples of software are plagued with. Yes, time is a key difference, but also the evolutionary cycle of just enhancing the last design a bit is clearly a big part of it. Each step is validated, but the next step can make real leaps. Slowly, but surely the buildings have improved.
Another reason is that if they built skyscrapers using the same type of twenty year process of just 'slapping' on new parts, in much the same way we try to 'iterate' from a starting point into something greater, the building would be so damned ugly and scary that the local government would go after the owners and make them take it down, either because it was an eye-sore, or because it was just plain dangerous. Most software is fortunate that it is not visible to the naked eye or it would suffer a similar fate.
The big problem with software is that we are not learning from our past mistakes, yet we are frequently exceeding our own thresholds of complexity. We build too big, too fast, and we have no way of going back and learning from it. A single programmer might grow over a long career, and help move a team into building more complex things, but we really are an industry that puts our older programmers out to pasture way too quickly, so there is no history, nothing to build on.
Blueprints then, wouldn't just leverage a small number of people's expertise, they would also allow retrospective mining of the strengths and weakness of various systems built over the years. That's the type of knowledge that we are sorely missing right now, yet we could easily use. It leaks out of the industry faster than we can restore it. We aren't even leveraging 10% of a computer's ability, and we are running around like chickens with our heads cut off just to keep our existing poor band-aid solutions from falling over. We really need to find a way through this.
Nothing is more important than using our time wisely. And that comes from not just winging it, or guessing and being wrong. Luck is a harsh mistress. The difference between being able to hack out a quick solution that hopefully works, and being able to carefully assemble a solid working solution to a problem is absolutely huge, but a completely misunderstood distinction for the industry. Setting down one person's vision, then evolving it to the next level, in a way that is transparent, and documented is such a huge fundamental change to software that if we were to find working blueprints the consequences to not only our industry, but also are societies would be immense. An evolutionary leap worth being a part of.
In the end though, this isn't only about programmers wanting to leverage their design abilities so that they can build bigger systems. Instead this is about freeing our industry from an ongoing crisis / dark-age and allowing us to finally utilize the hardware correctly to build truly wonderful software that really does help its users allowing them to reach higher levels of productivity and understanding. You know, all of those platitudes that vendors have been tossing into their commercials for years and years, that just haven't materialize in our current technologies. A promise we made, but are unable to keep.
Subscribe to:
Posts (Atom)