Sunday, February 10, 2008

The Power of Expression

My writing archives are littered with half-completed, mostly dead posts on the nature of expression. More so than any other topic, this one has defeated any attempt to roll up my ideas into a coherent, finished piece of work.

Writing needs to come together in a way that leads the reader on a journey, while leaving them satisfied at the end. Half-thoughts, while interesting, leave the reader longing for more. Sort of like an appetizer with no main course. You won't starve to death, but your still very hungry afterwards.

To get around that, this post is -- I guess -- a series of appetizers; hopefully enough to fulfil. If you don't like one, perhaps some of the following might be more satisfying. If you keep reading long enough, hopefully you'll be satiated. If your still hungry at the end, stay tuned, there will always be more.


THE NEVER CHANGING ELEMENTS OF DEVELOPMENT

Software development is all about building 'tools' for users to play with their piles of data. The most important attribute for any tool is that it is usable. The second most important attribute is that it is extensible, or at very least it can be kept updated.

A tool that only worked for a short time is a pain. Any investment in learning how to utilize it, is squandered when it is no longer available. Even the best tools take some effort to master, while most of the software ones take huge effort because of their poor designs.

There is a implicit covenant between the users and the programmers. The users commit to learning how to utilized the tools only if we commit to building and maintaining them over the long run. A 'release' is only a brief instance in the life of any software, and you're only as good as your last release.

Software itself, is just a large set of instructions. These days, it is often a huge set of instructions, in fact trillions and trillions of them if you are looking at assembler. Most modern computers are happily working their way through millions of lines of instructions every second. The sheer size of our existing code bases and amount of code that is being executed is daunting.

Collecting together these instructions is hard, messy and prone to failure. There is much we can do to improve our accuracy, but I'll leave those thoughts for another day.

Once we know what to build, if we didn't know better, we might try to manually type each and every one of the instructions into explicitly the computer. We did it that way initially, but we've learned a lot since then. Or have we?

Even though we are no longer manually flipping switches or pounding out endless lines of assembler, we still commonly employ higher level brute force approaches as our primary means in building software. Generally, most programmers build the system by belting out, or copying each every line of code that needs to be executed. They'll use some 'theory' to reduce the redundant code, but it usually is not applied to great effect. Few code-bases are not endlessly-repeating chucks of nearly identical code, despite how we as an industry proclaim we are following principles like DRY (Don't Repeat Yourself -- The Pragmatic Programmers).

Even more disconcerting, on an industry level, is that we are madly adding in as much code as possible to handle all of our perceive problems. There is some type of implicit assumption that having more code will actually help. That if we just had 'enough' code, we could solve all of our user's problems.

The funny thing is that it is a waste of time. You can never win, the amount of brute force code you need is infinite and infinitely growing. E.g. the more crappy code you have, the more crappy code you need to monitor it. That's an exponentially escalating mess. It defines the building culture for 'many' of our current operating systems and major tools. Although we've found ways to add in instructions at a faster rate we have not fundamentally changed our approach to what we are doing. We're just pounding each and every instruction explicitly into the computer.

It might be OK, if it wasn't for the fact that code rusts. If we have code, we have to maintain it, and if it is rapidly growing out of control, that is means that sooner or later lots of that code is going to rust. We cannot support it all. Our approach is flawed.

At least there will always be work in programming. Unless there is a shift, companies will endlessly pound-out partial tools. And we will endless refactor those tools into arbitrarily inconsistent pieces. And people like virus writers will help, in producing counter-productive code that needs to be monitored and controlled. It really is quite endless with this approach. The more code you have, the more you need, the more work you have to do to keep it going. We are not evolving, we are just barely keeping up with the demand.


LANGUAGE EXPRESSION

There are huge debates over which programming language is best. This rather silly subjective argument has been going on with the entire life of software development. It is not that I don't think the choice of language is important. It can be a critical component in getting the tool successfully built. But inherently, all of the languages that we currently have -- suck. They suck to varying degrees, but they still suck.

We haven't found the right level of abstraction yet, that allows us to build our systems reliably and consistently. We're not even in the right corner of the solution space, so we are quibbling over an endless array of broken and incomplete languages. Do you care what the mnemonic is for incrementing a register in assembler? No, of course not, that discussion is long gone. Is a Pascal pointer better than a C one? That too is ancient history. So too will many of today's issues disappear.

The language we want is the one that makes its representation the closest possible to the way we think about the problem. The 'farther' we have to translate the answers, the more likely there will be errors. The 4am test is critical. Will I be able to sort this out at 4am or do I have to do some huge amount of mental gymnastics? Clear, straight-forward syntax and semantics that match the problem space are inherently necessary in minimizing the undesirable translations from the real world into the computer.

None of our current language paradigms really match the problem space for which we are building. Not objected oriented, nor functional programming, nor any of the older models. Users don't come to us asking for 'objects', nor do they come to us asking for functional closures. There is little in the technical language that maps back to most problem domains.

They come to us to build very specific tools to solve their data pile problems. They talk to us about data and they talk to use about their problems. These other 'things' are abstract technology concepts. We spend a massive amount of time and effort mapping the user based problems onto abstract 'technology' issues. That is a critical amount of our effort.

Not that 'abstraction' itself is bad. The foundation on which we leverage our work across many problem domains is abstraction, and it is here that we need to put more effort, not less. Abstractions are the answer to the brute force problem. Abstraction as a concept is great and truly important. It is just that most of our 'current' abstractions are not nearly as strong as we need; they could be more effective. So this leads to problems with 'expressing' our solutions in these underlying languages, technologies and abstractions.

Instead of foolishly defending our own favorite languages, we should really try to come together and see what works and what doesn't. But honestly, not with a bias in trying to show one language is way better than any other. That really doesn't matter. These days the deciding-factor isn't even the language itself, its the libraries and communities that matter most.

The problem we are trying to solve is relatively simple: we want to be able to build tools quickly and correctly. Never lose sight of that underlying problem. Being able to cut and paste a million lines of 'for' and 'if' statements into a barely-stable GUI that tortures its users is not the makings of a great programmer.

At some point we will find better, stronger abstractions that are closer to the way our users express their problems. When we have reduced that impedance mismatch, building a system will become trivial. Deciding what to build, however will never change. Understanding the structure of the 'data' will never change. It is only the way we instantiate our solutions that can be effected. That doesn't mean we won't necessary find larger super-tools that will be leverage-able to provide the underlying mechanics for huge swatches of problems. Given the current redundancy of most of our systems it isn't hard to guess that yes, there are still more than a few elegant solutions just waiting to be found. We've only covered a tiny segment of the capabilities of our machines.

That growth that we still need to accomplish, is the underlying essence behind my prediction for the future:

http://theprogrammersparadox.blogspot.com/2008/02/age-of-clarity.html

One day we understand the data we are collecting, and what it really means. Then we will be able to use this understanding to 'deterministically' improve ourselves and our societies. The problem that keeps us from getting to this point is not with our technologies, we just don't know how to use them properly yet. The key to solving this problem will absolutely be the computer; it was the single most significant invention of the 20th century, and it will drive huge social changes in the 21st. We are still vastly underrating the significant of these machines.


COMPLEXITY REVISITED

You can't get very far in software development without having to learn to deal with complexity. Software development management is complexity management. While there are lots of definitions for complexity, people seem to understand the essence of the concept, but they still have trouble with the mechanics.

If we pick a convenient way to break it down into underlying effects, it becomes easier to see where the problems arise. A simple clean definition is always the strongest starting point.

We can start with a few definitions: all 'business' domain have an inherent complexity. The business domain is the specific industry, problem, etc. for which you are writing the tool. Some generalized tools cover huge domains, but all tools, must always cover some domain. Unless of course the code is entirely random and pointless. Even a simple demo is aimed at a specific set of users.

Most developers understand this, and go about performing or acquiring a significant amount of analysis of the business domain on which to build their solutions for their common user problems.

What often seems to get missed however, is that the development, testing and deployment of the software itself is significant. The problem domain for any piece of software isn't just the business domain, it is all of the development domains as well. For example, if you write the perfect tool, but it is flaky, then it is not usable. Everything about the tool, including itself, is part of the tool's 'problem' domain.

In addition, for every technology used in providing the solution, there is an inherent underlying amount of complexity that comes from the technologies themselves. To write 'to' a specific operating system platform for example, you have to understand the strengths and weakness of it, or your solution will be volatile. Depending on any specific aspect of any technology is a mandatory risk for a project, but a manageable one.

So, for our development project, we have a very huge problem domain with its inherent complexity and a significant amount of technical complexity for each and every piece in the system. If you were brilliant and your underlying technologies were clean, and this and only this was the sum total of the complexities in your solution, it would essentially be perfect. However, for all of these complexities, the culture of software development has a extreme tendency to 'add' in way more complexity on top of all of this.

Beyond the inherent complexity in a system, every thing else is 'artificial'. It need not be there, but there is -- in practice -- generally a huge amount of it. It is entirely possible with enough effort to refactor any solution to completely remove, forever, all artificial complexity. That is true by definition. It serves no purpose other than to 'bulk' up the solution.

Frederick P. Brookes uses the term 'accidental' complexity, but I believe that includes both what I call artificial complexity and some of the technical complexity. This makes it a less than desirable term because there is nothing you can do about technical complexity, it is as much a part of the solution as the problem domain. Artificial complexity on the other hand is removable.

Also, accidental is a horrible word, although his intended definition is the centuries old version favoured by Aristotle. Oddly, I think that using archaic terms for modern things is in itself a form of artificial complexity, we like to make things sound special so we can be exclusive. Simple is better.

Artificial complexity for most development projects equals or exceeds the the other inherent complexities. If not directly, the underlying technologies contribute huge amounts of fancy dancing that need not be necessary in an ideal world to properly complete the solution. The actual amount of artificial complexity in the software industry is astoundingly vast these days, and growing at an exponential rate. It is so large and so pervasive that most developers don't even realize how much of the underlying infrastructure could actually be simplified to make their lives easier. Getting it, is a mind-blowing experience.

The funniest part about Computer Science is one's instinctive guess about software that might lead to the assumption that if a large group of people were working on the same code for year after year, it would be gradually approaching a system with a decreased amount of artificial complexity. As work progresses, the problem domain would grow larger, and the solution, overall should get simpler.

In reality, the longer these big teams work on their systems, the worse the artificial complex gets. In some cases, the entire terminology and development practices of some of these massive groups is so choked with artificial complexity that it probably represents up to 95% of their effort to discuss, push, rehash, extend or mess with their code on a regular basis. Artificial complexity breeds artificial complexity. Stay at it long enough, and most of what you have is just artificial complexity. Very little real stuff gets done underneath.

My favorite example of artificial complexity is a very visible one. There is but one operating system on the whole planet that distinguishes between binary and text files, and it is rumored that the cause of that distinction was a quick fix for a demo, many decades ago. The reason, so I was told, was that a specific hard drive needed to have the newline characters translated between the operating system and the disk. This, then became the reason for differentiating between text and binary files. Translate in one case, ignore in the other.

So this was some simple little problem that briefly reared its head on some early DOS system. Of course, the ripples from this are visible all over. NTFS in Windows still differentiates for no apparent reason. Protocols like FTP require specification of this parameter. No doubt it has worked its way into a countless number of interfaces, particularly any that want portability is DOS or Windows. Million and millions of lines of code have had to deal with keeping track of this for one file or another. Millions and millions of hours of time have been spent debugging problems related to this.

This little 'artificial' distinction -- completely unnecessary -- has had a significant impact on the world. In fact, I'm willing to bet that if we took all of the effort involved in this silly little problem in one way or another and converted into some other form of constructive effort, we'd have quite the funky-cool skyscraper by now. Possibly the largest one on the planet. If you think of how many soon-to-be programmers will eventually trip up on this issue one day, it is very depressing.

Even more disconcerting, is when you consider that along the various decades there were multiple periods were this issue could have, and should have been put to bed. Removed, refactored or cleaned up. But still it remains. And more importantly, its brothers and sisters and cousins -- wantonly little bits of artificial complexity -- all amount to more effort, than what we needed to solve the actual underlying problems for our users. I'll go out on a limb here, but it is a very thick one: the amount of artificial complexity in the software industry is larger than the amount of inherent complexity, but I'll leave that proposition for someone more knowledgeable and wiser to prove.


WHY EVEN SLICE AND DICE?

Modern day software developers are easily lost. While we may know what we want to accomplish, there is a dizzying array of technological and technique choices to be made before even sitting down to consider a design. With so many subjective arguments, and contradictory opinions, it is easy to get lost amongst the voices screaming at each other about the right way to build software.

That is why it is so critical, time and again, to go back to the basics and reexamine them. When the noise gets too loud, you have to ground yourself in what you know to be universally true.

We chop programs up into little pieces to make them easier to build. That is the only reasons why we should be doing it. If it isn't easier, then it is just artificial complexity. That being said, there are so many 'theories' for programming, many of which are great, but all of which are dangerous if you take them too seriously. Again, "we chop programs up into bits to make them easier to build."

That means, they are easier to read, easier to fix, easier to understand, etc. The attributes 'gained' by chopping up the code are all good things that help in the long run. It is not about typing, nor about the 'right' way to do it, nor anything else. Elegance comes solely from the ease in which we can manipulate the system. Clever, convoluted 'tricky' code is never elegant.

In Java, for example the whole idea of dumping things back into mindless sub-objects called 'beans' only to stitch them back to other fuller objects later, seems like an exercise in futility. For things to be readable, and elegant we want to bring all of the relevant code together, and we don't want to repeat it over and over. What are beans then, other than some semantic mess for non-object structures that aren't even convenient to use in the system.

Now given the above definition of elegance, you might be thinking that it is far easier to pound out a set of instructions, over and over again, then it is to apply some fancy abstraction to it, in an attempt to generalize the solution. That assertion might be true, if the time spend pounding out the instructions wasn't significant. However, given that the more 'brutal', the brute force, the more work that is required to get the job done. And not just a little more work, we are talking about massive amounts of work. Pounding out the code is hugely time consuming, and an infinitely loosing proposition.

A good abstraction, that opens up the problem domain and really solves a series of related problems is absolutely more work. But, in comparison, it is only marginally more effort, relative to the alternative. If, for example, you find a way to create twenty GUI screens with one block of code and changing the incoming parameters, it may have taken you twice as much time as writing one of the screen, but 1/10 as much time as writing all twenty. The more powerful the abstraction, the more leverage. The more leverage, the more time 'saved'.

When we generalize programs, we still need to chop them up to be easier to build. All of the same reasons for slicing and dicing some explicit set of instructions also occurs for slicing and dicing some generalized set of instructions. We build with an abstraction in mind to make it easier to understand the code. We build with a pattern in mind for the same reason. The abstraction and the pattern are irrelevant, except in regards to making the underlying code easier to understand. A pattern helps to slice and dice, but unless it is also an abstraction, it should not leave remnants of artificial complexity in the code. Naming objects after design patterns, for example, is misleading. The data in a system is 'that' data, its structure is the pattern. It should be named for what it is, not how it is structured. It is the same as calling your intermediate counter variable in a 'for' loop 'integer'.

Again and again, movements grow from programmers that seek easier ways of building software. That is to be expected, but it is also to be expected that many of these movements are not improvements. With that in mind, we should always fall back to first principles when examining a new movement. If it does not jive with the base problem we are trying to solve, then it is a poor solution. Also, you should never buy the counter-argument that you have to try something to know if it is good or bad. Our ability to think through a problem is tremendous, and our ability to ignore the truth is also tremendous. Just because you find a technique fun, doesn't make it a good idea.


I'VE HAD TOO MUCH JAVA TODAY

Like all programmers I often have a bias towards whatever technology I am currently working with. The more you dig in and understand something, the easier it becomes. Oddly even though I've been working heavily in Java recently, I don't find the language very appealing.

Languages range from being very flexibly and accommodating, to being stiff and fragile. COBOL was the original stiff board. While it did an excellent job of making screens to capture data, it was just painfully boring to work with. The various incantations of Visual Basic were another example of stiff. The language forces it practitioners to pound out brute force code because of the weak semantics of the language. I've never seen elegant VB, and I would be surprised if I ever did.

Java has some of that stiffness. The original plan was to not provide too much rope to allow the programmers to hang themselves. Languages like C are incredibly flexible, but for most programmers that flexibility is dangerous. They don't use it wisely, so their code gets unstable. Language designers don't want to stifle expression, but they can stop the their users from creating some types of programs. Stiffness is good and bad.

The biggest problem with Java isn't the language itself, it is the culture that grew out of the language after it matured. The underlying libraries in Java are just damned awkward, and that translated into the coding practices being dammed awkward. Things like beans and struts and just about every library I've ever seem is so over-engineered, inconsistent, and messy. Stuffing in fifty million unrelated Design Patterns became vogue, which fills the system with a tremendous amount of artificial complexity. So much, that working the whole environment on a messy operating system like Windows with one of the more modern icky IDEs, the collection of stuff is as arbitrary and convoluted as working on a main frame or AS400s (iSeries). It is one giant arbitrary inconsistent mess. When I was younger, I remember not wanting to work on mainframes because the technology was arbitrary and ugly. Now it has found me on the PC. Java has become the new COBOL and Windows has become the new mainframe. The cycle of life perhaps?

There is much interest in adding new features like closures to the language. I'll dispense a little advice: Hey guys, the problem isn't that the language is missing stuff. The problem is that the libraries are damned awkward. Fix the problem. Re-release a Java2 with a decent set of clean and normalized libraries that don't suck. Cut back on the stupid functionality and focus on getting really really simple abstractions. You know you are close, when the examples for simple things aren't hideously large. Make simple things simple. Look to the primitive libraries for other languages like Perl and C. Get a philosophy other than 'over-complicate-the-hell-out-of-it', and then clean up the implementation. And, ok make a couple of language changes, like making strings easy to use (remove StringBuffer), get rid of arrays and primitive types, and find a nicer syntax for callbacks. But please, whatever you do, don't adopt the C# strategy of just dumping more and more crap into the pot until you can't see the bottom, that's a guaranteed recipe for 'unstable'.

Despite my misgivings for Java, I'll probably be using it for a while still. At least until we can convince someone to bankroll a serious effort into finding new and better ways of really building systems (if you've got big money, I've got big ideas :-) But it doesn't seem as if the focus right now is on moving forward. We're just too busy drowning in our own artificial complexity to even consider that we shouldn't be. Besides, bugs are big business.


COMING BACK TO THE IMPORTANT OF LANGUAGES

The various sections in this post fit together like a meal of appetizers, because of how they relate to the way we express our solutions for our users. Our primary Computer language and its libraries may be the heart of our implementations but what we really do is translate the perceived needs of our users into long and complex sequences of instructions. By the time you include development, testing, packaging and distribution a commercial software project will often involve the coordination of many full and partial computer languages. A web-based application may involve over a dozen. These different forms of expressing the solution fit some problems easily, but most require more effort. It is in this expression, that we so easily goastray.

I see each and every language as a series of good and bad attributes, for which I would really like to collect the good together and discard the bad. If you've read my earlier posting on primitives, you probably understand why that is not a great idea, but as an approach towards enhancing development it is good direction to start. We need a new language that more closely matches the way we express our problems to each other. We don't necessarily need a fancy 'natural' language system, but we should focus on reducing the amount of translation that is happening between analysis and implementation.

Our latest technologies are extremely complex. Much of it is accumulated artificial complexity, that could be removed if we have the nerve to refactor our solutions. When we find the right underlying abstractions, they will create a consistent layer on which we can easily express super-complicated problems. This combined with a more natural representation, should give us a huge leap in technical sophistication. It is always worth noting that a computer is an incredibly powerful mind-machine, and that our current level of software development is an extremely disappointingly crude attempt at utilizing it.

It should be easier to express our "understanding from the users" into specific tools. There is no real reason why I need to write, over and over again with the same basic solutions to the same basic technical problems. My real problem is the nature and structure of the data, not what type of list structure is returned from some bizarre internal call. We get so caught up in the fantasy of our brilliance in pounding out little solutions to the same common problems, that we forget about the big picture, the real problem we are trying to solve. The users want to manipulate a pile of data. We need to build specific tools to accomplish this. That underlying consistency threads all types of programming for all types of industries together into one giant related effort. We are building ever increasing piles of data in the same way that ancient Egyptians were building ever increasingly large pyramids. It is just that it is hard to physically see our efforts, although the web does allow tourists to visit our piles.

A key source of our problems is our underlying technologies, particularly our languages. We need to fix or refactor these, if we want to make building things easier. However, if things are too easy, programmers may actually refuse to use the technologies, because they take away too much of the fun. Oddly, one can easily suspect that Frederick P. Brookes is correct about there not being a silver bullet, not because it is impossible, but because people wouldn't use it, even if it existed. Humanity -- in that regard -- is a strange crowd.