The Programmer's Paradox

Sunday, February 16, 2014

Principles

This post http://tekkie.wordpress.com/2014/02/06/identifying-what-im-doing/ by Mark Miller really got me thinking about principles. I love the video he inserted by Bret Victor at http://vimeo.com/36579366 and while the coding examples in it were great, the broader theme of finding a set of principles really resonated with me. I've always been driven to keep building larger, more sophisticated systems, but I wasn't really trying to distill my many objectives into concrete terms. Each new system just needed to be better than the last one (which becomes increasingly hard very quickly).

Framing my objectives as a set of principles however sets an overall theme for my past products, and makes it easier to be honest about their true successes and failures.

As for principles, I no doubt have many, but two in particular drive me the hardest. One for the front end and another for what lies behind the curtains. I'll start with the latter since to me it really lays the foundations for the development as a whole.

Software is slow to write, it is expensive and it is increadibly time consuming. You can obviously take a lot of short-cuts to get around this, but the usefulness of software degrades rapidly when you do, often to the point of negating the benefits of the work itself. As such, if you are going to spend anytime building software you ought to do it well enough that it eventually pays for itself. In most instances this payoff doesn't come by just deploying some code to solve a single problem. There are too many development, operational and support costs to make this an effective strategy. It's for exactly this reason that we have common code like operating systems, libraries, frameworks, etc. But these peices are only applied to the technical aspects of the development, what about the domain elements? They are often way more complex and more expensive. What about the configuration and integration?

My backend priciple then is really simple: any and all work done should be as leveraged as much as possible. If you do the work for one instance of a problem then you should be able to leverage that effort for a whole lot of similar problems. As many as possible. For code this means 'abstraction', 'generalization' and eventually 'reuse'. At an organizational level this means some architectural structure that constrains the disorganization. At the documentation level this means that you minimize time and maximize readership.

Everything, at every level, should be designed and constructed to get the upmost leverage out of the initial effort. Every problem solved needs to viewed in a much larger context to allow for people to spot similar problems elsewhere.

Naysayers will invoke the specter of over-engineering as their excuse to narrow down the context to the absolute smallest possible, but keep in mind that it is only over-engineering if you never actually apply the leverage. If you manage to reuse the effort, the payoff is immediate and if you reuse it multiple times the payoff is huge. This does mean that someone must grok the big picture and see the future direction but there are people out there with this skill. It's always hard for people who can't see big pictures to know if someone else really does or not but that 'directional' problem is more about putting the wrong people in charge than it is about the validity of this principle. If the 'visionary' lacks vision than nothing will save the effort it is just doomed.

When a project has followed this principle it is often slower out of the gate than a pure hackfest. The idea is to keep building up sets of larger and larger lego blocks. Each iteration creates bigger peices out of the smaller ones which allows for tackling larger and larger problems. Time no longer is the enemy, as there are more tools available to tackle a shrinking set of issues. At some point the payoff kicks in and the project's capabities actually get faster, not slower. Leverage, when applied correctly, can create tools well beyond what brute force can imagine. Applied at all levels, it frees up the resources to push the boundaries rather than to be stuck in a tar pit of self-contructed complexity.

My principle for the front end is also equally effective. Crafting software interactions for people, whether it be command line, a GUI or a NUI, is always slow and messy work. It is easily the most time-consuming and bug prone part of any system. It is expensive to test and any mistakes can cost significant resources in managing the debugging, support, training and documentation. A GUI gone bad can suck a massive whole into a development project.

But an interface is just a way of hanging lots of entry-points of functionality for the users to access. There is a relative context to save the users time from having to respecify stuff and there is often some navigational component to help them get quickly from one peice of functionality to another, but that's it. The rest is just litterally window dressing to make it all look pretty.

So if you are going to build a GUI, why would you decompose everything into a billion little peices and then start designing the screens from a bottom up perpective? That would only insure that there was extra effort in making endless screens with nearly the same bits displayed in a redundant manner. You can't design from the bottom up, but rather it must be from the top down. You need to look at what the users are really doing, how it varies and then minimize that into the smallest, tightest number of entry-points that they need. An interface built this way is small. It is compact. It contains fewer screens, less work and less code. It takes the users quickly to what they need and then gets them back to a common point again in as little of effort as possible. It's less work and they like it better.

A system with hundreds of scattered screens and menus is almost by definition a bad system, since it fails due to its size to be cohesive; to be usable. Functionality is useless if you can't find it. Sure it is easier to write, you don't have to agonize over the design, but that lack of thought comes with a heavy price tag.

Programmers build GUIs from the bottom up because they've been told to build the rest of the code from the bottom up. But for an interface this is backwards. To be effective, the interface has to be optimized for the user, and of course this will make the programmer's job far more difficult, but so what? Good coding is never easy, so forcing it to be that way simply dumps the problems back onto the people we are trying to help. The system should be easy to use even if that means the code is harder to write. And if the work is hard, but relatively redundant than that is precisely what the first principle is for. The difficult bits should be collected together and encapsulated so that it can be leveraged across the entire system. So for example, If the coders spent extra time generalizing a consistent paging mechanism for a screen, then that same code should be applied to all screens that need paging. Ten quick, but flakey paging implementations is ultimately more expensive and very annoying for the users.

It's hard to put a simple name to this second principle, but it could be characterized by stating that any people/machine interfaces need to be designed in a top-down manner to insure that they are optimized for the convience of people rather than for the convience of the construction. If people are going to benefit from software then they have to be the higest priority for its design. If money or time is a problem, less stuff should be delivered, but the priority must be people first.

Both principles echo strongly in my past works. Neither is really popular within the software development communities right now, although both frequently get lip service. People say they'll do both, but it is rare iin actuallity. Early agile, for instance, strongly focused on the end users but that gradually devolved into the stakeholders (management) and generally got pushed aside for the general gameification of the development process. These days it is considered far better to sprint through a million little puzzles, tossing out the results irratically, then it is to insure that the work as a whole is consistent and cohesive. Understanding the larger context is chucked outside the development process onto people who are probably unaware what that really means or why it is vital. This is all part of a larger trend where people have lost touch with what's important. We build software to help people, making it cheap or fun or any other tasty goal is just not important if the end product sucks.

Saturday, January 25, 2014

Rights of the Modern Age

Our world just keeps on changing and with all of these changes we must keep on updating and asserting our basic human rights. To this end I suggest a couple of new rights that I think we all possess:

We own the intellectual property rights to all our interactions with the world. That is, if we buy something from a store, we own and control any data generated from that interaction which includes us as individuals. The store can own any data that shows they sold a bunch of stuff to a collection of anonymous people, but if that data singles anyone out, in any way or shape, then that person owns it. That right not only includes stores but governments and healthcare as well, in fact any and all interactions that we have as we creatively engage with the world around us. If it is specifically about a person then clearly they should own the rights to it and it can't be used or even collected without their explicit consent.

Given that science seeks to enlighten us with their research, we own the right to not accept anything they say unless they also present the raw, unedited data that they gathered to back up their analysis. I don't want to see or hear about any work unless the process to compile it was completely transparent. If the data really shows what their analysis claims it shows, then they will have no problems releasing two things: the paper explaining the analyse and any data (including unused data) that was gathered to investigate the claim. Given the increasing sophistication of mathematical approaches for extracting conclusions from data, any claim presented without data should be considered untrustworthy and quite possibly propaganda designed to obscure rather than the clarify the underlying truth. Papers without data should not be considered 'scientific works'. Science is about discovering the truth, not about making one's career.

We all own the right to be different. We are all unique and should value this. Diversification is a key strength of our species so we shouldn't be alike, think alike or follow blindly. Any person, organization or process that is attempting to 'clean up our differences' is not acting in our best interests. They are violating our fundamentals rights to be different and to remain that way forever. A homogeneous world is just one sad shade of grey; we know this and all need to incorporate it into our philosophies of getting along together. Different is good, even if it can be annoying at times.

That's it for now but I'm sure as the next wave of madness hits us I'll figure out some other basic tenets of existence.

Monday, January 13, 2014

Controlling Complexity

"Make everything as simple as possible, but not simpler."

Albert Einstein

Within a context every object or process has a given amount of complexity. As Einstein said there is a base level of complexity that cannot be circumvented, but there are at least two types of complexity: inherent and artificial. There are many other names for these and many other ways to decompose complexity into subparts, but this simple breakdown clarifies a simple property of complexity, that is that under specific circumstances many complex things can be made simpler.

Simplification can occur for many reasons, but most commonly it is from removing artificial complexity. That is, the complexity that is piled on top for reasons like misunderstandings, short-cuts, disorganization, self interest and lack of understanding. Note that all of these are directly attributable to human intelligence, and with that we quite easy define 'inherent' complexity as the lower limit that is bounded by our physical world in the sense that Einstein really meant in his quote. Also note that I started the first sentence referring to context. By this I actually mean a combination of spacial and temporal context. Thus, things can get simpler because we have learned more about them over time or because we are choosing to tighten the boundaries of the problem down to avoid issues within the larger context. The latter however can be problematic if done under the wrong conditions.

For reducing complexity there is also the possibility of simplification by encapsulation, that is some part of the whole is hidden within a black box. The context within the box is obviously simpler, but the box itself adds something to the larger complexity. This works to some degree, but it can only be piled so high before it itself becomes too complex.

Often people attempt to simplify by reducing context, essentially "wearing blinders", but they don't follow through with the encapsulation. In that case, it is extremely unlikely that any underlying changes will actually simply things, instead they spawn off unexpected side effects which themselves are just added artificial complexity. This often goes by the name 'over simplifying' but it's a misnomer in that while the change within the context may be describable as a 'simplification' it isn't really.

Within this description we can also add abstraction as a means of simplifying stuff. In general it's really just a larger pattern or relationship manifested over a larger space of objects or processes, but it's ability to help comes from the fact that it organizes things underneath. Organization and sometimes categorization relate similar things together by properties, so exploiting these relations reduces the complexity of dealing with the individual parts. Abstraction thought has it limits in that it acts much like a bell curve. Some abstraction reduces complexity, increasing to a maximum point, then falling off again because the abstractions are too general to be applied for organization. Still a powerful abstraction, at a maximal point, can cut complexity by orders of magnitude, which is way more powerful than any other technique for controlling complexity. It's not free however in that considerable fewer people can deal with or understand strong abstractions. That leaves them subject to being misunderstood and thus becoming a generator of artificial complexity.

There are many ways to reduce or control complexity, but there are many more ways for people to introduce artificial complexity. It's this imbalance that is driving our modern age to the brink of serious trouble. So often people cry "simplification" while actually making things worse, and it isn't helped by living in an age where ability to spin the facts is valued far more than the ability to get things done well. Quantity and hype constantly trump quality and achievement.

Thursday, January 2, 2014

The Quality of Code

One of the trickier issues in programming is whether or not a program is well-written. Personally I believe that the overall quality of the software is heavily affected by its internal quality. This is because most actively used software is in continuous development, so there is always more that can be done to improve it, the project never really stops. To keep this momentum on a reasonable track the underlying code needs to be both readable and extendable. These form the two foundations for code quality.

Readability is simple in its essence, but notoriously difficult to achieve in practice. Mostly this is because programming languages support a huge variance in style. Two different programmers can use the same language in very different ways and still get good results. This capacity opens the door to each programmer coding in their own unique style, an indirect way of signing their own work. Unfortunately a code base made up of 4 different styles is by definition four times harder to read. You have to keep adjusting your understanding when switching between the different sections in different styles. Getting multiple programmers to align on nearly identical styles is incredibly hard because they don't like having any constraints, there are deadlines and most programmers won't read other programmer's code. Style issues should really be set up before any development begins and any new programmers should learn and follow the stylistic rules already laid down. When that happens well enough, the code quality increases.

To get around reading other's code many programmers will attempt to extend existing code by doing what Tracy Kidder described in her book "the Soul of a New Machine" as just attaching a bag on the side. Essentially instead of extending, refactoring or integrating they just write some external clump of code and try to glue it to the side of the existing system. This results in there affectively being two different ways of handling the same underlying mechanics, again doubling any new work to extend the system. Done enough, this degenerates the architecture into a hopeless 'ball of mud' eventually killing any ability to extend the system further. Many programmers justify this by stating that it is faster, but that speed comes at the cost of gradually stopping any further extensions.

Both multiple styles and bad extensions are very obvious if you read through the code. In this way if you read a lot of code, it is fairly obvious if the system is well-written or not. If its fairly consistent and the mechanics of the system are all encapsulated together, its probably not going to be hard to read it and then extend its functionality. If on the other hard it looks like it was tossed together by a bunch of competing programmers with little structure or organization then making any changes is probably long, painful and will require boat loads of testing to validate them. Given lost of experience with different systems, experienced programmers can often just loosely rank a code base on a scale of 1 to 10, with the obvious caveat that any ranking from a programmer who hates reading other's code will obviously be erratic.

An important side effect of achieving good quality is that although the project starts slower, it maintains a consistent pace of development throughout it's lifetime, instead of slowing down over time. This opens a door to keeping a metric on long term development that mirrors the underlying quality. If the amount of code getting in the final production state is rapidly decreasing, on of the causes is declining quality (there are several other causes to consider as well).

Monday, October 14, 2013

Out of Control

The complexity of modern software systems has gotten out of control. Decades back, computers served very well-defined roles. They computed specific values, or they kept track of manually entered data. It was all very constrained. Nice and simple.

These days, technical ability has embedded itself deeply into many industries; they need their computers to remain competitive. And these machines need mass amounts of data just to keep up. Everything is now interconnected, churning around in an information haze.

But the silos that enabled the earlier systems are now the impediments to utilizing all of this collected data. Above this lies layers of spaghetti so intertwined by history that there is no hope of sorting through the whole hideous clump of knots. All this is serviced by increasingly stressed operations departments just trying to stay afloat of the shifting technologies, security issues, devices, weird processes and out-of control user expectations. Most of these groups, just one small step away from catastrophe.

Modern systems are intrinsically complex, but their rough evolution has hugely amplified the problems. Underneath software is fairly simple. It rigorously models attributes of the world, grinding through the data to provide some insight into behaviour. We have enough knowledge now to really understand how to organize the data, and construct the code to achieve our goals. However, history, politics and a general lack of understanding inflate the issues forcing what could have been a straightforward engineering effort into a swirling cloud of chaos whose results are most often disappointing. People rush to build these systems, skipping over what is known in order to bend to the pressure of getting things out too early. These are of course self-inflicted injuries. Weak code, badly organized data, lack of standards, and no empathy for users or operations results into piles of unstable code fragments that behave badly when exposed to the real world. Compound this with an obsessive need to restart from scratch continually or to just blindly build on something else, and what results are sprints that prematurely burn out then come crashing down.

There are no short-cuts in software development. There are trade-offs, but there is no easy way to bypass the consequences of a long series of poor decisions. To get things working, people have to work through all of the problems diligently and to great detail which is often a slow and painful process. You can’t skip it, defer it until later or rely on luck. For code to not suck, it has to be well thought out, there is no way around this since code is essentially a manifestation of the the underlying knowledge of the programmers involved. If they don’t understand what they are writing, then the code reflects that. And if above them the environment is not organized, then the system and the data reflect that as well. In that sense a system is just a mirror of where it was created and where it is running. It is only as stable and reliable as it’s environment.

The irony of software development is that lots of experience makes one understand how easy it could be, yet exposes them to the full ugliness of how it usually is done. For programmers, once you can see above the code, the silos, the technologies, etc. you can imagine the possibilities of building way more sophisticated and usable systems, things that would radically change the value to the users, but now you can also now see the environmental, organizational and chronological roadblocks that will often prevent you from achieving this. Software could be better, but its rare.