The Programmer's Paradox: June 2010

Sunday, June 27, 2010

Paragraphs and Functions

The list of software instructions you already know, it is breaking them up into manageable pieces that is the problem. What is the best way to accomplish this?

Chopping up code into bite-sized, tasty morsels remains one of the most difficult tasks in programming. Too small, and the intent is dispersed to widely, hidden in the nooks and crannies of the system. Too large, and the bulk is intimidating, difficult to understand or verify.

A useful trick is to see the base unit of code -- a function, procedure or method -- as being equivalent to the usage of paragraphs in writing.

Writing is broken up into paragraphs to make it easier to read. Each paragraph contains the sentences that relate together. Sentences that share some common idea. Short paragraphs are OK, but too many are distracting. Long paragraphs are less desirable, the reader loses focus. A good writer balances their ideas, using the paragraphs as transitioning points to move from one idea to the next. To keep the work flowing.

In that regard, writing does share some commoniality with the programmer's desire to produce 'readable' work. That is, the point of breaking things up into smaller functions is nearly identical to the point of using paragraphs. They both clump together the related ideas.

One notable difference is that writing remains linear, while functions are also hierarchical. That's actually a good thing because it gives the programmer more flexibility when grouping their code. Some sets of instructions are obvious subsets of a larger overall plan. A hierarchy can maintain that relationship.

Getting the pieces right makes the code more readable. It makes it easy for other programmers to follow along and enhance the code later. Getting it right also makes it easier to reuse the code in many other places. Leveraging existing work is the most effective way to avoid a time-crunch.

A readable program is a platform for further work, while a non-readable one is generally tossed and re-written from scratch. Getting the code to work is nice, but having it last as the working solution is far superior. Why build things over and over again? If you do it right the first time, it can be built upon. It can provide a solid base for future effort.

Tuesday, June 15, 2010

Feedback

The best part about blogging is in how it forces you to reflect on what you know.

It is easy to have an opinion, but until you write it down, release it and see what happens it is only wishful thinking. In trying to communicate your ideas, you quickly realize how fragile they are, how often they sound better inside of your head then they do outside.

In that sense, feedback is the key. I have no real idea whether the stuff I am writing is reasonable, madness, rambling or idiotic. I do my best to weed out the weaker stuff, but what I am often talking about is in a very raw state. Unless it's something that I've been harping on for a while, the ideas only get shaped as they turn into sentences. Although I write a lot from personal experience, trying to generalize that or examine it in a global context can be difficult. Just because I can easily do something, doesn't mean that I understand it, or can teach it to others.

In periods like now, when there has been little feedback I always get worried that I've somehow disconnected from my readers. That I'm no longer writing relevant or interesting things. I realize that I'm not a particularly great writer (or editor), but I do believe that communication is the only way we can collectively grow. It's better, I think, to write poorly about something, then to not write at all. If we're not talking about it, then we're probably making way too many assumptions about it, and we certainly aren't growing our knowledge.

If you're out there, I'd love to hear what you think. What you like and/or dislike. Feedback would be awesome. Also, if you have any questions, I'd love to answer them. I could write up my thoughts as a full post, or for short answers I could post them in the comments, or perhaps combine several together as a post.

Thanks for reading this blog.

User Poll:

Sunday, June 13, 2010

Truth and its Consequences

History is a beast. In our ancient past, what people believed to be true often turned out to be incorrect or only partially right. Over the last three thousand years or so, our view of our world has been radically altered. The shape of our world, the solar system, zero, negative numbers, chemistry, biology, genes, DNA, determinism, computers, etc.

We've undergone a massive change in our knowledge, one that has been gradually shaping how we see the world around us. Although one doubts that we've really 'evolved' in such a short time, we've definitely civilized ourselves. Truth, it seems, gradually has an effect on us, as it sinks in generation after generation. It takes a while -- knowledge is sticky, and people are resistant -- but as we know more, and more people come and go, our thoughts and actions gradually get more sophisticated.

Still, we've reached that point were we are surrounded by so much knowledge, and so much of it is of dubious quality, that we seem to be lost in what we know. Unable to assimilate or deal with it. Once we made it so easily available in large qualities, the side-effect appears to be a form of snow-blindness.

What the world really needs now is a truth-telling machine. One that when given a statement will determine whether or not the statement is true. And not, as one might expect, true in the present context, but instead 'universally true'. That is, true for all time, and for all places. This type of invention would allow us to once again size up what we know, show the falsehoods for what they are, and then collectively move on to a better place. It's a great idea. Crazy. But still great.

In a universal sense, while there can be an infinite number of incorrect answers there can only be one correct one. That perspective, however, is far too black and white for our universe, and how we live in it. It makes more sense to see some of the false answers, as being tied to a specific context. That is, "the sun always rises in the east" is a valid statement for the contextual period in which both the planet, and the sun still exist. In a universal sense, some day that statement won't be true, either the east won't exist, the sun won't exist or the relationship between the two planets will have been altered. The statement then, is not a universal truth, it is just contextually true today, as far as we know.

Most people however, don't want to deal with the world around them and their knowledge on a universal/geological scale. Most people, it seems would prefer that their own 'universal' truths be limited to their lifetimes, or even shorter periods. That, however is most likely the type of tunnel vision that has continued to plague our understandings. People dig in, and don't want to know the truth, or at least more of it, so the old 'truths' linger like bad odors, until the last of a generation die off. Given all of the generational overlaps, collectively what we know, and what we believe is a mis-mash of all of these overlapping incorrect versions of our world. To fix things in a way that allows us to move farther ahead, faster, we have to fix these stagnation problems first.

The problem with universal truth is that we can't possibly know what that it is. We start out in a context, and are trapped by it. Only in hindsight can we see that something isn't as we believed, and that's only if we willing to accept that it might be wrong.

Given all of the major discoveries and turmoil in history that have gone on, we can see that we generally grow our understandings in leaps and bounds. There are clearly times were generations stagnate or go backwards, and then there are times full of great discovery and change. Our societies ebb and flow, as the different organizational or cultural issues push or suppress us into examining our world.

And within our growth, everything we know is dependent on a huge underlying foundation. Each fact sits on a sea of other facts. We can only build new upper layers when we've managed to get discover enough of the lower ones. Knowledge, it seems, is the ultimate house of cards. That relationship, and the fact that there are always missing supporting cards, defines the frailness of what we know. There will always be assumptions, so there will rarely beuniversal truth.

Even if we can't have universal truth, one of our more recent discoveries, computers, do give us an unprecedented opportunity. Computers are stupid, but they beat elephants for having a long memory. That, and they are never bored, so you can just set them up to do something, seemingly for ever. Memory and patience open up the ability for us to do something fascinating. While we can't determine the universality of what we know, we can however use computers to chart out the degrees of freedom in what we say. That is, we can take all of our statements, one by one, and examine how and why they might very.

A really sophisticated program would be nice, but in truth something far cruder might still reach the same basic level of understanding. We'd just like to know for any very large collection of statements where the degrees of freedom lie. So, for my example "the sun always rises in the east", it would be nice to know that the sun is a planet. That we are assumed to live on another planet, and that east is a relative pointer based on the relationship between the two planets, and that planets have a finite lifespan. That is, for the very simple knowledge in that statement, what are all of the related things on which that statement depends?

We can't know if something is universally true, but we can figure out absolutely all of the places on which the statement is dependent. The truthfulness of a statement isn't a black and white variable, but rather a complex graph of underlying relationships on which the overall quality of a statement is formed. It's the structure of those relationships that ultimately sets the data quality. If most of the underlying dependencies are true, then the statement can been seen as mostly correct (and still not be the truth, for example "political spin").

In that sense, if we had a huge database of a massive number of relationships, we could add in a new statement and quickly see how likely that the quality of that statement might vary. We could test any new knowledge statement against the sum total of what's already there. We could easily see where the weakness and assumptions lie.

As miraculous as this sounds, domain experts do this all of the time. They can hear a statement and based on their internal understandings, come first to an intuitive belief that the statement is unlikely to be true. It may require further digging to confirm it, but human beings are quite capable of seeing how something doesn't quite fit an overall pattern. We can match what we hear up against what we know. We've effective pattern matchers. And it's not just the experts.

For whatever reason we've been inundated the last few decades with a huge amount of scientific research. The newspapers and TV are forever announcing a vast array of new discoveries. Most of these discoveries are probably fine, but the media has a tendency to want to announce the more exotic or dramatic sounding results. Because of this, it is inevitable that some less scrupulous researchers will bend their results to say things entirely for the point of achieving notoriety. We expect this, and we see it often.

Sometimes scientific discoveries are counter-intuitive, that does happen. However, we see far too many 'surprising' results, many of which get huge media coverage but then just disappear into the shadows. Mostly these examples of 'bad science' are obvious if you spend some time contemplating them. We're able to see how unlikely they are, even if we're not experts. They just don't fit. There may be a few false negatives, but mostly if its covered in the media, and it sounds off, it probably is.

Understanding our current knowledge is good, but our biggest problems come from where we are headed. Even if we know what we know, and we know that it is 'true enough', we still can't correctly analyse statements about what might possibly happen in the future. But it is these questions which cause us the most anxiety.

People are forever trying to change the world, and some of these changes are downright stupid and dangerous. Too see this in advance, we'd have to build and examine sophisticated models of our circumstances, then apply the changes. Intrinsically these models would have some likelihood of being incorrect when probed or extended. Nothing short of a full simulation of our existing universe would ever be complex enough to be entirely synchronised with our would.

Still given these limitations, the models only need to be accurate enough to given us an indication of the success or failure of the changes. And it is far easier to build such a reasonable model if we fully understand the variability underneath. The mechanics of the model match the underlying degrees of freedom. That is, we can only model what we understand, and we can only simulate it along the lines of the were we think it might change. But if we are working from a full structural view of the model's statement dependencies, then at least we know the fullest extent to which the model may vary. We know its domain.

Another thing we would get from these structures was a sense of the missing cards. There is always some intrinsic symmetry and pattern occurring because of the nature of our physical world. It may be ultra-complex like fractals, but we're still able to pick up on the patterns in nature where and when they occur. In examining the relationships, there will no doubt be places where the violation of a pattern or some lack of symmetry is noticeably obvious. Places that have been overlooked or not fully examined. Bringing these to the forefront will provide direction for researchers to investigate morethoroughly . We pretty much rely on inspiration for this now, but this would insure that what we know gets more fully well-rounded, that the small ignored gaps get noticed and filled. These holes may seem insignificant butgenerally they propagate upwards and distort the upper layers. Ultimately this slows down progression until we're lucky enough to have some genius come along and re-align the basis correctly. And that just takes too much time.

We have a fantastical amount of knowledge, but we really don't understand it well. It is openly available and has inundated our lives, but hasn't really improved them. Given that we really can't trust a lot of what we hear, or what we read, something, anything, that would give us a sense of the underlying truthfulness would be a huge boon to our society. It would allow us to contain the explosion of information, and turn it into something useful. Dependable. It would allow us to examine the frailness of our understandings. With so much information, and so much of it wrong, all we are going to do is make our population more resistant to change. We're foolishly forcing ourselves to have to ignore what is going on around us. To ignore what we know. We're not utilizing our own collective intelligence.

Sunday, June 6, 2010

The Different Phases of Development

Software construction is always a long running project. Most software stays in active development for years, some for decades. As the underlying infrastructure is always changing, software must at very minimum make changes to avoiding rusting. Static software quickly becomes unsupportable as all of the underlying dependencies -- like the OS, the database and any libraries -- progress and drop support for the old versions. Generally, along with this dependency maintenance, the users are pushing for some new features or fixes to old or broken ones. As well, the technological expectations are shifting too. That is, the technologies and user expectations change over time. Thin clients, fancier GUIs, and more interactive interactions.

Thus, the normal life-span for software is as a long-running development project. In order to show progress and give the users something to use, the development usually takes place in a number of iterations, some lasting only a short time, some much longer. Older development methodologies preferred long running iterations, such as a year or more, while the newer ones can be as short as a month. Most experienced developers would agree that the length of the iteration is directly proportional to the risk of project failure. That is, longer running iterations have a greater probability of failure. They have more ways and opportunities to get off track, and they are much harder to schedule accurately.

But, like any trade-off, shorter iterations are more costly. Done correctly, they include both rounds of design and testing, which have intrinsic setup/take-down costs. Splitting the programming work in half, for instance takes more than twice the time. That relationship is always true for any type of effort where there is a built-in context switch. Multi-tasking always requires more individual effort. Iteration length is a risk vs. effort trade-off.

Within an iteration, we can break up the development into a number of different phases:

design/experimentation
initial development
mid-term blues
final integration
testing/release

Essentially these phases exist for any project, although sometimes they are skipped, trivialized, or combined together; most often in projects that are destined for failure.

DESIGN/EXPERIMENTATION

Anything non-trivial needs a design. Some large projects need it to be explicit and clearly defined or they will end up building unnecessary or unusable parts of the system. Smaller projects may get away with an informal verbal direction, particularly if the software is small, very simple or a well-known design.

A lack of design always results in a 'big ball of mud' architecture. These types of non-architectures have a very low threshold of complexity, which when exceeded, usually drives the project into a dangerous death march. Architecture, particularly a well-defined structure, is responsible for encapsulating the overall complexity of the system into different components. Without this type of structure, the complexity of the system will rapidly exceed the developer's ability to understand it. When that happens, any fixes or enhancements are equally likely to break something as they are to add in new features. The code base becomes unstable.

With a decedent architecture, the system is decomposed into layers of nearly independent parts, in a way that allows the developers to only focus on one part or one layer at a time. While the architecture adds some complexity, it removes the necessity to understand the whole system all at once, in order to fix or modify it. It makes the code manageable.

To get to a coherent design, developers need to understand the strengths and weaknesses of their underlying technologies. A common problem is to believe the available marketing material description of the underlying functionality, only to find out in the middle of development that not all of the functions work as expected. Real experience with utilizing the technology for a specific purpose is the only way to avoid this problem, but as the technologies change rapidly most developers are using them in new ways. Small prototypes for core behaviors always saves time and prevents embarrassing slip-ups.

Experiments, prototyping and design are the foundations for smooth development projects, but the most common industry practice is to do these poorly or bypass them altogether. Short cutting this phase frequently leads to disaster, although many developers prefer to place the blame on tight management schedules. However, panic to get started, is not the same as scope creep. Development projects ending in death marches, are usually caused by poor planning, and that is often dominated by poor design.

While too little design is bad, too much can be a big waste of effort. Software is complex to assemble, so it makes sense to only fully assemble it once, in its most appropriate final form. Explicitly re-creating it fully in some design specification format is just a 'second way' of specifying the system. That level of detail is unnecessary. The design need only formulate the lines of encapsulation, the key behaviors (if not obvious) and the division of responsibilities between teams/developers. Beyond that, any extra effort is wasted effort. Projects frequently explode because they've run way over the time expectations, so efficient utilization of time is critical.

INITIAL DEVELOPMENT

The best part of programming is the initial part. Generally progress is fast, the code is light and the work is fun. It can be very exhilarating to show off the new behaviors of the code. Praise is constant, everyone is excited and happy.

However, this is a delusion. One that doesn't last, as the development gets into the other phases. Still many developers think this is the way the entire project should go, so they often run into morale problems in the later, harder parts of development.

Besides the unrealistic expectations of the developers, another problem comes from hacking the code too fast. Often, in this initial phase, the excitement leads the coders to cut little corners here and there. A few small cuts can lead to being able to keep the momentum of the development effort. Too many small cuts lead to nasty technical debt problems. Work that is 'mostly' correct, but just needs a little refactoring, is really just an excuse for building up debt. Too much debt, especially in the later stages, impacts the time and morale, and is frequently another way into starting a death march. Even the best architecture cannot save a project with mediocre code.

MID-TERM BLUES

At some point, in every project, you are no longer at the beginning, but you're still not seeing the end. With increasing pressure, mounting technical debt, and the usual scope creep, most developers go into a sort of depression state. Morale falls, the code quality degenerates, and many developers consider abandoning ship.

This is the point where tempers flair, and rebellions spring out of every corner. Even in the best run, best organized projects, there is always some doubt as to the direction or possible success of the effort. The design is criticised, the standards are abandoned, and many programmers head off in their own unique direction, thinking that only they can put the work back on track.

Left unchecked, this is another common place where failure sets in. Depression, bad morale and low quality code all risk derailing the effort.

The best thing to do is focus the effort on cleaning up the small bugs, refactoring the code to 'standards' and working on other necessary, but 'trivial' cleanup tasks. Calling a 'code freeze' and forcing everyone to close off all of their open development also works. This forces the project into the next stage early (and thus looses some functionality), but it keeps it from becoming a full death march.

Too many open development tasks leads to too many potential bugs, which as they mount becomes increasingly costly to fix. The work generated by inter-dependent bugs increases exponentially. These types of exponential work explosions are impossible to accurately predict with scheduling. If there are too many changes getting made at the same time, their likelihood to clash eventually becomes inevitable. A project with an unknown or non-trivial bug list has a large potential debt pending. Sorting through this problem is more important than adding 'other' features, particularly if the key ones where added first.

Generally, for any iteration, everyone's expectation of work that will be completed is over-blown. That always means that some tough choices have to be made towards the end of this phase about which features are in, and which ones have to wait for the next iteration. It's always a hard choice, but not making it, or making it too late usually has very serious ramifications. Real experience in handling these types of trade-offs is invaluable in preventing failure. The choices need to be made by someone who really understands all of the consequences.

FINAL INTEGRATION

A great design, and good quality code will get you far, but it all has to come together, get packaged and work properly to be considered a success. There are always a number of small problems that creep up in the end, generally caused by design issues, communication problems or rebellious coders. Inevibitally they have to be worked through.

Integration is a complete freeze on the addition of any new functionality. The only changes allowed are fixes to the existing code. Any significant changes need to be discussed first. Cascading bugs should be avoided, or carefully tracked.

In this ending stage, some developers strongly believe that each line of code should be as independent from each other as much as possible. This line of thought leads to using brute force to pound out explicit code for each new feature or function in the system. While this does reduce the likelihood that a change to one part of the code will cause a problem in some other part, this is more than offset by the inconsistencies of having redundant code. Good architecture and encapsulation are the correct solutions for containing the impact from changes, not spending unnecessary effort on duplicated logic. Redundancies also mean more testing is necessary, and extending the code is way harder. We've long established that redundant code is bad, but it is still one of the most common industry practices.

Issues such as documentation, tutorials, packaging, and automation are often ignored until too late. Most developers are so focused on the core code that they forget about all of the other efforts that are required to get the project released. In really complex multi-lingual, commercial software, the non-code development work, such as installation scripts, database upgrade scripts, language translations, graphic design, documentation updates and features tutorials can require a army of trained specialists. It takes a serious amount of work.

At very least, commercial grade work needs to get packaged appropriately, a task which always requires a considerable amount of time, generally months. Even in-house projects can significantly benefit from being well-packaged or mostly automated. Any manual tasks or config fiddling opens up the possibility of problems and thus unexpected support costs. A bad or sloppy release, can seriously cut into the next iteration, setting the stage for a future failure. The project could be a success, while the release: a total (and expensive) failure.

If a project has accrued a significant technical debt (known or unknown) and is started down the path of a death march, it usually starts here as all of the developers are integrating their work. A significant death march is like stuffing straw into a burlap bag full of holes; as you stuff it into one side, it falls out the others. Maybe with some time, and luck, you'll get all of the straw into the bag, but it certainly is unstable unless you've taken the time to repair the bag. Most death marches are too far gone to bother with trying to repair the root causes. They've reached this point through a series of bad decisions, and nothing but enough time will get them past this point.

TESTING/RELEASE

Contrary to popular techie belief, the modern expectation for software quality is that it 'mostly works'. Decades of bad or disappointing releases have really lowered the bar for quality. Users skip over more bugs then they realize; they've become efficient at routing around the failures. Most developers believe that nothing short of perfect is acceptable, which generally sets them up for failure when they are unprepared to handle the inevitable problems.

Bugs are not just algorithmic coding problems, or junk on the screen, they are any behavior that is unexpected by a normal user. Any problem that requires significant support. And there are always a few with any release, you just can't avoid it.

In some cases the code might be technically correct, but the interface is convoluted and confusing to users. Or the functionality just doesn't make sense. Or some obviously expected part is missing, such as the ability to delete newly added data. In whatever case, in can be hard to find these issues in testing (the testers are not average users), and even if they are noticed there may not be time to rectify the code before release.

Choosing to release a system is not as simple as just waiting until everything is fixed and in working order. Known issues need to be evaluated for their true costs, and set into priority. Once there is nothing 'serious enough', the software gets shipped. That's a far uglier reality than most developers want, but success in software is really about getting tools out to the users, not about crafting an elegant loop, or the perfect data structure. Sometimes mostly working, is 'good enough'; there are always later iterations to fix the issues.

Choosing not to release a system, and instead, opening up some emergency development work is mostly a sign of a death march. A failure to initially get out of final integration properly. If a system has been punted at least once before, it is probably best to go back and identify the really serious problems, and choose to fix them (at whatever time cost). Getting back to the straw bag analogy, focusing on the straw is the main problem. Until you get deeper and decide to fix the bag, the likelihood of making a lasting solution is low.

AND FINALLY

Go have a beer. Celebrate! Particularly if the system is slick and easy to use; it is a rare event. Of course, it is best to be as honest about what worked, and what didn't. Software developers have a horrible ability to completely delude themselves as to the final quality of their code, and the real success of their project. Or to place the blame on anything other then their personal effort (and rebellion). Software development appears simple, but it is really complex. It can take decades to fully understand all of the trade-offs, choices and right decisions that are necessary to really produce good stuff. Lack of understanding and experience are no doubt, significant causes of our industries poor quality and excessively high failure rate. It is way too easy to write code, but extremely hard to actually development usable software. It's the difference between being able to build a shed in your backyard, and being able to build a skyscraper.