The Programmer's Paradox: The Different Phases of Development

Software construction is always a long running project. Most software stays in active development for years, some for decades. As the underlying infrastructure is always changing, software must at very minimum make changes to avoiding rusting. Static software quickly becomes unsupportable as all of the underlying dependencies -- like the OS, the database and any libraries -- progress and drop support for the old versions. Generally, along with this dependency maintenance, the users are pushing for some new features or fixes to old or broken ones. As well, the technological expectations are shifting too. That is, the technologies and user expectations change over time. Thin clients, fancier GUIs, and more interactive interactions.

Thus, the normal life-span for software is as a long-running development project. In order to show progress and give the users something to use, the development usually takes place in a number of iterations, some lasting only a short time, some much longer. Older development methodologies preferred long running iterations, such as a year or more, while the newer ones can be as short as a month. Most experienced developers would agree that the length of the iteration is directly proportional to the risk of project failure. That is, longer running iterations have a greater probability of failure. They have more ways and opportunities to get off track, and they are much harder to schedule accurately.

But, like any trade-off, shorter iterations are more costly. Done correctly, they include both rounds of design and testing, which have intrinsic setup/take-down costs. Splitting the programming work in half, for instance takes more than twice the time. That relationship is always true for any type of effort where there is a built-in context switch. Multi-tasking always requires more individual effort. Iteration length is a risk vs. effort trade-off.

Within an iteration, we can break up the development into a number of different phases:

design/experimentation
initial development
mid-term blues
final integration
testing/release

Essentially these phases exist for any project, although sometimes they are skipped, trivialized, or combined together; most often in projects that are destined for failure.

DESIGN/EXPERIMENTATION

Anything non-trivial needs a design. Some large projects need it to be explicit and clearly defined or they will end up building unnecessary or unusable parts of the system. Smaller projects may get away with an informal verbal direction, particularly if the software is small, very simple or a well-known design.

A lack of design always results in a 'big ball of mud' architecture. These types of non-architectures have a very low threshold of complexity, which when exceeded, usually drives the project into a dangerous death march. Architecture, particularly a well-defined structure, is responsible for encapsulating the overall complexity of the system into different components. Without this type of structure, the complexity of the system will rapidly exceed the developer's ability to understand it. When that happens, any fixes or enhancements are equally likely to break something as they are to add in new features. The code base becomes unstable.

With a decedent architecture, the system is decomposed into layers of nearly independent parts, in a way that allows the developers to only focus on one part or one layer at a time. While the architecture adds some complexity, it removes the necessity to understand the whole system all at once, in order to fix or modify it. It makes the code manageable.

To get to a coherent design, developers need to understand the strengths and weaknesses of their underlying technologies. A common problem is to believe the available marketing material description of the underlying functionality, only to find out in the middle of development that not all of the functions work as expected. Real experience with utilizing the technology for a specific purpose is the only way to avoid this problem, but as the technologies change rapidly most developers are using them in new ways. Small prototypes for core behaviors always saves time and prevents embarrassing slip-ups.

Experiments, prototyping and design are the foundations for smooth development projects, but the most common industry practice is to do these poorly or bypass them altogether. Short cutting this phase frequently leads to disaster, although many developers prefer to place the blame on tight management schedules. However, panic to get started, is not the same as scope creep. Development projects ending in death marches, are usually caused by poor planning, and that is often dominated by poor design.

While too little design is bad, too much can be a big waste of effort. Software is complex to assemble, so it makes sense to only fully assemble it once, in its most appropriate final form. Explicitly re-creating it fully in some design specification format is just a 'second way' of specifying the system. That level of detail is unnecessary. The design need only formulate the lines of encapsulation, the key behaviors (if not obvious) and the division of responsibilities between teams/developers. Beyond that, any extra effort is wasted effort. Projects frequently explode because they've run way over the time expectations, so efficient utilization of time is critical.

INITIAL DEVELOPMENT

The best part of programming is the initial part. Generally progress is fast, the code is light and the work is fun. It can be very exhilarating to show off the new behaviors of the code. Praise is constant, everyone is excited and happy.

However, this is a delusion. One that doesn't last, as the development gets into the other phases. Still many developers think this is the way the entire project should go, so they often run into morale problems in the later, harder parts of development.

Besides the unrealistic expectations of the developers, another problem comes from hacking the code too fast. Often, in this initial phase, the excitement leads the coders to cut little corners here and there. A few small cuts can lead to being able to keep the momentum of the development effort. Too many small cuts lead to nasty technical debt problems. Work that is 'mostly' correct, but just needs a little refactoring, is really just an excuse for building up debt. Too much debt, especially in the later stages, impacts the time and morale, and is frequently another way into starting a death march. Even the best architecture cannot save a project with mediocre code.

MID-TERM BLUES

At some point, in every project, you are no longer at the beginning, but you're still not seeing the end. With increasing pressure, mounting technical debt, and the usual scope creep, most developers go into a sort of depression state. Morale falls, the code quality degenerates, and many developers consider abandoning ship.

This is the point where tempers flair, and rebellions spring out of every corner. Even in the best run, best organized projects, there is always some doubt as to the direction or possible success of the effort. The design is criticised, the standards are abandoned, and many programmers head off in their own unique direction, thinking that only they can put the work back on track.

Left unchecked, this is another common place where failure sets in. Depression, bad morale and low quality code all risk derailing the effort.

The best thing to do is focus the effort on cleaning up the small bugs, refactoring the code to 'standards' and working on other necessary, but 'trivial' cleanup tasks. Calling a 'code freeze' and forcing everyone to close off all of their open development also works. This forces the project into the next stage early (and thus looses some functionality), but it keeps it from becoming a full death march.

Too many open development tasks leads to too many potential bugs, which as they mount becomes increasingly costly to fix. The work generated by inter-dependent bugs increases exponentially. These types of exponential work explosions are impossible to accurately predict with scheduling. If there are too many changes getting made at the same time, their likelihood to clash eventually becomes inevitable. A project with an unknown or non-trivial bug list has a large potential debt pending. Sorting through this problem is more important than adding 'other' features, particularly if the key ones where added first.

Generally, for any iteration, everyone's expectation of work that will be completed is over-blown. That always means that some tough choices have to be made towards the end of this phase about which features are in, and which ones have to wait for the next iteration. It's always a hard choice, but not making it, or making it too late usually has very serious ramifications. Real experience in handling these types of trade-offs is invaluable in preventing failure. The choices need to be made by someone who really understands all of the consequences.

FINAL INTEGRATION

A great design, and good quality code will get you far, but it all has to come together, get packaged and work properly to be considered a success. There are always a number of small problems that creep up in the end, generally caused by design issues, communication problems or rebellious coders. Inevibitally they have to be worked through.

Integration is a complete freeze on the addition of any new functionality. The only changes allowed are fixes to the existing code. Any significant changes need to be discussed first. Cascading bugs should be avoided, or carefully tracked.

In this ending stage, some developers strongly believe that each line of code should be as independent from each other as much as possible. This line of thought leads to using brute force to pound out explicit code for each new feature or function in the system. While this does reduce the likelihood that a change to one part of the code will cause a problem in some other part, this is more than offset by the inconsistencies of having redundant code. Good architecture and encapsulation are the correct solutions for containing the impact from changes, not spending unnecessary effort on duplicated logic. Redundancies also mean more testing is necessary, and extending the code is way harder. We've long established that redundant code is bad, but it is still one of the most common industry practices.

Issues such as documentation, tutorials, packaging, and automation are often ignored until too late. Most developers are so focused on the core code that they forget about all of the other efforts that are required to get the project released. In really complex multi-lingual, commercial software, the non-code development work, such as installation scripts, database upgrade scripts, language translations, graphic design, documentation updates and features tutorials can require a army of trained specialists. It takes a serious amount of work.

At very least, commercial grade work needs to get packaged appropriately, a task which always requires a considerable amount of time, generally months. Even in-house projects can significantly benefit from being well-packaged or mostly automated. Any manual tasks or config fiddling opens up the possibility of problems and thus unexpected support costs. A bad or sloppy release, can seriously cut into the next iteration, setting the stage for a future failure. The project could be a success, while the release: a total (and expensive) failure.

If a project has accrued a significant technical debt (known or unknown) and is started down the path of a death march, it usually starts here as all of the developers are integrating their work. A significant death march is like stuffing straw into a burlap bag full of holes; as you stuff it into one side, it falls out the others. Maybe with some time, and luck, you'll get all of the straw into the bag, but it certainly is unstable unless you've taken the time to repair the bag. Most death marches are too far gone to bother with trying to repair the root causes. They've reached this point through a series of bad decisions, and nothing but enough time will get them past this point.

TESTING/RELEASE

Contrary to popular techie belief, the modern expectation for software quality is that it 'mostly works'. Decades of bad or disappointing releases have really lowered the bar for quality. Users skip over more bugs then they realize; they've become efficient at routing around the failures. Most developers believe that nothing short of perfect is acceptable, which generally sets them up for failure when they are unprepared to handle the inevitable problems.

Bugs are not just algorithmic coding problems, or junk on the screen, they are any behavior that is unexpected by a normal user. Any problem that requires significant support. And there are always a few with any release, you just can't avoid it.

In some cases the code might be technically correct, but the interface is convoluted and confusing to users. Or the functionality just doesn't make sense. Or some obviously expected part is missing, such as the ability to delete newly added data. In whatever case, in can be hard to find these issues in testing (the testers are not average users), and even if they are noticed there may not be time to rectify the code before release.

Choosing to release a system is not as simple as just waiting until everything is fixed and in working order. Known issues need to be evaluated for their true costs, and set into priority. Once there is nothing 'serious enough', the software gets shipped. That's a far uglier reality than most developers want, but success in software is really about getting tools out to the users, not about crafting an elegant loop, or the perfect data structure. Sometimes mostly working, is 'good enough'; there are always later iterations to fix the issues.

Choosing not to release a system, and instead, opening up some emergency development work is mostly a sign of a death march. A failure to initially get out of final integration properly. If a system has been punted at least once before, it is probably best to go back and identify the really serious problems, and choose to fix them (at whatever time cost). Getting back to the straw bag analogy, focusing on the straw is the main problem. Until you get deeper and decide to fix the bag, the likelihood of making a lasting solution is low.

AND FINALLY

Go have a beer. Celebrate! Particularly if the system is slick and easy to use; it is a rare event. Of course, it is best to be as honest about what worked, and what didn't. Software developers have a horrible ability to completely delude themselves as to the final quality of their code, and the real success of their project. Or to place the blame on anything other then their personal effort (and rebellion). Software development appears simple, but it is really complex. It can take decades to fully understand all of the trade-offs, choices and right decisions that are necessary to really produce good stuff. Lack of understanding and experience are no doubt, significant causes of our industries poor quality and excessively high failure rate. It is way too easy to write code, but extremely hard to actually development usable software. It's the difference between being able to build a shed in your backyard, and being able to build a skyscraper.

The Programmer's Paradox

Sunday, June 6, 2010

The Different Phases of Development

No comments:

Post a Comment