Monday, March 29, 2010

The Edit Loop

A very common programming and architectural problem comes from what I like to call the 'edit loop'.

It is a software programming construct that exists when getting the user's data out of some form of long-term persistent storage and then putting it into a user interface for display or editing. The second half of the loop comes from taking any changes made by the user and getting them back into storage.

Mostly these days the interfaces are GUIs, and the storage is a relational database, but all of the same problems apply even if the interface is a command line or text based, and if the storage is some other type of persistent storage such as a key/value data-store. For this discussion we'll just assume the interface is a standard GUI with panels and widgets, and the database is relational.

These loops are generally triggered by some event mechanism, which is started by a user navigating to a specific screen-full of data in their application. The loops happen often, and are the main form of processing for the application. They usually constitute the bulk of the functionality and of the code.

Millions of programmers have been coding these types of loops for decades. Virtually every system has some type of interface, and in all of these there are often many types of edit loop.

The number of times this code has been written, rewritten, hacked or refactored is staggering. It generally accounts for at least 80% of the work of most application development. It is were most programmers spend most of their time.

This post will start first with the stand practice for building such loops in large systems, and then examine some of the common problems with this approach. It will then get into ideas about how to reduce the work and redundancies. Finally it will deal with more complex architectures.


STANDARD DESIGN

The easiest approach to building an edit loop is to start with the database.

Since the major strength of a relational database is to allow several different applications (such as reporting and mining) to share the same underlying data, the first thing that should be done is to create a fourth normal form schema in a 'universal' representation usable by all of the applications.

Some applications have more stringent performance requirements, so a few of the tables may have to be de-normalized as required.

From the database, the programmers need to get the data into their running code. Since most applications have slightly different requirements then the universal schema, there is usually some finessing of the data to put it into an application specific model.

The current popular programming paradigm is object-oriented, which generally relates the tables in the database to specific objects in the application.

There could be a one-to-one correspondence between the tables and the objects, but it is more likely that the programmers will source the same underlying tables within many different objects in the system. There is usually a lot of repetition. This collection of objects in the application is often loosely referred to as the "model", although it is rarely so precise.

Once the users have created a large number of different models of the data, most system architectures are layered into distinct pieces, either as a client/server architecture or just as some lower data level with an interface level sitting on top of it.

In either case, the data constructed in the modeling part of the code needs to get transported to the interface part of the code.

The predominate convention for this type of 'transportation' code is to make it strongly typed. Each variable loaded from the database has a consistent data type that stays with it as it travels throughout the system. If the field is an integer in the database, it is read as an integer, passed through the system as an integer and edited as an integer. The type stays consistent.

Beyond the transportation code lies the user interface. In most systems this is predominately the largest chunk of code. Although the convention is to not decorate the data with 'presentation', normally the different models in the back-end have been specifically created based on different presentation needs, so some of that presentation information is already implicitly encoded into the data.

This mostly occurs because the panel/widget code is often thick, ugly and confusing, so the programmers push the different views of the data back down to the database layer where it won't get mixed into the GUI code. What starts as good intentions quickly gets obfuscated.


THE REVERSE TRIP

So far, the data has worked its way up from the database, through to the interface. It has been strongly typed right out of the database and throughout its journey.

Once into the interface, if it is only a half loop, the data is just further spruced up for presentation, annotated (with things like links) and then dumped to the screen.

If the data is involved in a full loop, it is usually displayed on the screen in a series of widgets that allow for it to be modified. Once the user's editing work is deemed complete, the data is then heavily validated.

Usually this means some set of checks that is more stringent than just its simple data-type. For example, if the field is an integer, perhaps the only valid values for it are between 1 and 4. It is highly restrictive.

Many fields are also cross-checked with each other, to make sure that the whole set of data fields is consistent with some set of external rules.

If this process fails, the problem is pushed back to the user, to try again. And again if required. Until the data is finally validated enough to start the journey back to the database.

On the reverse trip the data is also commonly strongly typed. It goes from the interface code through some transportation back to the database code. Many systems use entirely different sections of code and different transportation mechanisms for the second half of the loop. It is frequently seen as a different problem from fetching the data.

Now, most schema usually have some serious validations checks encoded into the database as well. It is an essential part of a well-designed schema. If the application requires a relational database, it should use that database properly and to its fullest extent.

The incoming data must be consistent in order to get added or updated in the database. However, most database code assumes that the incoming data is fine, and then blindly tries to save it.

A common problem is basing user edits on stale data -- data that has changed since it was retrieved -- but this one is rarely solved in most systems, usually just ignored.

In the database code, the data is jammed into the database API, and then added or updated as needed. If this fails, the code will generally just chuck an error straight up towards the user. If it succeeds then a success message is sent instead.

Sometimes you'll find a second layer of validation checks, just ahead of the database. Usually this is to allow for a slightly different but more specific message then would have been issued by the database in the error case.


INTRINSIC WEAKNESSES

While this type of code construction is very common, it makes a trade-off between just getting the code written quickly and working smartly as a software programmer.

Simply put, it is more often a result of the programmers needing to get something working now for a deadline, then it is a result of the programmers sitting around spending lots of time thinking about the ultimate way to build their systems. It is a by-product of the pressures to get something built fast.

In these types of systems, there are essentially three main sections: a) the database code, b) the transportation code, and c) the interface code.

Starting backwards, the interface code is the mechanism to allow the users to use some functionality within a specific context. That is, the user really wants to do something to the data in the system; their choices are to edit it, or to view it in some fashion (that may take significant computing power). Because of this, we can view the interface code as just remembering the user's context, and as an entry-point to launch some functionality for the user.

In most big systems, there are literally thousands of different individual 'user-functions' that can be accessed. These can be large like editing data in a form, or small like re-sorting some data in a table. Each and every action by the user executes some user-function of some type.

This is usually the most code in the system, and the worst code in the system.

Generally, most programmers short-cut the object-orientation of their user interfaces so that the code becomes very big and bulky. There is a tendency to create larger objects that contain more underlying screen element construction code, like setting up panels or widgets. Full object-orientation is very rare, particularly in the more modern programming languages like Java and C#.

Most of this interface code is very specific to individual screens in the system. Most screens are only used once within the system. There are a huge number of redundancies.

The transportation code moves the data back and forth between the interface and the database. Although the transportation code is simple, because of an insistence on strong typing, it is not uncommon the find systems out there with lots and lots of very specific transportation code.

Usually it comes across as just glue code, binding one interface to another, but with a large number of specific variable copies. Sometimes you'll find a  number of different data transformations as well, where the programmers have gone to extra work to fiddle with the data in some way, in the middle of transport.

On the opposite spectrum, sometimes the code is just spanned right out to a full, but highly redundant representation, that could have been easily compressed. The programmers thinking it was too much effort to pack the data a little tighter. Simple tricks like pulling out any common absolute partial strings, leaving only relative ones, can help cut down on data duplication, but are rarely used.

Mostly, the bulk of the transportation work goes into setting it up on one side, and then re-sorting it into a different shape on the other. Different schools of philosophy and coding style have very different preferences in how to build this part. Many prefer brute force.

The final section of code is the database code. In a language like Java, much of it is as direct SQL implementations in APIs like JDBC. In C#, the programmers can save a bit of the ugly syntax with LINQ. Most software development environments allows for some type of direct SQL access to the database.

Later implementations of alternative paradigms like object-relational-mapping (OR/M) try to put an abstracted layer over this code to make it less redundant and more consistent.

Still, one key problem is that the data representation in a relational database is both universal and less expressive than the application representation. As such, there are a lot of things that the code can do, that are not easy, convenient or even possible with the database.

The difference in expressibility leads to a very common software development mistake, where some developers try to build the system backwards, starting from the interface, and doing the database layer last.

This top-down approach is easier to code, but it is more likely that there will be issues when trying to force the application perspective into a database, then if they had started with a bottom-up approach.

Also, the top-down approach leaves the data in a highly de-normalized state which is application specific. If the data is only usable in its persistent form by one application then it is a technological waste to store it into a complex container like a relational database, when there are far simpler and more expressive ways to store the data. Still the convention is to not think that way.


CODE BLOAT

These three sections, in most systems, get quickly filled with lots and lots of redundant code. Most programming efforts expect and allow this. That is, they set up the architecture, add in some common libraries, and they start letting these three sections grow large and fat.

There is usually more re-use initially, but as the work progresses and the teams change slightly, more and more programmers tend to avoid the old code, and just write new stuff. Bloat is normal.

What inevitably happens is that the code gets more and more redundant. Each new user-function gets its own specific interface, transport and database code. Most times, the teams are conscious enough to not duplicate data in the actual schema, but even then you can often find the same data stored in multiple different tables.

Redundancies become common place. One clear way to see this is by breaking the code up into two. Given two distinct sets of screens, how easily can you split the system into two separate programs? How much code will they share as common libraries?

If this decomposition is easy and includes the bulk of the code, then the screens are 'independent' from each other.

In a non-independent system, most of the code in the system is shared between the different screens. Each screen has very little unique code.

In an independent system, you could keep breaking the system into two new pieces, recursively, until all of the screens exist in their own programs. If it wasn't for context, you could also break down all of the user-functions into their own programs. A thousand little programs would work just the same as one big one (although navigation from program to program would be tricky).

Independence means that there is more and more specific code that has a 1-to-1 correspondence with some user-function. The code is only executed by a very specific entry-point into the system. As that code is replicated throughout the system, it becomes worse and worse.

We can define a metric to count how well the code is being re-used. We'll call it 'leverage'. The leverage value L, for a line of code in the system, is the number of user-functions which require it. If the code is used only by one specific piece of functionality, then L=1. If it is shared in two places, then L=2. If it is never used then L=0. If there are N user-functions that need it, then L=N. It is a straight-forward count.

We can combine the different L values for the various code blocks to get one overall value for the system. Independent systems have a lower average L value. The more independent the code, the more it is redundantly doing the same work over and over again. The more likely that changes will cause new bugs.

Dense systems have a higher L value. If the value closely matches the amount of functionality in the system, then the bulk of the code is being greatly re-used, and even minimal testing has large coverage. Systems that have higher L values are more likely to have higher overall quality. There is way less code in the system, and more bugs are found with less testing. All good attributes to have in a code base.

We clearly want to try to maximize the leverage in our systems. Less work, means we can do a better job. Nothing is worse than flailing away at poorly structured code.


FIXES AND SOLUTIONS

If you have a big system with thousands of user-functions, then even if the functions are all relatively simple and only require a thousand lines of code each, a complete independent system is easily over a million lines.

And it is that multiplier effect that is so poorly understood by many programmers.

There is a limit to the speed at which we can write, and as the application becomes bigger and more redundant, our coding speed gets slower and slower. Inconsistencies get larger and more dangerous, causing the bug fixing to become another huge problem.

Mostly the lower the leverage value in the system, the harder future development will become, often further lowering it. It is a downward spiral.

The only real solution is to try to increase the leverage to its highest possible value. Only in a highly leverage system will the developers be able to keep up with the work load and most development will get easier with time, not harder.

The first, most obvious change is to the transportation code. Wherever it is, in any system, it really doesn't need to know the type of data it is handling. It is crucial to minimize parsing, and conversions, but we also want to leverage any block of code as highly as possible.

Even though code is handling the data, the code should always have a minimal understanding of its data. That is, a container like a tree, doesn't need to know about its underlying objects, it just needs to hold them. And the same is true in the transportation section.

The data has to come from the model, and the model has to alter the incoming data, but from there all data is the same as any other data. It should go into one large generic container.

In that way, each and every user-function that uses an edit loop, should go through the same transportation code. If there are fifty user-functions that need a full or partial loop, then L should equal 50 for the transportation code.

Modern languages allow for introspection, which allows for further encapsulation of the packaging and unpacking code. The method of transportation for the system should be identical for all parts of the system. Writing duplicate, but slightly different code sections is neither necessary nor reasonable.

From a programming standpoint, it should be trivial to package all or parts of an underlying model into a transport container. It should be a one-line statement. It should also be trivial to get the data from a screen and send it back to the back-end. These are hard, but good qualities to achieve.


STRONG TYPING

Strongly typed code is often viewed as the "correct" way to build systems these days, because it pushes some of the validation work onto the compiler. While that can be useful in detecting some types of compile-time bugs, the compiler cannot check the running content of the data, so there are an awful lot of bugs that can't get caught this way.

On the other hand, any programmers with experience in loosely typed languages know that there is always less code, and that the underlying code is always intrinsically more reusable. Loose typing makes for flexible code. Strongly typing makes it more rigid, which is sometimes useful, but only in specific circumstances.

If we are really interested in significantly reducing the size of our systems, then we are interested in any type of approach that reduces code.

By keeping the data as loosely typed as possible, we can move it through the system without having to create a lot of specific code to manage it. For instance, in Java, since everything can be of type Object, then everything can be loosely typed. Even the containers like lists and trees can be packaged as Objects.

When we finally need to use the data, we can convert it to a more convenient type. This process is natural in a loosely typed language -- usually being handled quietly and automatically -- but can easily be emulated in a strongly typed one. If the interface needs a date from the database we can grab it, cast it to an Object, and move it through the system and into position. Only at the last moment do we need to need to re-convert it back into an actual date string for display.

At the end, this may seem like an unnecessary conversion, but truthfully it is a small cost compared to managing all of the strongly typed code in between. Software development is always about trade-offs, and this is one of the better ones.

The presentation layer generally needs to 'stringify' the data anyways, so most often it isn't even extra, and it might just be an optimization in some cases. If your underlying transportation code is forcing all of the data into a strings, then the final conversion might just be moot.

But it shouldn't matter to the programmer, because the transportation code should abstract away all of the underlying details. String, integer or other data-type, the programmer shouldn't know or have to know about the in-transport data representation, it should be encapsulated.

All they need to know is what their final data-type for the user should be, and if they are using some interface abstraction, they may not even need to know that much.

The less a programmer needs to know about what is underneath, the less they should know. Many programmers tend towards being too specific in their handling of data, often writing large amounts of excessive, unnecessary code. It's a common form of over-complicating the code. Stopping this behavior, or at least containing it, helps in keeping the code from getting under-leveraged.


DUPLICATE DATA VALIDATIONS

For most systems, the interface has very strict validation requires and so does the database. As previously mentioned, the database validates and returns either an error or success during an update or save.

It is not possible to keep the database from ever returning an error, as there will always be unforeseen events like being out of disk space, or communication problems.

Because of this, there must be a passageway through the system for database errors, which one might as well utilize for other things as well. The more we reuse the existing mechanisms, the less specific testing we'll need.

Code re-use always cuts down on testing.

So ultimately, we really don't want to do much duplicate validation at the database level, and we certainly don't want to do any of it at the transport level. Display data is validated by the fact that it came directly from the database.

Since editing always requires strong validation, if we stack our validation code into one place, then it is not duplicated or spread all over the system. Editing is the best candidate.

Implicitly it does also exist in the database schema, but that is OK. The database schema encodes a universal view of the data, which can have its own different validation, and differences should be accommodated with the reading and transformation of the data. The differences may be small, but they need to be encapsulated together.

In that way, although we want strict validations in the interface, the rest of the system shouldn't know, and shouldn't care about its data. That is, it should just be some loosely typed 'thing' that is going someplace.

Code shouldn't know any more about the data, then what is absolutely necessary.

Now once at the front-end, both the presentation and editing do require the data to be strongly typed. In fact, they require 'strict' typing, where the type is far more limited than just a simple data-type, it is tied by its domain and by being cross-referenced to one or more other variables.


FRONT-END ABSTRACTIONS

One place where programmers spend way too much time is on the front end. The code is highly repetitive, and none of the current framework paradigms attempt to abstract that down to something smaller.

Common practice is too create a small set of unique objects per screen, or action, and then fill these with highly-redundant low-level GUI code, such as allocating widgets, or handling events.

Model-view-controller (MVC) frameworks bring this down marginally to dealing with actions, but since they usually have a one-to-one correspondence with screens, there is still a large amount of nearly identical code.

Mostly, all of the interface code is about displaying data or editing it. The displays can be textual or image based. They include embedded user-functionality to help drive the appearance of interactivity. They are all very similar in nature, and always need to have some overriding consistency.

But if the code is spread out into a large collection of redundant sections, keeping it neat and tidy quickly becomes impossible. What's needed is an overlaying abstraction to minimize the code and enforce consistency.

A good abstraction provides a higher-level structure, in an attempt to reduce the amount of code necessary to work with an encapsulated set of behavior. That is, the programmer should have to do way less work, when working with a good abstraction. It not only provides structure, but also reduces effort.

An abstraction that works nice is to fit a 'form' over all of the interfaces. Read-only code fits into a read-only form, while editing works mostly within a standard forms model.

In a reasonable implementation, the user would need some minimum way of defining the form, and then populating it with initial data. A good abstraction would hide most of the normal interaction with the user, allowing for only special cases to flow through. This is necessary because the programmers shouldn't have to reinvent the mechanics for handling simple technical things like paging list of data from the database.

Keeping the programmers away from re-inventing the smaller technical solutions, also helps to enforce consistency in the behavior of the interface.

Some care needs to be established because frameworks can occasionally make things too implicit. That is, APIs often present the wrong types of options to the programmers. Simple obvious values should be set into the defaults, and the only options that need explicit overriding should be those that really represent a degree of freedom within the method call (although multiple calls are a better choice sometimes).

Restricted validation and some cross-conditional handling is always necessary. When ready, a good abstraction will present the final and completed data to the programmer, ready for transportation.

Of course, many applications will have special corner-cases, so the abstraction will need to allow the programmer to 'hook' in code at specific points. This can be complex because these hooks are intrinsically disconnected from the rest of the system, making them less obvious and harder to understand. A good interface should read very simply, and so the nature and purpose of the hooks should be obvious and easy to grasp.


BACK-END COMPRESSION

One of the hardest places to reduce overly redundant code is in the back-end data models. Still, there are some key data elements contained in the database which are of interest to the application. If you're strict about minimizing these, and keeping out presentation information, the code can be reduced.

Most schema have inherent redundancies in their tables, such as date stamps and user auditing information. These can be encapsulated, and reused over and over again. Small convenience libraries can be used to make any of these common fields share one implementation.

As well, even if the schema is forth normal form, some of the tables themselves can be brought together at higher levels. Although this type of generalization can cut down on code, you have to be very careful not to over-do it and make the schema itself impossible to share across applications. It becomes a set of very hard trade-offs.

If the data is only ever used in one application, and has some alternative import/export format for external system's comparability, then using some other non-relational format may be a better choice. Ideally, the less work required to load and finesse the data is the best solution.

Thus in an object-orient language, a real object-oriented persistence mechanism that is super-simple is the best choice, although consideration has to be made in correctly handling system data upgrades at some later point.

The best solution is to just boil it down to its absolute minimum, and take advantage of techniques like polymorphism wherever possible. Less code, means less duplication, less redundancy, less testing and less bugs.


SUMMARY

Inexperienced programmers are generally more concerned with getting their code to run then they are with keeping it so. Because of this, they rely on the simpler and more obvious brute force approaches towards development, which significantly increase the amount of code, reduce the leveraging and the increase the work in testing it.

A system with a few hundred user-functions might start out OK, but if most of the code is poorly leveraged, that quickly changes as the project matures.

Poorly leveraged, highly redundant code, is the most common critical problem with most software development. You see it everywhere. An initial release might be successful, but the accrued technical debt grows more rapidly then the resources to offset it. Stagnation or implosion are the expected consequences.

Ultimately, if the system is so highly leveraged that just logging into it tests a large swath of the transportation and back-end code, then the overall quality of the system will be high. It will be high because it was intrinsically built into the architecture. A few simple tests will cover large code sections, which makes for a trivially well-tested system.

Ultimately, it's not how many bugs you have in the system, it is how easy they are to detect that really matters.

As well, writing the initial code is not particularly hard. Given enough concentration most people can put together a sequence of instructions to tell the computer to do something. The real trick to programming, is to be able to keep this code sane, version after version.

Without structure, and with a low leverage value, the code will degrade rapidly as it gets pushed and expanded.

On the other hand, strong abstractions in highly leverage code are the gold standard of programming. They make extending the code -- any part of it -- easy. A sign of good code is that it should be easier to grow the system, then it should be to re-write it.

We don't want huge, inconsistent systems. We don't want independent code. Programmers never start out with this as their goal, but it is the inevitable consequence of many of our standard programming best practices.

Given, that with a little more effort and thought, the work involved in most systems can easily to be reduced by orders of magnitude, it is surprising to see how rare this is in practice. Programmers are usually their own worst enemies, even though few stay in the profession long enough to correctly understand this.

Monday, March 15, 2010

The Value Proposition

Suppose for a moment that you had a great idea for a new but non-trivial piece of software. Somehow, through a bit of luck, you managed to secure just enough money to get it built and released, but only enough for you to do all of the work by yourself.

That is, you've come into possession of a empty company with just enough money to pay your salary for a couple of years, but nothing else. The money is guaranteed, but until you find some other source of income that's all you are going to get.

It's a simple scenario, but a strong one in being able to evaluate software development methodologies. It works well, because your time is limited, and for every choice you make there is an opportunity cost in terms of what you could have been doing instead. It forces you to make better choices.

If you spend your days writing long-winded comments for instance, you've severely cut down on your development time. Given that you can't afford another programmer, comments aren't all that useful are they? Whose going to read them?

Of course, in this scenario you can't win until you get the software finished, packaged, and shipped to people. A whole bunch of people. But initially you don't even have a market, or a way to generate sales, just a great idea.

For the sake of simplicity, we'll say the idea is so great that in the hands of a reasonable company it will definitely generate revenue with some type of reasonable profit. That is, it is a winner, even if it's not a big one.

To get it going, you'll have to get several versions built and released to a number of different clients before the revenue becomes consistent enough to hire new people to replace yourself.

Still, just knowing that the idea can win, doesn't lead one to being able to make it so. Initially you'll need demos, marketing materials, and enough time set aside to practice a song and dance about why people should commit to this product. You are not only the software developer, you're also the salesperson, or at least the pre-sales engineer.


PRIORITIES

You realize that this is a chance of a lifetime, to be able to create your own unique product from scratch, so you'd be crazy to turn it down. Somethings are just worth doing.

The first priority would be to set up an environment in which to work. A development machine is obvious, but some type of source code control would definitely help in tracing problems, keeping track of changes, and tracking down bugs. A separate source server would be easier to backup.

You'll have to set up the whole environment, including the development and test machines, so you'll probably want something really simple and easy to maintain. You don't want to lose a lot of effort to system admin tasks, and you don't have enough to out-source this issue.

Of course for tools, you don't want to waste a lot of time learning some fancy environment; the simplest, but most straight-forward tools are always the best. You need to edit, and search the code, but you'll also need some powerful debugging from time-to-time. You really don't want to lose a week or two in trying to track down a stupid bug with only print statements.

Assembling a few lines of working code is not a hard problem, but as that code grows and grows, you'll quickly find that "structure" at multiple levels is important. It is there to separate the code, encapsulating it so it doesn't come back to haunt you. It also makes it way easier to triage bugs, and relate them to specific sections of code, something that will come in really handy later, when the first support calls start to eat into development time.

Structure is fine, but given that you'll be working on this system for years, and given that you'll need to do the sales yourself there will be plenty of interruptions, so you'll need to write down you plans.

Of course, spending six months to produce the perfect documentation, with arrows and charts and what not, is a killer waste of time. You only need enough documentation to remind you of exactly what you planned to do. Just enough to keep you honest. But enough to remind you what is important on extremely busy and disruptive days.

Early on demos will be necessary, so version control plays an increasingly important role. You must be able to code full stream, but fall back to a consistent version for a demo. Buggy demos hurt, and getting sales is your only chance at getting help. Of course a separate demo machine is important, and so is being able to update it quickly to the latest and greatest version. Another side issue is being able to quickly fill it with demo-related configuration and persistent data. And empty system doesn't demo well.

Now, in thinking about your design, you realize that if you just belt out the code you'll need way over half a million lines. Given that you've only got a couple of years, and you can't pound out that much code that quickly, you have to resort to re-using as much code as possible to reduce the code size into something manageable. Abstraction is the only way to do this. You find an intelligent way to implement some internal generic structure that allows you to hang all of the required functionality onto the same infrastructure, but without continuously duplicating each piece separately. Done well, the code comes down into the thousands of lines range.

Still, there are way more features than you initially need to get a sale, and you really want to get sales early to build up both experience and capital. As such, the 1.0 version of the system has got to be ready early, and you'll keep extending it, by adding in new features until it gets closer and closer to the product you imagined. While structure was really great in encapsulation and bug fixing, it really becomes crucial in handling how you will extend the system. The structure delineates what you think are the permanent lines, not the ones in which you'll end up replacing or refactoring code. In this way, the modules, libraries, components and other pieces make it far easier to do the extensions, particularly in small improvements. The pieces also make it easier to just test a small subset of the overall system. Testing too, is another time-consumptive problem that you have to watch out for.

Given that the the environment will be chaos once the product is out there, but you'll still need to be making a large number of improvements, any effort in structure, or in abstraction will pay huge dividends. In fact, you won't be able to win unless you invest effort enough in both.


VALUE

The common denominator, again and again is in just doing on the most minimal amount of work possible to get to the results. If you get caught up on a side-track or a make-work project, it's a painful waste of your already limited resources.

The easy way to really value work, then is by tracing it backwards from some necessary requirement. Something you just can't live without. You need software to sell, so you have to write the code. People want it to work, so you'll have to test it. Other people will need to understand it, so they'll be documentation.

You need documentation for the users, so you'll have to both make the interface simple, and provide some trivial documentation. Of course, if you make interface changes frequently you certainly don't want to create a huge amount of extra work by having to keep the user documentation up-to-date. It should be simple, and as resistant to change as possible.

Since you have to sell, you'll need marketing documentation. People always want to know about the features, or how to use the product or a tutorial, or any number of other necessary 'sales aides'. These documents are usually summaries, but they too need to be kept in sync. In fact, they often need to be updated head of the code, given that you are trying to generate interest in upcoming features.

Nothing looks worse than poorly packaged software. While it's common with some shops, vendors do not look professional until they've wrapped their works in nice installers. It may not seem like much, but generally even a simple installer can take weeks if not months, and as the systems progress they need continuous updating as well. Many modern technologies are utterly pathetic to install, so it makes the problems even worse. Suddenly you have to be concerned about the dependencies, like databases, containers and libraries, not just for their technical abilities, but also for their capacity to interfere with the installer.

Of course, while you're on dependencies, you can't forget the massive number of sticky, icky licenses that are floating about. You need revenue, and you don't want to make you're competitor's lives easier by giving them the system, so you'd like to retain control of the source code. That means you have to be careful in choosing which libraries you use underneath. Some of them are legal landmines, just waiting to bite.

One easy thing to forget is support. In the initial days, for your first sales you still have to add in new features, but you'll also be both customer service relations, and support. You'll be on-site post-sales engineering, and will definitely be there for the first few installs, it is both necessary and prudent. System administrators will call with technical problems, but also with other weird and wonderful issues, like which standards you support. Users will call, and it won't matter how simple the interface is, or how well the documentation is, they will still call. They do this mostly because they know you are small, and they can be lazy. And of course there will be bugs.

The worse part about many of the bugs is that they come out of no-where, when you are busiest and just eat up huge segments of time. Just trying to understand and replicate takes effort, but then if the problem requires documentation and/or patching, a few days or a week can just disappear. No matter how good the coding and testing is, there will always be bugs, and some will be very difficult to find. Support is always underestimated, and can be a huge drain on resources.

The initial panic to setup the environment, then create a design and get coding as fast as possible, gives way to a much more disruptive environment once the first sales get going. Of course, you have to sell to get more people, and you really don't want to just hire someone immediately, because the initial cashflow is un-steady. Hiring costs time, and laying off your first employees is both time consuming because you procrastinate, and demoralizing even if they take it well.


SUMMARY

Wining in a scenario like this takes an ability to really value work. In an environment with a lot of resources, it is very easy to place a value on useless work because it is deemed to fit some idea or process. But when there is just enough between you and success, you quickly find that a lot of what people think helps them, is really just excess baggage.

Of course some things that you might not think have value, come to play really strongly. Two examples are simple code, and simple interfaces.

Simple code doesn't mean brute force, but it does mean that the code does no more than exactly what it should. Abstractions massively cut down on the code size, so they are absolutely necessary but intricate, fragile and complex code is not. Code needs to be neat, consistent and easily readable, particularly late at night or when you are really busy.

Simple interfaces are another important feature, in that they both cut down on user documentation and support calls. If your system contains some uber-complex algorithm for handling stuff, it will provoke significant effort to educate the users, and that could cost you the win. A bad interface can eat a lot of time in creating it, but also a lot more in trying to support it. These are not resources you can afford.

What's interesting about a lot of our best practices in the software development industry is that they fail very quickly in this type of scenario. The same bloat and excess we put into the code, also goes into the making of the code. At the same type, you'll also find developers that have swung too far to the other way. They panic, becoming cowboy coders, avoiding all process and organization. Either too much or too little causes problems. In both cases, as the projects grow, the complexity increases and the resources get squandered on poor value propositions. If you have plenty of resources to waste, this might not be a problem, but even in large companies few developers have any real spare cycles.

Tuesday, March 9, 2010

97 Things Every Programmer Should Know

It is probably a good time for me to plug the latest in the 97 Things series. It came out last month and I managed to get a story of my included:

97 Things Every Programmer Should Know, edited by Kevlin Henney.

This is another great work containing a wide range of writers interested in sharing their experiences and understandings. Something our industry desperately needs.

For this addition I threw in one of my favorite stories from around the age when I really learned programming. I was lucky enough to have a mentor, which made a huge difference to my abilities. I had no real idea what I was doing when I started.

You can work as hard as you want, but you can't get beyond just pounding out the obvious until you gain access to a larger knowledge base. That is, until you climb up on other's shoulders, you're just getting the same view as everyone else. You're just learning to do what they did already.

We may produce tonnes of reference material, and have pretty good online Q&A forms, but they are really only band-aides to help programmers fix things when they go wrong. Our biggest problems come from the way coders are building their systems. From the way they are just belting out code.

A mass of ugly poorly structured code may not be immediately visible, but you can always tell the poor construction from the disorderly interfaces, annoying over-complexity and quirky behavior. The true nature of code always shines through!

Things won't change until we find ways to teach the "higher" principles to new coders; there is far more to programming than just assembling massive lists of instructions. Things won't change until people understand that flailing away at their code is unnecessary, unproductive and just adds to the problem; brute force always produces messy systems. Things won't change until we learn how to pass on our skill sets; left on their own, most new programmers will just pound out the same messes over and over again.

That is why the 97 Things series is such a great thing for an industry that is clearly having trouble trying to grow. A huge number of our software development problems stem from the fact that each new generation of programmers poorly reinvents the same wheels, while strategically avoiding the real problems frozen into our mammoth backward-compatible tar pits.

In an age were we have access to massive data, powerful machines and can connect to computers from anywhere, it is the only our software that continues to be a let down.