Tuesday, May 29, 2012

How do you design software?

Simon Brown started an interesting thread of discussion:


In one reply, Gene Hughson added:


I figured I’d also take a shot at explaining how I usually design systems.  

Please keep in mind that design is a highly creative process, so there are no right or wrong answers. What works in one case, may not in others. It is also highly variable based on the scale of the team, the project and the system. In smaller systems you can get away with cutting corners that are absolutely fatal in big ones.

I’ll answer this with respect to greenfield (new) software projects. Extending an existing system is similar, but considerably more constrained since you want all of the new pieces to fit in well with the existing ones.

For context, I’ve been building systems for over twenty years. In the last 14 years they have all been web apps and have all been destined to be sold as commercial products. The domains have changed significantly, but even though they are directed at different problems the underlying architectures and design goals have been very similar. I like to build systems that are dynamic. By that I mean that they do not know in advance what data they will be holding, nor how will be structured. There is always some static core, but below that, the exact nature of the data depends what is added to the system as it runs. At the interface level I usually build in some form of templates or scripting (DSL), so that the users are enabled to highly customize their workflows. For data stores, I’ve done OODB, NoSQL and generic schemas. I prefer NoSQL style solutions, but RDBMSes are useful for smaller systems. I consider a system successful when a user can essentially create their own personal sub-system with its own unique schema, quickly and easily using only the GUI. If their work requires long delays, programmers or operations involvement then I’ve missed the mark.

The teams I usually work with are small. I’ve been on big teams, but I find that smaller ones are generally more effective. Within these teams, each programmer has their strengths, but I try to get everyone to be as general as possible. Also, for any specific section of code, I try very hard to insure that at least two programmers know and can work on it. In the past, specialist teams with no overlap have had a tendency to collapse with moral or staffing changes. I prefer not to be subject to that sort of problem.

For any design the very first thing I do is identify a problem to solve. You can’t create an effective solution, if you don’t understand the problem.

Once I ‘get’ the problem, I go about deciding on the technologies. As the materials that make up the solution, the technology choices play a significant role in determining the system’s architecture. Each one has a ‘grain’ and things go considerably smoother if you don’t go against it. They also stack, so you should pick a collection of parts that work well together. Beyond affecting development, the choice often pays a significant role in sales as well, which is a key aspect when the work is commercial. Systems built in specific technologies sell easier in most markets.

For technologies that I have little or no experience with, I always go off and write a few prototypes to gain both experience and to understand the limits. Software capability is usually over-sold, so its worth confirming that it really works as required, before its too late.

With the technologies in hand, I then move onto the data. For all systems, the data is the foundation. You can’t build on what isn’t there, and an underlying schema works best if it isn’t patchy and inconsistent. The key thing to know about the data is its structure (schema/model/etc). But knowing how it gets into the system, its quality, volume and frequency are all important too. Even if the system is brand spanking new and cutting edge, chances are that earlier developers have modeled at least the major aspects of the data, so I find it crucial to do both research on what is known and analysis on what is out there. Mistakes in understanding the data are always very painful and expensive to correct, so I like to spend a little extra effort making sure that I’ve answered every question that has come up. One of the worst things you can do in software is to ignore stuff in the hopes that it will get sorted out later. Later is usually too late.

I work with two basic models for the data. The first is based on the idea that there is some ‘universal’ schema out there that correctly corresponds to the specifics of the data, no matter where it is found or what it is used for. The second is the subset of this model that is specific to the application I am building. If I’m utilizing an RBDMS to hold static elements, I generally try to make the schema in it as universal as possible. I may skip some entities or attributes, but certainly the model for any core entity is as close as I can afford to get it. Keep in mind that I am also constantly generalizing based on my understanding to get up to a higher level abstract perspective for as much of the data as I can. Generalizations cost in terms of the amount of work, the system performance and they slow down the initial stage of the development, however if they are well chosen they reduce the overall amount of work and provide a significant boost in development speed as the project matures.

Now, it is always the case that over time whatever I build will grow, getting larger and more complex. This usually means that the model of the data will grow as well. I can’t predict the future, but I can insure that the bulk of these future changes will be additions, not modifications or deletes. Doing so smooths out the upgrade path and avoids having future development hampered by technical debt that’s grown too large to be able to pay down.

With the base data understood, since I am usually designing large systems I generally move onto the architecture. The individual puzzles that are solved by the code always need to come together in a coherent fashion at the system level. Get this wrong and the system descends into ‘meta-spaghetti’, which is usually fatal (it can be fixed, but given that it is unpleasant work it is usually avoided until its too late).

I usually visualize the architecture as a very large set of lines and boxes. The lines separate different chunks of code, forming the basis of encapsulation and APIs. The boxes are just independent ‘components’ that handle related functionality; sometimes they are off the shelf, sometimes they need to be specifically written for the project. There are all sorts of ways of diagramming systems, but I find that laying it out in this fashion makes it easy to distribute the workload and to minimize overlap between programmers.

I always start by drawing the most natural lines first. For example, if it’s a web-app, there is a client and a server (you have no choice). Given that I’m keen on getting as much reuse as possible I try to avoid partitioning the system vertically based on attributes like ‘screens’. Most screens share a mass amount of common code, so avoiding duplicating it is usually a significant factor in many of my designs. I want one big chunk of underlying code that is called by some minimal code for each screen. Getting that in place usually results in several other sets of lines for the architecture.

With web apps, depending on the specific technologies, there can be some communication between the client and the server. If there is, again I want just one piece of code to handle it. It’s all communications code, so it doesn’t need to know about the data it is passing, just that it is passing it and handling errors. That’s easy to say, but sometimes with this type of piece, there can be several non-intuitive lines required to make sure that it really works deterministically.

Another place where redundancies play havoc is in getting the data to be persistent. Again, I focus on avoiding redundancies, since they are nearly as costly as in the screens, but unlike the screens I am way more tolerant in dividing any unrelated data into verticals. So long as there is a layer above that brings it all together in a consistent manner I don’t mind that the specifics are handled differently based on the underlying type of data. That’s usually a choice based on security or the distribution of work.

In all systems there are ‘big’ computations that grind through specific data. I generally label these as ‘engines’. I set them into the design as black boxes. What’s inside is less important then how it fits into the system. The technologies required usually dictate how these are peices integrated, but encapsulating them means that they can be built independently of the rest of the system. That is generally a big help again, when it comes to distributing the work or scheduling the releases.

Documentation generally depends on the environment and the size of the team. For a small, close knit group of very experienced developers, the major lines sketched on the back of a napkin can often be enough. Well, that and an ER diagram of the schema and some type of mock up of the screens done by a real graphic designer. Basically all of the parts that have to fit together nicely with each other. For some systems, I’ve whacked out the major pieces first then staffed up to flesh out the full system. That works well if you know where you are going and have enough time to articulate the lines in the code.

These types of techniques limit the overall size of the system or stretch out the development time, but they tend to set a tighter path towards reuse.

Sometimes I’ve done full specs. In those cases I’ll go down to the depth that I think is safe, but that is dependent on how the work is organized and who is doing it. So far I’ve always known who was doing the work and what their skill level was, but if I didn’t I’d likely go all the way down to the Class level, as in “here are the classes you are going to build”. It’s better to be too explicit initially, then be unpleasantly surprised later.

In most specs I prefer bullet points, tables and the occasional diagram. Anything but flowing text. Whatever gets the point across with the least amount of work and is easy to skim. The point of a spec is generally to get one or two people to accomplish a specific task, so I shy away from making it large, pretty or all inclusive. It just needs enough relevant details to create the code and nothing more. Flowing descriptions and justifications belong in high level documentation and have a very different audience.

Most of the architectures I have designed have been significantly abstracted away from the user requirements. The requirements affect the data and drive the number and types of engines, but up to this point they haven’t really entered into the picture. Most of the initial work has been about the technology and the data. It’s usually around this point that I try to sit down with a few real users and see what they are doing on a day-to-day basis. I may have found the problem right away, but now its time to actually dig into its ugly side. This manifests itself most often as the generation of lots of screens and some fairly serious extensions to the data model. If the architecture is holding water, neither of these are significant problems. I knew they were coming and now I want them.

From a user interface perspective I am usually aiming to simplify the interface as much as possible. It makes the user experience better, but it also reduces the coding work and testing. Well, not always. Some simplifications come from building more sophistication into the backend. The system holds a deeper more complex perspective on what the user is actually doing, so the user doesn’t have to hold it themselves. Those types of design issues generally land into the category of just being more data or engines, so there is usually a place for them to roost, long before they’ve been articulated.

Pretty much, if the architecture is doing it’s job, the growth and extensions to both the code and data are landing in previously defined parts of the system. To ease coding collisions I generally push building the code from the back to the front. The schema gets extended first, then the server, then the front-end. That also avoids creating fancy features that don’t map to the existing data.

Of course, there are always issues and problems. Things never really work according to plan, they take longer than expected and designs are rarely comprehensive enough to cover all of the extensions. My only rules are that if the design is wrong, we have to admit it as early as possible and then fix it properly as soon as possible. The longer you wait, the worse it gets. But it should be noted that any software development effort is always part of a much larger context (organizational/motivation), so often this larger context gets priority over the design and development issues. It’s these outside influences that make success so tricky to achieve.

That just about covers it from a high level. If all is working correctly, the analysis takes a bit of time up front and the development starts off slowly (from an interface perspective), but then generally the project falls into a comfortable steady state where each new extension gets easier to do than the last one. There are sometimes speed-bumps cause by a jump in scale or some ugly technical debt. Those have to be dealt with as early as possible. It’s worth noting too that in the beginning, there is usually a lot of moaning about time and progress, so it becomes important to veer away from the ‘right’ way to grab low hanging fruit (usually demos or throw-away features). But it’s always important afterwards to redirect the project back onto a more stable, long-term trajectory. Knowing when to bend, and figuring out how to undo that later before it becomes a huge problem, is extremely difficult and take considerable past experience to make viable choices.