Tuesday, May 29, 2012

How do you design software?

Simon Brown started an interesting thread of discussion:


In one reply, Gene Hughson added:


I figured I’d also take a shot at explaining how I usually design systems.  

Please keep in mind that design is a highly creative process, so there are no right or wrong answers. What works in one case, may not in others. It is also highly variable based on the scale of the team, the project and the system. In smaller systems you can get away with cutting corners that are absolutely fatal in big ones.

I’ll answer this with respect to greenfield (new) software projects. Extending an existing system is similar, but considerably more constrained since you want all of the new pieces to fit in well with the existing ones.

For context, I’ve been building systems for over twenty years. In the last 14 years they have all been web apps and have all been destined to be sold as commercial products. The domains have changed significantly, but even though they are directed at different problems the underlying architectures and design goals have been very similar. I like to build systems that are dynamic. By that I mean that they do not know in advance what data they will be holding, nor how will be structured. There is always some static core, but below that, the exact nature of the data depends what is added to the system as it runs. At the interface level I usually build in some form of templates or scripting (DSL), so that the users are enabled to highly customize their workflows. For data stores, I’ve done OODB, NoSQL and generic schemas. I prefer NoSQL style solutions, but RDBMSes are useful for smaller systems. I consider a system successful when a user can essentially create their own personal sub-system with its own unique schema, quickly and easily using only the GUI. If their work requires long delays, programmers or operations involvement then I’ve missed the mark.

The teams I usually work with are small. I’ve been on big teams, but I find that smaller ones are generally more effective. Within these teams, each programmer has their strengths, but I try to get everyone to be as general as possible. Also, for any specific section of code, I try very hard to insure that at least two programmers know and can work on it. In the past, specialist teams with no overlap have had a tendency to collapse with moral or staffing changes. I prefer not to be subject to that sort of problem.

For any design the very first thing I do is identify a problem to solve. You can’t create an effective solution, if you don’t understand the problem.

Once I ‘get’ the problem, I go about deciding on the technologies. As the materials that make up the solution, the technology choices play a significant role in determining the system’s architecture. Each one has a ‘grain’ and things go considerably smoother if you don’t go against it. They also stack, so you should pick a collection of parts that work well together. Beyond affecting development, the choice often pays a significant role in sales as well, which is a key aspect when the work is commercial. Systems built in specific technologies sell easier in most markets.

For technologies that I have little or no experience with, I always go off and write a few prototypes to gain both experience and to understand the limits. Software capability is usually over-sold, so its worth confirming that it really works as required, before its too late.

With the technologies in hand, I then move onto the data. For all systems, the data is the foundation. You can’t build on what isn’t there, and an underlying schema works best if it isn’t patchy and inconsistent. The key thing to know about the data is its structure (schema/model/etc). But knowing how it gets into the system, its quality, volume and frequency are all important too. Even if the system is brand spanking new and cutting edge, chances are that earlier developers have modeled at least the major aspects of the data, so I find it crucial to do both research on what is known and analysis on what is out there. Mistakes in understanding the data are always very painful and expensive to correct, so I like to spend a little extra effort making sure that I’ve answered every question that has come up. One of the worst things you can do in software is to ignore stuff in the hopes that it will get sorted out later. Later is usually too late.

I work with two basic models for the data. The first is based on the idea that there is some ‘universal’ schema out there that correctly corresponds to the specifics of the data, no matter where it is found or what it is used for. The second is the subset of this model that is specific to the application I am building. If I’m utilizing an RBDMS to hold static elements, I generally try to make the schema in it as universal as possible. I may skip some entities or attributes, but certainly the model for any core entity is as close as I can afford to get it. Keep in mind that I am also constantly generalizing based on my understanding to get up to a higher level abstract perspective for as much of the data as I can. Generalizations cost in terms of the amount of work, the system performance and they slow down the initial stage of the development, however if they are well chosen they reduce the overall amount of work and provide a significant boost in development speed as the project matures.

Now, it is always the case that over time whatever I build will grow, getting larger and more complex. This usually means that the model of the data will grow as well. I can’t predict the future, but I can insure that the bulk of these future changes will be additions, not modifications or deletes. Doing so smooths out the upgrade path and avoids having future development hampered by technical debt that’s grown too large to be able to pay down.

With the base data understood, since I am usually designing large systems I generally move onto the architecture. The individual puzzles that are solved by the code always need to come together in a coherent fashion at the system level. Get this wrong and the system descends into ‘meta-spaghetti’, which is usually fatal (it can be fixed, but given that it is unpleasant work it is usually avoided until its too late).

I usually visualize the architecture as a very large set of lines and boxes. The lines separate different chunks of code, forming the basis of encapsulation and APIs. The boxes are just independent ‘components’ that handle related functionality; sometimes they are off the shelf, sometimes they need to be specifically written for the project. There are all sorts of ways of diagramming systems, but I find that laying it out in this fashion makes it easy to distribute the workload and to minimize overlap between programmers.

I always start by drawing the most natural lines first. For example, if it’s a web-app, there is a client and a server (you have no choice). Given that I’m keen on getting as much reuse as possible I try to avoid partitioning the system vertically based on attributes like ‘screens’. Most screens share a mass amount of common code, so avoiding duplicating it is usually a significant factor in many of my designs. I want one big chunk of underlying code that is called by some minimal code for each screen. Getting that in place usually results in several other sets of lines for the architecture.

With web apps, depending on the specific technologies, there can be some communication between the client and the server. If there is, again I want just one piece of code to handle it. It’s all communications code, so it doesn’t need to know about the data it is passing, just that it is passing it and handling errors. That’s easy to say, but sometimes with this type of piece, there can be several non-intuitive lines required to make sure that it really works deterministically.

Another place where redundancies play havoc is in getting the data to be persistent. Again, I focus on avoiding redundancies, since they are nearly as costly as in the screens, but unlike the screens I am way more tolerant in dividing any unrelated data into verticals. So long as there is a layer above that brings it all together in a consistent manner I don’t mind that the specifics are handled differently based on the underlying type of data. That’s usually a choice based on security or the distribution of work.

In all systems there are ‘big’ computations that grind through specific data. I generally label these as ‘engines’. I set them into the design as black boxes. What’s inside is less important then how it fits into the system. The technologies required usually dictate how these are peices integrated, but encapsulating them means that they can be built independently of the rest of the system. That is generally a big help again, when it comes to distributing the work or scheduling the releases.

Documentation generally depends on the environment and the size of the team. For a small, close knit group of very experienced developers, the major lines sketched on the back of a napkin can often be enough. Well, that and an ER diagram of the schema and some type of mock up of the screens done by a real graphic designer. Basically all of the parts that have to fit together nicely with each other. For some systems, I’ve whacked out the major pieces first then staffed up to flesh out the full system. That works well if you know where you are going and have enough time to articulate the lines in the code.

These types of techniques limit the overall size of the system or stretch out the development time, but they tend to set a tighter path towards reuse.

Sometimes I’ve done full specs. In those cases I’ll go down to the depth that I think is safe, but that is dependent on how the work is organized and who is doing it. So far I’ve always known who was doing the work and what their skill level was, but if I didn’t I’d likely go all the way down to the Class level, as in “here are the classes you are going to build”. It’s better to be too explicit initially, then be unpleasantly surprised later.

In most specs I prefer bullet points, tables and the occasional diagram. Anything but flowing text. Whatever gets the point across with the least amount of work and is easy to skim. The point of a spec is generally to get one or two people to accomplish a specific task, so I shy away from making it large, pretty or all inclusive. It just needs enough relevant details to create the code and nothing more. Flowing descriptions and justifications belong in high level documentation and have a very different audience.

Most of the architectures I have designed have been significantly abstracted away from the user requirements. The requirements affect the data and drive the number and types of engines, but up to this point they haven’t really entered into the picture. Most of the initial work has been about the technology and the data. It’s usually around this point that I try to sit down with a few real users and see what they are doing on a day-to-day basis. I may have found the problem right away, but now its time to actually dig into its ugly side. This manifests itself most often as the generation of lots of screens and some fairly serious extensions to the data model. If the architecture is holding water, neither of these are significant problems. I knew they were coming and now I want them.

From a user interface perspective I am usually aiming to simplify the interface as much as possible. It makes the user experience better, but it also reduces the coding work and testing. Well, not always. Some simplifications come from building more sophistication into the backend. The system holds a deeper more complex perspective on what the user is actually doing, so the user doesn’t have to hold it themselves. Those types of design issues generally land into the category of just being more data or engines, so there is usually a place for them to roost, long before they’ve been articulated.

Pretty much, if the architecture is doing it’s job, the growth and extensions to both the code and data are landing in previously defined parts of the system. To ease coding collisions I generally push building the code from the back to the front. The schema gets extended first, then the server, then the front-end. That also avoids creating fancy features that don’t map to the existing data.

Of course, there are always issues and problems. Things never really work according to plan, they take longer than expected and designs are rarely comprehensive enough to cover all of the extensions. My only rules are that if the design is wrong, we have to admit it as early as possible and then fix it properly as soon as possible. The longer you wait, the worse it gets. But it should be noted that any software development effort is always part of a much larger context (organizational/motivation), so often this larger context gets priority over the design and development issues. It’s these outside influences that make success so tricky to achieve.

That just about covers it from a high level. If all is working correctly, the analysis takes a bit of time up front and the development starts off slowly (from an interface perspective), but then generally the project falls into a comfortable steady state where each new extension gets easier to do than the last one. There are sometimes speed-bumps cause by a jump in scale or some ugly technical debt. Those have to be dealt with as early as possible. It’s worth noting too that in the beginning, there is usually a lot of moaning about time and progress, so it becomes important to veer away from the ‘right’ way to grab low hanging fruit (usually demos or throw-away features). But it’s always important afterwards to redirect the project back onto a more stable, long-term trajectory. Knowing when to bend, and figuring out how to undo that later before it becomes a huge problem, is extremely difficult and take considerable past experience to make viable choices.

Tuesday, May 22, 2012

New Layout

Just to keep life interesting, I've changed the template on my blog to Blogger's dynamic template.

One consequence is that I now need to send the full posts in the feed (I didn't before because I wanted people to visit the site so I could track them).

Another consequence is that DISCUS isn't supported, so for a short time I've turned off my comments. Once I figure out how to get DISCUS working again, I restore them.

Given these issues I may change the blog back to a simpler template, but I figure I'll leave it this way for a few days until I decide. Enjoy (it's quite an entertaining template :-)

UPDATE: Seems like DISCUS isn't going to support this template for a while, so I think I'll just open the blog up to comments in Blogger and then import them over later, when DISCUS is ready. 

Tuesday, May 8, 2012

Bag O'Tricks

There are at least two different approaches to computer programing.

The first approach comes from slowly building up an understanding of coding ‘tricks’. These are simple ways to solve simple problems. Initially, people start with the basic language features: assignments, conditionals, loops and functions. As they figure them out these go into their Bag O’Tricks. Then they start adding in language library functions, like string handling, files, data-structures etc. Gradually as they learn more, their Bag O’Tricks gets larger and larger. Many people move on to adding higher-level paradigms like design patterns. Most add in specific tricks for different technologies, like databases, networks or frameworks. Over time programmers end up with a fairly large collection of ways to solve lots of sub-problems within different languages and technologies.

When confronted with a new problem, they quickly break it down into sub-problems, continuing until the pieces are small enough to be solved with their existing Bag O’Tricks. If they are curious folk, they generally try to learn more tricks from examples or their follow programmers. They collect these up and apply them as necessary.

This is a very valid way of programming, but it does have one significant weakness. At any time during their career, a programmer’s Bag O’Tricks contains only a finite number of different solutions. They can arrange them in different ways, but their capabilities are limited by their tricks. That works wonderfully when the problems are the same or substantially similar to ones they have dealt with in the past.

The trouble comes when they encounter a problem that is of a new or different caliber. What happens -- you can see this quite clearly in a lot of code -- is that they start applying their tricks to the sub-problems, but the tricks don’t pack together well enough. These solutions become very tetris-like, basically odd fitting blocks with many gaps between. Of course, past success clouds present judgement and since the programmers have no reasonable alternatives -- given the ever present time constraints -- they keep heading down their chosen path. It’s the only path they know. When this goes wrong, the result is a bad mess that is unstable. A problem outside of the scope of a programmer’s tricks is one that they aren’t going to be able to solve satisfactorily. The industry is littered with examples, too numerous to count.

The second approach to programming is to drop the notion that ‘code’ is “the thing”. That is the key, to let go of the idea that creating software is all about assembling lists of instructions for a computer. Yes, there is always ‘code’, but the code itself is only a secondary aspect of a larger issue. The ‘thing’ is what is happening ‘underneath’ the code. The root of everything in the system. The foundation.

Right down at the bottom is data. It is what the users are collecting, what the database is storing and what people are seeing on their screens, reports, everything. All coding problems can be seen in the light that they are just instructions to take data -- stored in one place -- and move it to somewhere else. Along the way, the structure or shape of the data may have to change as part of the move. And on top of the data, there may be a requirement for ‘dynamic data’ that is calculated each time it is used, but this is only to avoid storing that data redundantly. Ultimately it is all about the data.

So the second approach is to forget about the code. It’s just a vehicle for getting data from somewhere, transforming it and then passing it on. The list of instructions is meaningless, the system is all about how data flows from different locations, is manipulated and then flows elsewhere. You can visualize the entire system as just data moving about, going from disks to memory, heading in from the keyboard and heading out to the network, getting dumped to the printers, being entered and endlessly modified by users. You don’t really need to understand the specifics of the algorithms that tweak it as it moves, but rather just its starting and final structure. The system is the data, and that data is like a car, where the code is simply the highway that the car follows to get to specific locations.

This second approach has considerable advantages. The best one is that a programmer seeing their work as just taking data D and getting it to D’ is no longer restricted by their finite Bag O’Tricks. Although they can permute their tricks endlessly, they are still heavily restricted from solving particular problems correctly. But a transformation from what is essentially one data-structure to another is a well-defined problem. There may be some sub-algorithmic issues involved in the transformation, but once broken down into discrete pieces, figuring out the code or researching how to do it properly are very tangible activities. So the programmers are in a good place to solve the system problems correctly, rather than just trying to endless combine tricks in the hopes that most issues go away.

Another major advantage is that a data perspective on the code allows for easy and natural optimizations. The programmer is no longer combining pieces, which often throw the data through unwanted gyrations. Instead the data goes directly from point A to point B. It’s a straight line from one format to another. As well, the programmer can widen their scope from just a localized problem all the way up to ‘every use’ of a particular type of data, anywhere in the system. This opens up huge possibilities for macro-optimizations that generally provide huge boasts to the overall performance.

One common difficulty in software development is system upgrades. The code upgrades really easily, you just replace a block you don’t like with a block that you do. Data however, is a major pain. Upgrades force merges, backfilling and endless restructuring. If you are initially focused on the code then the upgrade problem gets ignored, where it quietly grows and becomes dangerous. Focusing on the data however brings it front and center. It becomes just another sub-problem of moving the data from one place to another, but this time across versions rather than just inside of the system. It makes tackling a big upgrade problem no worse than any other aspect of the system.

Added to all of this, it is far easier to visualize the data moving about in a system instead of seeing a mountain of poorly organized code. This makes architecture, debugging and testing far simpler as well. For example, a large inventory system with lots of eclectic functionality becomes conceptually simple when viewed as just a way to collect and display very specific items. This twist then leads to ways to combine and organize the existing functionality so that it is easier for the user to wield. Generalizations flow naturally.

Over the years, I’ve seen many a programmer hit the wall with their current Bag O’Tricks approach. Their ability to correctly solve problems is limited, so it is easy for them to get into a position where it becomes convoluted. However, seeing the data first breezes right through these issues. It becomes very quick and convenient to either manipulate the data into the correct form, or to determine if such manipulations are even possible (there are many unsolvable problems). Getting back to the earlier analogy, if you don’t have a viable car, you don’t really need to consider which off-ramp would be best.

Often I like to refer to programmers who rely solely on their Bag O’Tricks as having ‘one eye open’. The programmer may be very good at coding, but they’re too constrained by the limits of their existing tricks. If they spend their career staying within those boundaries, there are no problems. But if they want to get out there and build spectacular things that people will love, then they’ve got to get that second eye open as well. Once they’ve done that, they are no longer limited by what they know, just by the available time and their ability to correctly analyze the problem space.  A whole new world of possibilities opens up. They just have to learn to change their perspective.