Simon Brown started an interesting thread of discussion:
http://www.codingthearchitecture.com/2012/04/27/how_do_you_design_software.html
In one reply, Gene Hughson added:
http://genehughson.wordpress.com/2012/05/17/getting-there-from-here/?goback=.gde_1835657_member_116375634
I figured I’d also take a shot at explaining how I usually design systems.  
Please
 keep in mind that design is a highly creative process, so there are no 
right or wrong answers. What works in one case, may not in others. It is
 also highly variable based on the scale of the team, the project and 
the system. In smaller systems you can get away with cutting corners 
that are absolutely fatal in big ones.
I’ll
 answer this with respect to greenfield (new) software projects. 
Extending an existing system is similar, but considerably more 
constrained since you want all of the new pieces to fit in well with the
 existing ones. 
For
 context, I’ve been building systems for over twenty years. In the last 
14 years they have all been web apps and have all been destined to be 
sold as commercial products. The domains have changed significantly, but
 even though they are directed at different problems the underlying 
architectures and design goals have been very similar. I like to build 
systems that are dynamic. By that I mean that they do not know in 
advance what data they will be holding, nor how will be structured. 
There is always some static core, but below that, the exact nature of 
the data depends what is added to the system as it runs. At the 
interface level I usually build in some form of templates or scripting 
(DSL), so that the users are enabled to highly customize their 
workflows. For data stores, I’ve done OODB, NoSQL and generic schemas. I
 prefer NoSQL style solutions, but RDBMSes are useful for smaller 
systems. I consider a system successful when a user can essentially 
create their own personal sub-system with its own unique schema, quickly
 and easily using only the GUI. If their work requires long delays, 
programmers or operations involvement then I’ve missed the mark.
The
 teams I usually work with are small. I’ve been on big teams, but I find
 that smaller ones are generally more effective. Within these teams, 
each programmer has their strengths, but I try to get everyone to be as 
general as possible. Also, for any specific section of code, I try very 
hard to insure that at least two programmers know and can work on it. In
 the past, specialist teams with no overlap have had a tendency to 
collapse with moral or staffing changes. I prefer not to be subject to 
that sort of problem.
For
 any design the very first thing I do is identify a problem to solve. 
You can’t create an effective solution, if you don’t understand the 
problem.
Once
 I ‘get’ the problem, I go about deciding on the technologies. As the 
materials that make up the solution, the technology choices play a 
significant role in determining the system’s architecture. Each one has a
 ‘grain’ and things go considerably smoother if you don’t go against it.
 They also stack, so you should pick a collection of parts that work 
well together. Beyond affecting development, the choice often pays a 
significant role in sales as well, which is a key aspect when the work 
is commercial. Systems built in specific technologies sell easier in 
most markets.
For
 technologies that I have little or no experience with, I always go off 
and write a few prototypes to gain both experience and to understand the
 limits. Software capability is usually over-sold, so its worth 
confirming that it really works as required, before its too late.
With
 the technologies in hand, I then move onto the data. For all systems, 
the data is the foundation. You can’t build on what isn’t there, and an 
underlying schema works best if it isn’t patchy and inconsistent. The 
key thing to know about the data is its structure (schema/model/etc). 
But knowing how it gets into the system, its quality, volume and 
frequency are all important too. Even if the system is brand spanking 
new and cutting edge, chances are that earlier developers have modeled 
at least the major aspects of the data, so I find it crucial to do both 
research on what is known and analysis on what is out there. Mistakes in
 understanding the data are always very painful and expensive to 
correct, so I like to spend a little extra effort making sure that I’ve 
answered every question that has come up. One of the worst things you 
can do in software is to ignore stuff in the hopes that it will get 
sorted out later. Later is usually too late.
I
 work with two basic models for the data. The first is based on the idea
 that there is some ‘universal’ schema out there that correctly 
corresponds to the specifics of the data, no matter where it is found or
 what it is used for. The second is the subset of this model that is 
specific to the application I am building. If I’m utilizing an RBDMS to 
hold static elements, I generally try to make the schema in it as 
universal as possible. I may skip some entities or attributes, but 
certainly the model for any core entity is as close as I can afford to 
get it. Keep in mind that I am also constantly generalizing based on my 
understanding to get up to a higher level abstract perspective for as 
much of the data as I can. Generalizations cost in terms of the amount 
of work, the system performance and they slow down the initial stage of 
the development, however if they are well chosen they reduce the 
overall amount of work and provide a significant boost in development 
speed as the project matures. 
Now,
 it is always the case that over time whatever I build will grow, 
getting larger and more complex. This usually means that the model of 
the data will grow as well. I can’t predict the future, but I can insure
 that the bulk of these future changes will be additions, not 
modifications or deletes. Doing so smooths out the upgrade path and 
avoids having future development hampered by technical debt that’s grown
 too large to be able to pay down.
With
 the base data understood, since I am usually designing large systems I 
generally move onto the architecture. The individual puzzles that are 
solved by the code always need to come together in a coherent fashion at
 the system level. Get this wrong and the system descends into 
‘meta-spaghetti’, which is usually fatal (it can be fixed, but given 
that it is unpleasant work it is usually avoided until its too late). 
I
 usually visualize the architecture as a very large set of lines and 
boxes. The lines separate different chunks of code, forming the basis of
 encapsulation and APIs. The boxes are just independent ‘components’ 
that handle related functionality; sometimes they are off the shelf, 
sometimes they need to be specifically written for the project. There 
are all sorts of ways of diagramming systems, but I find that laying it 
out in this fashion makes it easy to distribute the workload and to 
minimize overlap between programmers.
I
 always start by drawing the most natural lines first. For example, if 
it’s a web-app, there is a client and a server (you have no choice). 
Given that I’m keen on getting as much reuse as possible I try to avoid 
partitioning the system vertically based on attributes like ‘screens’. 
Most screens share a mass amount of common code, so avoiding duplicating
 it is usually a significant factor in many of my designs. I want one 
big chunk of underlying code that is called by some minimal code for 
each screen. Getting that in place usually results in several other sets
 of lines for the architecture. 
With
 web apps, depending on the specific technologies, there can be some 
communication between the client and the server. If there is, again I 
want just one piece of code to handle it. It’s all communications code, 
so it doesn’t need to know about the data it is passing, just that it is
 passing it and handling errors. That’s easy to say, but sometimes with 
this type of piece, there can be several non-intuitive lines required to
 make sure that it really works deterministically.
Another
 place where redundancies play havoc is in getting the data to be 
persistent. Again, I focus on avoiding redundancies, since they are 
nearly as costly as in the screens, but unlike the screens I am way more
 tolerant in dividing any unrelated data into verticals. So long as 
there is a layer above that brings it all together in a consistent 
manner I don’t mind that the specifics are handled differently based on 
the underlying type of data. That’s usually a choice based on security 
or the distribution of work.
In
 all systems there are ‘big’ computations that grind through specific 
data. I generally label these as ‘engines’. I set them into the design 
as black boxes. What’s inside is less important then how it fits into 
the system. The technologies required usually dictate how these are 
peices integrated, but encapsulating them means that they can be built 
independently of the rest of the system. That is generally a big help 
again, when it comes to distributing the work or scheduling the 
releases.
Documentation
 generally depends on the environment and the size of the team. For a 
small, close knit group of very experienced developers, the major lines 
sketched on the back of a napkin can often be enough. Well, that and an 
ER diagram of the schema and some type of mock up of the screens done by
 a real graphic designer. Basically all of the parts that have to fit 
together nicely with each other. For some systems, I’ve whacked out the 
major pieces first then staffed up to flesh out the full system. That 
works well if you know where you are going and have enough time to 
articulate the lines in the code. 
These
 types of techniques limit the overall size of the system or stretch out
 the development time, but they tend to set a tighter path towards 
reuse. 
Sometimes
 I’ve done full specs. In those cases I’ll go down to the depth that I 
think is safe, but that is dependent on how the work is organized and 
who is doing it. So far I’ve always known who was doing the work and 
what their skill level was, but if I didn’t I’d likely go all the way 
down to the Class level, as in “here are the classes you are going to 
build”. It’s better to be too explicit initially, then be unpleasantly 
surprised later.
In
 most specs I prefer bullet points, tables and the occasional diagram. 
Anything but flowing text. Whatever gets the point across with the least
 amount of work and is easy to skim. The point of a spec is generally to
 get one or two people to accomplish a specific task, so I shy away from
 making it large, pretty or all inclusive. It just needs enough relevant
 details to create the code and nothing more. Flowing descriptions and 
justifications belong in high level documentation and have a very 
different audience.
Most
 of the architectures I have designed have been significantly abstracted
 away from the user requirements. The requirements affect the data and 
drive the number and types of engines, but up to this point they haven’t
 really entered into the picture. Most of the initial work has been 
about the technology and the data. It’s usually around this point that I
 try to sit down with a few real users and see what they are doing on a 
day-to-day basis. I may have found the problem right away, but now its 
time to actually dig into its ugly side. This manifests itself most 
often as the generation of lots of screens and some fairly serious 
extensions to the data model. If the architecture is holding water, 
neither of these are significant problems. I knew they were coming and 
now I want them.
From
 a user interface perspective I am usually aiming to simplify the 
interface as much as possible. It makes the user experience better, but 
it also reduces the coding work and testing. Well, not always. Some 
simplifications come from building more sophistication into the backend.
 The system holds a deeper more complex perspective on what the user is 
actually doing, so the user doesn’t have to hold it themselves. Those 
types of design issues generally land into the category of just being 
more data or engines, so there is usually a place for them to roost, 
long before they’ve been articulated. 
Pretty
 much, if the architecture is doing it’s job, the growth and extensions 
to both the code and data are landing in previously defined parts of the
 system. To ease coding collisions I generally push building the code 
from the back to the front. The schema gets extended first, then the 
server, then the front-end. That also avoids creating fancy features 
that don’t map to the existing data.
Of
 course, there are always issues and problems. Things never really work 
according to plan, they take longer than expected and designs are rarely
 comprehensive enough to cover all of the extensions. My only rules are 
that if the design is wrong, we have to admit it as early as possible 
and then fix it properly as soon as possible. The longer you wait, the 
worse it gets. But it should be noted that any software development 
effort is always part of a much larger context 
(organizational/motivation), so often this larger context gets priority 
over the design and development issues. It’s these outside influences 
that make success so tricky to achieve. 
That
 just about covers it from a high level. If all is working correctly, 
the analysis takes a bit of time up front and the development starts off
 slowly (from an interface perspective), but then generally the project 
falls into a comfortable steady state where each new extension gets 
easier to do than the last one. There are sometimes speed-bumps cause by a
 jump in scale or some ugly technical debt. Those have to be dealt with 
as early as possible. It’s worth noting too that in the beginning, there
 is usually a lot of moaning about time and progress, so it becomes 
important to veer away from the ‘right’ way to grab low hanging fruit 
(usually demos or throw-away features). But it’s always important 
afterwards to redirect the project back onto a more stable, long-term 
trajectory. Knowing when to bend, and figuring out how to undo that 
later before it becomes a huge problem, is extremely difficult and take 
considerable past experience to make viable choices.
Software is a static list of instructions, which we are constantly changing.
Tuesday, May 29, 2012
Tuesday, May 22, 2012
New Layout
Just to keep life interesting, I've changed the template on my blog to Blogger's dynamic template.
One consequence is that I now need to send the full posts in the feed (I didn't before because I wanted people to visit the site so I could track them).
Another consequence is that DISCUS isn't supported, so for a short time I've turned off my comments. Once I figure out how to get DISCUS working again, I restore them.
Given these issues I may change the blog back to a simpler template, but I figure I'll leave it this way for a few days until I decide. Enjoy (it's quite an entertaining template :-)
UPDATE: Seems like DISCUS isn't going to support this template for a while, so I think I'll just open the blog up to comments in Blogger and then import them over later, when DISCUS is ready.
One consequence is that I now need to send the full posts in the feed (I didn't before because I wanted people to visit the site so I could track them).
Another consequence is that DISCUS isn't supported, so for a short time I've turned off my comments. Once I figure out how to get DISCUS working again, I restore them.
Given these issues I may change the blog back to a simpler template, but I figure I'll leave it this way for a few days until I decide. Enjoy (it's quite an entertaining template :-)
UPDATE: Seems like DISCUS isn't going to support this template for a while, so I think I'll just open the blog up to comments in Blogger and then import them over later, when DISCUS is ready.
Tuesday, May 8, 2012
Bag O'Tricks
There are at least two different approaches to computer programing. 
The first approach comes from slowly building up an understanding of coding ‘tricks’. These are simple ways to solve simple problems. Initially, people start with the basic language features: assignments, conditionals, loops, and functions. As they figure them out these go into their Bag O’Tricks. Then they start adding in language library functions, like string handling, files, data-structures etc. Gradually as they learn more, their Bag O’Tricks gets larger and larger. Many people move on to adding higher-level paradigms like design patterns. Most add in specific tricks for different technologies, like databases, networks or frameworks. Over time programmers end up with a fairly large collection of ways to solve lots of sub-problems within different languages and technologies.
When confronted with a new problem, they quickly break it down into sub-problems, continuing until the pieces are small enough to be solved with their existing Bag O’Tricks. If they are curious folk, they generally try to learn more tricks from examples or their fellow programmers. They collect these up and apply them as necessary.
This is a very valid way of programming, but it does have one significant weakness. At any time during their career, a programmer’s Bag O’Tricks contains only a finite number of different solutions. They can arrange them in different ways, but their capabilities are limited by their tricks. That works wonderfully when the problems are the same or substantially similar to ones they have dealt with in the past.
The trouble comes when they encounter a problem that is of a new or different caliber. What happens -- you can see this quite clearly in a lot of code -- is that they start applying their tricks to the sub-problems, but the tricks don’t pack together well enough. These solutions become very Tetris-like, basically odd fitting blocks with many gaps between. Of course, past success clouds present judgment and since the programmers have no reasonable alternatives -- given the ever-present time constraints -- they keep heading down their chosen path. It’s the only path they know. When this goes wrong, the result is a bad mess that is unstable. A problem outside of the scope of a programmer’s tricks is one that they aren’t going to be able to solve satisfactorily. The industry is littered with examples, too numerous to count.
The second approach to programming is to drop the notion that ‘code’ is “the thing”. That is the key, to let go of the idea that creating software is all about assembling lists of instructions for a computer. Yes, there is always ‘code’, but the code itself is only a secondary aspect of a larger issue. The ‘thing’ is what is happening ‘underneath’ the code. The root of everything in the system. The foundation.
Right down at the bottom is data. It is what the users are collecting, what the database is storing and what people are seeing on their screens, reports, everything. All coding problems can be seen in the light that they are just instructions to take data -- stored in one place -- and move it to somewhere else. Along the way, the structure or shape of the data may have to change as part of the move. And on top of the data, there may be a requirement for ‘dynamic data’ that is calculated each time it is used, but this is only to avoid storing that data redundantly. Ultimately it is all about the data.
So the second approach is to forget about the code. It’s just a vehicle for getting data from somewhere, transforming it and then passing it on. The list of instructions is meaningless, the system is all about how data flows from different locations, is manipulated and then flows elsewhere. You can visualize the entire system as just data moving about, going from disks to memory, heading in from the keyboard and heading out to the network, getting dumped to the printers, being entered and endlessly modified by users. You don’t really need to understand the specifics of the algorithms that tweak it as it moves, but rather just its starting and final structure. The system is the data, and that data is like a car, where the code is simply the highway that the car follows to get to specific locations.
This second approach has considerable advantages. The best one is that a programmer seeing their work as just taking data D and getting it to D’ is no longer restricted by their finite Bag O’Tricks. Although they can permute their tricks endlessly, they are still heavily restricted from solving particular problems correctly. But a transformation from what is essentially one data-structure to another is a well-defined problem. There may be some sub-algorithmic issues involved in the transformation, but once broken down into discrete pieces, figuring out the code or researching how to do it properly are very tangible activities. So the programmers are in a good place to solve the system problems correctly, rather than just trying to endless combine tricks in the hopes that most issues go away.
Another major advantage is that a data perspective on the code allows for easy and natural optimizations. The programmer is no longer combining pieces, which often throw the data through unwanted gyrations. Instead, the data goes directly from point A to point B. It’s a straight line from one format to another. As well, the programmer can widen their scope from just a localized problem all the way up to ‘every use’ of a particular type of data, anywhere in the system. This opens up huge possibilities for macro-optimizations that generally provide huge boosts to the overall performance.
One common difficulty in software development is system upgrades. The code upgrades really easily, you just replace a block you don’t like with a block that you do. Data, however, is a major pain. Upgrades force merges, backfilling and endless restructuring. If you are initially focused on the code then the upgrade problem gets ignored, where it quietly grows and becomes dangerous. Focusing on the data however, brings it front and center. It becomes just another sub-problem of moving the data from one place to another, but this time across versions rather than just inside of the system. It makes tackling a big upgrade problem no worse than any other aspect of the system.
Added to all of this, it is far easier to visualize the data moving about in a system instead of seeing a mountain of poorly organized code. This makes architecture, debugging and testing far simpler as well. For example, a large inventory system with lots of eclectic functionality becomes conceptually simple when viewed as just a way to collect and display very specific items. This twist then leads to ways to combine and organize the existing functionality so that it is easier for the user to wield. Generalizations flow naturally.
Over the years, I’ve seen many a programmer hit the wall with their current Bag O’Tricks approach. Their ability to correctly solve problems is limited, so it is easy for them to get into a position where it becomes convoluted. However, seeing the data first breezes right through these issues. It becomes very quick and convenient to either manipulate the data into the correct form or to determine if such manipulations are even possible (there are many unsolvable problems). Getting back to the earlier analogy, if you don’t have a viable car, you don’t really need to consider which off-ramp would be best.
Often I like to refer to programmers who rely solely on their Bag O’Tricks as having ‘one eye open’. The programmer may be very good at coding, but they’re too constrained by the limits of their existing tricks. If they spend their career staying within those boundaries, there are no problems. But if they want to get out there and build spectacular things that people will love, then they’ve got to get that second eye open as well. Once they’ve done that, they are no longer limited by what they know, just by the available time and their ability to correctly analyze the problem space. A whole new world of possibilities opens up. They just have to learn to change their perspective.
The first approach comes from slowly building up an understanding of coding ‘tricks’. These are simple ways to solve simple problems. Initially, people start with the basic language features: assignments, conditionals, loops, and functions. As they figure them out these go into their Bag O’Tricks. Then they start adding in language library functions, like string handling, files, data-structures etc. Gradually as they learn more, their Bag O’Tricks gets larger and larger. Many people move on to adding higher-level paradigms like design patterns. Most add in specific tricks for different technologies, like databases, networks or frameworks. Over time programmers end up with a fairly large collection of ways to solve lots of sub-problems within different languages and technologies.
When confronted with a new problem, they quickly break it down into sub-problems, continuing until the pieces are small enough to be solved with their existing Bag O’Tricks. If they are curious folk, they generally try to learn more tricks from examples or their fellow programmers. They collect these up and apply them as necessary.
This is a very valid way of programming, but it does have one significant weakness. At any time during their career, a programmer’s Bag O’Tricks contains only a finite number of different solutions. They can arrange them in different ways, but their capabilities are limited by their tricks. That works wonderfully when the problems are the same or substantially similar to ones they have dealt with in the past.
The trouble comes when they encounter a problem that is of a new or different caliber. What happens -- you can see this quite clearly in a lot of code -- is that they start applying their tricks to the sub-problems, but the tricks don’t pack together well enough. These solutions become very Tetris-like, basically odd fitting blocks with many gaps between. Of course, past success clouds present judgment and since the programmers have no reasonable alternatives -- given the ever-present time constraints -- they keep heading down their chosen path. It’s the only path they know. When this goes wrong, the result is a bad mess that is unstable. A problem outside of the scope of a programmer’s tricks is one that they aren’t going to be able to solve satisfactorily. The industry is littered with examples, too numerous to count.
The second approach to programming is to drop the notion that ‘code’ is “the thing”. That is the key, to let go of the idea that creating software is all about assembling lists of instructions for a computer. Yes, there is always ‘code’, but the code itself is only a secondary aspect of a larger issue. The ‘thing’ is what is happening ‘underneath’ the code. The root of everything in the system. The foundation.
Right down at the bottom is data. It is what the users are collecting, what the database is storing and what people are seeing on their screens, reports, everything. All coding problems can be seen in the light that they are just instructions to take data -- stored in one place -- and move it to somewhere else. Along the way, the structure or shape of the data may have to change as part of the move. And on top of the data, there may be a requirement for ‘dynamic data’ that is calculated each time it is used, but this is only to avoid storing that data redundantly. Ultimately it is all about the data.
So the second approach is to forget about the code. It’s just a vehicle for getting data from somewhere, transforming it and then passing it on. The list of instructions is meaningless, the system is all about how data flows from different locations, is manipulated and then flows elsewhere. You can visualize the entire system as just data moving about, going from disks to memory, heading in from the keyboard and heading out to the network, getting dumped to the printers, being entered and endlessly modified by users. You don’t really need to understand the specifics of the algorithms that tweak it as it moves, but rather just its starting and final structure. The system is the data, and that data is like a car, where the code is simply the highway that the car follows to get to specific locations.
This second approach has considerable advantages. The best one is that a programmer seeing their work as just taking data D and getting it to D’ is no longer restricted by their finite Bag O’Tricks. Although they can permute their tricks endlessly, they are still heavily restricted from solving particular problems correctly. But a transformation from what is essentially one data-structure to another is a well-defined problem. There may be some sub-algorithmic issues involved in the transformation, but once broken down into discrete pieces, figuring out the code or researching how to do it properly are very tangible activities. So the programmers are in a good place to solve the system problems correctly, rather than just trying to endless combine tricks in the hopes that most issues go away.
Another major advantage is that a data perspective on the code allows for easy and natural optimizations. The programmer is no longer combining pieces, which often throw the data through unwanted gyrations. Instead, the data goes directly from point A to point B. It’s a straight line from one format to another. As well, the programmer can widen their scope from just a localized problem all the way up to ‘every use’ of a particular type of data, anywhere in the system. This opens up huge possibilities for macro-optimizations that generally provide huge boosts to the overall performance.
One common difficulty in software development is system upgrades. The code upgrades really easily, you just replace a block you don’t like with a block that you do. Data, however, is a major pain. Upgrades force merges, backfilling and endless restructuring. If you are initially focused on the code then the upgrade problem gets ignored, where it quietly grows and becomes dangerous. Focusing on the data however, brings it front and center. It becomes just another sub-problem of moving the data from one place to another, but this time across versions rather than just inside of the system. It makes tackling a big upgrade problem no worse than any other aspect of the system.
Added to all of this, it is far easier to visualize the data moving about in a system instead of seeing a mountain of poorly organized code. This makes architecture, debugging and testing far simpler as well. For example, a large inventory system with lots of eclectic functionality becomes conceptually simple when viewed as just a way to collect and display very specific items. This twist then leads to ways to combine and organize the existing functionality so that it is easier for the user to wield. Generalizations flow naturally.
Over the years, I’ve seen many a programmer hit the wall with their current Bag O’Tricks approach. Their ability to correctly solve problems is limited, so it is easy for them to get into a position where it becomes convoluted. However, seeing the data first breezes right through these issues. It becomes very quick and convenient to either manipulate the data into the correct form or to determine if such manipulations are even possible (there are many unsolvable problems). Getting back to the earlier analogy, if you don’t have a viable car, you don’t really need to consider which off-ramp would be best.
Often I like to refer to programmers who rely solely on their Bag O’Tricks as having ‘one eye open’. The programmer may be very good at coding, but they’re too constrained by the limits of their existing tricks. If they spend their career staying within those boundaries, there are no problems. But if they want to get out there and build spectacular things that people will love, then they’ve got to get that second eye open as well. Once they’ve done that, they are no longer limited by what they know, just by the available time and their ability to correctly analyze the problem space. A whole new world of possibilities opens up. They just have to learn to change their perspective.
Subscribe to:
Comments (Atom)