Simon Brown started an interesting thread of discussion:
http://www.codingthearchitecture.com/2012/04/27/how_do_you_design_software.html
In one reply, Gene Hughson added:
http://genehughson.wordpress.com/2012/05/17/getting-there-from-here/?goback=.gde_1835657_member_116375634
I figured I’d also take a shot at explaining how I usually design systems.
Please
keep in mind that design is a highly creative process, so there are no
right or wrong answers. What works in one case, may not in others. It is
also highly variable based on the scale of the team, the project and
the system. In smaller systems you can get away with cutting corners
that are absolutely fatal in big ones.
I’ll
answer this with respect to greenfield (new) software projects.
Extending an existing system is similar, but considerably more
constrained since you want all of the new pieces to fit in well with the
existing ones.
For
context, I’ve been building systems for over twenty years. In the last
14 years they have all been web apps and have all been destined to be
sold as commercial products. The domains have changed significantly, but
even though they are directed at different problems the underlying
architectures and design goals have been very similar. I like to build
systems that are dynamic. By that I mean that they do not know in
advance what data they will be holding, nor how will be structured.
There is always some static core, but below that, the exact nature of
the data depends what is added to the system as it runs. At the
interface level I usually build in some form of templates or scripting
(DSL), so that the users are enabled to highly customize their
workflows. For data stores, I’ve done OODB, NoSQL and generic schemas. I
prefer NoSQL style solutions, but RDBMSes are useful for smaller
systems. I consider a system successful when a user can essentially
create their own personal sub-system with its own unique schema, quickly
and easily using only the GUI. If their work requires long delays,
programmers or operations involvement then I’ve missed the mark.
The
teams I usually work with are small. I’ve been on big teams, but I find
that smaller ones are generally more effective. Within these teams,
each programmer has their strengths, but I try to get everyone to be as
general as possible. Also, for any specific section of code, I try very
hard to insure that at least two programmers know and can work on it. In
the past, specialist teams with no overlap have had a tendency to
collapse with moral or staffing changes. I prefer not to be subject to
that sort of problem.
For
any design the very first thing I do is identify a problem to solve.
You can’t create an effective solution, if you don’t understand the
problem.
Once
I ‘get’ the problem, I go about deciding on the technologies. As the
materials that make up the solution, the technology choices play a
significant role in determining the system’s architecture. Each one has a
‘grain’ and things go considerably smoother if you don’t go against it.
They also stack, so you should pick a collection of parts that work
well together. Beyond affecting development, the choice often pays a
significant role in sales as well, which is a key aspect when the work
is commercial. Systems built in specific technologies sell easier in
most markets.
For
technologies that I have little or no experience with, I always go off
and write a few prototypes to gain both experience and to understand the
limits. Software capability is usually over-sold, so its worth
confirming that it really works as required, before its too late.
With
the technologies in hand, I then move onto the data. For all systems,
the data is the foundation. You can’t build on what isn’t there, and an
underlying schema works best if it isn’t patchy and inconsistent. The
key thing to know about the data is its structure (schema/model/etc).
But knowing how it gets into the system, its quality, volume and
frequency are all important too. Even if the system is brand spanking
new and cutting edge, chances are that earlier developers have modeled
at least the major aspects of the data, so I find it crucial to do both
research on what is known and analysis on what is out there. Mistakes in
understanding the data are always very painful and expensive to
correct, so I like to spend a little extra effort making sure that I’ve
answered every question that has come up. One of the worst things you
can do in software is to ignore stuff in the hopes that it will get
sorted out later. Later is usually too late.
I
work with two basic models for the data. The first is based on the idea
that there is some ‘universal’ schema out there that correctly
corresponds to the specifics of the data, no matter where it is found or
what it is used for. The second is the subset of this model that is
specific to the application I am building. If I’m utilizing an RBDMS to
hold static elements, I generally try to make the schema in it as
universal as possible. I may skip some entities or attributes, but
certainly the model for any core entity is as close as I can afford to
get it. Keep in mind that I am also constantly generalizing based on my
understanding to get up to a higher level abstract perspective for as
much of the data as I can. Generalizations cost in terms of the amount
of work, the system performance and they slow down the initial stage of
the development, however if they are well chosen they reduce the
overall amount of work and provide a significant boost in development
speed as the project matures.
Now,
it is always the case that over time whatever I build will grow,
getting larger and more complex. This usually means that the model of
the data will grow as well. I can’t predict the future, but I can insure
that the bulk of these future changes will be additions, not
modifications or deletes. Doing so smooths out the upgrade path and
avoids having future development hampered by technical debt that’s grown
too large to be able to pay down.
With
the base data understood, since I am usually designing large systems I
generally move onto the architecture. The individual puzzles that are
solved by the code always need to come together in a coherent fashion at
the system level. Get this wrong and the system descends into
‘meta-spaghetti’, which is usually fatal (it can be fixed, but given
that it is unpleasant work it is usually avoided until its too late).
I
usually visualize the architecture as a very large set of lines and
boxes. The lines separate different chunks of code, forming the basis of
encapsulation and APIs. The boxes are just independent ‘components’
that handle related functionality; sometimes they are off the shelf,
sometimes they need to be specifically written for the project. There
are all sorts of ways of diagramming systems, but I find that laying it
out in this fashion makes it easy to distribute the workload and to
minimize overlap between programmers.
I
always start by drawing the most natural lines first. For example, if
it’s a web-app, there is a client and a server (you have no choice).
Given that I’m keen on getting as much reuse as possible I try to avoid
partitioning the system vertically based on attributes like ‘screens’.
Most screens share a mass amount of common code, so avoiding duplicating
it is usually a significant factor in many of my designs. I want one
big chunk of underlying code that is called by some minimal code for
each screen. Getting that in place usually results in several other sets
of lines for the architecture.
With
web apps, depending on the specific technologies, there can be some
communication between the client and the server. If there is, again I
want just one piece of code to handle it. It’s all communications code,
so it doesn’t need to know about the data it is passing, just that it is
passing it and handling errors. That’s easy to say, but sometimes with
this type of piece, there can be several non-intuitive lines required to
make sure that it really works deterministically.
Another
place where redundancies play havoc is in getting the data to be
persistent. Again, I focus on avoiding redundancies, since they are
nearly as costly as in the screens, but unlike the screens I am way more
tolerant in dividing any unrelated data into verticals. So long as
there is a layer above that brings it all together in a consistent
manner I don’t mind that the specifics are handled differently based on
the underlying type of data. That’s usually a choice based on security
or the distribution of work.
In
all systems there are ‘big’ computations that grind through specific
data. I generally label these as ‘engines’. I set them into the design
as black boxes. What’s inside is less important then how it fits into
the system. The technologies required usually dictate how these are
peices integrated, but encapsulating them means that they can be built
independently of the rest of the system. That is generally a big help
again, when it comes to distributing the work or scheduling the
releases.
Documentation
generally depends on the environment and the size of the team. For a
small, close knit group of very experienced developers, the major lines
sketched on the back of a napkin can often be enough. Well, that and an
ER diagram of the schema and some type of mock up of the screens done by
a real graphic designer. Basically all of the parts that have to fit
together nicely with each other. For some systems, I’ve whacked out the
major pieces first then staffed up to flesh out the full system. That
works well if you know where you are going and have enough time to
articulate the lines in the code.
These
types of techniques limit the overall size of the system or stretch out
the development time, but they tend to set a tighter path towards
reuse.
Sometimes
I’ve done full specs. In those cases I’ll go down to the depth that I
think is safe, but that is dependent on how the work is organized and
who is doing it. So far I’ve always known who was doing the work and
what their skill level was, but if I didn’t I’d likely go all the way
down to the Class level, as in “here are the classes you are going to
build”. It’s better to be too explicit initially, then be unpleasantly
surprised later.
In
most specs I prefer bullet points, tables and the occasional diagram.
Anything but flowing text. Whatever gets the point across with the least
amount of work and is easy to skim. The point of a spec is generally to
get one or two people to accomplish a specific task, so I shy away from
making it large, pretty or all inclusive. It just needs enough relevant
details to create the code and nothing more. Flowing descriptions and
justifications belong in high level documentation and have a very
different audience.
Most
of the architectures I have designed have been significantly abstracted
away from the user requirements. The requirements affect the data and
drive the number and types of engines, but up to this point they haven’t
really entered into the picture. Most of the initial work has been
about the technology and the data. It’s usually around this point that I
try to sit down with a few real users and see what they are doing on a
day-to-day basis. I may have found the problem right away, but now its
time to actually dig into its ugly side. This manifests itself most
often as the generation of lots of screens and some fairly serious
extensions to the data model. If the architecture is holding water,
neither of these are significant problems. I knew they were coming and
now I want them.
From
a user interface perspective I am usually aiming to simplify the
interface as much as possible. It makes the user experience better, but
it also reduces the coding work and testing. Well, not always. Some
simplifications come from building more sophistication into the backend.
The system holds a deeper more complex perspective on what the user is
actually doing, so the user doesn’t have to hold it themselves. Those
types of design issues generally land into the category of just being
more data or engines, so there is usually a place for them to roost,
long before they’ve been articulated.
Pretty
much, if the architecture is doing it’s job, the growth and extensions
to both the code and data are landing in previously defined parts of the
system. To ease coding collisions I generally push building the code
from the back to the front. The schema gets extended first, then the
server, then the front-end. That also avoids creating fancy features
that don’t map to the existing data.
Of
course, there are always issues and problems. Things never really work
according to plan, they take longer than expected and designs are rarely
comprehensive enough to cover all of the extensions. My only rules are
that if the design is wrong, we have to admit it as early as possible
and then fix it properly as soon as possible. The longer you wait, the
worse it gets. But it should be noted that any software development
effort is always part of a much larger context
(organizational/motivation), so often this larger context gets priority
over the design and development issues. It’s these outside influences
that make success so tricky to achieve.
That
just about covers it from a high level. If all is working correctly,
the analysis takes a bit of time up front and the development starts off
slowly (from an interface perspective), but then generally the project
falls into a comfortable steady state where each new extension gets
easier to do than the last one. There are sometimes speed-bumps cause by a
jump in scale or some ugly technical debt. Those have to be dealt with
as early as possible. It’s worth noting too that in the beginning, there
is usually a lot of moaning about time and progress, so it becomes
important to veer away from the ‘right’ way to grab low hanging fruit
(usually demos or throw-away features). But it’s always important
afterwards to redirect the project back onto a more stable, long-term
trajectory. Knowing when to bend, and figuring out how to undo that
later before it becomes a huge problem, is extremely difficult and take
considerable past experience to make viable choices.
Software is a static list of instructions, which we are constantly changing.
Tuesday, May 29, 2012
Tuesday, May 22, 2012
New Layout
Just to keep life interesting, I've changed the template on my blog to Blogger's dynamic template.
One consequence is that I now need to send the full posts in the feed (I didn't before because I wanted people to visit the site so I could track them).
Another consequence is that DISCUS isn't supported, so for a short time I've turned off my comments. Once I figure out how to get DISCUS working again, I restore them.
Given these issues I may change the blog back to a simpler template, but I figure I'll leave it this way for a few days until I decide. Enjoy (it's quite an entertaining template :-)
UPDATE: Seems like DISCUS isn't going to support this template for a while, so I think I'll just open the blog up to comments in Blogger and then import them over later, when DISCUS is ready.
One consequence is that I now need to send the full posts in the feed (I didn't before because I wanted people to visit the site so I could track them).
Another consequence is that DISCUS isn't supported, so for a short time I've turned off my comments. Once I figure out how to get DISCUS working again, I restore them.
Given these issues I may change the blog back to a simpler template, but I figure I'll leave it this way for a few days until I decide. Enjoy (it's quite an entertaining template :-)
UPDATE: Seems like DISCUS isn't going to support this template for a while, so I think I'll just open the blog up to comments in Blogger and then import them over later, when DISCUS is ready.
Tuesday, May 8, 2012
Bag O'Tricks
There are at least two different approaches to computer programing.
The first approach comes from slowly building up an understanding of coding ‘tricks’. These are simple ways to solve simple problems. Initially, people start with the basic language features: assignments, conditionals, loops, and functions. As they figure them out these go into their Bag O’Tricks. Then they start adding in language library functions, like string handling, files, data-structures etc. Gradually as they learn more, their Bag O’Tricks gets larger and larger. Many people move on to adding higher-level paradigms like design patterns. Most add in specific tricks for different technologies, like databases, networks or frameworks. Over time programmers end up with a fairly large collection of ways to solve lots of sub-problems within different languages and technologies.
When confronted with a new problem, they quickly break it down into sub-problems, continuing until the pieces are small enough to be solved with their existing Bag O’Tricks. If they are curious folk, they generally try to learn more tricks from examples or their fellow programmers. They collect these up and apply them as necessary.
This is a very valid way of programming, but it does have one significant weakness. At any time during their career, a programmer’s Bag O’Tricks contains only a finite number of different solutions. They can arrange them in different ways, but their capabilities are limited by their tricks. That works wonderfully when the problems are the same or substantially similar to ones they have dealt with in the past.
The trouble comes when they encounter a problem that is of a new or different caliber. What happens -- you can see this quite clearly in a lot of code -- is that they start applying their tricks to the sub-problems, but the tricks don’t pack together well enough. These solutions become very Tetris-like, basically odd fitting blocks with many gaps between. Of course, past success clouds present judgment and since the programmers have no reasonable alternatives -- given the ever-present time constraints -- they keep heading down their chosen path. It’s the only path they know. When this goes wrong, the result is a bad mess that is unstable. A problem outside of the scope of a programmer’s tricks is one that they aren’t going to be able to solve satisfactorily. The industry is littered with examples, too numerous to count.
The second approach to programming is to drop the notion that ‘code’ is “the thing”. That is the key, to let go of the idea that creating software is all about assembling lists of instructions for a computer. Yes, there is always ‘code’, but the code itself is only a secondary aspect of a larger issue. The ‘thing’ is what is happening ‘underneath’ the code. The root of everything in the system. The foundation.
Right down at the bottom is data. It is what the users are collecting, what the database is storing and what people are seeing on their screens, reports, everything. All coding problems can be seen in the light that they are just instructions to take data -- stored in one place -- and move it to somewhere else. Along the way, the structure or shape of the data may have to change as part of the move. And on top of the data, there may be a requirement for ‘dynamic data’ that is calculated each time it is used, but this is only to avoid storing that data redundantly. Ultimately it is all about the data.
So the second approach is to forget about the code. It’s just a vehicle for getting data from somewhere, transforming it and then passing it on. The list of instructions is meaningless, the system is all about how data flows from different locations, is manipulated and then flows elsewhere. You can visualize the entire system as just data moving about, going from disks to memory, heading in from the keyboard and heading out to the network, getting dumped to the printers, being entered and endlessly modified by users. You don’t really need to understand the specifics of the algorithms that tweak it as it moves, but rather just its starting and final structure. The system is the data, and that data is like a car, where the code is simply the highway that the car follows to get to specific locations.
This second approach has considerable advantages. The best one is that a programmer seeing their work as just taking data D and getting it to D’ is no longer restricted by their finite Bag O’Tricks. Although they can permute their tricks endlessly, they are still heavily restricted from solving particular problems correctly. But a transformation from what is essentially one data-structure to another is a well-defined problem. There may be some sub-algorithmic issues involved in the transformation, but once broken down into discrete pieces, figuring out the code or researching how to do it properly are very tangible activities. So the programmers are in a good place to solve the system problems correctly, rather than just trying to endless combine tricks in the hopes that most issues go away.
Another major advantage is that a data perspective on the code allows for easy and natural optimizations. The programmer is no longer combining pieces, which often throw the data through unwanted gyrations. Instead, the data goes directly from point A to point B. It’s a straight line from one format to another. As well, the programmer can widen their scope from just a localized problem all the way up to ‘every use’ of a particular type of data, anywhere in the system. This opens up huge possibilities for macro-optimizations that generally provide huge boosts to the overall performance.
One common difficulty in software development is system upgrades. The code upgrades really easily, you just replace a block you don’t like with a block that you do. Data, however, is a major pain. Upgrades force merges, backfilling and endless restructuring. If you are initially focused on the code then the upgrade problem gets ignored, where it quietly grows and becomes dangerous. Focusing on the data however, brings it front and center. It becomes just another sub-problem of moving the data from one place to another, but this time across versions rather than just inside of the system. It makes tackling a big upgrade problem no worse than any other aspect of the system.
Added to all of this, it is far easier to visualize the data moving about in a system instead of seeing a mountain of poorly organized code. This makes architecture, debugging and testing far simpler as well. For example, a large inventory system with lots of eclectic functionality becomes conceptually simple when viewed as just a way to collect and display very specific items. This twist then leads to ways to combine and organize the existing functionality so that it is easier for the user to wield. Generalizations flow naturally.
Over the years, I’ve seen many a programmer hit the wall with their current Bag O’Tricks approach. Their ability to correctly solve problems is limited, so it is easy for them to get into a position where it becomes convoluted. However, seeing the data first breezes right through these issues. It becomes very quick and convenient to either manipulate the data into the correct form or to determine if such manipulations are even possible (there are many unsolvable problems). Getting back to the earlier analogy, if you don’t have a viable car, you don’t really need to consider which off-ramp would be best.
Often I like to refer to programmers who rely solely on their Bag O’Tricks as having ‘one eye open’. The programmer may be very good at coding, but they’re too constrained by the limits of their existing tricks. If they spend their career staying within those boundaries, there are no problems. But if they want to get out there and build spectacular things that people will love, then they’ve got to get that second eye open as well. Once they’ve done that, they are no longer limited by what they know, just by the available time and their ability to correctly analyze the problem space. A whole new world of possibilities opens up. They just have to learn to change their perspective.
The first approach comes from slowly building up an understanding of coding ‘tricks’. These are simple ways to solve simple problems. Initially, people start with the basic language features: assignments, conditionals, loops, and functions. As they figure them out these go into their Bag O’Tricks. Then they start adding in language library functions, like string handling, files, data-structures etc. Gradually as they learn more, their Bag O’Tricks gets larger and larger. Many people move on to adding higher-level paradigms like design patterns. Most add in specific tricks for different technologies, like databases, networks or frameworks. Over time programmers end up with a fairly large collection of ways to solve lots of sub-problems within different languages and technologies.
When confronted with a new problem, they quickly break it down into sub-problems, continuing until the pieces are small enough to be solved with their existing Bag O’Tricks. If they are curious folk, they generally try to learn more tricks from examples or their fellow programmers. They collect these up and apply them as necessary.
This is a very valid way of programming, but it does have one significant weakness. At any time during their career, a programmer’s Bag O’Tricks contains only a finite number of different solutions. They can arrange them in different ways, but their capabilities are limited by their tricks. That works wonderfully when the problems are the same or substantially similar to ones they have dealt with in the past.
The trouble comes when they encounter a problem that is of a new or different caliber. What happens -- you can see this quite clearly in a lot of code -- is that they start applying their tricks to the sub-problems, but the tricks don’t pack together well enough. These solutions become very Tetris-like, basically odd fitting blocks with many gaps between. Of course, past success clouds present judgment and since the programmers have no reasonable alternatives -- given the ever-present time constraints -- they keep heading down their chosen path. It’s the only path they know. When this goes wrong, the result is a bad mess that is unstable. A problem outside of the scope of a programmer’s tricks is one that they aren’t going to be able to solve satisfactorily. The industry is littered with examples, too numerous to count.
The second approach to programming is to drop the notion that ‘code’ is “the thing”. That is the key, to let go of the idea that creating software is all about assembling lists of instructions for a computer. Yes, there is always ‘code’, but the code itself is only a secondary aspect of a larger issue. The ‘thing’ is what is happening ‘underneath’ the code. The root of everything in the system. The foundation.
Right down at the bottom is data. It is what the users are collecting, what the database is storing and what people are seeing on their screens, reports, everything. All coding problems can be seen in the light that they are just instructions to take data -- stored in one place -- and move it to somewhere else. Along the way, the structure or shape of the data may have to change as part of the move. And on top of the data, there may be a requirement for ‘dynamic data’ that is calculated each time it is used, but this is only to avoid storing that data redundantly. Ultimately it is all about the data.
So the second approach is to forget about the code. It’s just a vehicle for getting data from somewhere, transforming it and then passing it on. The list of instructions is meaningless, the system is all about how data flows from different locations, is manipulated and then flows elsewhere. You can visualize the entire system as just data moving about, going from disks to memory, heading in from the keyboard and heading out to the network, getting dumped to the printers, being entered and endlessly modified by users. You don’t really need to understand the specifics of the algorithms that tweak it as it moves, but rather just its starting and final structure. The system is the data, and that data is like a car, where the code is simply the highway that the car follows to get to specific locations.
This second approach has considerable advantages. The best one is that a programmer seeing their work as just taking data D and getting it to D’ is no longer restricted by their finite Bag O’Tricks. Although they can permute their tricks endlessly, they are still heavily restricted from solving particular problems correctly. But a transformation from what is essentially one data-structure to another is a well-defined problem. There may be some sub-algorithmic issues involved in the transformation, but once broken down into discrete pieces, figuring out the code or researching how to do it properly are very tangible activities. So the programmers are in a good place to solve the system problems correctly, rather than just trying to endless combine tricks in the hopes that most issues go away.
Another major advantage is that a data perspective on the code allows for easy and natural optimizations. The programmer is no longer combining pieces, which often throw the data through unwanted gyrations. Instead, the data goes directly from point A to point B. It’s a straight line from one format to another. As well, the programmer can widen their scope from just a localized problem all the way up to ‘every use’ of a particular type of data, anywhere in the system. This opens up huge possibilities for macro-optimizations that generally provide huge boosts to the overall performance.
One common difficulty in software development is system upgrades. The code upgrades really easily, you just replace a block you don’t like with a block that you do. Data, however, is a major pain. Upgrades force merges, backfilling and endless restructuring. If you are initially focused on the code then the upgrade problem gets ignored, where it quietly grows and becomes dangerous. Focusing on the data however, brings it front and center. It becomes just another sub-problem of moving the data from one place to another, but this time across versions rather than just inside of the system. It makes tackling a big upgrade problem no worse than any other aspect of the system.
Added to all of this, it is far easier to visualize the data moving about in a system instead of seeing a mountain of poorly organized code. This makes architecture, debugging and testing far simpler as well. For example, a large inventory system with lots of eclectic functionality becomes conceptually simple when viewed as just a way to collect and display very specific items. This twist then leads to ways to combine and organize the existing functionality so that it is easier for the user to wield. Generalizations flow naturally.
Over the years, I’ve seen many a programmer hit the wall with their current Bag O’Tricks approach. Their ability to correctly solve problems is limited, so it is easy for them to get into a position where it becomes convoluted. However, seeing the data first breezes right through these issues. It becomes very quick and convenient to either manipulate the data into the correct form or to determine if such manipulations are even possible (there are many unsolvable problems). Getting back to the earlier analogy, if you don’t have a viable car, you don’t really need to consider which off-ramp would be best.
Often I like to refer to programmers who rely solely on their Bag O’Tricks as having ‘one eye open’. The programmer may be very good at coding, but they’re too constrained by the limits of their existing tricks. If they spend their career staying within those boundaries, there are no problems. But if they want to get out there and build spectacular things that people will love, then they’ve got to get that second eye open as well. Once they’ve done that, they are no longer limited by what they know, just by the available time and their ability to correctly analyze the problem space. A whole new world of possibilities opens up. They just have to learn to change their perspective.
Subscribe to:
Posts (Atom)