One of the most interesting aspects of software development is how reasonable sounding ideas can go so horribly wrong in practice.
One of my favorite examples is the model-view-controller (MVC) pattern. At its heart, it is a simple idea: separate the data model from its presentation aspects. Redundancy in coding often comes from taking the same underlying data and dressing it up in many different ways. The brute force approach introduces a never-ending series of inconsistencies when the model code and the presentation code are done differently, again and again, for each slightly different viewpoint at the higher level. By separating them, the programmers need only to build the model once and then leverage that work for each slightly different presentation, achieving built-in consistency in the process.
But in practice, the MVC pattern morphed into the sub-structure of an incomplete framework. Although it only solves a tiny fraction of the necessary architecture of the system, people begin using it as if covers the entire architecture. These days its main use is to bind a static system navigation to a large series of static screens through a rather convoluted declarative approach in static configuration files written in a different (and highly restricted) computer language. That’s a long long way from its origins.
To make this clearer, I’ll give an example using the dreaded construction analogy. It helps to give this a more concrete perspective. We’ll consider a simple prototypical project to build a multi-floor apartment building for tenants ...
The project starts as the desire for an organization to build a new apartment building. The land is purchased and cleared, the builders hired and a number of possible tenants have signed on board to varying degrees.
The first choice made is to get some global design for the building. So the lead builders decide to use the Acme Elevator Shaft framework. It’s easy to set up, and is highly versatile in allowing the elevator to stop at any height for the floors. You just cut in a new door in the shaft, and set some markers. It also allows for builders to literally hang their floors off the shaft. Everyone agrees that this feature would be very helpful.
As usual, feedback from possible tenants has been encouraging, but highly contradictory as each of them has their own unique wish-list and dreams. Some people want a grand establishment, while others are more focused on the nit picky details. Management sets up a list of priorities, and insists that some tenants will get to move in before the entire building is completed. They hire some outside consultants who insist that not only is this possible, but the building can be constructed in a piecewise manner, which they say will insure a better design.
There are 5 builders and amongst themselves they decide that each builder should work on a specific floor. Many feel that the upper floors are the most interesting, so they they partition out the work randomly. Each builder will take a couple of floors (but not in sequence) and start working on them right away. One builder insists on wanting to do the basement first, but management has already decided that the 5th floor needs to be done right away because the tenant on that floor is louder and richer than the others.
The work starts well. A big hole is dug, a small pad of concrete laid, and the Acme Elevator Shaft is installed so that the other builders can immediately get to work on their parts without having to wait for the basement to be completed.
Because no one has decided how many floors will exist, the elevator shaft is set to allow for a possible 31 floors. Management has specified that the building should have somewhere between 15 to 20 floors. Initially they wanted to fix the number, but the main builder and the consultants successfully argued for some flexibility here. They feel that the creativity of the building team should not be restricted by selecting an arbitrary number of floors and that the project should be allowed some room to grow. Management was also trying to fix the number of apartments, they need at least 45 of them to make a profit, but again they were beaten back in the name of reason.
The lead builder doing the 5th floor decides that the proportions of his floor should be 441.2x867 ft, and contain the standard 5 units with the elevator located in the middle. He’s worked on several buildings in the past, and that was always the sized they used. Just to spice it up, though he decides to have 9 ft ceilings. It’s a nice touch, he is proud of his design. He cuts a hole in the elevator shaft
The junior-most builder is assigned the 8th floor. In school they taught 200x300 ft, with only 3 units so he sets out in that direction. His Prof. stated that nobody would ever need more than 6.5 feet. He also makes the assumption that he doesn’t need to add a ceiling, since that will come from the floor above him. He is proud of his cost savings measures.
The basement builder gets to work early and lays an intricate series of parking garage floors, and basement storage. He considers carefully how building maintenance will need space in the winter to store a snow-blower and in the summer to store a tractor for cutting the lawn. There is even a nice little industrial strength, mini-elevator shaft to move things around his floors. He starts on the left side, being very diligent as he goes.
Several other floors start in progression at the same time, a female builder goes after the third floor, and another older developer starts work up on 13. Work progresses.
Management is ecstatic, since there are clear signs of progress and the building team has divvied up the work successfully. The tenants hold a celebratory party, complaining about the sins of their last apartments and heaping praise on the builders for their intelligence and skill level. Morale is high.
The work continues, but one astute tenant notices that there is no work being done on the first floor even though they are scheduled to move in soon; in just 3 months. Management quickly hires two new builders. One is assigned to the 6 floor and the other starts work on floors 10, 18, and 21. The second new builder comes up with the bright idea that the elevator should should be offset to the left (from an entrance perspective), and all 6 units would be in a half-circle around it. He read it in a magazine somewhere and always wanted to try that design.
Meanwhile the junior builder has become concerned about that fact that the 8th floor has no ceiling. He tries bring it up, but the older builders just tell him that it works this way. It will all come together in the end, they say.
As the move-in date is fast approaching, the lead builder quickly fashions a 1st floor lobby, and cuts enough of a hole that people can start accessing the elevator to get to his almost completed 5th floor. Management is elated, since even more progress is made. Everyone volunteers for overtime, and more floors are started.
It isn’t long after this milestone that things take a turn for the worst. Since no-one considered plumbing and electricity, several different builders have gone off on their own and tried stick in some quick infrastructure. One builder chooses to run her pipes and wires up the left side, while another one is trying to connect things from a top-down approach on the right. There are a few tenants moved in already, most were desperate for a place to live, or their investment caused problems with their cash-flow.
Besides the noise of construction, and the fact that electricity and water are intermittent, tenants on the 7th floor have noticed that there is an ever present nasty smell. The builders deny it, but then someone notices that the trash chutes (all 8 of them) for the upper floors seem to be terminating on the 8th floor.
A quick meeting is held, and all of the builders decide that the problem isn’t immediate since there is enough space on the 8th floor to accommodate the growing mounds of trash, nobody is living there right now, it would be too much work to remove the garbage and connect all of the trash chutes together. It is ignored for the time being. Management hails this choice as a cost-savings measure.
More trouble erupts as some of the newer tenants want to repaint the garish colors chosen at random by various builders, only to discover that the paint used was specially designed to not allow primers, paint or even wallpaper to stick over it. The builders argue that they have excellent visual sense and that everybody should love their choices of industrial strength day-glow reds, greens and pinks. They just don’t see the need to repaint, it is a wasted effort. Little progress is made on this issue.
This causes a mutiny, as some of the less tied tenants opt out of the project and head for greener pastures (literally, they ultimately choose camping in tents as their new residences). The remaining tenants, less than 20% of the expected occupancy, are either too desperate for a place to live or too financially bound to the project to be able to flee with dignity. Most live on the lower floors since there is a rumor that the upper ones are going to collapse soon, and still don’t have water or electricity. Several of the builders feel very proud that people are living in their works.
Time wears on and the everyone notices that the floors of one of the newer builders, the guy who wanted everything off-center, have started to shift the entire building to one side. This is causing a problem with the Ache Elevator Shaft, it is slowly bending to the right, which is making the elevators stick when going from floor to floor. Another quick meeting is held, and everyone decides that if they pile up enough scaffolding, they can offset the unbalanced weight. A few steel i-beams are added to help shore up the weight. A neighboring building complains because one of the new i-beans is stuck right through their lobby. Management dismisses their complains, noting that their building is hideous and should be rebuilt.
At this point, several builders, the consultants and a few managers leave the project, including the basement builder. His works are finished on the left side, but not on the right. The left has a nice set of storage and parking garages but unfortunately the ramps to get cars up and down were going to be on the right side. So instead of real cars, the basement is filled with intricate mock-ups of cars, that if ever the ramps did get completed, would prove the validity of the design.
The tenants are rightly pissed, but the remaining builders insist that it is actually their fault for allowing the scope to creep and for constantly changing the requirements. The tenants are to blame (and management too for not reining them in). After all, if they had wanted all the floors to be the exact same size and height, they should have just said so. A few little gaps between the walls and ceiling isn’t a big deal: it just helps to ensure good airflow, and it is far easier to know what the weather is outside if some of it is inside as well.
The consultants that fled spread the word about how successful the project was. They hold it up as an iconic example of how their techniques help to ensure the success of any project. They point to the one floor that two builders did together, even though the left and right halves were different dimension and the whole thing slopes by 22 degrees. They point to the tape measures that they handed out to every builder and made them spend lots of time using. They say that this constant measuring ensures the high quality of each floor, it was exactly the unique size that each developer wanted.
The first round of builders that fled also talk about the successes with great enthusiasm. Stating how 441.2x867 ft and 7.2 ft ceiling should be the accepted standard for all future floors in all future buildings. They admit to a few failings, but blame them squarely on the tenants, whom they say, didn’t really help in building anything. They also point out that management should have done a much better job at making sure that everybody followed the 441.2x867 ft standard, since it is so well-known in the industry.
The surviving management and builders slump into depression, choosing to do almost nothing as the growing list of problems overwhelms the tenants. They know if they wait long enough, if enough problems develop, that sooner or later the momentum will just build up again and they can start from scratch. Why fix it, if it should be torn down before it is even completed?
The builders blame management and the users. They feel that they did an exemplary job, given the circumstances, so it wasn’t their fault at all. Most of them would have left in the earlier wave, but somewhere along the way they lost their confidence and their desire to continue building things. It is just easier to sit and wait it out.
The tenants wonder what sort of horrible acts they committed in the past to get this much bad karma. All they wanted was a nice and cosy, well-built place to live and now everybody is saying it is their fault for doing “something”, but that “something” is always too vague to really make sense of. They feel cheated, so they walk around often talking about how much they hate buildings, apartment and most of all construction. They’re surprised and disappointed when the builders show them so little empathy, after all who else knows any better how much misery they’ve been put through.
Eventually management sucks up the courage to do it all over again. Although this particular project was a complete and utter failure -- yes it was built, but never even came close to making a profit due to the lack of tenants -- they somehow see the positive side and decide to learn from their mistakes. “This time …” they tell themselves, “this time we won’t use the Acme Elevator Shaft. That was the problem. This time we’ll do it right and use the Mega-co Elevator Shaft, that will fix everything.” Eventually the cycle starts all over again.
Sadly, if you replace the construction terms with technical ones, this little example is what a typical software development project normally looks like. Our epic failure rate, and the commonness of this type of project speak for themselves. Our industry's landscape is littered with countless fugly disasters that are barely standing and many more that are just piles of ruble. In spite of this, you constantly see the insistence on how Acme Elevator Shafts (a.k.a. the MVC pattern) makes great frameworks, and that we shouldn’t re-invent the wheel. While there is nothing wrong with an elevator shaft, but when it is used so badly it just magnifies the problems. In building an apartment, the elevator shaft, while crucial is just a small part of a much larger design. Trying to avoid that design work, while also failing to work consistently as a team -- not just a bunch of cranky individuals -- results in the systematic breakdown of the project. The freedoms that the shaft allows should never have been used at an individual level, they were meant for the project as a whole. In truth, any group of people working collectively that can’t all get on the same page at the same time is bound for failure. Software development just has the bad attribute that this dis-organization and disagreement can be hidden for a long time, making things look far better than they are in reality.
Software is a static list of instructions, which we are constantly changing.
▼
Sunday, November 21, 2010
Friday, November 19, 2010
The Myth of Code Reuse
Perhaps the most maligned of all programming techniques is code reuse. The idea is simple: if you’ve written some code, tested it, and then managed to get it battle-tested in real life scenarios, it is far more efficient to spend a little time to generalize that work and then re-apply it for other uses. It leverages your initial effort. It would be even more effective if you did this little extra in advance, knowing that it would get re-used. Why do work over and over again, multiple times, if you’ve already done it once.
There must be at least a trillion of lines of code already out there. We’re writing it faster than ever and there are more people whacking away at their keyboards than ever before (excluding the notorious dot-com bomb blip). Huge effort goes into software development, and yet most of these systems being worked on are either small enhancements of older systems, or just shallow re-writes. Most of the code being written now has been written many times before. Most of the work has already been done once.
Still, with all of that effort being applied and with so many accepted principles being floated around like DRY and “don’t re-invent the wheel”, most of the code in most of the systems I’ve seen -- again and again -- is highly redundant. It’s not uncommon to see a medium sized system with fifty to one hundred screens where each screen although visually similar, has its own unique nearly independent code base. And all of the database access code is equally redundant. In fact, very little is actually shared at all, even though most of it does basically the same thing as an existing piece in the same way. It is normal to see any sizable system with massive amounts of redundant code.
So what is going on? Technically, there is no reason why there should be so much repetition. The computers don’t need it, or even care. The tools exist to avoid it. There is also a lot of good advice suggesting that we don’t repeat our code over and over again. In fact the whole premise behind the Object Orient movement was to encapsulate good coding techniques right into the language to better facilitate better code. Objects weren’t just some arbitrary abstraction, they were intended to push the programmers closer towards encapsulating the related bits of their code with their data.
A key reason programmers give for massive duplication is that they don’t want to waste time over-generalizing the problem. Their fear is that it will become a time sink, crippling the effort while they try to knock the bugs out of very complicated pieces.
But instead, they choose to code as fast as possible with little intelligence. This inherent frailty forces them to constantly revisit each old section, as any new code has quickly made that part of the system inconsistent. With each change, this generates a new slew of bugs. Fast redundant growth manifests itself in bugs and bad interfaces.
I’ve heard of big development efforts with thousands upon thousands of bugs currently being tracked. This type technical debt explosion doesn’t happen with a small tightly wound code base. It is caused by not properly encapsulating the solutions, so that instead of getting a piece done and setting it aside, the developers are constantly forced to re-visit everything. The inevitable inconsistencies driven by rampant duplication force the problems to pop back, over and over again, onto the bug list. Bug fixing grows exponentially, killing off development.
If you put some effort into getting a piece of code right, then it should stay that way if it was properly encapsulated. It only changes if you’re extending that part of the system, otherwise you should be able to trust that it is there and it works. Big bug lists are symptomatic of massive duplication and poor encapsulation.
Another reason that programmers give for avoiding reuse is because they don’t want the code to be too complicated. If it is stupidly simple, it is easier to work on. That’s kind of a funny issue, because most coders are fascinated by complexity and there is a strong culture of trying to not make the work of programming too rote. Many feel it should be unique and creative. Ironically, they struggle to get the freedom to think deeply about problems, only then in practice to just pound out the most thoughtless thing that barely works.
A brute force coding style is a rather mindless activity once you get passed the initial learning stage. You don’t have to think about it, since you’re just splatting out the same basic patterns with the same basic tricks, again and again. Good for schedules, bad for technical debt.
But if you factor in the growing mountain of debt, you quickly realize that any rapid application development that happens will eventually be swallowed by a sea of misery in trying to harmonize all of the little inconsistencies in the reams of redundancy. And, since the pressure to grow will only get worst as the work progresses, the project is destined to crash into a wall, or get canned, or just get re-written again for the third or forth time. Rapid progress is only good if it isn’t also generating rapid technical debt. And usually it is.
There are lots of other excuses programmers give, but I actually think the real underlying reason reuse fails in practice is simple. Programmers don’t like reading other programmer’s code. Few that I know will just drop into some other code base and try to make sense of it. Most would rather struggle on with incomplete documentation or by random experimentation, then just pull up the source and read how it works. Code is viewed as mysterious, impenetrable.
It’s pretty funny because the ability to read the code was the foundation of the OpenSource movement. Richard Stallman wasn’t trying to de-value the price of software, rather he was just frustrated by not knowing what was going on below. The Open in the name references the ability of any dependant programmers to get access to the source code, not to be able to use the stuff for free. But like all great ideas in software that started out OK, once it hit the masses its simple concept became badly diluted and derailed.
It is made worse because most programmers are essentially self-taught. Schools don’t know how to teach programmers and very few coders have ever had a mentor that they respected long enough to be able to learn from. Because of that, most programmers tend towards an eclectic programming style. They get some of the basics down and then they close their minds, thinking that what little they know is all they should know.
Spending all day in black and white logic has a bad tendency to make people try to apply that rigour to the rest of the world around them even when they know it doesn’t fit. It’s an ever present programmer’s disease. You can really see it if you get a programmer talking about something very gray like politics. You’ll often find they have a very closed, hard, over-simplified view that they insist is the absolute truth, and that they can’t understand why everyone just can’t see it as such. A closed mind, with limited, but highly sticky information and far too much faith that it is both correct and complete is a common trait in the industry.
And it seems that it is exactly this eclectic set of beliefs that not only fuels the silly arguments on the web, but also keeps many programmers from actually learning from other’s work. Programmers have a startling lack of desire to learn more programming techniques. There are what, over a million programmers out there working professionally, but the markets for books and blogs is tiny. There are all sorts of blogs that show how to use a few library calls, or gossip about the big companies, but there are few that dare to try and teach good coding style. There is an entire massive industry of consultants dedicated towards finding “fun” things to do while programming, without any concern at all for the “right” things to do. It has become an industry of poorly trained programmers who are happy to be that way.
If you go back a few decades there were far more people interested in what makes programming tick and how to build reliable systems. Donald Knuth’s landmark epic about all of the data-structure techniques wouldn’t even raise an eyebrow today. We share little fragments, perhaps clever tricks but we’re no farther along these days in understanding what really helps in building complex things, and with complexity rising fast, more and more young programmers know less and less about what really happens underneath their instructions. There is a fundamental disconnect. The software crisis that started fifty years ago has turned into a full blown pandemic.
At the root of this the real problem remains. Programmers should read code. Lots of it. They should be actively seeking to improve their code reading skills by examining what others have managed to do. They should scan existing code bases, and spend some time thinking about what works and what doesn’t. What is readable and what is horrible. They shouldn’t need a tour guide to examine an algorithm, nor should they rely on some hastily written summary of the internals. Instead, the only real way to learn, the only real documentation, is the code. It speaks for itself. For all it has to offer and for all of its flaws.
The only way to become a better programmer is to learn from the experience of others, and the only real way to do that is be able to read their code. Once a programmer gets over this hurdle, it is just a small sprint to being able to share code with others. There is no need for documentation or PowerPoint slides. There is no need for huge essays or out-of-date reference material. The more code you read, the more styles you see, the more abstractions you encounter, the easier it gets. You don’t have to spend months re-inventing the wheel, or countless hours just hacking at something. You have the ability, if you have access to the source, to work through the issues yourself. It is a hugely powerful skill-set. And if you work on a team of programmers who all can read each other’s code, then they can share their knowledge, experience and most of all the code itself.
Reuse is possible. I have seen it may times, always employed successfully. It not only cuts down on testing, but it also enforces a necessary consistency at the higher levels that make the tools more usable. Programmers find all sorts of reasons why they can’t do it, even though it is clearly one of the most important skills that they can learn. Not only that, but it also appears as if it is being done far less often by each new generation of programmers. Most programmers are hoping that the rising tide of external code, in the form of libraries and frameworks, takes care of the issue for them. However most of these bits of code only focus on the smaller, naturally tighter, and usually more fun to code technical issues. Reusing technical code is good, but the real value comes from doing it at the domain and application level. After all, there is far more duplication in the application screens and the database access then there is in the small part of the program that does something specialized like reading JPEG images. The best place to apply reuse is the one that sees it the least often.
There must be at least a trillion of lines of code already out there. We’re writing it faster than ever and there are more people whacking away at their keyboards than ever before (excluding the notorious dot-com bomb blip). Huge effort goes into software development, and yet most of these systems being worked on are either small enhancements of older systems, or just shallow re-writes. Most of the code being written now has been written many times before. Most of the work has already been done once.
Still, with all of that effort being applied and with so many accepted principles being floated around like DRY and “don’t re-invent the wheel”, most of the code in most of the systems I’ve seen -- again and again -- is highly redundant. It’s not uncommon to see a medium sized system with fifty to one hundred screens where each screen although visually similar, has its own unique nearly independent code base. And all of the database access code is equally redundant. In fact, very little is actually shared at all, even though most of it does basically the same thing as an existing piece in the same way. It is normal to see any sizable system with massive amounts of redundant code.
So what is going on? Technically, there is no reason why there should be so much repetition. The computers don’t need it, or even care. The tools exist to avoid it. There is also a lot of good advice suggesting that we don’t repeat our code over and over again. In fact the whole premise behind the Object Orient movement was to encapsulate good coding techniques right into the language to better facilitate better code. Objects weren’t just some arbitrary abstraction, they were intended to push the programmers closer towards encapsulating the related bits of their code with their data.
A key reason programmers give for massive duplication is that they don’t want to waste time over-generalizing the problem. Their fear is that it will become a time sink, crippling the effort while they try to knock the bugs out of very complicated pieces.
But instead, they choose to code as fast as possible with little intelligence. This inherent frailty forces them to constantly revisit each old section, as any new code has quickly made that part of the system inconsistent. With each change, this generates a new slew of bugs. Fast redundant growth manifests itself in bugs and bad interfaces.
I’ve heard of big development efforts with thousands upon thousands of bugs currently being tracked. This type technical debt explosion doesn’t happen with a small tightly wound code base. It is caused by not properly encapsulating the solutions, so that instead of getting a piece done and setting it aside, the developers are constantly forced to re-visit everything. The inevitable inconsistencies driven by rampant duplication force the problems to pop back, over and over again, onto the bug list. Bug fixing grows exponentially, killing off development.
If you put some effort into getting a piece of code right, then it should stay that way if it was properly encapsulated. It only changes if you’re extending that part of the system, otherwise you should be able to trust that it is there and it works. Big bug lists are symptomatic of massive duplication and poor encapsulation.
Another reason that programmers give for avoiding reuse is because they don’t want the code to be too complicated. If it is stupidly simple, it is easier to work on. That’s kind of a funny issue, because most coders are fascinated by complexity and there is a strong culture of trying to not make the work of programming too rote. Many feel it should be unique and creative. Ironically, they struggle to get the freedom to think deeply about problems, only then in practice to just pound out the most thoughtless thing that barely works.
A brute force coding style is a rather mindless activity once you get passed the initial learning stage. You don’t have to think about it, since you’re just splatting out the same basic patterns with the same basic tricks, again and again. Good for schedules, bad for technical debt.
But if you factor in the growing mountain of debt, you quickly realize that any rapid application development that happens will eventually be swallowed by a sea of misery in trying to harmonize all of the little inconsistencies in the reams of redundancy. And, since the pressure to grow will only get worst as the work progresses, the project is destined to crash into a wall, or get canned, or just get re-written again for the third or forth time. Rapid progress is only good if it isn’t also generating rapid technical debt. And usually it is.
There are lots of other excuses programmers give, but I actually think the real underlying reason reuse fails in practice is simple. Programmers don’t like reading other programmer’s code. Few that I know will just drop into some other code base and try to make sense of it. Most would rather struggle on with incomplete documentation or by random experimentation, then just pull up the source and read how it works. Code is viewed as mysterious, impenetrable.
It’s pretty funny because the ability to read the code was the foundation of the OpenSource movement. Richard Stallman wasn’t trying to de-value the price of software, rather he was just frustrated by not knowing what was going on below. The Open in the name references the ability of any dependant programmers to get access to the source code, not to be able to use the stuff for free. But like all great ideas in software that started out OK, once it hit the masses its simple concept became badly diluted and derailed.
It is made worse because most programmers are essentially self-taught. Schools don’t know how to teach programmers and very few coders have ever had a mentor that they respected long enough to be able to learn from. Because of that, most programmers tend towards an eclectic programming style. They get some of the basics down and then they close their minds, thinking that what little they know is all they should know.
Spending all day in black and white logic has a bad tendency to make people try to apply that rigour to the rest of the world around them even when they know it doesn’t fit. It’s an ever present programmer’s disease. You can really see it if you get a programmer talking about something very gray like politics. You’ll often find they have a very closed, hard, over-simplified view that they insist is the absolute truth, and that they can’t understand why everyone just can’t see it as such. A closed mind, with limited, but highly sticky information and far too much faith that it is both correct and complete is a common trait in the industry.
And it seems that it is exactly this eclectic set of beliefs that not only fuels the silly arguments on the web, but also keeps many programmers from actually learning from other’s work. Programmers have a startling lack of desire to learn more programming techniques. There are what, over a million programmers out there working professionally, but the markets for books and blogs is tiny. There are all sorts of blogs that show how to use a few library calls, or gossip about the big companies, but there are few that dare to try and teach good coding style. There is an entire massive industry of consultants dedicated towards finding “fun” things to do while programming, without any concern at all for the “right” things to do. It has become an industry of poorly trained programmers who are happy to be that way.
If you go back a few decades there were far more people interested in what makes programming tick and how to build reliable systems. Donald Knuth’s landmark epic about all of the data-structure techniques wouldn’t even raise an eyebrow today. We share little fragments, perhaps clever tricks but we’re no farther along these days in understanding what really helps in building complex things, and with complexity rising fast, more and more young programmers know less and less about what really happens underneath their instructions. There is a fundamental disconnect. The software crisis that started fifty years ago has turned into a full blown pandemic.
At the root of this the real problem remains. Programmers should read code. Lots of it. They should be actively seeking to improve their code reading skills by examining what others have managed to do. They should scan existing code bases, and spend some time thinking about what works and what doesn’t. What is readable and what is horrible. They shouldn’t need a tour guide to examine an algorithm, nor should they rely on some hastily written summary of the internals. Instead, the only real way to learn, the only real documentation, is the code. It speaks for itself. For all it has to offer and for all of its flaws.
The only way to become a better programmer is to learn from the experience of others, and the only real way to do that is be able to read their code. Once a programmer gets over this hurdle, it is just a small sprint to being able to share code with others. There is no need for documentation or PowerPoint slides. There is no need for huge essays or out-of-date reference material. The more code you read, the more styles you see, the more abstractions you encounter, the easier it gets. You don’t have to spend months re-inventing the wheel, or countless hours just hacking at something. You have the ability, if you have access to the source, to work through the issues yourself. It is a hugely powerful skill-set. And if you work on a team of programmers who all can read each other’s code, then they can share their knowledge, experience and most of all the code itself.
Reuse is possible. I have seen it may times, always employed successfully. It not only cuts down on testing, but it also enforces a necessary consistency at the higher levels that make the tools more usable. Programmers find all sorts of reasons why they can’t do it, even though it is clearly one of the most important skills that they can learn. Not only that, but it also appears as if it is being done far less often by each new generation of programmers. Most programmers are hoping that the rising tide of external code, in the form of libraries and frameworks, takes care of the issue for them. However most of these bits of code only focus on the smaller, naturally tighter, and usually more fun to code technical issues. Reusing technical code is good, but the real value comes from doing it at the domain and application level. After all, there is far more duplication in the application screens and the database access then there is in the small part of the program that does something specialized like reading JPEG images. The best place to apply reuse is the one that sees it the least often.
Tuesday, November 9, 2010
Reducing Test Effort
Last week a reader, Criador Profundo, asked a great question on my last post Code Validation:
“Could you comment on how tests and similar practices fit in with the points you have raised?”
I meant to reply earlier, but every time I started to organize my thoughts on this issue I found myself headed down some closely-related side issue. It's an easy area to get side-tracked in. Testings is the point in software development where all of the development effort comes together (or not), so pretty much everything you do from design, to coding, to packaging affects it in some way.
Testing is an endless time sink; you can never do enough of it, never get it perfected. I talked about tactics for utilizing limited resources in an older post Testing for Battleships (and probably a lot of the others posts as well). I’ll skip over those issues in this one.
The real question here is what can be done at the coding level to mitigate as much of the work of testing as possible. An ideal system always requires some testing before a release, but hopefully not a full battery of exhaustive tests that takes weeks or months. When testing eats up too many resources, progress rapidly slows down. It’s a vicious cycle that has derailed many projects originally headed out in the right direction.
Ultimately, we’d like for previously tested sections of the system to be skipped if a release isn’t going to effect them. A high-level ‘architecture’ is the only way to accomplish this. If the system is broken up into properly encapsulated self-contained pieces, then changes to one piece won’t have an effect on the other pieces. This is a desirable quality.
Software architecture is often misunderstood or maligned these days, but it is an absolute necessity for an efficient iterative based development. Architectures don’t happen by accident. They come from experience and a deliberate long-term design. They often require care when extending the system beyond its initial specifications. You can build quickly without them, but you’ll rapidly hit a wall.
Encapsulation at the high level is the key behind an architecture, but it also extends all of the way down at every level in the code. Each part of the system shouldn’t expose its internal details. It’s not just a good coding practice, it’s also essential in being able to know the scope of any changes or odd behaviors. Lack of encapsulation is spaghetti.
Redundancies -- of any kind: code, properties, overloaded variables, scripts, etc. -- are also huge problems. Maybe not right away, but as the code grows they quickly start to rust. Catching this type of rusting pushes up the need for more testing. Growth slows to a crawl as it gets quickly replaced by testing and patching. Not being redundant is both less work, and less technical debt. But its easier said than done.
To avoid redundant code, once you have something that is battle-tested it makes little sense to start from scratch again. That’s why leveraging any and all code as much as possible is important. Not only that, but a generalized piece of code used for fifty screens is way less work to test, then fifty independently-coded screens. Every time something is shared, it may become a little more complex, but any work invested in testing is multiplied. Finding a bug in one screen is actually finding it in all fifty. That is a significant contribution.
The key to generalizing some code and avoiding redundancies in a usable manner is abstraction. I’ve talked about this often, primarily because it is the strongest technique I know to keep large-scale development moving forward. Done well, not only does development not slow down as the project progresses, it actually increases it. It’s far easier to re-use some existing code, if it is well-written, then it is to start again from scratch.
Sadly, abstraction and code re-use are controversial issues in software development because programmers don’t like having to look through existing code bases to find the right pieces to use, they fear sinking too much time into building up the mechanics, and because it is just easier to splat out the same redundant code over and over again without thinking. Coding style clashes are another common problem. Still, for any large scale, industrial strength project it isn’t optional. It’s the only way to avoid an exponential growth in the amount of work required as the development progresses. It may require more work initially, but the payoff is huge and it is necessary to keep the momentum of the development going. People often misuse the phrase “keep it simple, stupid” to mean writing very specific, but also very redundant code, however a single generalized solution used repeatedly is far less complex than a multiple of inconsistent implementations. It’s the overall complexity that matters, not the unit complexity.
Along with avoiding redundancies comes tightening down the scope of everything (variables, methods, config params, etc.) wherever possible. Technically it is part of encapsulation, but so many programmers allow the scope of their code or data to be visible at a higher level, even if they’ve tried to encapsulate it. Some misdirected Object Orient practices manage to just replace the dreaded global variable, with a bunch of global Objects. They’re both global, so they are both the same problem. Global anything means that you can’t gauge the impact of any changes, which means re-testing everything, whether it is necessary or not. If you don’t know the impact of a change, you’ve got a lot more work to do.
Another issue that causes testing nightmares is state. Ultimately, the best code is stateless. That is, it doesn’t reference anything that is not directly passed into it, and its behavior cannot change if the input doesn’t change. This is important because bugs can be found accidentally as well as on purpose, but if the test case to reproduce the bug is too complex to reproduce, it will likely be ignored (or assumed to have magically disappeared) by the testers. It’s not uncommon for instance to see well-tested Java programs still having rare, but strange threading problems. If they don’t occur consistently, they either don’t get reported or they are summarily dismissed.
There are plenty of small coding techniques as well. Consistency and self-discipline are great for reducing the impact of both bugs and extending the system. Proper use of functions (not to big, not to small, and not to pedantic) makes it easier to test, extend and refactor. Making errors obvious in development, but hiding them in the final system helps. Limiting comments to “why” and trying to avoiding syntactic noise are important. Trying to be smart, instead of clever, helps as well. If it’s not obvious after a few months of not looking at it, then it’s not readable and thus a potential problem.
Ultimately once the work has been completed and tested a bit, it should be set aside and ignored until some extension is needed later. If you’re forced to constantly revisiting the code then you’re not building anymore. It’s also worth noting that if the code is any good there will be many many people that look at it over the decades that it remains in service. The real indicator of elegance and quality is how long people continue to use the code. If it’s re-written every year, that says a lot about it (and the development shop). (Of course it can also be so bad that nobody has the nerve to look at it, and it just gets dumped in by default).
There is a big difference between application development and system’s programming. The latter involves many complex technical algorithms, usually based on non-intuitive abstractions. It doesn’t take a lot of deep thinking to shift data back and forth between the GUI and the database, but it does to deal with resource management, caching, locking, multiple-processes, threading, protocols, optimizations, parsing, large scale sorting, etc. Mostly, these are all well-explored issues and there is a huge volume of available knowledge about how to do them well. Still, it is not uncommon to see programmers (of all levels of skill) go in blindly and attempt to wing it themselves. A good implementation is not that hard, but a bad one is an endless series of bugs that are unlikely to ever be resolved, and thus an endless series of testing that never stops. Programmers love to explore new territory, but getting stuck in one of these traps is usually fatal. I can’t even guess at the number of software disasters I’ve seen that come from people blindly diving in without first doing some basic research. A good textbook on the right subject can save you from a major death march.
Unit testing is hugely popular these days, but the only testing that really counts in the end is done at the system level. Specifically, testing difficult components across a wide range of inputs can be faster at the unit level, but it doesn’t remove the need to verify that the integration with other pieces is also functioning. In that way, some unit testing for difficult pieces may be effective, but unit testing rather simple and obvious pieces at both the unit level and the system level is wasted effort, and it creates make-work when extending the code. Automated system testing is hugely effective, but strangely not very popular. I guess it is just easier to splat it out at the unit level, or visually inspect the results.
From a user perspective, simple functionality that is easily explained is important for a usable tool but it also makes the testing simpler as well. If the test cases are hugely complicated and hard to complete properly, chances are the software isn’t too pleasant to use. The two are related. Code should always encapsulate the inherent difficulties, even if that means the code is somewhat more complicated. An overly-simple internal algorithm that transfers the problems up to the users may seem elegant, but if it isn’t really solving the problem at hand, it isn’t really useful (and the users are definitely not going to be grateful).
There are probably a lot more issues that I’ve forgotten. Everything about software development comes down to what you are actually building, and since we’re inherently less than perfect, testing is the only real form of quality control that we can apply. Software development is really more about controlling complexity and difficult people (users, managers AND programmers) than it is about assembling instructions for a computer to follow (that’s usually the easy part). Testing is that point in the project where the all of the theories, plans, ideas and wishful thinking come crashing into reality. With practice this collision can be dampened, but it’s never going to be easy, and you can’t avoid it. It’s best to spend as much initial effort as possible to keep it from becoming the main source of failure.
“Could you comment on how tests and similar practices fit in with the points you have raised?”
I meant to reply earlier, but every time I started to organize my thoughts on this issue I found myself headed down some closely-related side issue. It's an easy area to get side-tracked in. Testings is the point in software development where all of the development effort comes together (or not), so pretty much everything you do from design, to coding, to packaging affects it in some way.
Testing is an endless time sink; you can never do enough of it, never get it perfected. I talked about tactics for utilizing limited resources in an older post Testing for Battleships (and probably a lot of the others posts as well). I’ll skip over those issues in this one.
The real question here is what can be done at the coding level to mitigate as much of the work of testing as possible. An ideal system always requires some testing before a release, but hopefully not a full battery of exhaustive tests that takes weeks or months. When testing eats up too many resources, progress rapidly slows down. It’s a vicious cycle that has derailed many projects originally headed out in the right direction.
Ultimately, we’d like for previously tested sections of the system to be skipped if a release isn’t going to effect them. A high-level ‘architecture’ is the only way to accomplish this. If the system is broken up into properly encapsulated self-contained pieces, then changes to one piece won’t have an effect on the other pieces. This is a desirable quality.
Software architecture is often misunderstood or maligned these days, but it is an absolute necessity for an efficient iterative based development. Architectures don’t happen by accident. They come from experience and a deliberate long-term design. They often require care when extending the system beyond its initial specifications. You can build quickly without them, but you’ll rapidly hit a wall.
Encapsulation at the high level is the key behind an architecture, but it also extends all of the way down at every level in the code. Each part of the system shouldn’t expose its internal details. It’s not just a good coding practice, it’s also essential in being able to know the scope of any changes or odd behaviors. Lack of encapsulation is spaghetti.
Redundancies -- of any kind: code, properties, overloaded variables, scripts, etc. -- are also huge problems. Maybe not right away, but as the code grows they quickly start to rust. Catching this type of rusting pushes up the need for more testing. Growth slows to a crawl as it gets quickly replaced by testing and patching. Not being redundant is both less work, and less technical debt. But its easier said than done.
To avoid redundant code, once you have something that is battle-tested it makes little sense to start from scratch again. That’s why leveraging any and all code as much as possible is important. Not only that, but a generalized piece of code used for fifty screens is way less work to test, then fifty independently-coded screens. Every time something is shared, it may become a little more complex, but any work invested in testing is multiplied. Finding a bug in one screen is actually finding it in all fifty. That is a significant contribution.
The key to generalizing some code and avoiding redundancies in a usable manner is abstraction. I’ve talked about this often, primarily because it is the strongest technique I know to keep large-scale development moving forward. Done well, not only does development not slow down as the project progresses, it actually increases it. It’s far easier to re-use some existing code, if it is well-written, then it is to start again from scratch.
Sadly, abstraction and code re-use are controversial issues in software development because programmers don’t like having to look through existing code bases to find the right pieces to use, they fear sinking too much time into building up the mechanics, and because it is just easier to splat out the same redundant code over and over again without thinking. Coding style clashes are another common problem. Still, for any large scale, industrial strength project it isn’t optional. It’s the only way to avoid an exponential growth in the amount of work required as the development progresses. It may require more work initially, but the payoff is huge and it is necessary to keep the momentum of the development going. People often misuse the phrase “keep it simple, stupid” to mean writing very specific, but also very redundant code, however a single generalized solution used repeatedly is far less complex than a multiple of inconsistent implementations. It’s the overall complexity that matters, not the unit complexity.
Along with avoiding redundancies comes tightening down the scope of everything (variables, methods, config params, etc.) wherever possible. Technically it is part of encapsulation, but so many programmers allow the scope of their code or data to be visible at a higher level, even if they’ve tried to encapsulate it. Some misdirected Object Orient practices manage to just replace the dreaded global variable, with a bunch of global Objects. They’re both global, so they are both the same problem. Global anything means that you can’t gauge the impact of any changes, which means re-testing everything, whether it is necessary or not. If you don’t know the impact of a change, you’ve got a lot more work to do.
Another issue that causes testing nightmares is state. Ultimately, the best code is stateless. That is, it doesn’t reference anything that is not directly passed into it, and its behavior cannot change if the input doesn’t change. This is important because bugs can be found accidentally as well as on purpose, but if the test case to reproduce the bug is too complex to reproduce, it will likely be ignored (or assumed to have magically disappeared) by the testers. It’s not uncommon for instance to see well-tested Java programs still having rare, but strange threading problems. If they don’t occur consistently, they either don’t get reported or they are summarily dismissed.
There are plenty of small coding techniques as well. Consistency and self-discipline are great for reducing the impact of both bugs and extending the system. Proper use of functions (not to big, not to small, and not to pedantic) makes it easier to test, extend and refactor. Making errors obvious in development, but hiding them in the final system helps. Limiting comments to “why” and trying to avoiding syntactic noise are important. Trying to be smart, instead of clever, helps as well. If it’s not obvious after a few months of not looking at it, then it’s not readable and thus a potential problem.
Ultimately once the work has been completed and tested a bit, it should be set aside and ignored until some extension is needed later. If you’re forced to constantly revisiting the code then you’re not building anymore. It’s also worth noting that if the code is any good there will be many many people that look at it over the decades that it remains in service. The real indicator of elegance and quality is how long people continue to use the code. If it’s re-written every year, that says a lot about it (and the development shop). (Of course it can also be so bad that nobody has the nerve to look at it, and it just gets dumped in by default).
There is a big difference between application development and system’s programming. The latter involves many complex technical algorithms, usually based on non-intuitive abstractions. It doesn’t take a lot of deep thinking to shift data back and forth between the GUI and the database, but it does to deal with resource management, caching, locking, multiple-processes, threading, protocols, optimizations, parsing, large scale sorting, etc. Mostly, these are all well-explored issues and there is a huge volume of available knowledge about how to do them well. Still, it is not uncommon to see programmers (of all levels of skill) go in blindly and attempt to wing it themselves. A good implementation is not that hard, but a bad one is an endless series of bugs that are unlikely to ever be resolved, and thus an endless series of testing that never stops. Programmers love to explore new territory, but getting stuck in one of these traps is usually fatal. I can’t even guess at the number of software disasters I’ve seen that come from people blindly diving in without first doing some basic research. A good textbook on the right subject can save you from a major death march.
Unit testing is hugely popular these days, but the only testing that really counts in the end is done at the system level. Specifically, testing difficult components across a wide range of inputs can be faster at the unit level, but it doesn’t remove the need to verify that the integration with other pieces is also functioning. In that way, some unit testing for difficult pieces may be effective, but unit testing rather simple and obvious pieces at both the unit level and the system level is wasted effort, and it creates make-work when extending the code. Automated system testing is hugely effective, but strangely not very popular. I guess it is just easier to splat it out at the unit level, or visually inspect the results.
From a user perspective, simple functionality that is easily explained is important for a usable tool but it also makes the testing simpler as well. If the test cases are hugely complicated and hard to complete properly, chances are the software isn’t too pleasant to use. The two are related. Code should always encapsulate the inherent difficulties, even if that means the code is somewhat more complicated. An overly-simple internal algorithm that transfers the problems up to the users may seem elegant, but if it isn’t really solving the problem at hand, it isn’t really useful (and the users are definitely not going to be grateful).
There are probably a lot more issues that I’ve forgotten. Everything about software development comes down to what you are actually building, and since we’re inherently less than perfect, testing is the only real form of quality control that we can apply. Software development is really more about controlling complexity and difficult people (users, managers AND programmers) than it is about assembling instructions for a computer to follow (that’s usually the easy part). Testing is that point in the project where the all of the theories, plans, ideas and wishful thinking come crashing into reality. With practice this collision can be dampened, but it’s never going to be easy, and you can’t avoid it. It’s best to spend as much initial effort as possible to keep it from becoming the main source of failure.