Friday, November 6, 2009

Programming Structure

I've been a little sluggish with my writing, lately. Probably because I've been very busy rewriting a big piece of code at work and I'm not much of multi-tasker (if it involves thinking).

Still, I need to get back to my exploring proposed laws, starting with the first set:

Fundamental Laws of Programming Structure:

1. All programming code is hard to write and has bugs.
2. The lower the code is in an architecture, the more you can re-use it.
3. The fastest way to develop code is by re-using it.
4. The best way to develop code is by deleting it.

Before I dive in, I thought I would start with xlr8's comment:

"the first thing I can think is: there is some difference between math laws and these."

Computers straddle mathematical abstractions and the real world. As such, software runs in an abstract system that references back to the real one. What we are really doing when we design and build systems is finding useful ways to bind the two together.

On the purely abstract side, a term like "theory" works well for describing something that has been rigorously proven to be true. Something that is abstract in nature.

That's nice, but most of the interesting or complex problems from computers come not from their roots in the pure mathematical space, but rather from how we tie that back to reality.

Mathematics can have a intellectual purity to it that is impossible in the real world. The real world, on the other hand effectively resists any type of intellectual containment. In that way, even if we are rigorous with our real world observations, there will always be exceptions.

Economics sets a fine precedent for dealing with the mixture and uses the term "law" for handling things are mostly proven to be true:

http://en.wikipedia.org/wiki/Law_of_demand

so I think that it also works well when we are discussing any Computer Science problems that fall on the "real world" side of the fence.

Software development in that sense will always be a "soft" science similar to Economics (which calls its 'hard' math side "econometrics"). We already have names for our "hard" math side, it's the references to the softer side of our jobs that get most confusing.

Given that, for any proposed "law", there will always be exceptions:

http://thedifferencesblog.blogspot.com/2009/07/human-oriented-dynamics.html


xlr8 also said:

"2 This leads to second point, definitions. In fact in math everything is well defined, here, it can not be. See the words i quoted in preceding point ("useful", "you can't", etc)."

As I go through the different sets, I will try and be more precise about what I mean for the various terms I am using. I usually have a very precise definition in mind, even if I haven't managed to document it well.

The issues that play out in the real world, are usually based around scheduling or development costs, both of which serve as very sharp and painful obstacles for software development projects. Software development is almost always primarily bounded by time and resources, it is after all, real world stuff that needs to function in order to be useful.


HARD CODING

I really agonized over my first law for this set, mostly because I don't actually think programming is "hard", at least it is not hard all of the time. Parts of it are, but it shouldn't be large parts.

In fact, I know that the road to success for any project is to get the "hard" part out of the way as early as possible. If you're limping into the deadline with nothing but mindless effort between you and getting it done, you're pretty sure that you going to make it on time.

However, if a few weeks before you're supposed to deliver, you find that there is still some major chunk of the program that is unknown, and you haven't even worked out a battle plan for getting written yet, then you're pretty much sure that you ain't making that deadline.

If there are "hard" things to do between you and the finish line, then you're not managing the project well (even if it is not your fault).

Hard, as it comes about in programming, is either that it is hugely time-consuming, hugely resource consuming, or just requires a mass amount of thought. If it's beyond hard, then likely it's impossible (or near enough so).

No sane construction company would start building a structure that they knew was impossible, but oddly that's not all that uncommon in software development. We're often too eager to rush off and start coding.

Still, I've found that with experience comes an increasing ability to recognize "hard", and to learn to worry about it as early as possible.

In that way, if all is going well in a development project, then the second half should not be hard, it should just be work. And to some degree, it should be predictable (even to the point of always expecting the get one or two of those really nasty week-consuming bugs that grind everything to a standstill).

"Hard" is not something we want, need or should put up with for long periods of time. "Hard" is the first big obstacle in many cases, but there are ways to manage it successfully.

What I really should have said was:

1. All programming code is time-consuming to write.


which I think is far closer to the underlying reality of what I am trying to say.

Writing code, any code takes a very long time. And, to some degree, if it doesn't, it is because the project is saving time by accruing technical debt, but I'll get deeper into that point when I talk about enhancing software and some of the rules-of-thumb metrics I added under the commercial software laws.

Also, my second point about bugs deserves its own specific law:

2. All programming code has some unexpected behavior (bugs).


Bugs have always been a bit of a funny concept particularly in programming. We often blend several very different things together. Astrobe writes:

"- Programs have bugs because undecidability prevents automated checks
- Programs have bugs because they are written by humans, and errarum humanum est."

In the purest sense, a bug in a programming language really only exists when a compiler of some type refuses to continue building the executable code. In any other circumstance, including run-time, the computer is just doing exactly, and only exactly, what it was told to do (including a segmentation violation or core dump).

Ignoring undetected hardware failures (for now), a computer that does exactly what it is told, but still does something unexpected is more of a problem with our expectations and our point-of-view of what we think is right, rather than with its ability to perform a given task. The computer's not at fault, we are.

Al responded to Astrobe with:

"That said, it is very true that we cannot be sure that all bugs are detected automatically, because of undecidability (thanks to Mr Gödel and Mr Turing for the theories behind it). But, even without any theory about programming, the number of combinations for the inputs of any non-trivial program is far too large to do exhaustive testing (could the number of combinations be a beginning for a characterization of "non-trivial" ?).

And when the bug lies in the spec, well, we cannot write a jUnit that checks that the spec is what the customer meant ;)"

In fact I'd say that the far majority of the bugs I've had to deal with over the years have been rooted in domain issues; coming directly from communication problems with the customers or the users.

Like the coding itself, technical problems are easier to detect, and fix. Often they are fairly obvious in their solution. Domain problems on the other hard, are very hard to detect, and frequently very expensive to fix.

Nothing automated could ever detect that some information wasn't right if it doesn't have some context in which to compare it. Since the code is the first attempt to work with that domain information (correctly), it is unlikely to ever have a secondary code body that can validate the primary one (or we'd be using the secondary body as the primary one).

There is zero intelligence in computer behavior, above and beyond exactly what we explicitly encode there. Computers are not wrong, nor are they right, they just are ....

About the only thing we can do is maximize our use of the specific programming language features that force the compiler to be stricter in detecting that there is a problem with the code.

Of course, that opens up a severe trade-off where we are getting better value from our compilers automated checking, at the expense of having significantly more code that is explicitly more rigid. For some domain-critical algorithms, that is an appropriate, if not excellent trade-off, but for something more general code like a re-usable framework that is an exceptionally poor choice.


SPEED AND PROGRAMMING

Projects die for all sorts of reasons, but often they die because the programmers squandered their resources and where never able to get past the mountain of technical debt they created.

The idea that every messy piece of code, every inconsistent file, and every quick hack eventually builds up to drive the forward momentum of a project to nearly zero is not a new one. Still, it is ignored so frequently that it is staggering. Making too many short term vs. long term decisions nicely encapsulates itself under the term "debt".

The term "debt" fits well because software development is really bounded by the real world problems from the domain, not by the mathematical ones or the technical ones.

No-one (that I know) has ever failed because they violated or crashed into the halting problem (although some clearly should). The theory sets some pretty broad limits, but the fatalities in software come from more mundane things.

Technical debt, like financial debt can build up to the point where it's effect is tremendous. Ignored it is frequently fatal.

In that way, my first goal on all of my development projects has always been to "make sure all aspects of the development are tangible". Due to my first point about programming being hugely time consuming, most projects are seriously short on resources before they've even reached the preliminarily design stage. Knowing this early on allows for one to make better trade-offs.

Limits are intrinsic in the creation of projects. If you decided to renovate your kitchen, chances are you're not wealthy enough to allow the contractors to spend "whatever it takes" to get the job done. There is some amount of money in your saving account that is bounding the work, before it's even been planned.

In all but a few cases, the same point is true in software. The value of the work, in either dollars, time or man-power is known long in advance of the design. When I was young, I can remember how much that irritated me, how I sensed it was an injustice. However, when I had to write the cheques my perspective changed, and fast. Controlling the resources is tad amount to controlling the risk. Controlling the risk is necessary for success.

One does want to lose their house because a renovation project went 200X over budget, that would be madness, and the same occurs in software.

My point of this little ramble is to justify, as much as possible, the importance for all software projects to leverage their code base to it's full extent. Just pounding out masses of code at a high rate is the same as just generating masses of technical debt.

Brute force coding may seem effective in the short run, but it eventually involves callously corning one-self far away from the door while trying to paint the floor, i.e., painting oneself into a very tight corner.

It's one of those stupid things that seems on reflection to be obvious, but is still done far too frequently by people who should know better. With that in mind, points 2-4 in the original laws might be summed up with:

3. Every line of code, comments and document counts as work.


And added to by the nearly obvious:

4. It is less work to update a smaller amount of code, comments and documentation.


And here I should point out that by "code" I not only mean all of the code that is written to deal with the technical and domain issues, but also any "code" that is written for packaging, automation and scaffolding (like testing). Code is code is code, it doesn't really matter what it is doing. Comments and documentation are just sloppy code (or code is just strict documentation). they are all the same thing, differentiated only by precision.

In the end, these are all things by which people have to expend effort to update, examine or remove. When there are more of them, there is more work.

Ignoring things (like the last version of the user manual) doesn't make them go away, it just builds up more technical debt.

By 'work', I mean any effort expended by any persons for the purpose of managing, containing, checking, meeting, talking about, updating, changing, cleaning up or re-writing or any other effort that somehow involves some amount of dedicated time.

It's easy to forget that even if some chunk of complex code is not receving constant updates, if it is coming up in discussions, lectures, meetings, etc. it is still "costing" some amount of effort on the people involved. If the coders all whine once a week about how bad a particular library is to use, that's still expended effort, even if it's not constructive effort (and bad morale heavily eats at development resources, whether or not the pin-headed management consultants realize it).


GOALS

My original goal for this set of laws was to lay down something that stressed the importance of leveraging code to it's highest capabilities. Why waste it, if you have it.

As a profession, we constantly refuse to do this.

Too many programmers have fallen into the bad practice of just accepting that the bulk of their system won't get re-used. Some programmers have even mastered the black art of explicitly not reusing stuff, so that more or less it is their intrinsic style of programming (each and every line of code has exactly one and only one usage within the system).

Many programmers seem to fear that any extra effort in thinking won't get paid off by the savings in work. Which, in many regards is true initially.

Doing it right, generalizing and making sure that the code is re-usable takes more time, right away. Creating an architecture, doing design, fixing inconsistencies, and not just taking the quick way around all of the problems is always longer. But not long enough to negate any payoffs later in the project.

Unrealistic schedules force us to always have to accept some technical debt, and all of it is absolutely more resource expensive then doing it right the first time. Minimizing it, and carefully picking and choosing, are key choices that can effect the success or failure.

Some programmers fear that highly leveraged code is complex, and changes will propagate to unexpected places. In some ways that is true, but primarily since it's the re-use itself that is helping to enforce consistency across the system. If the code is re-used, but the usage is an inconsistent mess then, of course, changes can have negative consequences, but realistically it's not because of the re-use, it's because the upper body of code is a mess.

From this we get:

5. Lower level code is easier to re-use.


A small body of well-formatted, consistent code is worth so much more than some untold millions of lines of crap (well, at least to technical people, the market still occasionally disagrees, but only in the short run).

Thus if we have one and only one main programming principle it should be to always solve the problem with the "absolute minimum" of resources, including code, documentation and testing.

If it isn't necessary, it isn't necessary and shouldn't be done. If there is little or no practical value, then it is just wasting time.

As a side-note, I'm a huge fan of design and architecture. I'll never commit to a project without at least a preliminary architecture, but I've often sensed that people get the wrong ideas about what is necessary.

Creating a big fancy architecture, with hundreds of pages is a waste of time. Creating one with any level of detail above and beyond exactly what is going to be useful is a waste of time. Detail is not the issue. Systems always need an architecture, but only in the context of not wasting too much time in experimental programming, and in not wasting too much time in overall system testing, and in not wasting too much time in operational support triage. For these goals, the lines need to be clearly drawn, but beyond that the specifics are less crucial. So long as you are clear on where the line should be, enforcing that is not hard (and is mostly boring refactoring work).

The architecture -- to have any value at all -- needs to pay for itself and the time it took in producing it. Otherwise it's just another -1 to a project in desperate need of more resources.

Getting back to re-use, it is important that we maximize any and all effort in all levels of the project. While that might appear to be an obvious statement, most people too easily forget about side-effects.

If you built some testing mechanism that required a couple of hours to create, but saves 4 hours of debugging, you've managed to pull ahead by two hours. If you've built some automation for weeks that might have been done manually by a low level employee in a couple of days, it might require years, decades or even centuries to pay off. Patting yourself on the back for "saving time" is a little premature.

It is too easy for projects to invest too much time in the wrong areas.

A skilled and experienced software developer learns not to take any effort for granted. The rewards of the work, may or may not pay off the effort. Sometimes it makes more sense to not automate something. To not add fancy diagrams, or to just not commit to some huge effort right away. Sometimes stupid and ugly code is the best approach (rarely, and only at higher levels).

For people who don't know and are trying to figure it out there is a very simple way to think about it. Trace everything backwards through time. If you can't associate the work with some useful tangible (and necessary) effect (either in the project or outside of it), chances are, it has no value and should not be done. In environments with constrained resources, there always needs to be a practical reason for everything. Nothing should be random, nothing should be "just because".

If you're relating the work to "assumed" facts, then it might make some real sense to validate you assumptions before investing too much time. Simple things have a nifty way of getting out of control if they are not tied to specific outcomes (in fact most things out of control get that way precisely because they are unbounded).

All of this comes back to getting the most value out of your work. The biggest investment in effort in all software development projects comes from working on the code (not including sales and operations). You can't find the time to do it right, if you can't figure out how to leverage any existing work. You can't get ahead if you are always behind.

You can't win if you're not making long-term trade-offs, and you can't makes these unless you have enough resources. Thus:

6. Minimizing the work allows for making more longer-term trade-offs.


SUMMARY

Computers are amazing machines, but our current (disappointing) state of software construction diminished their usefulness.

Well-structured programming code is part of the balance required for large and small software projects to be able to find the time to correct any significant issues with their efforts. Poor development practices leave the programming teams scrambling to make tight deadlines, often forcing them to accrue significant technical debt. Debt eats back into the resources, forcing the teams to further compound their weak practices. This downward cycle is fairly standard for most new and upcoming projects.

This cycle can only be broken by minimizing all of the overall work, and then using what is saved to increase the number of long-term choices made. As more of the debt is paid off, the "crunch" from each development cycle will ease and more time can be dedicated towards improving the system, not just patching it.

So I think programming is bound by the following laws:

1. All programming code is time-consuming to write.
2. All programming code has some unexpected behavior (bugs).
3. Every line of code, comments and document counts as work.
4. It is less work to update a smaller amount of code, comments and documentation.
5. Lower level code is easier to re-use.
6. Minimizing the work allows for making more longer-term trade-offs.