The Programmer's Paradox: Bag O'Tricks

There are at least two different approaches to computer programing.

The first approach comes from slowly building up an understanding of coding ‘tricks’. These are simple ways to solve simple problems. Initially, people start with the basic language features: assignments, conditionals, loops, and functions. As they figure them out these go into their Bag O’Tricks. Then they start adding in language library functions, like string handling, files, data-structures etc. Gradually as they learn more, their Bag O’Tricks gets larger and larger. Many people move on to adding higher-level paradigms like design patterns. Most add in specific tricks for different technologies, like databases, networks or frameworks. Over time programmers end up with a fairly large collection of ways to solve lots of sub-problems within different languages and technologies.

When confronted with a new problem, they quickly break it down into sub-problems, continuing until the pieces are small enough to be solved with their existing Bag O’Tricks. If they are curious folk, they generally try to learn more tricks from examples or their fellow programmers. They collect these up and apply them as necessary.

This is a very valid way of programming, but it does have one significant weakness. At any time during their career, a programmer’s Bag O’Tricks contains only a finite number of different solutions. They can arrange them in different ways, but their capabilities are limited by their tricks. That works wonderfully when the problems are the same or substantially similar to ones they have dealt with in the past.

The trouble comes when they encounter a problem that is of a new or different caliber. What happens -- you can see this quite clearly in a lot of code -- is that they start applying their tricks to the sub-problems, but the tricks don’t pack together well enough. These solutions become very Tetris-like, basically odd fitting blocks with many gaps between. Of course, past success clouds present judgment and since the programmers have no reasonable alternatives -- given the ever-present time constraints -- they keep heading down their chosen path. It’s the only path they know. When this goes wrong, the result is a bad mess that is unstable. A problem outside of the scope of a programmer’s tricks is one that they aren’t going to be able to solve satisfactorily. The industry is littered with examples, too numerous to count.

The second approach to programming is to drop the notion that ‘code’ is “the thing”. That is the key, to let go of the idea that creating software is all about assembling lists of instructions for a computer. Yes, there is always ‘code’, but the code itself is only a secondary aspect of a larger issue. The ‘thing’ is what is happening ‘underneath’ the code. The root of everything in the system. The foundation.

Right down at the bottom is data. It is what the users are collecting, what the database is storing and what people are seeing on their screens, reports, everything. All coding problems can be seen in the light that they are just instructions to take data -- stored in one place -- and move it to somewhere else. Along the way, the structure or shape of the data may have to change as part of the move. And on top of the data, there may be a requirement for ‘dynamic data’ that is calculated each time it is used, but this is only to avoid storing that data redundantly. Ultimately it is all about the data.

So the second approach is to forget about the code. It’s just a vehicle for getting data from somewhere, transforming it and then passing it on. The list of instructions is meaningless, the system is all about how data flows from different locations, is manipulated and then flows elsewhere. You can visualize the entire system as just data moving about, going from disks to memory, heading in from the keyboard and heading out to the network, getting dumped to the printers, being entered and endlessly modified by users. You don’t really need to understand the specifics of the algorithms that tweak it as it moves, but rather just its starting and final structure. The system is the data, and that data is like a car, where the code is simply the highway that the car follows to get to specific locations.

This second approach has considerable advantages. The best one is that a programmer seeing their work as just taking data D and getting it to D’ is no longer restricted by their finite Bag O’Tricks. Although they can permute their tricks endlessly, they are still heavily restricted from solving particular problems correctly. But a transformation from what is essentially one data-structure to another is a well-defined problem. There may be some sub-algorithmic issues involved in the transformation, but once broken down into discrete pieces, figuring out the code or researching how to do it properly are very tangible activities. So the programmers are in a good place to solve the system problems correctly, rather than just trying to endless combine tricks in the hopes that most issues go away.

Another major advantage is that a data perspective on the code allows for easy and natural optimizations. The programmer is no longer combining pieces, which often throw the data through unwanted gyrations. Instead, the data goes directly from point A to point B. It’s a straight line from one format to another. As well, the programmer can widen their scope from just a localized problem all the way up to ‘every use’ of a particular type of data, anywhere in the system. This opens up huge possibilities for macro-optimizations that generally provide huge boosts to the overall performance.

One common difficulty in software development is system upgrades. The code upgrades really easily, you just replace a block you don’t like with a block that you do. Data, however, is a major pain. Upgrades force merges, backfilling and endless restructuring. If you are initially focused on the code then the upgrade problem gets ignored, where it quietly grows and becomes dangerous. Focusing on the data however, brings it front and center. It becomes just another sub-problem of moving the data from one place to another, but this time across versions rather than just inside of the system. It makes tackling a big upgrade problem no worse than any other aspect of the system.

Added to all of this, it is far easier to visualize the data moving about in a system instead of seeing a mountain of poorly organized code. This makes architecture, debugging and testing far simpler as well. For example, a large inventory system with lots of eclectic functionality becomes conceptually simple when viewed as just a way to collect and display very specific items. This twist then leads to ways to combine and organize the existing functionality so that it is easier for the user to wield. Generalizations flow naturally.

Over the years, I’ve seen many a programmer hit the wall with their current Bag O’Tricks approach. Their ability to correctly solve problems is limited, so it is easy for them to get into a position where it becomes convoluted. However, seeing the data first breezes right through these issues. It becomes very quick and convenient to either manipulate the data into the correct form or to determine if such manipulations are even possible (there are many unsolvable problems). Getting back to the earlier analogy, if you don’t have a viable car, you don’t really need to consider which off-ramp would be best.

Often I like to refer to programmers who rely solely on their Bag O’Tricks as having ‘one eye open’. The programmer may be very good at coding, but they’re too constrained by the limits of their existing tricks. If they spend their career staying within those boundaries, there are no problems. But if they want to get out there and build spectacular things that people will love, then they’ve got to get that second eye open as well. Once they’ve done that, they are no longer limited by what they know, just by the available time and their ability to correctly analyze the problem space. A whole new world of possibilities opens up. They just have to learn to change their perspective.

The Programmer's Paradox

Tuesday, May 8, 2012

Bag O'Tricks

No comments:

Post a Comment