The Programmer's Paradox: About, Next and Garbage

Trouble often starts trivially. It's those little subtle irrationalities that most people don't think worth investigating, that gradually percolate their way into bigger, more significant issues.

Most professions accept these inconsistencies as the bias of their conventions. Software developers, however, straddle the abstract mathematical world of computing machines, and the grittiness of the real world. Those tiny little things that others might miss become essential for us to understand, or we end up watching them slowly eat away at our efforts.

A fascination with the trivial is a healthy one for someone whose income depends on mapping the real world down to a deterministic set of symbolic tokens.

The mathematical world is clean and simple. We can get so used to it that we hopelessly try to project it back onto the real world. Why are things so messy? Why can't people get organized? Why do stupid mistakes happen, and then not get caught early? We mistakenly seek to apply order to the chaos.

When you spend all day contrasting these two very different worlds, it is easy to fail to appreciate the depth of the real one. But keeping the two separate, and having separate expectations for each one, makes it easier to correctly map back and forth. If we're not trying to jam mathematical purity onto the real world we're far less likely to be disappointed by the results.

Still, where and when things are strange, erratic or inconsistent, we need to pay closer attention. Computer Science begins and ends with trivialities, they inhibit the world around us and we have to learn to deal with them.

In the following simple examples, I'll start with three really trivial issues, then get into two more that are domain-specific. I will initially present each one of the five as simply as possible, with little or no conclusions. Each of them is trivial in their own right, yet each, in their own way are examples of the very small things that work upwards, causing greater issues.

GARBAGE

Garbage day comes once a week. Thanks to diminishing space at toxic landfills, we're gradually separating more and more of our household garbage into specific sub-categories. At first, it was just glass, then metal cans, then cardboard and more recently compostable food materials.

Over the decades, garbage has gotten considerably more complex. What started with simple garbage bags, morphed into garbage cans (to save the bags), blue boxes for glass and cans, grey boxes for paper and finally green bins for food scraps. Recently, though the grey boxes have been dropped, their contents now allowed in a newer merged blue bin.

Since it is expensive to pick up all of the different types each week, they alternate on a weekly basis. Every week they pick up the green bin, but they switch between the recyclables one week and the rest, the week after. To make it faster for pickup, and to allow them to charge us more money they provided new (huge) bins to all of the people, nicely labeled 'garbage'.

So, now garbage day is either a recycle week, or a garbage week. Sadly, the term garbage now means both the overall stuff we throw away, and all of the stuff we throw away that isn't something else (a wildcard). We have two distinctly different uses for the word garbage.

NEXT

"Let's get together next Saturday" seems like such a simple sentence. If today is Tuesday, for many people this is an invitation to meet in 11 days. The closest Saturday is often referred to as "this Saturday", while the proceeding one is "next Saturday".

Of course, the dictionary definition of the word 'next' likes to use the term "nearest" in its definition. If you were sitting on a chair, the "next chair" would be the one you could reach out and touch (if it were close enough). Your next-door neighbor is the house adjacent.

Some people -- but they seem to be in the minority -- would take that initial invitation to mean a meeting in 4 days, instead of 11. Next for them, really is the next thing, regardless of position.

Of course, it might just be that "next Saturday" is a short-form for something like "next week, Saturday". There might be some reasonable historic answer for why next doesn't reference the nearest thing but instead references something two positions over.

ABOVE

Once, while playing a guessing game, I was asked if the name of an object started with a letter above or below F.

We read text from left to right, top to bottom, so if we were to write out the alphabet, it would be in that order. On a very narrow sheet of paper, A would start out on top and the letters would descend, one after each other. In that sense, A is clearly above F.

In a computer, however, the letters are all encoded as numbers. A starts with relatively low value, and it gets larger and larger until we get to Z. In a numerical sense, it is not uncommon to take a number like 28 and say that it is higher than 2, which implies that it is above 2. In that sense, as encoded numbers, Z is clearly above F.

So is A above F, or is Z above F?

BOND CALCULATIONS

Bonds are a fairly complex financial instrument that is used to indicate a debt between two parties. One party essentially borrows money from the other, at the terms specified by the bond. Because the instrument is generic, the money loaner is free to sell the bond to other holders, if they no longer wish to keep it (they may need the underlying cashback to do something else with it).

Lots of people like buying bonds because they are a reasonably safe and consistent instrument. There is a big market. However, for the sellers (borrowers), the cost of issuing a bond can be significant. They are interested in raising the most cash they can for their efforts. Their focus is on finding new ways to get the buyers to over-value their instruments and pay more. On the flip side, with all of the buyers out there, bonds with more complicated features are likely to be under-valued. Strangely this makes buyers interested in them, also hoping for a deal. Hoping that they are under-valued.

As such, the bond market is continually turning out new types of bonds with increasingly complex features, so as to obscure the price. Both buyers and sellers are placing small implicit bets that the other players can't or don't know how to value the instrument correctly. Even after hundreds of years, there is a steady stream of new types of financial instruments getting created. The change, and occasionally the scandals caused by gross under-valuations of instruments (like a CDS), are important parts of keeping the pressure on the markets to balance the real and perceived value of everyone's investments. Gradually, people always start believing that things are worth a bit more than they really are. We are optimistic by nature.

PRICE QUOTES

Big companies have to buy things, often in great quantities. In an age where everyone is looking for the deal, many different industries have grown up to supply these needs. Since a large company holds a tremendous amount of purchasing power, they often use that as a tactical weapon against their own suppliers.

In industries where this is common, the suppliers generally create individual and specific quotes for all of their available goods and services. Quoting is a time consuming manual process, which may seem out-of-date in the computer age, but there is an underlying need for it.

Most suppliers have quantitative price breakdowns for their wares, or at least they have some reasonable idea about how much they need to charge in order to stay in business. Those numbers are nice, but with their clients occasionally and inconsistently trying to force through a bargain of some type, suppliers have to continually make up for lost revenues.

Thus the prices for most items fluctuate depending on how negatively or positivity the business dealings with the company have been. In short, if a big company forces its supplier into a deal, the supplier will record that, and eventually, the supplier will recoup the money (and often lots more). There is constant tension on the relationship, as both parties try to outmaneuver each other.

FROM ABOVE

Getting back to 'garbage', we see that in building computer systems it is not uncommon to come across terminology that while matching domain conventions, is horribly inconsistent. it just builds up that way.

The domain experts get it, yet an outsider would have little hope quickly detecting the ambiguities. We step on these types of landmines frequently, like natural language, and our sciences are founded on a base of massive inconsistencies.

Even in a new discipline like Computer Science, it is still not uncommon to find a definition coming from an established work like "the Mythical Man Month" by Frederick P. Brookes, to be using Aristotle's ancient definition of the word accidental, rather than a more modern one. The differences in meaning make a significant literal different from the ideas.

Even with respect to some term as simple and obvious as 'next', we do not come either to an agreement or a consistent definition. It is well defined, yet not used that way. If most people use a term in a manner that disagrees with its definition, the convention easily overrides the definition. But when slightly more than half do, it becomes complicated. A significant enough number of people have a more intuitive definition for "next Saturday" so that it is an exceptionally risky way to specify a date. The term is ambiguous enough to render it useless.

Relative terms, like 'above', are especially at risk since their conversion into an absolute piece of information can easily be tied to perspective. If the term is not used heavily, and it has at least one intuitive meaning, we have to be careful that we're not assuming that our general understanding of it, is the correct one. Because of this, it is always advised that we specify everything in absolute terms if we want to make sure there are no problems. Relative phrasing courts miscommunication.

Even if we're not getting lost in multiple contradictory definitions, the real world holds a tremendously large amount of inconsistency and irrationality. Things in the mathematical world are clean and simple, yet the real world, even with simple issues is deep and complex.

The underlying nature of the bond industry, for example, forces a constant inherent complexity over the whole process. In order to give the various different parties advantage over each other in extracting money, the underlying financial math is needlessly complicated. It's a game in which both sides are equally guilty in wanting it to be confusing. Both are hoping to win.

Sometimes, the real problems come mainly from one party, as in pricing. If it wasn't for a steady supply of aggressive executives out to make their careers by gauging deals from suppliers, the pricing could be considerably more rational. While the excutroids get their deals, the suppliers often win bigger in the long run and use their client's nasty tactics as an excuse to over-charge. There is little incentive to stop these types of dealing on either side. The system is founded on an irrational, steady stream of price haggling. Most pricing has a few irrational components built-in.

Business itself -- the more you dig into it -- is essentially irrational. Or at very least, like whether it is so inherently complex, that one can understand the big picture while still not being able to predict whether or not it will rain the next day.

For all of the big business theories, courses, management consultants and universities claiming to understand and quantify it, the whole system always migrates back to an irrational state of being. It does this because if it didn't then it would be really easy for every party to win, which means that none would. A rational business environment would be a fair one for all, but that does not suit any smaller subset of people in business together. Fair does not equal profits.

FINALITY

Software development is about creating computer systems to build up large piles of data. Done well, we can use that data to automate and improve many different aspects of our lives.

But in building these systems we have to dig deeply into the underlying domains and try to impose a rational and deterministic structure on top. This is no trivial feat because all of this domain information is rooted in a long history of messy and irrational elements.

It is great that we get to dig deeply into other people's business, but it is also surprisingly frustrating at times. The deeper we dig, the scarier it becomes.

The biggest and most common mistake software developers make is confusing the 10,000 feet view with actually understanding the underlying domain.

Any and all of our assumptions are dangerous and have a high probability of being wrong. To get past this, you have to start by not expecting the trivial to be trivial. If you are always willing to accept the possibility that a misunderstanding has crept into the picture, then you're able to prevent yourself from being too fragile. Assumptions, even simple ones, are the source of most analytical errors.

Still, most professions constrain their practitioners into just keeping up with their own industry. Software developers, however, to survive year after year, have to both be experts in building systems, and also experts in a specific domain. General programming knowledge is good for the first five years of a programming career, but it's the domain expertise that makes it possible to last longer.

Unless you're only interesting in building software for a fraction of your career, it is important to start building up significant domain knowledge. Even, if like myself, you find yourself skipping back and forth between radically different domains, digging often helps to give one a good sense of the world around them.

The Programmer's Paradox

Sunday, March 29, 2009

About, Next and Garbage

No comments:

Post a Comment