Friday, January 27, 2012

The First Principle of Software

The very worst thing that any computer can do is lie to you. You have to be able to trust it when it says it will accomplish some work on your behalf. If it tells you one thing, then goes off and does something completely different -- causing a huge mess -- you have the right to be mad. If it shows you deliberately incorrect or misnamed data then you should be pissed. Except during hardware failure, computers can precisely record, manipulate and display data without any of the frailties of human nature. Users should be able to rely on this.

Thus the very first ‘most sacred’ principle of building software is that it should never, ever, mislead the user in any way. The software shouldn’t lie to them. That sounds simple enough but the current state of our industry is to wantonly ignore this principle.

Most programmers think of ‘users’ as the end-users, but they aren’t the only ones who rely on the software. Operations people are users too, and so are all of the other programmers that are directly or indirectly leveraging the code. All of these people are on the front-line of utilizing a programmer’s list of instructions to the machine. Behind them often stands another huge group of people who rely on the information coming out of these systems. Thus even a simple problem can affect a massive number of people. There are far more primary and secondary users than most programmers realize. If a programmer writes code that lies to people, they could be lying to a really large group of people for a very long time.

There are many different interpretations of ‘lying’ -- our modern age has really lowered the bar for this word -- but it is commonly considered to be ‘intentional’ false information. However a computer itself has no hidden agenda. It is just happily doing what it is told. So if there is ‘intent’ it lies with the programmers. Sometimes the programmers that assembled the initial list of ‘instructions’ are trying to be malicious -- viruses are a real and present danger -- but most often the programmer’s lack of knowledge, time pressure or just plain laziness are behind the the false results. Given what we know collectively as a profession, it is clear that these are all preventable excuses, which leaves ignoring them as intentional acts. Thus a programmer who deliberately ignores any long-established conventions about building robust software is intentionally misleading the users.

Software can lie to users in many different ways including:
  1. Misnaming the data.
  2. Misnaming the code and/or functionality
  3. Misrepresenting the status of work.
The first issue is the simplest. If you write code to store people’s full names into a system, then that data should be named something appropriate like ‘fullName’. That applies not only to the database that stores the data, but also to any interface that handles it at any time. Wherever the data is stored or presented, any associated names should be valid. Now if you put the data into a field like ‘address’, which in this case isn’t related to someone’s full name in any way, then the data is misnamed in the system. You are lying about it. Also, if you invent your own abbreviation that is non-standard and only known by you, such as ‘sFlNm then you are also lying, since no one else knows what it means and it is essentially garbage text.

Of course, sometimes we want the code to handle a more generic range of data, so to do this we can lift up the terminology to something like ‘name’, ‘text’ or even ‘string’. That type of categorical lifting isn’t misleading, it’s just using a more general term that still pertains to the underlying data.

In a system where all of the long-term (persistent) data is stored in a technology like a relational database, anyone should be able to go directly to the schema, and if they understand the data model, know exactly what each and every datum stored there means. Most of these data-store technologies have the capacity to support reasonable naming standards, old limitations like 8 character names have long since vanished into the dredges of history, thus there is rarely a valid excuse for not naming stuff appropriately. Its a matter of professionalism.

But it’s not only the names of that data that matter. Modern software relies on millions, if not hundreds of millions of lines of code, all of it whipping data back and forth around the system. Since the software industry is subject to on-going growth and change, very little programming code is ever just worked on by a single programmer. So if you belt out some code with cryptic variable names, then leave, you are essentially going to lie to the next programmer that unfortunately comes along. If you’re not lying to them, then its because they’ve junked all of your work, which isn’t particularly nice either. Methods, functions, configuration files, processes, etc. all require correct names. If a professional programmer has to name things, then those names have to be correct.

One of the worst things software can do is insist that it completed some work when it didn’t. This is easily the trickiest issue when it comes to lying with software, but solutions to mitigate or prevent problems have been around for nearly a half a century.

For instance, we do have ways of insuring reasonable transactional integrity that can be implemented in products. However, getting this type of code correct requires significant knowledge of the underlying issues, theory and practice. In reality, few programmers ever bother to access this type of knowledge, so they end up re-inventing broken versions of it. This hasty behavior shows up in many areas of software development including transactions, locking, concurrency and distributed processing. Failures in these areas, coupled with bad error handling, often produce misleading results. The software ends up lying to its users. Except in very rare cases, a professional programmer with the right prerequisite knowledge can avoid this type of lie.

With the exception of some unavoidable coordination issues, there is no reason for a computer to ever lie to its users. Someone may have typed in the wrong values foe the data but the computer should never contribute to the problem. It collects stuff, distills it and then shows it. It doesn’t need to distort what it shows in unreasonable ways. So it is pretty clear that the people creating the instructions for the computer should follow suit. It’s a matter of professionalism. We don’t need to misname data, obfuscate our code or reinvent broken functionality. These mistakes are avoidable and avoiding them is the foundation of providing usable software. Almost all violations of the first principle of software are completely unnecessary.