Saturday, February 10, 2018

The Value of Software

A huge whack of code, on its own, is pretty useless. The code needs data to process. The user’s real-world issues only get resolved by enabling better decisions made from the persistent data. Data is the heart and soul of any software system.

What’s really changed over the decades is not that the code we write got any better, or that the frameworks and libraries became easier to use, but rather that data became hugely abundant. The World Wide Web is a great example. From a coding perspective, it is a dog’s breakfast of good and bad ideas munged together in rapidly evolving chaos that often barely works and has had exponential explosions of obscene complexity. Technically, taken all together, it is pretty crappy. But the Web itself, at least until recently, was a monumental achievement. With a bit of patience and perseverance, you could find a tremendous amount of knowledge. It opened up the world, allowing for people to learn about things that were previously off limits. It was the data, the sheer scale of it, that made the Web wonderful.

That explosion, of course, is diminishing now, as the quality of available data is watered down by noise. The tools we built were predicated on not having enough data, they cannot deal with having too much low-quality stuff.

Programmers still, however, hyperfocus on code. It’s as if an ‘algorithm’ really has the power to save the day. All they think we need is just something better, that can magically separate the good stuff from the bad. Ironically, for our industry, we have known for at least half a century that shoving garbage into code produces garbage as output. And, at least algorithmically, nothing short of intelligence, can reliably distinguish data quality. If we want better data then we have to train an army of people to sift through the garbage to find it. The best we can do is to craft tools that would make this work less painful.

The promise of computers was that they could remove some of the drudgery from our lives, that they could help keep us better organized and better informed. The reality is they now waste our time, while constantly distracting us from the stuff that really matters. The flaky code we rely on is a big issue, but the real problems come from streams and streams of bad data. If we let everyone type in whatever they want, then not only does the conversation degrade, but it also becomes increasingly impossible to pick out value. A massive network of digital noise isn’t going to drive positive changes. Code is a multiplier, the real underlying value of software comes from the quality of the data that it captures. We won’t get better software systems until we learn how to get better data.