Saturday, May 17, 2014


Since I've been following it, hardware innovations have always been the leading indicator for software direction. 

Mainframes were followed by minis, then micros, networks and servers, and now smart phones and tablets. Each new wave of hardware spawned in software another set of waves a few years later such as operating systems, better languages, data structures, GUIs, networks, libraries, client-server, object-oriented, the web, NUIs, distributed systems and mobile apps. These came as a response to the increased capability of the new hardware. Since it was all new territory, most of these new software technologies were redone from scratch. In this fashion we've built up billions of lines of code, in distinct multiple generations, managed by millions of programmers. 

In a very general sense, software has never had the chance to really mature because adapting to the next wave of hardware took precedence. This dates way back beyond me and might come loosely under the category of "the software crisis" from the 60s. An ongoing epidemic where each new generation of programmers ignores the knowledge of the previous ones because they were trying to keep up with the new demands brought on by advances in hardware. We don't get caught up in fixing the existing problems, we just move on to the next ones. 

The Internet of Things promises to continue this trend for yet another round. They'll be another huge wave of small hardware devices, followed by billions of lines of code to try and do something useful with all of these interfaces and the data collected.

The 'crisis' part of this evolution is that although software has innovated to catch up, the general quality of it has been sacrificed to keep pace. A big change in approach like Object Oriented programming did open the doors for us to build considerably larger systems, but each new generation compensated for the increased complexity by reducing aspects of the quality. 

There are of course, lots of counter examples of great modern software that is extremely well-built, but considering the billions of lines of code out there today, they just don't count for a sizeable volume. Most code is fragile. More importantly, our critical data is kept on or processed by older, more reliable technologies like mainframes, COBOL or even Fortran. That's partly because they are better debugged by now, but also because they were better engineered as well. Cruder hardware forced more concentration on the solutions, which resulted in the software being smaller and dependable. More thought went into the work, because there was more time available for it.

All of the new hardware coming out for the Internet of Things will once again hide this basic problem. Data was initially generated mostly by people, so it was small, prone to error and needed extensive editing. But it was highly structured and deep. With hardware generating the data the opposite will be true. It results in a mass amount of data that is extremely accurate, but also very shallow. Mostly it will be endless time-series of specific points in the real world, like a living room lamp. We'll get great volumes of stuff like ON 3:46pm, OFF 4:23pm, ON 7:23pm, and so on. In a sense, it is data that just 'outlines' some base information like the fact that someone came home at quarter to four for a while and then returned later for dinner. It doesn't say that directly, but it can be inferred.

The rush to utilize this new hardware input will once again obscure the fundamental problems with software development. We'll get an app to flicker the lights on and off for example, but beyond the basic tricks, the software side will only marginally improve our lives at the cost of making them once again more complex. Now to get the lights, you'll have to fumble around to find that lamp app buried within dozens of other ones. 

In fact this has been the ongoing trend for software since the begining. Computers haven't increased the 'quality' of our lives as much as they've could have and at the same time they've allowed us to amplify the complexities of our societies to a dangerous level. 

This is best seen in all of the security problems we've been having lately. It has become far too easy for people to get important data on anything they want, and it is frequently the case that they choose to use this data for means that are not good for either individuals or our societies. Identity theft, credit card fraud and illegal spying are all consequences of the software industries inability to reliably create large software systems. Since security isn't visible to the consumers, it is one of those isses that can be left for later.

Added to that, there are great questions that we could answer now about various behaviours of 'things' in our societies. We've collected the data, it is stored, but unfortunately it spans so many repositories that it is essentially unmineable. We converged on this 'siloed' version of our collective data because the problems we could have tackled are just too large for any single individual to comprehend them, and there hasn't been any market or financial benefits for organizations or individuals working together in the software industry to solve this. Thus we have very few standards, many many exceptions and great walls between our data.

With more hardware on the way, the industry will obviously continue on its current trajectory. We'll just build more hardcoded ways to view the new data, and gradually combine little bits of it together, here and there. The new features will keep people amused long enough that they forget about all of the underlying bad things that are happening. That will last us at least another decade at its present rate, but what then? What do we do when we have trillions of lines of code, erratically spread all over, generating data that makes petabytes look tiny?

At some point, hardware just isn't going to be able to cover up the sins of the software industry anymore. Software will start to change very slowly and there will be a great outcry for it to start living up to its promise. Of course by then, the collective mess that we've built will be well beyond any individual or group's comprehension. And worse, it will be nearly locked in stone because we'll not have the resources to chase through the tangled webs to unravel the fundumental problems. Still, once progress is no longer apparent, there will finally be a demand to correct what was so hastily thrown together.

I'd guess what happens at that point, but I think many people are just hoping we invent artificial intelligences by then and make it their problem. They'll have the patience to slowly unwind what we've built up over decades of haste. With a bit of luck we can avoid the consequences of our immaturity.

We could, of course, start working on research to find a solution to our growing problem. Most likely it would take decades and involve going all of the way back to the basic models of computation. There are a few groups out there with this type of goal, but there seems to be very little funding available to really put significant effort into it. It's not short-term enough to interest investors and if something great were discovered it wouldn't achieve widespread usage unless it was essentially 'shared' with the rest of the world. Those qualities are generally bad for profits. It would also be too long, deep and slow to make it a viable goal for movements like OpenSource. Any notoriety or payoff would come so late that interest would have long since waned. People would just give up and move onto something more topical. 

With all of that in mind, what we'll most likely see over the next decade is a new round of technology driven by the increase in hardware meant to capitalize on the Internet of Things wave. It will probably center around being distributed, and will need to handle large volumes of simple data. These waves usually come with a new round of languages, so any of the more interesting distributed ones might grow in popularity but it is more likely that the next set will be loosely based on what is out there right now. Like all of the other waves, it will ignore the ongoing problems with version changes and it may even degrade the representational quality of the data itself to save space, time or both. It will be harder to install, and it will inadvertently take more effort to keep it constantly upgraded, both these attributes have been getting worse lately. Of course its security will be abysmal and any serious logging or auditing will be absent (sacrificed because of space). Still people will flock to it, and the next generation of coders will pronounce the rest of us out-of-touch with modern software for not degrading our practices enough to keep up.

History will repeat itself at least once more. Some trends are just too strong to escape. On a relate note, if anyone has millions of spare dollars and a hankering to finally break this cycle, I've got enough crazy ideas to last decades...