Tuesday, July 31, 2007

Two Great Problems

For me, the two greatest programming problems in Computer Science right now are a) how to bring masses of data together and b) how to easily deploy functionality. Certainly there are lots of people working on parts of these problems, but it does not seem as if people have really put the issues into focus or looked at the bigger picture.

Underneath, software is just a tool to manipulate data. We can capture mass amounts of data, but we have trouble using it. There are enough degrees of freedom in our technologies that each group of developers can choose to implement their models differently. As such, it is a higher order problem in general to combine any two sets of data. No amount of code or algorithms, will ever solve this issue. If we can't bind the data together, than we can't make use of it as a single collection. Concepts such as data warehouses try to avoid the issue by making copies of it in other locations in other formats. The amount of effort and administration to make this work are tremendous, but many organizations if they preserve have been able to keep these types of systems running. The longer the system runs, the more of an undertow that builds up against it, making it harder to change. At some point, the frequency of changes crosses over the threshold of barriers against change. Unless corrected, the pending changes to the project grow faster than the ability to make them.

The other big approach to combining data comes from the Internet search folks. They can combine masses of data, but they do it by essentially removing all of the type information and making it into one big mass of characters. It is an interesting approach, but without maintaining structure on the data, we become severely limited in the types of questions we can ask. We also move away from discreet working algorithms that provide 100% accuracy, into messy statistical heuristics that only answer the question for some percentage of the data. The results are interesting, but somehow they appear crude when we consider what types of calculations can really be done by a computer.

Most functionality out there is pretty simple. It wouldn't take long to write it, but we are forced to write all of the other other bits and bytes that are necessary to wrap it up, including the application and the packaging. If it came down to a simple set of manipulations that were allowable on a specific set of data we could make significant progress in enhancing our applications. There have been attempts at frameworks or simplified languages, but Frederick P. Brookes insistence that there is no such thing as a Silver Bullet has scared away most people from delving into this issue. While we could never remove the issue -- it too involves higher order reasoning -- that doesn't imply that we can't build a foundation onto which new functionality is easily integrated. The limits of the foundation transcend into the limits of the functionality, but we can easily build a simple very wide foundation. There is nothing magical here.

I think both problems are within our reach, but to solve them we need to stop proceeding along our current path. Generally now we just pound out reams of code, over and over again in each new and upcoming technology without really looking at the problems we are trying to solve. The entire industry, and that probably includes the academic community as well, is so focused on coding that we have forgotten to ask whether or not we are writing the 'right' code. It is unlikely, mostly because we wrote the same thing last decade, and then the decade before that... We have also forgotten to ask if we are using the right procedures to build the code, "but that's another story" as Hammy Hamster would say.