Sunday, May 31, 2015

Get Serious

It is time we get serious about building software systems.

Small programs used for entertainment can be hastily written, but they do not scale up for industrial usage. Large organizations need massive amounts of computation that they can rely upon, both to keep up with the ever-changing times but also to master the dragons of complexity that we have unleashed.

So far, in software development, all we've been doing is idly tossing code at a sea of independent silos, while the more questionable people in our industry keep making fabulous claims about how their stuff, this time, will really fix everybody else's problems. The thing is, that each time we dump in more disjoint technology we're just making our problems worse, not better. Another silo is not a positive accomplishment.

It would be really nice if there was some new "magical" technology on the horizon which would rectify the mistakes of the past, but so far I haven't seen anything that is substantially different from our earlier offerings. If we don't change the focus we'll just perpetually re-invent the same whole house of cards that is already collapsing under the weight of our dysfunction.

To hopefully get us out of this self-imposed 'vicious cycle' we need a reasonable set of goals. This would allow us to make better choices along the way and give us some tangible indication of progress. With that in mind, I am suggesting seven basic requirements, or constraints, all driven from outside the technologies, that if met, would better position any new solutions to avoid repeating the past insanities.

Seven constraints for better software systems:

  1. There is one and only one system per organization.
  2. All of the data is in one logical data persistence location, and all of it is accessible from one general interface. It has global ACID properties as a single system (even if it is distributed across a massive number of machines).
  3. There is one all covering general user interface, for everything. There can also be specialized interfaces for particular roles, but all of that functionality is available from the general interface.
  4. Data and code partitioning is malleable, on-the-fly (access, security, visibility of both data and code).
  5. Functionality is loosely bound to all of the user interfaces, all versions of all functionality are always available. Users can choose which version they want (prefer).
  6. Almost everything works at near real time (fast enough to be usable and interactive). For any really slow computations, there is one simple mechanism to schedule work to be completed and users notified. The user can decide to have immediate or delayed access to any and all functionality.
  7. Data and code organization is completely handled by the system itself, not by the users. They can name things in any way they want, but the things themselves are organized by the computer and thus are always up-to-date.

Each one of these constraints fits together to encapsulate behaviours that address well-known problems.

Having only one system per organization ensures that the data and functionality aren't siloed. It is potentially usable by all. Organizations are fluid arrangements, so this also means that you can easily split and merge systems. 'Easily' means that it doesn't take years or months. There is only one operations dept, that can get the full range of experience required from dealing with just one system. Expertise can be built up, not thinly spread across dozens of incompatible systems.

There must be one overall, consistent, fast and easy way to access absolutely every piece of data in the system through one programmatic interface (including logging and errors). Underneath, the data is likely distributed, shared and partitioned, but those physical attributes are encapsulated away. Thus at the data level there are no silos. This type of persistence solution, as will become apparent in later constraints, may share the same concepts of some of the modern technologies, but it won't really come into existence until we address it specifically. It will have ACID properties. It will be dependable.

Absolutely every single piece of new or old functionality should be accessible, fairly easily, from one complete overall general user interface. A general interface however would never be optimal for highly specialized users with deeper or more complex needs, so there would also be several more specific interfaces tuned to specific users, groups or roles. The overall interface insures that at the very least, all of the code and all of the data is always accessible to the application support roles, without having to do specific training on specialized interfaces. It might be slower to navigate around, but the consistency and accompanying organization would mean that someone, with the assistance of some domain experience, could reasonably view data, edit data or run some computation. This insures that shifts in the organization, and quirks of the specialized interfaces don't represent risks, but rather that administration might be slower at times.

Organizations change, code tends to be static. That causes an unstoppable amount of rust. Security, sub-groups and any number of access issues crop up. A system should be easily reconfigured in far less than a day, so that it can keep pace with the organization's structure. Re-orgs should not be hampered by technology; they may be unpleasant, but form an important part of preventing stagnation. The system needs to deal with what happens in real life, not just take a convenient way around it.

The programmers should be consistently building functionality, not playing hurry up and wait with the users. Some users will be early adopters, many won't. As such the users themselves should have some control on how they move forward utilizing any new functionality, which will continue to grow along with the organization (at a near exponential rate sometimes). Once tested the functionality should be available, but the older functionality must also continue to work correctly. This is a thorny problem, but it can be handled by decoupling the structure of data away from the data itself, in the same manner that we already decoupled the indexing of data away. In that sense, to make this works demands a primary structure and many secondary ones, all of which are usable by the code. The data should be restructurable on the fly, so that an early version of some functionality that depends on a given structure works whether it is primary or secondary. Also for any set of related data, some it can have one primary structure, while some of it can have an older one. This makes it easy to schedule batch processes to slowly convert the database over time, rather than having to commit to large downtimes. Given that the underpinnings can naturally support many versions, the users should have the ability to explicitly control which versions are bound to their interfaces. That is, one user might set their functionA to version 1.2 while another sets it to 3.4, and both should be able to access and add in similar data at the same time. At the same time the programmers could have moved on to implementing version 5.1. With that achieved the system really does decouple the differences in what the users are doing, from the software construction process.

Processing speed is important, but it is also important for the users to maintain control over their expected results. If some computation is slow, scheduling it to be completed in the background should be trivial and convenient. Any sort of uncertainties in time need to be removed to allow for the users to make better plans with regard to the computations. They need to be informed, kept abreast of the progress. We know that no matter how fast the hardware gets, there will always be slow computations, but so far we've been ignoring this.

The last constraint is one that I've been pondering for a while now. Allowing many users to organize assets gradually over time always seems to result in significant disorganization. Computers as a tool are most valuable when they automatically organize stuff for us. It is an incredibly complex issue, but I think it can be tackled. The idea would be similar to my earlier library example from my post in January called "Organization". As the sizes of the assets, code and data, get larger, the computer would automatically add in newer navigational indexing schemes, allowing the users to meaningfully name them, but not to create new arbitrary half-completed clumps of their own. Mostly this means sets, groups, hierarchies and indexing, on any number of dimensions with some limited combination of subsets. As the underlying size grows, the system would introduce new forms of organization, insuring that all of them are always 'complete' at all times.

The underlying essence of any trustable, effective system is a single well-defined architecture spanning the full organization that allows the data, its structure and any necessary code to keep pace with the growth and changes that are natural. This constrained flexibility is necessary for any organization that wants to fully utilize the data that it has been collecting. We have most elements for this type of system buried in our technologies, but to my knowledge nothing this comprehensive actually exists, although many a software developer has dreamed of it late at night. There is nothing in this description that is known to be impossible, although some constraints directly confront difficult problems that we have been avoiding so far. It is possible to build something that matches these constraints, but it is also possible that we won't achieve this sort of system for many decades to come; both the software industry and our programming culture have other agendas. Still, modern organizations are crying out to stop the insanity and there has been a long history of people promoting non-existent solutions to these problems. The demand is there, the seeds of the technology are also there, we just haven't oriented ourselves in the right direction yet.