The Programmer's Paradox: June 2014

Friday, June 27, 2014

Technology

I'll start by proposing a significantly wider definition for the word 'technology'.

To me it is absolutely 'any' and 'all' things that we use to manipulate our surrounding environment. Under this rather broad definition it would include such age old technologies as fire, clothes and shelter.

I like this definition because it helps lay out a long trajectory for how technologies have shaped our world, and since many of our technologies are so firmly established -- like fire or clothing -- it really frames our perspective on their eventual impact.

My view is that technologies are neither good nor bad, they just are. It's what we choose to do with them that matters.

Fire, for instance, is great when it is contained; we can use it for light, warmth or cooking food. It is dangerous when it is burning down houses, forests or whatever else is in its path. We've long since developed a respect for it, we have an understanding of its dangers, and so we react reasonably whenever its destructive side emerges.

You wouldn't try to ban fire, or declare that it is bad for our societies. People don't protest against it, and to my knowledge pretty much every living human being utilizes it in some way. It has been around long enough that we no longer react to it directly, but rather to the circumstances in which it appears.

This holds true for any technology, whether it be fire, clothing, machines, radio or computers.

Upon emergence, people take strange positions on the 'goodness' or 'badness' of the new technology, but as time progresses most integrate it ubiquitously into our lives. Specific usages of the technology might still be up for debate, but the technology itself becomes mainstream.

Still, new technologies have a significant impact.

Marshall McLuhan seemed to take a real dislike to TV, particullarly as it displaced the dominance of radios. His famous tag line 'the medium is the message' was once explained to me as capturing how the creation of any new technology inevitably transforms us.

That certainly rings true for technologies like lightbulbs, radio an TVs. Their initial existence broadened our abilities.

Lightbulbs made us free from the tyrany of daylight. Radios personalized information dissemination well beyond the limits of newsprint and pamphlets. And TVs dumped it all down for the masses into a form of endless entertainment.

Each came with great advantages, but also with significant dark sides. By now we've absorbed much of the impact of both sides, such that there are fewer and fewer adverse reactions. Some people choose to live without any of these technologies -- some still don't have access -- but they are few.

Technology acquisition seems to have been fairly slow until the early 19th Century spawned the industrial revolution, with its nearly endless series of clever time-saving machines.

We amplified theses wonders in the 20th Century to create mass production factories and then added what seems like a huge new range of addition technologies: computers, networks and cell phones.

As these new inventions have swept through our societies, they too have shown their good and bad sides.

The Internet as a technology went were previous information communications technology could never go, but also has its own massive dark underbelly. A place were danger lurks. Computers have overturned many of the tedious jobs created in the industrial revolution, but replaced their physical aspects with intellectual ones. Cell phones broke the chains on where we could access computers, but chained the users back to an almost mindless subservience to their constant neediness.

None of these things are bad, but than neither are they good. They just are part of our slow assimilation of technologies over the ages.

To many it may seem like we are in a combinatorical explosion of new technologies, but really I don't think that is the case. Well, not directly.

Somewhere I remember reading that it takes about twenty years for a technology to go from idea to adoption. That jives with what I've seen so far and it also makes sense in that that period is also roughly about 'generation'.

One generation pushes the existing limits, but it takes a whole new one to really embrace something new. Collectively, we are slow to change.

If this adoption premise is valid, then the pace for inventions is basically independent from our current level of progress. It remains constant.

What I think has changed, particullary since the start of the industrial revolution, is the sheer number of people available to pursue new inventions.

Changes to our ability to create machines enhanced our ability to produce more food, which in turn swelled our populations. Given that weapons like nukes dampened the nature of conflicts around the globe, we are experiencing the largest ever population for our species in any time in history (that we are aware of).

Technology spawned this growth and as a result it freed up a larger segment of the population to pursue the quest for new technologies. It's a cycle, but likely not a sustainable one.

It's not that -- as I imagined when I was younger -- we are approaching the far reaches of understandable knowledge. We are far from that. We don't know nearly as much as we think we do and that extends right down to the core of what we know.

Our current scientific approach helps refine what we learn, but we built it on rather shaky foundations. There is an obvious great deal of stuff to learn for practically every discipline out there and there is just a tonne of stuff that we kinda know that needs to be cleaned up and simplified.

Healthcare, software, economics, weather, management; these are all things that we do optimistically, but the results are not nearly as predictable as we would like, or people claim. On those fronts our current suite of technologies certainly has a huge distance left to go.

Each new little rung of better predictability -- better quality -- represents at least an exponential explosion of work and knowledge acquisition. For any technology, it takes a massively long time to stabilize and really integrate it into our civilizations.

Controlling fire was exotic at one point, but now it is no longer so magical. Gradually we collectively absorbed the ability to get reliable usage from it and lessoned its negative side, or as in the case of firemen, at least we built up a better understand of how to deal with any problems rapidly.

For each new technology, such as software, it is a long road for us to travel before we achieve mastery. It will take generations of learning, experience and practice, before these technologies will simply become lost in the surroundings. They'll no longer be new, but we'll find better ways to leverage them for good, while minimizing the bad. This is the standard trajectory for all technologies dating right back to the first one -- which was probably just a stick used to poke stuff.

With this broader definition of technologies, because it extends so far back, it is somewhat easier to project forwards.

If we have been gradually acquiring new technologies to allow us to manipulate our environment, it is likely that we have been chasing low hanging fruit. That is, we have been inventing technologies and integrating them roughly in the order that they were needed.

Shelter might have been first, followed by fire then perhaps clothing. Maybe not, but it would not be unreasonable to assume that people tended to put their energies into their most significant problems at the moment; we do not generally have really good long-term vision, particullarly for things that go beyond our own life times.

With that in mind, whether or not you believe in global warming, it has become rather obvious that our planet is not the nice, consistent stable environment that we used to dream that it was.

It's rather volatile and possible easily influenced by the life forms trapsing all over it.

That of course shows that the next major technological trend is probably going to be related to our controlling the planet's environment in the same way that clothing and shelter helped us deal with the fickle weather.

To continue our progress, we'll need to make sure that the continual ice ages and heat waves don't throw us drastically off course. Any ability we gain that can help there is a technology by my earlier definition.

As well, the space available on our planet is finite.

Navigating outer space seemed easy in the science fiction world of last century, but in practice it does appear to be well beyond our current technological sophistication.

We don't even have a clue how to create the base technologies like warp drives or anti-gravity, let alone keep a huge whack of complicated stuff like a space shuttle running reliably.

We're talking a lot about space exploration but our current progress is more akin to our ancestors shoving out logs into the ocean to see if they float. It's a long way from there to their later mastery of crafting sailing ships and another massive leap to our state-of-the-art cruise liners. Between all of those is obviously a huge gulf, and one that we need to fill with many new technologies, great and small.

Given our short life spans, we have a tendency to put on blinders and look at the progress of the world across just a few decades. That incredibly tiny time horizon doesn't really do a fair job in laying our the importance, or lack of importance, in what is happening with us as a species.

We're on a long-term trajectory to somewhere unknown, but we certainly have been acquiring lots of sporatic knowledge about where we have come from.

Of course it will take generations to peice it all together and further generations to consolidate it into something rational, but we in our time period at least get to see the essence of where we have been and where we need to go.

Our vehicles for getting there are the technologies that we have been acquiring over millennia. They are far from complete, far from well understood, but we should have faith that they form the core of our intellectual progress.

They map out the many paths we have been taking.

Technology is the manifestation of us applying our intellect, which is the current course set by evolution. It tried big and powerful, but failing that it is now trying 'dynamic'; an ability to adapt to one's surrounding much faster than gradual mutations ever could.

Thursday, June 19, 2014

Recycling

I was chatting with a friend the other day. We're both babysitting large systems (>350,000 lines) that have been developed by many many programmers over years. Large, disorganized mobs of programmers tend towards creating rather sporadic messes, with each new contributor going further off in their own unique direction as the choas ensues. As such, debugging even simple problems in that sort of wreckage is not unlike trying to make sense of a novel where everyone wrote their own paragraphs, in their own unique voice, with different tenses and with different character names, and now all the paragraphs are hopelessly intertwined into what is supposedly a story. Basically flipping between the different many approaches is intensely headache inducing even if they are somehow related.

Way back in late 2008, I did some writing about the idea of normalizing code in the same way that we normalize relational database schemas:

http://theprogrammersparadox.blogspot.com/2008/10/structure-of-elegance.html

http://theprogrammersparadox.blogspot.com/2008/10/revisiting-structure-of-elegance.html

http://theprogrammersparadox.blogspot.com/2008/11/code-normal-form.html

There are a few other discussions and papers out there, but the idea was never popular. That's strange given that being able to normalize a decrepit code base would be a huge boon to people like my friend and myself that have found ourselves stuck with somebody else's short sightedness.

What we could really use to make our lives better is a way to feed the whole source clump into an engine that will non-destructively clean it up in a consistent manner. It doesn't really matter how long it takes, it could be a week or even a month, just so long as in the end, the behaviour hasn't gotten worse and the code is now well-organized. It doesn't even matter anymore how much disk space it uses, just that it gets refactored nicely. Not just the style and formatting, but also the structure and perhaps the naming as well. If one could create a high-level intermediate representation around the statements, branches and loops, in the same way that symbolic algebra calculators like Maple and Mathematica manipulate mathematics, then it would just be straight forward processing to push and pull the lines matching any normalizing or simplification rule.

Picking between the many names for variables holding the same type or instance of data would require stopping for human intervention, but that interactive phase would be far less time consuming than refactoring by hand or even with the current tool set that is available in most IDEs. And a reasonable structural representation would allow identifying not only duplicate code, but also code that was structurally similar yet contained a few different hard-coded parameters. That second case opens the door to automated generalization, which given most code out there, would be a huge boost in drastically reducing the code size.

One could even apply meta-compiler type ideas to use the whole infrastructure to convert easily between languages. The code to intermediate representation could be split away from the representation to code part. That second half could be supplied with any number of processing rules and modern style guides so that most programmers who follow well-known styles could easily work on the revised anchient code base.

Of course another benefit is that once the code was cleaned up, many bugs would become obvious. Non-symetric resource handling for instance, is a good example. If the code grabbed a resource but never released it, that might have been previously buried in speghetti, but once normalized it would be a glaring flaw. Threading problems would also be brought quickly to the surface.

This of course leads to the idea of code recycling. Why belt out new bug-riddled code, when this type of technology would allow us to reclaim past efforts without the drudery of having to unravel their mysteries?

A smart normalizer might even leverage the structural understanding to effectively apply higher level concepts like design patterns. That's possible in that functions, methods, objects, etc. are in their essence just ways to slice and dice the endless series of instructions that we need to supply. With structure, we can shift the likely DAG-based representations around, changing where and how we insert those meta-level markers. We could even extract large computations buried with global variables into self-standing stateless engines. Just that capability alone would turbo charge many large projects.

With enough computing time -- and we have that -- we could even compute all of the 'data paths' through the code that would show how basically the same underlying data is broken apart, copied and recombined many times over, which is always an easy and early target when trying to optimize code. Once the intermediate representation is known and understood, the possibilities are endless.

There are at least trillions of lines of code out there, much of which has been decently vetted for usability. That stuff rusts constantly and our industry has shied away from learning to really utilize it. Instead, year after year, decade after decade, each new wave of programmers happily rewrites what has already been done hundreds of times before. Sure it's easier, and it's fun to solve the simple problems, but we're really not making any real progress by working this way. Computers are stupid, but they are patient and quite powerful, so it seems rather shortsighted for us to not be trying to leverage this for our own development work in the same way that we try to do it for our users. Code bases don't have to be stupid and ugly anymore, we have the resources now to change that, all we need to do is just put together what we know into a working set of tools. It's probably about time that we stop solving CS 101 problems and moved on to the more interesting stuff.

Sunday, June 1, 2014

Requirements

The idea of specifying programs by defining a set of requirements goes way way back. I saw a reference from 1979, but it is probably a lot earlier. Requirements are the output from the analysis of a problem. They outline the boundaries of the solution. I've seen many different variations on them, from very formal to quite relaxed.

Most people focus on direct stakeholder requirements; those from the users, their management and the people paying the bill for the project. These may frame the usage of the system appropriately, but if taken as a full specification they can lead to rampant technological debt. The reason for this is that there are also implicit operational and development requirements that although unsaid, are necessary to maintain a stable and usable system. You can take shortcuts to avoid the work initially, but it always comes back to haunt the the project.

For this post I'll list out some of the general requirements that I know, and the importance of them. These don't really change from project to project, or for domain. I'll write them in rather absolute language, but some of these requirements are what I would call the gold or platinem versions, that is they are above the least-acceptable bar.

Operational Requirements

Software must be easily installable. If there is an existing version, then it must be upgradeable in a way that retains the existing data and configuration. The installation or upgrade should consist of a reasonably small number of very simple steps and any information that the install needs that can be obtained from the environment should automatically be filled in. In-house systems might choose to not to do a fresh install, but if they do, then they need another mechanism for setting up and synchronizing test versions. The installation should be repeatable and any upgrade should have a way to rollback to the earlier version. Most installs should support having multiple instances/version on the same machine, since that helps with deployments, demos, etc..

Having the ability to easily install or upgrade provides operations and the developers with ability to deal with problems and testing issues. If there is a huge amount of tech debt standing in the way, the limits force people into further destructive short-cuts. That is they don't set things up properly, so they get caught by surprise when the testing fails to catch what should have been an obvious problem. Since full installations occur so infrequently, people feel this is a great place to save time.

Software should handle all real world behaviours, it should not assume a perfect world. Internally it should expect and handle every type of error that can be generated from any sub-components or shared resources. If it is possible for the error to be generated, then there should be some consideration on how to handle that error properly. Across the system, error handling should be consistent, that is if one part of the system handles the error in a specific way, all parts should handle it in the same way. If there is a problem that affects the users, it should be possible for them to work around the issue, so if some specific functionality is unavailable, the whole system shouldn't be down as well. For any and all shared resources, the system should use the minimal amount and that usage should be understood and monitored/profiled before the software is considered releasable. That includes both memory and CPU usage, but it also exists for resources like databases, network communications and file systems. Growth should be understood and it should be in line with reasonable operational growth.

Many systems out there work fine when everything is perfect, but are down right ornery when there are any bumps, even if they are minor. Getting the code to work on a good day is only a small part of writing a system. Getting it to fail gracefully is much harder, but often ignored as a invalid shortcut.

When a problem with software does occur, there should be sufficient information generated to properly diagnose the problem and zero in on a small part of the code base. If the problem is ongoing, there should not be too much information generated, it should not eat up the file system or hurt the network. It should just provide a reasonable amount of information initially and then advise on the ongoing state of the problem at reasonable intervals. Once the problem has been corrected it should be automatic or at least easy to get back to normal functionality. It should not require any significant effort.

Quirky systems often log excessively, which defeats the purpose of having a log since no one can monitor it properly. Logs are just another interface for the operations personal and programmers so they should be treated nicely. Some systems require extensive fiddling after an error to reset poorly written internal states. None of this is necessary and just adds an extra set of problems after the first one. It is important not to over-complicate the operational aspects by not addressing their existence or frequency.

If there is some major, unexpected problems then there should be a defined way of getting back to full functionality. A complete reboot of the machine should always work properly, there may also be faster, less sever options such as just restarting specific processes. Any of these corrective actions should be simple, well-tested and trustworthy, since they may be choosen in less than ideal circumstances. They should not make the problems worse, even if they do not fix them outright. As well, it should be possible to easily change any configuration and then do a full reboot to insure that that configuration is utilized. There may be less sever options, but again there should always be one big, single, easy route to getting everything back to a working state.

It is amazing how fiddly and difficult many of the systems out there are right now. Either correcting them wasn't considered or the approach was not focused. In the middle of a problem, there should always be a reliable hail mary pass that is tested and ready to be employed. If it is done early and tested occasionally, it is always there to provide operational confidence. Nothing is worse than a major problem being unintentionally followed by a long series of minor ones.

Development Requirements

Source code control systems have matured to the point where they are mandatory for any professional software development. They can be centralized or distributed, but they all provide strong tracking and organizational features such that they can be used to diagnose programming or procedural problems at very low cost. All independent components of a software system must also have a unique version number for every instance that has been released. The number should be easily identifiable at runtime and should be included with any and all diagnostic information.

When things are going well, source repos don't take much extra effort to use properly. When things are broken they are invaluable at pinpointing the source of the problems. They also work as implicit documentation and can help with understanding historic decisions. It would be crazy to build anything non-trivial without using one.

A software system needs to be organized in a manner that encapsulates its sub-parts into pieces that can be used to control the scope for testing and changes. All of the related code, configurations and static data must be placed together in a specific location that is consistent with the organization of the rest of the system. Changes to the system are scoped and the minimal number of tests is completed to verify their correctness. Consistency is required to allow programmers the ability to infer system wide behavior from specific sub-sections of code.

The constant rate of change for hardware and software is sufficiently fast enough that any existing system that is no longer in active development starts to 'rust'. That is, after some number of years it has slipped so far behind its underlying technologies that it becomes nearly impossible to upgrade. As such, any system in active operation also needs to maintain some development effort as well. It doesn't have to be major extensions, but it always necessary to keep moving forward on the versions. Because of this it is important that a system be well-organized so that at very least, any changes to a sub-part of a system can be completed with the confidence that it won't affect the whole. This effectly encapsulates the related complexities away from the rest of the code. This allows any change to be correctly scoped so that the whole system does not need a full regression test. In this manner, it minimizes any ongoing work to a sub-part of the system. The core of a software architecture is this necessary system-wide organization. Beyond just rusting, most systems get built under sever enough time constraits that it takes a very long time and a large number of releases before the full breath of functionality has been implemented. This means that there are ongoing efforts to extend the functionality. Having a solid architecture reduces the amount of work required to extend the system and provides the means to limit the amount of testing necessary to validate the changes.

Finally

There are lots more implicit non-user requirements, but these are the ones that I commonly see violated on a regular basis. With continuously decreasing time expectations, it is understandable why so many people are looking for shortcuts, but these never come for free so there are always consequences that appear later. If these implicit requirements are correctly accounted for in the specifications of the system, the technical debt is contained so that the operations and ongoing development of the system is minimized. If these requirements are ignored, ever-increasing hurdles get introduced which compromise the ability to correctly mange the system and make it significantly harder to correct later.