The Programmer's Paradox: August 2009

Thursday, August 27, 2009

Tightly Bound

I'd like to start this post off with a simple home-grown definition. Sometimes a bit of terminology can encapsulate a collection of ideas and make a conversation a little easier.

A computer software system is 'loosely bound' if for any specific functionality implemented within the program, most of the code involved for that functionality is specific to it and is not used for any other functionality, i.e., for each different functional behavior in the code, almost none of the underlying code is shared.

Of course, to make this definition complete the opposite is true as well. A system is 'tightly bound' if almost none of the underlying code is unique or specific to any given bit of functionality. Almost all of it is shared. The processing passes quickly from some specific block of code to something generic that then does most of the heavily lifting in the program. There is always some specific code or data at the end-points, but most of the work in the system is done by generic code.

It is best to give a simple example of this.

Most software we build these days follows a really simple pattern of being some type of GUI that allows us to navigate around, view or edit data. Within these applications, there is generally some heavy duty processing -- an 'engine' of some type -- that performs the in-depth work on the data. Mostly we persist a great deal of data either in a database or in a large set of configuration files. These basic elements are common to a huge collection of systems.

Within these types of systems, the flow generally starts from the user. The user requests to see some data, which then percolates back to the data source. The data is looked up, formatted in some way, and then sent back to the user interface for handling. So, we have two main pieces: a) the user sends a request to the persistent store, and b) the persistent store satisfies the user's request and sends back the data.

Mostly the user is navigating through some complex data landscape with the occasional action to create or modify some of what they see. This general pattern fits a large number of systems.

If you can visualize a program as a series of connections between a user and several other data stores (data bases, configuration files, etc.) then it is not a big leap to think of the code as just "lines of instructions" connecting the two together in a specific direction for some specific functionality.

For example, functionality like a user "Sign in" takes some basic information about the user, then looks up a valid user record in the database. "Edit a file" allows the user to navigate down to another (crude) type of database (the file-system), and identify a file for editing. The second half of that functionality opens the file and passes that data back to the user to be displayed in some context relative way (either by what was saved with the file, or what the user had recently set within the running system).

In a very offbeat way, we could see these various lines of code as being like a bundle of twigs standing between the user and the data. One twig to send the request from the user to the data source, and then another to send the results back again. Thus for most functionality, as it is launched by the user, it finds it way down one twig and then comes back on another.

In a loosely bound system, each piece (half) of functionality is essentially in its own twig. Except for some minor shared points, each twig stands alone; representing one half of the user's requested functionality. A loosely bound system is one where there is almost nothing holding the various twigs together; they are independent pieces of code, that share no context dependencies. They could all mostly stand alone. They are just in the same system because that's how they were packaged, how they ended up.

But in a tightly bound system, the twigs quickly come together at both ends to form branches, and the branches come together to form a trunk. In a really tightly bound system, there is only one massive trunk between the user and data, and absolutely all interaction between the two moves up or down this massive generic pathway.

In other words, the user's request to see some specific data via some on-screen control like a button gets generalized and sent down the main pathway to the back-end, which binds that to some specific subset of the persistent data. Once the database has assembled the data, it is packed into a generic container and shipped back to the front-end down the main pathway. At the front, it is dynamically tied back to some presentation information, and then shown to the user.

Most of the time, in most of the code, the management of the data is entirely generic. The code knows almost nothing about the underlying data.

In a tightly bound system, the data, which starts at one end or the other as being strongly typed, gets loosely typed as it flows through the system. In a very tight implementation the strongly typed static code at both ends is entirely minimized. It must always be there, but it doesn't need to have a large scope within the execution, e.g., the "UserName" field is also a data type, although stored as something more generalized in the database and the GUI widgets.

In a loosely bound system, we would copy the data from the database directly into a series of variables called UserName and then copy them, one after another through the system and finally into the appropriate display screen. Each link in the chain of code would know explicitly the exact type and name for all of its data. None of this code would be shared.

In a tightly bound system we would likely convert UserName into something more generic, a string perhaps -- or even farther out, an object -- and then pass that up through the system with some type of keyword or label. At the other side, we'd tie the keyword back to the data, convert it to a "UserName" again and then display it. In the middle of the code, all of the data is generic and loosely typed. At the ends, it is as specific and as strongly typed as necessary in order to complete the presentation or storage requirements (the screens or schema could actually be generic as well and not care, but that does limit display and functionality).

Getting back to this weird visualization, if we see all of the functionality of a system as this large bundle of twigs between the user and the persistent data, the tighter we bind it at the center, the more that center code gets generalized and carries an increased load. If we bind the twigs tight enough, they become one big indistinguishable trunk that then becomes the main pathway to move all of the data through the system.

Of course the point to tightly bounding the system is to try very hard to remove or reduce as much repetitive or duplicate code as possible. The twigs themselves are highly redundant.

We've long known that any redundancies are intrinsically dangerous because they can and do easily fall out of synchronization with each other, leading to expensive and difficult bugs. Time and changes both happen regularly, and both tend towards creating inconsistencies.

In most modern systems, there are two main places where we see this type of duplication the most.

On the user's side, our primary code interaction with the users is either by some set of highly repetitive screens, or in some type of MVC framework with a set of highly repetitive 'actions'. Either way, the user entry-points that manage the context and launch the functionality tend to be hotbeds of highly repetitive code. Most of the functionality is similar, thus most of the code driving it is similar too.

On the other side, near the database, although the data is stored into only a very limit set of introspective fields, typical relational database code is also highly repetitive and very static. Each and every query -- usually multiple ones even for the same underlying entities -- is explicitly moved around in very static parameters in the system. As the system grows, this code generally becomes massively repetitive and also very ugly. It's often a part of the system that most programmers try very hard to avoid or ignore.

In small and some medium sized systems, these two generators for redundant code can often be passed over without too many problems. The programmers can just do all of the extra work needed to pound out all of the extra code. Brute force will save the day.

In larger systems however, these two areas can start to push the overall complexity upwards into dangerous territory. If the programmers are just dumping code erratically into these parts, the system tends to become very unstable, very quickly. At some point, many systems reach a point where it is essentially like stuffing straw into a sack with a lot of holes. As you push it in one side, it falls out the others. It becomes a nearly endless sink for wasting resources.

By now we know that this type of redundant code can cause significant development problems. Still, even given all of our modern advice for not repeating the code over and over again, most programmers just accept loosely bound systems as being a necessity in programming, not realizing that there are other alternatives.

For most programmers, it is far easier for them to write code, if that code is as specific and static as possible. Most people have problems with generalizing, there seem to be far more detail-oriented beings who find it easier to focus on the trees, rather than the surrounding forest. It's not that surprising, it comes along with the general notion of "math anxiety" where people often fear a detachment from reality that comes from working through some highly generalized and abstract problem.

So, left to their own, most programmers will repetitively re-type in the same blocks of code, over and over again. It's habit and it is easier. They don't have to think deeply, and the quick progress makes them feel constructive. Many like the similarity -- are drawn to it -- even if deep down inside they know that there is probably a better, shorter way.

However, no matter how comfortable programmers are with making their systems loosely bound, there are several very strong reasons why tightly bound are always a far better development strategy.

The biggest reason is that if there are lots of twigs that can be generalized, then generalizing is significantly less code. Not just a 10% or 20% reduction, but often tightly bound system are orders of magnitude smaller. Why build something in 2 million lines of code, if 150,000 will do?

Programmers code at what can mostly be taken for as a constant rate. A good Java programmer -- who isn't doing a lot of bad cutting and pasting -- is probably creating about 30,000 to 50,000 lines of code per year. Given that medium systems start in the hundreds of thousands of lines of code, and large ones easily get into the millions of lines, 50,000 lines of code per year is a relatively small amount. Still, by the time planning, design, debugging, support, etc. are all taken into account, getting 50,000 lines of good clean, non-repetitive code out "consistently" would be a significant accomplishment for any talented programmer.

Within whatever bound, the amount of code that can be created by a programmer is still small. It is clear that most significant software is going to require many man-years to development. A quick hacker might get a prototype out in a few months, but to get the full product out, with all the proper documentation and packaging expected, is years and years worth of work. Programming is long and slow. It always has been that way.

Given the sizable effort involved, any significant reduction in code size, one that can particularly cut some of the base tasks into radically smaller pieces, is going to have a huge impact on the success of the project. If I know I can write the same system with one third of the code, I'd have to be crazy to write it any other way. Less code is better, way better.

Still, as most significantly experience programmers have probably figured out, it is not that initial "new" code that lands most development projects into hot water.

It's each new iteration, particularity on a foundation of constant erratic changes, that rapidly starts to burn through the resources.

As always, the initial versions and prototypes come into being very quickly, and then the "tar pit" so eloquently described by Brooks hits, and hits hard. And it is there, in between all of the massive dependencies, that having a code base that is one quarter or one tenth of the size really starts to pay off quickly.

As changes get made to loosely bound systems, the twigs start to quickly drift away from each other. There is, after all no technical reason why they should be bound or consistent with each other.

Inconsistencies build up, faster and faster. The rate of decay accelerates.

Our modern tool set makes it easier for programmers to keep searching through the code to find similar code that is falling out of sync, but we seem to shy away from utilizing these tools well, in favor of just ignoring the problems until testing. Until it is too late.

As an added bonus, not only are the original twigs falling out of sync with each other, but most programmers are ignoring this and hastily adding more and more new twigs, compounding these inconsistencies. The extremely repetitive nature of a loosely bound system causes all of the problems we would expect with having the same code blocks repeated over and over again. We know not to make some types of repetitions within our code particularly with variables or data, but for most programmers they don't really see that that is exactly what they are doing with their loosely bound systems.

As if that weren't bad enough already, the whole nature of the system means that it is really hard to isolate the new changes and just retest some sub-pieces. Unless you explicitly tract all of the changes, the nature of the twigs forces one to retest the entire system, each time, just in case. That, in its self, if done properly is a huge effort, one that is clearly skipped far too often.

Now contrast all of that with a tightly bound system. If the amount of unique and distinct code is small, then it doesn't take long before the processing has moved into some generic routine.

Adding new data to the database, or adding new screens to the GUI is all about just putting in that barest minimal amount of code into the system. All of the other handling is generalized and used by other code. Unlike a loosely bound system, as the code base grows, the ability of the programmer grows as well. The system gets easier to add to later, rather than harder.

If there are hundreds of functionality entry-points all using the same underlying generic code, then testing one of those points essentially tests all of them. There is, of course, still some high level or low level differences, slight variations on the screen or in the database, but mostly you can find that if the main pathways are working for the main data, there is a very diminished likelihood that they are failing for something else.

The bugs shift from being implementation problems to being presentation problems or small inconsistencies. If you have to pick your bugs, then these are a far better choice.

Of course, initially in the new development, generalizing the code is a hard prospect. You have to think about the problems a lot more, instead of just banging out the lines at high speed.

And, since none of us really see the full generalization right from the start, to get a really tightly bound system requires way more effort in re-factoring the code, iteration after iteration, to bring together the repeating pieces into denser and denser implementations.

Another point about a really dense tightly bound system is that the underlying primitives are quite obviously more difficult to understand. That is inevitable given the ever increasing pressure to keep making the code do more things. Dense code is harder to understand, which means another extra level of trying to keep it as clear as possible while still making it dense.

And still another problem with a tightly bound system is that it is much harder to distribute amongst a large group of programmers. That is often why you see the big organizations resorting to loosely bound systems. Brute force works with lots of coders even though the results are predictably fugly and repetitive.

Building a culture where coders want to extend what is there, not just quickly re-hack their own lame version, requires providing very difficult team dynamics and overall system architecture, as well as training and documentation. Programmers shouldn't sit around stranded on critical resources, but they also shouldn't just be splatting out code at some over-the-top rate.

Finding organizational arrangements that intrinsically facilitate good architecture with tightly bound results is an unanswered question in computer science.

So quite obviously, a tightly bound system is considerably harder to write. Programmers can't just throw themselves at the coding in the hopes of accidentally discovering the perfect generalizations. It takes slow deliberate thinking in order to get that tight binding at the center of the system.

Still, if and when it is done well, the extra effort in design and thinking in the initial part of the project pay off hugely at the release parts. Gradually, as the overall savings accrue, the work quickly puts a project ahead. And if the developers have been disciplined about keeping the code clean and consistent, the overall complexity of the code isn't growing at a dangerous rate.

A tightly bound system is a much better and more workable development project. Once over the initial design hump, it increases the likelihood of success and can really make the difference in being able to deliver new extended versions of the system.

Unfortunately, it is easier to develop loosely bound systems, and more importantly, most of the writing, advice, tutorials, etc. out there that tries to teach programming, explain design or help with technical issues also pushes the idea. It is hard enough to build a tightly bound system, writing about it is just too complex an endeavor for most people.

Regardless, the problem remains that a significant chuck of our development resources are burnt up in trying to beat on loosely bound systems. The low quality, frequent failures and high inconsistency rates in our modern software industry are testimonies to the fact that most developers are not being nearly as effective with their efforts as they could be. As an industry we flush away a huge amount of resources simply because we misunderstand were we can get the best use of them.

It is ironic because so many programmers strongly believe that their work involves a significant amount of intellectual creativity, yet they quickly fall back on using the most mindless brute force approaches to splat out as much brain-dead code as possible. They want their work to be recognized as intellectual, even if they don't want to put in the effort to make it so.

Sunday, August 9, 2009

Halting a Problem

I can remember quite vividly, well over twenty years ago, when I first came across a really good description of the halting problem. It was in an early Computer Science course. It had a huge impact, mostly because I was quite skeptical about accepting it as being real.

In its nature, it is similar to those many strange loops that Douglas Hofstadter explains so well in "Gödel, Escher, Bach: An Eternal Golden Braid". I talked about some of them in my previous post, so I won't repeat it again here.

That early description of the halting problem came as a simple story. It was about a kingdom -- somewhere imaginary -- that was having terrible problems with its computer systems getting caught in infinite loops, thus wasting tonnes of valuable resources.

This was a much larger problem with the older "central" mainframe architecture, where everyone shared the same massive system. One person wasting CPU is drawing it from others that might put it into better service. Most things ran as batch, so an infinite loop might run for a huge length of time before someone noticed and terminated it.

I tried to find an Internet reference to the original fable we were taught, but as it predates the web, there doesn't seem to be an online reference to it anywhere.

IN A KINGDOM FAR FAR AWAY

In the story, the King calls together his best programmers, pushing them to work tirelessly trying to create a program that will check all of the others for possible infinite loops. In that way, they can be examined before they are run to prevent any wastage.

They toil long and hard looking for a solution, but to no avail. With failure after failure mounting, some savior comes along and proves that it's not possible. They can't do it, it will never work. They are wasting their time (more time really).

It was a really good simple explanation that takes one a long way down the understanding of the problem.

The halting problem as it is discussed in Wikipedia, doesn't capture the real essence of the issue. It talks about things like undecidability, and other strict formal references, but in formalizing its mathematical description it leads people away from a more intuitive understanding.

Computers can emulate other computers, but only by doing exactly what the other machine should be doing. They contain no ability to look into the future, to guess or even estimate. They contain no ability to reason. So given some arbitrary program, they cannot make any inferences about it, other than whether it is in fact a syntactically correct program. If they want to know whether or not it will halt, they need to execute it. Line by line. And if they do, and it fails to halt, then the question is never answered.

If they just cut out after some pre-existing length of time, then it was always possible that the code might have halted, that the proper conditions to break the loop could have occurred. The question is not answered decisively.

Sure obvious endless loops are simple to find, but there is an infinite number of ways to create an infinite variety of endless loops, all of which are beyond the finiteness of the code to detect properly. It's another fun version of the problems with size in formal systems.

And because programs are locked into being formal systems, they contain absolutely no intelligence other than exactly what we pound in there. That is a huge limiting factor.

I wish I could find that original story, because it also deals with hubris, a common programmer affliction.

The King and his coders pursued a hopeless direction, repeatably in spite of obvious failures. They did so primarily because their intuitions on this matter were incorrect.

What initially seemed simple, was beyond possibility.

We live in a day and age where many people still believe that our world is mostly deterministic, and that through action we can and will fix all of our problems, regardless of whether or not a solution is really possible.

Hofstadter's strange loops lay out a series of problems that just don't fit this perspective. They break with our intuition, making us uncomfortable. But that doesn't take away from their truthfulness. The world is not nearly as simple as our minds would like it to be. Strange problems abound.

Some, such as regression have gradually become accepted over time, while others have been pushed to the margins. It seems that many of us would rather ignore these types of problems, as in Cantor's time with infinities, than to actually face and admit to them.

We're barely more enlighten in this day and age then they were 200 years ago. Even with all of our fancy toys, we've only scratched the surface of trying to really understand all the information that we have amassed. We have a lot of data, yet very little knowledge.

SOME CONSEQUENCES

A consequence of the halting problem is the general understanding that computing is unbounded. There are no limits to the amount of things we can or should compute. There is no length of time that bounds all computations.

Certainly with Moore's law, and our ability to distribute processing or run it symmetrically we've been able to consistently find more and more computing power. And we've been consistently using more and more of this power for each new generation of software system.

So it would be absurd then if someone came forth and choose an arbitrary number for the total number of steps that a computer could or should execute? If they just simply locked all of us into a fix limit of instructions. If they set a limit and insisted that anything more was not acceptable.

It would be simple madness, even in the sense that whatever limit might be applicable ten years ago is clearly just a fraction of modern usage. Performance doubles, then it doubles again. And we've grown to consume all of our massive power increases, often requiring more, driving our modern hardware into over-processing fits. We'll use whatever we are given, and then come back for more.

HOW LONG IS INFINITY

The halting problem is about infinities, and in its definition it usually takes place over a Turing machine with an infinitely long tape. The infinities that bind to it are an important part of its formulation.

I've often wondered if the finiteness of our real world -- bounded on a grand scale by a fixed universe, and on a minute one by quantum theory -- bypasses these purely mathematical issues. After all, infinite things seem to be more of a mathematical reality, than a physical one. Our world has natural bounds, so maybe our theories should too.

The halting problem disappears quite swiftly if we just removes a few of the infinities from our model. If an infinite loop isn't infinite, than we can just execute it to the end and produce an answer. If the model restricts the size of a maximum loop to some fixed number than, presto blammo, there is no more halting problem.

Not only that, but machines can suddenly do a whole lot more with themselves. Great chunks of computer theory fall away because they are predicated on the halting problem being true. The world becomes a lot more interesting. Computers become way more powerful.

Well almost. We know pretty much that from our own perspective, time at least is still infinite, and in that context (at least for us) an endless loop is really endless. These things do not change in our real world, and they will not change. So a model with a finite number of computations is a severely broken one that does not match our reality.

No matter how much we do not like the halting problem, it cannot be dismissed from our knowledge. We're stuck with it.

Now all of this was to some degree, pretty standard in the study of Computer Science (CS).

I found that most people in CS hated the theoretical courses and tried to drop this knowledge as useless the moment they graduated. Many programmers don't want the limits of theory to interfere with their day-to-day programming grind. Most programmers just don't want to know.

Also, a lot of people come into programming from different backgrounds, so I really don't expect them to understand the base points of computer theory. It was hard enough for most people to sit through the required course, it must be brutal to try and learn this stuff without being forced into it.

I do think that all programmers should have some idea about what's out there and why it is in fact important. Theory isn't useless, particularly if it keeps you from wasting your time trying to build impossible things. Still, it is the nature of programmers to want to ignore as many things as possible, while focusing as tightly as possible on what they are trying to build.

CONTROLLING CHAOS

Getting back to halting, our computer systems, as Turing-complete machines, are always and will always be subject to running in infinite loops. More importantly there is no real or correct way to prevent this. It is the nature of what they are, so we need to learn to deal with it as part of our technology. We can manage the situation, but we cannot remove it.

We know we can't write an infinite loop detector, but if we are building a base for others to run their components, some of us clearly would like to do something to prevent problems with rogue code.

The reasonable way to handle the problem would be to contain the CPU usage so it doesn't make the rest of the system sluggish, and to provide some easy way for the users to interrupt the processing if they've lost faith in it running to conclusion.

Computers can't be intelligent, so, as always in these cases we need to push the choice back up the user. The user should be able to easily decide at some point that they have waited long enough.

We could also add some type of direct resource limitations, but these need to be tailorable to the specific systems, code, users or any other circumstance take may require a longer or shorter bound to be implemented.

Since we will never pick the "correct" finite limit, there will always be a large number of occasions where we need to easily change any limits quickly and painlessly. You can't trust a computer to make an intelligent choice, and you can't encode enough logic into the system to cover all possible contingencies.

A really trivial way to "restrict" resources would be to pull some arbitrary number out of a hat and prevent all programs from running for longer than that number of instructions. Of course, no one in their right mind would do such a horrible thing, it's an obviously bad hack. Any, and every random fixed limitation eventually becomes a nasty operational problem to someone, somewhere. Limits can be useful, but they have to be implemented very, very carefully.

We've been at this long enough to know better than to just set hard arbitrary parameters. We go through a mass amount of trouble to make things more configurable than necessary just to avoid the types of stupid limitations that plagued so many systems in the past.

If software hasn't progressed very far over the last couple of decades, at the very least we've learned enough to make our configuration parameters massively more pliable. We've learned to stop second-guessing operational environments, and to not restrict users from doing things unless it is absolutely necessary. Well, at least some of us have learned that.

INTERNET EXPLOITER

Given all of the above, you can imagine how annoyed I was to have a dialog from IE pop up on my screen with the following message:

"A script on this page is causing Internet Explorer to run slowly. If it continues to run, your computer may become unresponsive. Do you want to abort the script?"

I had just finished adding in some advanced encryption code in JavaScript that was insuring that any critical information in the DOM was properly secured. Not a bad choice, given how easily it was for people to inject some rogue JavaScript into an open interface.

Our choices with these types of data access problems are to batten down the hatches so tight that the users are absolutely restricted in what they can input, or to protect the internal data and accept that the browser or the user won't get caught by any tricky things like Trojan horses.

Rigid restrictions are probably the second most popular reason for users to swear at their machines and hate their software.

In my most recent system I want openness for the users; I am rather hoping that the system allows them to move in and occupy the space, rather than trying to keep them at a distance while providing awkward fragile tools.

To do this, I need to let them control as much as possible, which means that I need to restrict the absolute minimum from them.

But I can't do this at the cost of making the system hopelessly insecure. It's a poor trade off. Thus they can inject things, but they can't make sense of any of the critical internal data. A great idea, but to make it work when they log into the system I need some serious CPU in the browser for a few seconds.

To be fair, it doesn't really matter what I was trying to do. If it wasn't this particular piece of code, this week, it would have been something else next week. As the hardware gets faster, our expectations for code will increase as well. We're always trying to push the limits in one way or another.

This message pops up when IE thinks that the code you've executed in your browser is running in an endless loop. This, as I've said is a way to deal with the problem. A bad way, but still a way.

Some group of people at Microsoft made an unfortunate decision to add the parameter called MaxScriptStatements into their code which limits all JavaScript execution to a rather small five million statements. And they set it up so that the only way to increase that number was to update it directly in the registry. It's beyond the programs and most users to be able to change it.

This is exactly the type of bad decision that has been holding back software for years and years. When programmers make these types of rash choices, they ultimately propagate upwards through all of the dependent code.

Our systems are built on layer after layer of bad choices. By now we know what works, but somehow that doesn't stop us from building on shaky foundations. Some of our best known modern technologies are just cobbled together masses of poorly written code. Somehow, the industry has decided that quantity will magically make up for low quality.

And even though the original choice was to supposedly enforce good behavior, it isn't long before people start working around the limits and the problems returned.

The GWT library for example already has an IncrementalCommand class to allow the programmer to work around this limitation. It just automatically chops up the computation into smaller pieces. A function that is really the responsibility of the operating system, not a framework.

Sure it's hack, but that is what always follows from a lower-level hack, just an endless series of unnecessary complexities, each of them making it a tiny bit harder to actually get something to work reliably.

For example, whole generations of systems stupidly distinguish between text and binary files even though modern computers could care less. It's some ancient stupid choice in an early DOS system that still has ramifications today.

INFORMATION WANTS TO STAY HIDDEN

I'm not really the type to sit on something that I've found, particularly if I know it is important with respect to software development. In this case, I wanted to get this dreadful limitation in IE out there to as many programmers as possible, in a way that they would hopefully learn something.

What was done is done, but that doesn't mean we can't learn from it, perhaps some day we could stop repeating these types of bad choices. Perhaps someday, we could stop pouring more bad code over ever shakier foundations.

I figured humor would do it, but then being funny isn't really my strong suit in writing (I'm learning, but its very slow :-).

With this in mind, I drafted a question for StackOverflow (SO) that I was hoping was interesting enough to get some attention, while being amusing enough to get people to really think about what I was trying to say.

http://74.125.47.132/search?q=cache:0heGdkeTOJQJ:stackoverflow.com/questions/566010/has-microsoft-solved-the-halting-problem+Paul+W.+Homer&cd=16&hl=en&ct=clnk&gl=ca&client=firefox-a

I made a couple of huge horrible miscalculations.

First was that my sense of humor doesn't always work in writing. I think it comes off as too sarcastic or too arrogant or something offensive. I'm not really sure, but I rarely seem to get the types of reactions I am hoping for. I guess its one of those great skills that I just haven't acquired yet (if ever).

The second miscalculation was to pick on Microsoft. Our industry is divided, with some people strongly in favor of Microsoft and other people strongly against.

Since this was an IE problem, and yet another in a long line of similar Microsoft induced problems, I really felt that they deserved a bit of sarcasm. All big companies have a "culture" that shapes and defines their products, and in the case of Microsoft their specific culture has always been in favor of turning out mass amounts of code, and against trying to find short, simple and elegant approaches to their problems. In essence, they use their bulk to brute force their way through, each and every time. They're a big company that likes to dominate more than innovate.

That might be enough in some forums to bring down the public wrath, but StackOverflow is overwhelming biased towards Microsoft. The roots of the people involved come from the MS side of the industry, and the site does a lot for promoting the use of Microsoft products.

Thus it was a relatively bad idea to go into a Microsoft love-fest and start making rude noises about them. A worst choice if you can't write well either (and you're not funny).

Still, somethings just have to be done. If I were really good at making smart choices, I'd retire to my private island and leave all of this squabbling for some other guys.

A VERY BAD REACTION

Mostly to my surprise, my question bombed really badly at SO. It picked up some immediate ire and lots of rapidly expanding negative points, only to be shutdown a few minutes later.

SO has a "system" for eliminating unpopular opinion. They try to base it around the questions not being legitimate "programming" questions, but the truth is that anything with even the slight bias towards a statement gets shutdown real fast.

The questions are limited towards just the things that can be retyped out of manuals. Discussions are restricted towards just simple facts.

The sentiment is that they don't want the site to become choked by religious wars between rival programmers, spam or other noise, but the consequence is that there is a very nasty authoritarian streak in their behavior.

There are lots of overly zealous programmers running around "scrubbing" the questions and shutting down what they don't like. The rules are ambiguous enough that the real enforcement is left up to the individuals, always a dangerous practice. If SO was a library, out front there would be a raging pile of books, feeding an eternal flame. And librarians toasting marshmallows on sticks.

It's a good place to ask some simple questions, but it was probably a very bad choice as a place to try and communicate with any fellow developers.

Still, in all ways I found the responses to the question to be quite amusing.

Over the years in analysis I always learned to look deeply at things that don't quite behave as I predicted. In domain analysis, the best place to learn things is when the users bring up unexpected issues.

What we know is less important than what we don't. I've always found that behind the complaints that other developers quickly dismiss as not valid, lies all sorts of interesting and fascinating truths. That when things changes frequently, instead of getting angry, I often choose to ask deeper and deeper questions.

What we understand of the world around us, is often only the shallow surface of a much greater and deeper sea of knowledge. There is much buried behind things, that sometimes even the most trivial fact is the gateway to a vast new understanding.

THE SLOW SINK TO THE BOTTOM

If SO hates the question, then I am entirely fascinated by why this is happening. If they sent it nearly to the bottom of their list of questions, than there is far more there than just a bad piece of writing.

After the initial surprise reaction to the my question, it started a slow deep sinking. Gradually it worked its way towards the bottom of the pack.

I do have to admit, that it wasn't entirely based on merit. At least a couple of those negative votes were encouraged by me.

Although many people were irritated by the post, I did receive some positive feedback, and a few people found it to be quite amusing. Of course, since it was already negatively rated, most support choose to sink it to a more obvious place. Positive, negative votes.

Now, a smarter and possibly more conscientious man might have been dreadfully embarrassed by getting such a negative rank. The bottom of the pack. The worst question ever.

I however thought that for right or for wrong, the extremely negative rank was a perfect balance to my overall tone in the piece. It had seemed to have found a place at the bottom of the list, happily bring up the rear. If I can't be famous, at least I can be infamous, or so I had hoped.

REVISIONIST HISTORY

Another most interesting thing about SO is that the surprises never seem to cease coming. Five months after having posed the question, most of the time dominating the very last position in the list for questions, some people in SO decided to delete it.

Well, it wasn't really deleted, just hidden so only "privileged" 10K users can see it. A rather strange process, given that the site is trying to be a public resource, not a private stronghold of secrets like some ancient church quietly burying uncomfortable knowledge in its vaults.

http://meta.stackoverflow.com/questions/11798/where-did-my-question-go

Deleting it, of course, is a very bad idea. If I, as the author, asked for it to be removed that would be one thing, but to choose to do it because several individuals felt that they didn't like it was quite different.

If it makes people feel uncomfortable, that in all ways is a good thing. We do not get significant progress from the status quo, it always comes from without. Sure the in-crowd manages a slow and steady push forward, that gradually helps, but it's always someone on the outside that makes for the fantastical leaps.

The inside is filled with convenient popular ideas that are easily acceptable, that is the definition of inside. The outside is filled with stupidity, madness and wanton acts of hubris. Along with all of the fools on the hill, are a whole series of crazy voices proclaiming a dizzying array of weird ideas.

It's not that I think the question was deep, or that it was funny, or even right or interesting. For all I know, the majority is correct and the question sucks. Badly written and obnoxious. But that is only a minor issue.

It is minor because we should not now or ever wipe out ideas just because the "group think" doesn't like them. We should not prune the bottom of the list or try to cover up our own stupidities.

That the smartest men on the planet often make really stupid choices should go without saying, we are human after all.

But even in that, who are we to say that the crazy ideas are wrong? After all our history is filled with men like Galileo, Cantor and Turing, and a host of others whose ideas were initially greeted as being wacky. If we are to learn anything at all from history, it is not to judge ideas, even when they appear to be completely crazy.

In our most modern time, Google's wonderful map-reduce concept was overlooked for decades. It's not what is popular that counts, it is want lies on the fringes that will be the next great thing.

So if people get together and nuke the fringes, just because some of what is there makes them uncomfortable, for any reason, right or wrong, then we don't just risk missing out on the next great ideas, we absolutely guarantee that we will.

What is crazy today, is often reasonable tomorrow. No doubt people once said things like: "The earth orbits the sun, you'd have to be mad to believe that."

FINAL THOUGHTS

Ultimately, other than it was really amusing, I don't care that much about my question. I tried to make a point, I failed, and then slowly it sank back into making a point again.

Being deleted, well hidden actually, allowed it to make a even more valuable point then I had originally intended.

Right or wrong, stupid or perceptive, it doesn't really matter what the question is, it only matters what happened to it. It matters how SO dealt with the question, and how the people in SO got together and removed it.

There was once a time when we had stronger moral convictions, and most people understood the difference between right and wrong. When they realized that silencing the world around them doesn't lead to a cleaner more orderly existence, but rather towards more bad acts of control and abuse of power.

We tolerate dissent because we used to know that quashing it doesn't work. That it is necessary, and that not all great ideas come from the party line; from the authorities and from those in power. From any small group of narrow minded people too focused on protecting their own self interests.

If we are interested in really making public forums, ones that truly support real discussion and progress, we can't just wipe out the unpopular stuff. Removing spam is one thing, removing stupid and ugly questions is quite another. It's taken centuries for us to really learn these principles, but only decades for us to unlearn them.

Beyond just the control and abuse issues, it seems that one of our most sever casualties in this modern age is our ability to really understand things. Ironic given our quick and easy access to such vast seas of knowledge. Having a fast reference source seems to make it harder to actually apply the information.

Sure we have all of the facts about morality in Wikipedia, but if people don't really understand them, they can't apply them in the various circumstances in their lives. That knowledge, along with our right to privacy is dying quickly in our modern web. And we are allowing this to happen.

No doubt, we will wake up one day soon, in a society not of our liking. Each little step we take, heads us farther in a direction that people have been fighting against for generations. History has a funny way of replaying itself over and over again, and we seem to be just setting up the next great failures.

Saturday, August 1, 2009

Dynamic Ramblings

A while ago I was discussing some software architectural options with a fellow developer and we ran into a little hitch with our conversation. Put quite simply, the word 'dynamic' has a couple of possible meanings. He was using it one way, and I was off on a complete different definition.

For me I've always taken 'dynamic' to literally mean the opposite of 'static', and for 'static' -- in the context of coding -- to mean the very precise specification of something specific be it code or data.

Static variables, static data, static data-type, static code, etc. all imply that there are no inherent degrees of freedom within any of the code or data to do anything other that its basic function. It is what it is.

As we code, we can replace instructions with all sorts of variables, and with each new variability the code gets a little more dynamic. Things that are well-defined are static, otherwise if they can vary to any degree they are dynamic.

More interestingly, if we allow the data-types of the variables themselves to go as not being specified, the code gets even more dynamic.

There are of course techniques to do this with the code itself, like polymorphism or function pointers, so the code can be dynamic in this sense as well. In a similar fashion, as the code runs, the underlying objects and methods can be not specific until they are needed. Then they are bound to some concrete implementation. Exactly the same as above.

Most interpreters even allow for code to be created on the fly, another way of making it dynamic, although the construction techniques are generally bound to static permutations (the generated code is limited in breadth by the generating code).

In a sense my use of dynamic means stuff is generalized, but only in its behavior relative to a set of fixed actions or the data. The more dynamic a routine, the more different variety of actions and data it can handle.

Of course, that software always being grounded in practical things means that at some point in the running life of the code, all of those degrees of freedom must collapse down to something specific. The code must actually do something, the data must actually be something.

The variable nature of the data must go from being abstract to concrete. Specific lines of code must be executed. The whole thing must actualize. There must be static data and data-types moving through the low levels of the code at all times, or else the code can't eventually make heads or tails of its data.

You can pass around some generalized data structure, but to do something specific with it, you need to know exactly what the structure is. If it's too general it's useless. Thus we can dynamically pass around pointers to things, but we have to de-reference them to get the actual data back.

So, I tend to think of dynamic as just introducing more and more degrees of freedom into a body of code and data.

Ultimately to really do something, everything that is dynamic must eventually become static in the end. What binds these degrees of freedom to their ultimate counterparts could be code driven or come from config files, or even passed in as some type of syntax.

Every computer system falls back to the same basic principles a) the total amount of code is finite (or generated in a finite manner) and b) the total different types of data is also finite (although some types can be very general). These two points become very important later in this discussion.

ANOTHER PERSPECTIVE

My developer friend however has a slightly different definition of dynamic. He sees it as the ability of something to extend itself. In his sense a dynamic protocol for instance would be one that could grow and get larger, adding new things as it goes. Where in mine definition of dynamic, for a protocol it would be one where fewer things were initially bound to static values. It could range over a wide variety of outputs, but it couldn't actually grow.

I understand his definition, but I tend to think of this properly of something to really grow itself as being far beyond that of being just dynamic. In a sense, I think that what he is getting into is a far larger attribute that needs its own term, something more closely connected with extending a formal system, not just opening up the degrees of freedom.

Our discussion was interesting because lately I've been reading 'Gödel, Escher, Bach: An Eternal Golden Braid' written by Douglas Hofstadter. It's an awesome book written almost thirty years ago, and has been on my must-read list forever. I've delayed reading it over the years partially because it draws a lot of similarity to what I studied in university as part of the combinatorics and optimization branch of mathematics. Partially because I know it is a very hard slog, although well worth it.

The book really focuses on strange loops with respect to formal systems in mathematics. By that I mean particular corner cases in math such as recursion, self-description, stacks, infinity and other bits of interesting, but non-intuitive concepts. It ties these ideas back to Escher's works in Art, and Bach's work in music as a way of trying to make such very difficult abstract topics map back onto very concrete things in reality.

He centers his ideas around formal systems, they are some well-defined way of specifying the rules of a system through a set of parameters and axioms. Basically a rigorous set of rules that when applied correct can generate all and everything within the system. Arithmetic is a simple example. Logic is probably the best known one (thanks to Spock). Various branches of mathematics are larger and more complex formal systems.

While I've seen these ideas in various courses in my university days, it has been decades since they've held a place so active in my thoughts.

In another more recent conversion with a friend -- over beers -- I found myself headed back over some of Hofstadter's key concepts. Most notably that Gödel and Turing showed that all formal systems were finite.

Not only were they finite, but also there is always at least one axiom that should (could) be in the formal system, but is not actually contained there. A valid axiom that falls out of bounds. Thus formal systems are incomplete by definition.

Another consequence of these ideas is that the overall space of all formal systems is infinitely large. It is effectively boundless. Always larger than the largest formal system.

While these ideas are simple in concept, they are hard for humans to accept since for centuries we've been categorizing the world as one large deterministic system.

The core of our beliefs is to accept that one day science will have all of the answers, and the core of Gödel's work was to show that mathematics itself was a formal system and that it was incomplete and would always be so. Our sciences are all founded on mathematics to some degree or another. So no matter how much we know, there will always be things we could know, that are beyond our knowledge. It never ends.

UNEXPECTED REFERENCES

Oddly I saw this excellent video recently called Dangerous Knowledge based on the lives of Georg Cantor, Ludwig Boltzmann, Kurt Gödel and Alan Turing which focused on how all of their lives ended tragically, trying in many ways to blame it on their obsessions with the very kinds of strange loops that Hofstadter likes so much.

And even more recently I was having an interesting discussion with another blogger Mark Essel at his site called Victus Spiritus. That conversation started with the semantic web, but gradually drifted over to some concepts from AI.

I was finding it more than curious how all of these different references to the same underlying bodies of knowledge were crisscrossing each other just in time for me to be working on my next post.

Getting back to the beers, as they loosened my tongue, I found myself talking about how formal systems are always finite, and how computers themselves are intrinsically formal systems.

In that sense, as I have said before, the only intelligence in a computer program comes directly from someone putting it there.

Mapping that concept back to a formal system, we see that a formal system is composed of the base axioms that define it, and nothing else. The size of a formal system is dependent on the number of base axioms in it, although the axioms themselves might lead to the production of other valid axioms in the system. These derived axioms are entirely limited by the initial ones (like generated code). Thus the system is only as large as someone makes it, and only as complex as someone makes it. What really feeds a formal system is outside of it. The intelligence of its creator.

Leaping back and forth between "programs" and "formal systems" we can see that the code and knowledge we program into the computer are just axioms in a formal system. That isomorphism bothers some people, but it is far too convenient to pass up or ignore. Our programs are only as smart as we make them.

CONSEQUENCES

Gödel's ideas however have some interesting consequences. One of which is that humans do not appear to be limited by formal systems. We seem quite able to re-write our axioms over longer periods -- changing as we need to -- so any type of simple model like logic is just too restrictive to account for our current level of sophistication (Vulcan aren't possible either, I guess).

That brings up another interesting bit, as I was reading Scientific American and they had a good article on the two halves of the brain. They were suggesting that the two halves work differently in many other species besides man. Mammals and other creatures show split brain behaviors.

They hypothesize that one half of the brain handles the normal day to day living issues. We can see it as a formal system that contains all of the axioms on how to spend our days and nights.

The other half of the brain is more interesting. They implied that it was some form of almost error handling. The part of the brain that handles special circumstances., strange conditions, etc.

As we work through the day, both halves of our brains are searching for answers, both working on the identical problems at the same time. However, one half or the other finishes first, and we respond to that result. It's a heuristic with a constant race condition between throwing an error or computing the results.

Although I suspect it may be too simple, I really like this model of the way a brain functions. It means in a sense that we always thinking about what we are doing in two different ways. Assuming that the day to day stuff works, it finished first, so we do that and move on. If it doesn't finish first we react with the other half of our brain, the error half. It seems to explain why people get funny under unusual circumstances. Perhaps it explains why things like beer liberate our behavioral patterns. It certainly is an interesting concept, isn't it?

But if we're trying to tie this back to a formal system concept we need to go deeper. Two formal systems are really just one larger formal system, and they are still both finite and incomplete. It wouldn't change us as people to have two halves of the brain or just one, unless something was different between them.

ANTI-SYSTEMS

Beer and a bit of tobacco (I quit, but occasionally I mis-behave) got me thinking about the opposite of a formal system. After all, formal systems sit in an infinite space. No one system can occupy it all, and it goes on forever. There are an infinite number of formal systems, and even the sum of all of them does not fill the space.

What if we had the opposite of a formal system? An anti-formal system. If it's defined as a set of rules which are not in the system and the whole thing is unbounded and infinitely large then it would be exactly the theoretical opposite of a formal system. I vaguely recall Hofstadter talking about how such a thing isn't well defined, but I sense that his meaning was a little different than mine.

If one half of our brains were a set of axioms about what we should do, then the other half might be a set of axioms about what we should not do. I.e we should not hang around, or we should not get that glass of water or we should not go to the store. Really it's this negative space of axioms about what doesn't fit into the anti-formal system. A system that is infinitely large.

That's a neat idea because we know that the only way a formal system can grow is if we (as intellectual beings) grow it to be larger. But two formal systems are just one giant one, so they can't really grow on their own (derived axioms don't count).

This gets around that, in that if a human is a combination of a formal system in one half of the brain, and an anti-formal one in the other half suddenly we have a model (albeit peculiar) were one half of the brain can add rules to the other half, but it itself is not bounded (but it's also not enough to drive us to do things). In some funky way, that secondary half could be seen as the sub-consciouses, while the formal half is the conscious.

OK, it's a bit weird, but in that way we have a solid (and nearly deterministic) perspective on people that explains why we're not bound by formal systems. Of course "intelligence" sits on top of all of this, since according to the article other mammals have the split brain as well, yet they are not (nearly) as intelligent as we are (although I have occasionally been outwitted by my dog, so who really knows).

In thinking about this I did have another idea as well. Although computers are bounded by finite formal systems, and a near infinite capacity is a necessity for intelligent behavior, we don't necessarily need to break that restriction in order to have something close enough.

If we build a system that makes it nearly trivial to add new intelligent axioms, although the system would have to go outside of itself to get it (to a human), it could possibly grow fast enough to more or less appear infinite. I.e. a rapidly expanding finite system converges in the long run with infinity (although it never fully gets there).

Get enough people adding enough data to some finite base and to most people it seems to go on forever, Wikipedia is an example of that.

We might not be able to get to AI, but at very least we might be able to create formal systems that for all intensive purposes are completely and totally indistinguishable from AI.

A NEARLY DETERMINISTIC UNIVERSE

Just because it walks like AI, and talks like AI, doesn't mean it really has to be AI.

In many ways this is the lesson that Hofstadter, Cantor, Boltzmann, Gödel and Turing have left us with, that the universe is not as simple and deterministic as we think, but it is not entirely un-understandable either. Once we grok the strange loops, we can get beyond just trying to force the world around us into our narrow human perspective.

We try to orient things into our simplest viewpoint, but in doing so we often conveniently leave out the really awkward, but hugely important ugly bits. Like infinity in Cantor's time or the space of all formal systems being infinite in Gödel's and Turing's time.

Getting back to my friend's definition of dynamic, my version meant that to any degree a dynamic system was still a formal system. The dynamic nature is in how the axioms are specified, not the overall size of the system itself.

My friend's definition however, meant that any dynamic system in his sense was not a formal system, but one that was larger and complete (and possibly not bounded). Theoretically I'm not sure where that stands, but certainly with respect to computers it is somewhat of an oxymoron. A dynamic system, in his sense of the word, is not a system that could or would fit into a computer. It is one where the system grows all on it's own, within its own power. A fabulous thing if we could get one, but not something we're going to see any time soon.

EPILOGUE

A little farther down in the beer, I started talking about some little blog post I wrote for one of my lessor blogs called The Differences. Mostly the ideas in that blog are just jotted down to be forgotten in time, but one of them was particularly interesting. I had watched a video on Richard Feynman and it got me to thinking about gravity.

Insanely I wrapped my ideas of the universe into a 5D version, but I figured that I had probably overlooked something obvious.

With my friend, who has a little bit of a mathematical background, but like me is no physicist, we started talking about some of these underlying ideas. At least in the beer haze it seemed as if they weren't entirely offbeat.

They managed to pass the "let's talk about it test", were one should really bounce their weirder concepts off friends first. Given that, I figure someone out there can give a good explanation why it isn't so. Or maybe take it further if it is possible (but don't forget to include me in your Nobel prize speech).