Sunday, August 9, 2009

Halting a Problem

I can remember quite vividly, well over twenty years ago, when I first came across a really good description of the halting problem. It was in an early Computer Science course. It had a huge impact, mostly because I was quite skeptical about accepting it as being real.

In its nature, it is similar to those many strange loops that Douglas Hofstadter explains so well in "Gödel, Escher, Bach: An Eternal Golden Braid". I talked about some of them in my previous post, so I won't repeat it again here.

That early description of the halting problem came as a simple story. It was about a kingdom -- somewhere imaginary -- that was having terrible problems with its computer systems getting caught in infinite loops, thus wasting tonnes of valuable resources.

This was a much larger problem with the older "central" mainframe architecture, where everyone shared the same massive system. One person wasting CPU is drawing it from others that might put it into better service. Most things ran as batch, so an infinite loop might run for a huge length of time before someone noticed and terminated it.

I tried to find an Internet reference to the original fable we were taught, but as it predates the web, there doesn't seem to be an online reference to it anywhere.


IN A KINGDOM FAR FAR AWAY

In the story, the King calls together his best programmers, pushing them to work tirelessly trying to create a program that will check all of the others for possible infinite loops. In that way, they can be examined before they are run to prevent any wastage.

They toil long and hard looking for a solution, but to no avail. With failure after failure mounting, some savior comes along and proves that it's not possible. They can't do it, it will never work. They are wasting their time (more time really).

It was a really good simple explanation that takes one a long way down the understanding of the problem.

The halting problem as it is discussed in Wikipedia, doesn't capture the real essence of the issue. It talks about things like undecidability, and other strict formal references, but in formalizing its mathematical description it leads people away from a more intuitive understanding.

Computers can emulate other computers, but only by doing exactly what the other machine should be doing. They contain no ability to look into the future, to guess or even estimate. They contain no ability to reason. So given some arbitrary program, they cannot make any inferences about it, other than whether it is in fact a syntactically correct program. If they want to know whether or not it will halt, they need to execute it. Line by line. And if they do, and it fails to halt, then the question is never answered.

If they just cut out after some pre-existing length of time, then it was always possible that the code might have halted, that the proper conditions to break the loop could have occurred. The question is not answered decisively.

Sure obvious endless loops are simple to find, but there is an infinite number of ways to create an infinite variety of endless loops, all of which are beyond the finiteness of the code to detect properly. It's another fun version of the problems with size in formal systems.

And because programs are locked into being formal systems, they contain absolutely no intelligence other than exactly what we pound in there. That is a huge limiting factor.

I wish I could find that original story, because it also deals with hubris, a common programmer affliction.

The King and his coders pursued a hopeless direction, repeatably in spite of obvious failures. They did so primarily because their intuitions on this matter were incorrect.

What initially seemed simple, was beyond possibility.

We live in a day and age where many people still believe that our world is mostly deterministic, and that through action we can and will fix all of our problems, regardless of whether or not a solution is really possible.

Hofstadter's strange loops lay out a series of problems that just don't fit this perspective. They break with our intuition, making us uncomfortable. But that doesn't take away from their truthfulness. The world is not nearly as simple as our minds would like it to be. Strange problems abound.

Some, such as regression have gradually become accepted over time, while others have been pushed to the margins. It seems that many of us would rather ignore these types of problems, as in Cantor's time with infinities, than to actually face and admit to them.

We're barely more enlighten in this day and age then they were 200 years ago. Even with all of our fancy toys, we've only scratched the surface of trying to really understand all the information that we have amassed. We have a lot of data, yet very little knowledge.


SOME CONSEQUENCES

A consequence of the halting problem is the general understanding that computing is unbounded. There are no limits to the amount of things we can or should compute. There is no length of time that bounds all computations.

Certainly with Moore's law, and our ability to distribute processing or run it symmetrically we've been able to consistently find more and more computing power. And we've been consistently using more and more of this power for each new generation of software system.

So it would be absurd then if someone came forth and choose an arbitrary number for the total number of steps that a computer could or should execute? If they just simply locked all of us into a fix limit of instructions. If they set a limit and insisted that anything more was not acceptable.

It would be simple madness, even in the sense that whatever limit might be applicable ten years ago is clearly just a fraction of modern usage. Performance doubles, then it doubles again. And we've grown to consume all of our massive power increases, often requiring more, driving our modern hardware into over-processing fits. We'll use whatever we are given, and then come back for more.


HOW LONG IS INFINITY

The halting problem is about infinities, and in its definition it usually takes place over a Turing machine with an infinitely long tape. The infinities that bind to it are an important part of its formulation.

I've often wondered if the finiteness of our real world -- bounded on a grand scale by a fixed universe, and on a minute one by quantum theory -- bypasses these purely mathematical issues. After all, infinite things seem to be more of a mathematical reality, than a physical one. Our world has natural bounds, so maybe our theories should too.

The halting problem disappears quite swiftly if we just removes a few of the infinities from our model. If an infinite loop isn't infinite, than we can just execute it to the end and produce an answer. If the model restricts the size of a maximum loop to some fixed number than, presto blammo, there is no more halting problem.

Not only that, but machines can suddenly do a whole lot more with themselves. Great chunks of computer theory fall away because they are predicated on the halting problem being true. The world becomes a lot more interesting. Computers become way more powerful.

Well almost. We know pretty much that from our own perspective, time at least is still infinite, and in that context (at least for us) an endless loop is really endless. These things do not change in our real world, and they will not change. So a model with a finite number of computations is a severely broken one that does not match our reality.

No matter how much we do not like the halting problem, it cannot be dismissed from our knowledge. We're stuck with it.

Now all of this was to some degree, pretty standard in the study of Computer Science (CS).

I found that most people in CS hated the theoretical courses and tried to drop this knowledge as useless the moment they graduated. Many programmers don't want the limits of theory to interfere with their day-to-day programming grind. Most programmers just don't want to know.

Also, a lot of people come into programming from different backgrounds, so I really don't expect them to understand the base points of computer theory. It was hard enough for most people to sit through the required course, it must be brutal to try and learn this stuff without being forced into it.

I do think that all programmers should have some idea about what's out there and why it is in fact important. Theory isn't useless, particularly if it keeps you from wasting your time trying to build impossible things. Still, it is the nature of programmers to want to ignore as many things as possible, while focusing as tightly as possible on what they are trying to build.


CONTROLLING CHAOS

Getting back to halting, our computer systems, as Turing-complete machines, are always and will always be subject to running in infinite loops. More importantly there is no real or correct way to prevent this. It is the nature of what they are, so we need to learn to deal with it as part of our technology. We can manage the situation, but we cannot remove it.

We know we can't write an infinite loop detector, but if we are building a base for others to run their components, some of us clearly would like to do something to prevent problems with rogue code.

The reasonable way to handle the problem would be to contain the CPU usage so it doesn't make the rest of the system sluggish, and to provide some easy way for the users to interrupt the processing if they've lost faith in it running to conclusion.

Computers can't be intelligent, so, as always in these cases we need to push the choice back up the user. The user should be able to easily decide at some point that they have waited long enough.

We could also add some type of direct resource limitations, but these need to be tailorable to the specific systems, code, users or any other circumstance take may require a longer or shorter bound to be implemented.

Since we will never pick the "correct" finite limit, there will always be a large number of occasions where we need to easily change any limits quickly and painlessly. You can't trust a computer to make an intelligent choice, and you can't encode enough logic into the system to cover all possible contingencies.

A really trivial way to "restrict" resources would be to pull some arbitrary number out of a hat and prevent all programs from running for longer than that number of instructions. Of course, no one in their right mind would do such a horrible thing, it's an obviously bad hack. Any, and every random fixed limitation eventually becomes a nasty operational problem to someone, somewhere. Limits can be useful, but they have to be implemented very, very carefully.

We've been at this long enough to know better than to just set hard arbitrary parameters. We go through a mass amount of trouble to make things more configurable than necessary just to avoid the types of stupid limitations that plagued so many systems in the past.

If software hasn't progressed very far over the last couple of decades, at the very least we've learned enough to make our configuration parameters massively more pliable. We've learned to stop second-guessing operational environments, and to not restrict users from doing things unless it is absolutely necessary. Well, at least some of us have learned that.


INTERNET EXPLOITER

Given all of the above, you can imagine how annoyed I was to have a dialog from IE pop up on my screen with the following message:

"A script on this page is causing Internet Explorer to run slowly. If it continues to run, your computer may become unresponsive. Do you want to abort the script?"

I had just finished adding in some advanced encryption code in JavaScript that was insuring that any critical information in the DOM was properly secured. Not a bad choice, given how easily it was for people to inject some rogue JavaScript into an open interface.

Our choices with these types of data access problems are to batten down the hatches so tight that the users are absolutely restricted in what they can input, or to protect the internal data and accept that the browser or the user won't get caught by any tricky things like Trojan horses.

Rigid restrictions are probably the second most popular reason for users to swear at their machines and hate their software.

In my most recent system I want openness for the users; I am rather hoping that the system allows them to move in and occupy the space, rather than trying to keep them at a distance while providing awkward fragile tools.

To do this, I need to let them control as much as possible, which means that I need to restrict the absolute minimum from them.

But I can't do this at the cost of making the system hopelessly insecure. It's a poor trade off. Thus they can inject things, but they can't make sense of any of the critical internal data. A great idea, but to make it work when they log into the system I need some serious CPU in the browser for a few seconds.

To be fair, it doesn't really matter what I was trying to do. If it wasn't this particular piece of code, this week, it would have been something else next week. As the hardware gets faster, our expectations for code will increase as well. We're always trying to push the limits in one way or another.

This message pops up when IE thinks that the code you've executed in your browser is running in an endless loop. This, as I've said is a way to deal with the problem. A bad way, but still a way.

Some group of people at Microsoft made an unfortunate decision to add the parameter called MaxScriptStatements into their code which limits all JavaScript execution to a rather small five million statements. And they set it up so that the only way to increase that number was to update it directly in the registry. It's beyond the programs and most users to be able to change it.

This is exactly the type of bad decision that has been holding back software for years and years. When programmers make these types of rash choices, they ultimately propagate upwards through all of the dependent code.

Our systems are built on layer after layer of bad choices. By now we know what works, but somehow that doesn't stop us from building on shaky foundations. Some of our best known modern technologies are just cobbled together masses of poorly written code. Somehow, the industry has decided that quantity will magically make up for low quality.

And even though the original choice was to supposedly enforce good behavior, it isn't long before people start working around the limits and the problems returned.

The GWT library for example already has an IncrementalCommand class to allow the programmer to work around this limitation. It just automatically chops up the computation into smaller pieces. A function that is really the responsibility of the operating system, not a framework.

Sure it's hack, but that is what always follows from a lower-level hack, just an endless series of unnecessary complexities, each of them making it a tiny bit harder to actually get something to work reliably.

For example, whole generations of systems stupidly distinguish between text and binary files even though modern computers could care less. It's some ancient stupid choice in an early DOS system that still has ramifications today.


INFORMATION WANTS TO STAY HIDDEN

I'm not really the type to sit on something that I've found, particularly if I know it is important with respect to software development. In this case, I wanted to get this dreadful limitation in IE out there to as many programmers as possible, in a way that they would hopefully learn something.

What was done is done, but that doesn't mean we can't learn from it, perhaps some day we could stop repeating these types of bad choices. Perhaps someday, we could stop pouring more bad code over ever shakier foundations.

I figured humor would do it, but then being funny isn't really my strong suit in writing (I'm learning, but its very slow :-).

With this in mind, I drafted a question for StackOverflow (SO) that I was hoping was interesting enough to get some attention, while being amusing enough to get people to really think about what I was trying to say.

http://74.125.47.132/search?q=cache:0heGdkeTOJQJ:stackoverflow.com/questions/566010/has-microsoft-solved-the-halting-problem+Paul+W.+Homer&cd=16&hl=en&ct=clnk&gl=ca&client=firefox-a

I made a couple of huge horrible miscalculations.

First was that my sense of humor doesn't always work in writing. I think it comes off as too sarcastic or too arrogant or something offensive. I'm not really sure, but I rarely seem to get the types of reactions I am hoping for. I guess its one of those great skills that I just haven't acquired yet (if ever).

The second miscalculation was to pick on Microsoft. Our industry is divided, with some people strongly in favor of Microsoft and other people strongly against.

Since this was an IE problem, and yet another in a long line of similar Microsoft induced problems, I really felt that they deserved a bit of sarcasm. All big companies have a "culture" that shapes and defines their products, and in the case of Microsoft their specific culture has always been in favor of turning out mass amounts of code, and against trying to find short, simple and elegant approaches to their problems. In essence, they use their bulk to brute force their way through, each and every time. They're a big company that likes to dominate more than innovate.

That might be enough in some forums to bring down the public wrath, but StackOverflow is overwhelming biased towards Microsoft. The roots of the people involved come from the MS side of the industry, and the site does a lot for promoting the use of Microsoft products.

Thus it was a relatively bad idea to go into a Microsoft love-fest and start making rude noises about them. A worst choice if you can't write well either (and you're not funny).

Still, somethings just have to be done. If I were really good at making smart choices, I'd retire to my private island and leave all of this squabbling for some other guys.


A VERY BAD REACTION

Mostly to my surprise, my question bombed really badly at SO. It picked up some immediate ire and lots of rapidly expanding negative points, only to be shutdown a few minutes later.

SO has a "system" for eliminating unpopular opinion. They try to base it around the questions not being legitimate "programming" questions, but the truth is that anything with even the slight bias towards a statement gets shutdown real fast.

The questions are limited towards just the things that can be retyped out of manuals. Discussions are restricted towards just simple facts.

The sentiment is that they don't want the site to become choked by religious wars between rival programmers, spam or other noise, but the consequence is that there is a very nasty authoritarian streak in their behavior.

There are lots of overly zealous programmers running around "scrubbing" the questions and shutting down what they don't like. The rules are ambiguous enough that the real enforcement is left up to the individuals, always a dangerous practice. If SO was a library, out front there would be a raging pile of books, feeding an eternal flame. And librarians toasting marshmallows on sticks.

It's a good place to ask some simple questions, but it was probably a very bad choice as a place to try and communicate with any fellow developers.

Still, in all ways I found the responses to the question to be quite amusing.

Over the years in analysis I always learned to look deeply at things that don't quite behave as I predicted. In domain analysis, the best place to learn things is when the users bring up unexpected issues.

What we know is less important than what we don't. I've always found that behind the complaints that other developers quickly dismiss as not valid, lies all sorts of interesting and fascinating truths. That when things changes frequently, instead of getting angry, I often choose to ask deeper and deeper questions.

What we understand of the world around us, is often only the shallow surface of a much greater and deeper sea of knowledge. There is much buried behind things, that sometimes even the most trivial fact is the gateway to a vast new understanding.


THE SLOW SINK TO THE BOTTOM

If SO hates the question, then I am entirely fascinated by why this is happening. If they sent it nearly to the bottom of their list of questions, than there is far more there than just a bad piece of writing.

After the initial surprise reaction to the my question, it started a slow deep sinking. Gradually it worked its way towards the bottom of the pack.

I do have to admit, that it wasn't entirely based on merit. At least a couple of those negative votes were encouraged by me.

Although many people were irritated by the post, I did receive some positive feedback, and a few people found it to be quite amusing. Of course, since it was already negatively rated, most support choose to sink it to a more obvious place. Positive, negative votes.

Now, a smarter and possibly more conscientious man might have been dreadfully embarrassed by getting such a negative rank. The bottom of the pack. The worst question ever.

I however thought that for right or for wrong, the extremely negative rank was a perfect balance to my overall tone in the piece. It had seemed to have found a place at the bottom of the list, happily bring up the rear. If I can't be famous, at least I can be infamous, or so I had hoped.


REVISIONIST HISTORY

Another most interesting thing about SO is that the surprises never seem to cease coming. Five months after having posed the question, most of the time dominating the very last position in the list for questions, some people in SO decided to delete it.

Well, it wasn't really deleted, just hidden so only "privileged" 10K users can see it. A rather strange process, given that the site is trying to be a public resource, not a private stronghold of secrets like some ancient church quietly burying uncomfortable knowledge in its vaults.

http://meta.stackoverflow.com/questions/11798/where-did-my-question-go

Deleting it, of course, is a very bad idea. If I, as the author, asked for it to be removed that would be one thing, but to choose to do it because several individuals felt that they didn't like it was quite different.

If it makes people feel uncomfortable, that in all ways is a good thing. We do not get significant progress from the status quo, it always comes from without. Sure the in-crowd manages a slow and steady push forward, that gradually helps, but it's always someone on the outside that makes for the fantastical leaps.

The inside is filled with convenient popular ideas that are easily acceptable, that is the definition of inside. The outside is filled with stupidity, madness and wanton acts of hubris. Along with all of the fools on the hill, are a whole series of crazy voices proclaiming a dizzying array of weird ideas.

It's not that I think the question was deep, or that it was funny, or even right or interesting. For all I know, the majority is correct and the question sucks. Badly written and obnoxious. But that is only a minor issue.

It is minor because we should not now or ever wipe out ideas just because the "group think" doesn't like them. We should not prune the bottom of the list or try to cover up our own stupidities.

That the smartest men on the planet often make really stupid choices should go without saying, we are human after all.

But even in that, who are we to say that the crazy ideas are wrong? After all our history is filled with men like Galileo, Cantor and Turing, and a host of others whose ideas were initially greeted as being wacky. If we are to learn anything at all from history, it is not to judge ideas, even when they appear to be completely crazy.

In our most modern time, Google's wonderful map-reduce concept was overlooked for decades. It's not what is popular that counts, it is want lies on the fringes that will be the next great thing.

So if people get together and nuke the fringes, just because some of what is there makes them uncomfortable, for any reason, right or wrong, then we don't just risk missing out on the next great ideas, we absolutely guarantee that we will.

What is crazy today, is often reasonable tomorrow. No doubt people once said things like: "The earth orbits the sun, you'd have to be mad to believe that."


FINAL THOUGHTS

Ultimately, other than it was really amusing, I don't care that much about my question. I tried to make a point, I failed, and then slowly it sank back into making a point again.

Being deleted, well hidden actually, allowed it to make a even more valuable point then I had originally intended.

Right or wrong, stupid or perceptive, it doesn't really matter what the question is, it only matters what happened to it. It matters how SO dealt with the question, and how the people in SO got together and removed it.

There was once a time when we had stronger moral convictions, and most people understood the difference between right and wrong. When they realized that silencing the world around them doesn't lead to a cleaner more orderly existence, but rather towards more bad acts of control and abuse of power.

We tolerate dissent because we used to know that quashing it doesn't work. That it is necessary, and that not all great ideas come from the party line; from the authorities and from those in power. From any small group of narrow minded people too focused on protecting their own self interests.

If we are interested in really making public forums, ones that truly support real discussion and progress, we can't just wipe out the unpopular stuff. Removing spam is one thing, removing stupid and ugly questions is quite another. It's taken centuries for us to really learn these principles, but only decades for us to unlearn them.

Beyond just the control and abuse issues, it seems that one of our most sever casualties in this modern age is our ability to really understand things. Ironic given our quick and easy access to such vast seas of knowledge. Having a fast reference source seems to make it harder to actually apply the information.

Sure we have all of the facts about morality in Wikipedia, but if people don't really understand them, they can't apply them in the various circumstances in their lives. That knowledge, along with our right to privacy is dying quickly in our modern web. And we are allowing this to happen.

No doubt, we will wake up one day soon, in a society not of our liking. Each little step we take, heads us farther in a direction that people have been fighting against for generations. History has a funny way of replaying itself over and over again, and we seem to be just setting up the next great failures.

5 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Good post!

    It´s really a shame to remove a non-spam question from our sights.

    Now I´m curious...

    ReplyDelete
  3. Hi Seiti,

    Thanks for the comment :-)

    It does make one suspicious about what else is being cleansed from the slate, doesn't it? In a world were we claim to be free and civilized, it is still amazing how quickly people will fall back into bad habits without thinking or accepting the consequences. We seem to leap into the trap of thinking that our own "rational" for doing something, somehow means that we won't succumb to the same flaws that drove other's actions to be ill. Yet, no matter the reason, bad actions always produce bad results.

    Paul.

    ReplyDelete
  4. The first thing I thought about the halting problem in your post is maybe is not so bad, at least in dayly problems... because chess programs can easily detect the third repetition rule draw.

    ReplyDelete
  5. Hi xlr8,

    Thanks for the comment. It is true, up to a point, that the halting problem is based on an infinite tape, and in reality most computers are bounded by more finite resources. That always bothered me when I first learned about the halting problem in University (I refused to believe in it at first :-)

    Still, in a general case, with a large amount of resources, no program could "reliably" determine if another will stop. It's that "reliably" that is important, particularly if it's my bank account on the line, if the program goes nuts.

    Chess is a big tree of all possible moves. At any point in the tree, you can look farther down and make guesses about the state of the lower branches (OK, it's an upside down tree :-) Loops in the tree can easily be ignored (or marked as such), so the tree pretty much has a finite ending. In a sense, it is no more complex that handling "*" in a regular expression (via a NDFA).

    Paul.

    ReplyDelete

Thanks for the Feedback!