The Programmer's Paradox: August 2008

Saturday, August 23, 2008

97 Things -- The Strength of Communication

Some things you just connect with right away.

I stumbled onto a new project by Richard Monson-Haefel and O'Reilly in a series called "97 Things" where they have created a site for axioms based around "97 Things That Every Software Architect Should Know":

http://97-things.near-time.net/wiki

Software architects and developers are encouraged to submit their best advice for the project and possible inclusion into a final book. The writing style is an axiom with a 250 to 500 word description, long enough to explain the axiom, short enough to be right to the point.

This is a great idea, and a great way to handle it.

There is a huge amount of knowledge and experience out there buried in the rank and file, that hasn't made it back to mainstream. Software developers learn a lot from hard fought battles, but little of it returns to the overall industry.

This idea taps all of that potential and opens it up to an industry that is hungering for information. We get tutorials, reference books, and textbooks, but that type of information is rather limited. It tells you what things are, but not how to utilize them. You get to understand the parts, but not how to apply them.

We're looking for higher-level understandings. Things beyond technology and syntax. Things to makes our lives easier. Things to make sure that we'll be able to finish our projects as successfully as we start them.

There already has been an explosion of bloggers pushing the established topics, feeding the front line programmers and software developers with alternative viewpoints. This trend is clearly going to continue. But I also expect that the more traditional media will come to understand that although the web has replaced the older technical book markets, there are new ones opening up that will be based on distilling a wider range of less formal information. A kind of grassroots movement for utilizing and understanding the impact of technologies.

We're coming to another fork in the road for software development; a time where we can jump up a few steps and new technologies can spew forth at an astonishing rate. If we're to breakout of our current limitations, these new offerings have to build on our real experiences, and for that we have to leave the comfort of our cubicles and spread our ideas far and wide.

What is known out there, and what is currently talked about is only a fraction of what is really happening. It's not the loudest that we need, its the masses that should be feeding their input back into the process. We need to find a way to conquer the complexities that cripple our systems. We need to find a way to make massive systems simple again. Only then can we reach further into the potential of our machines.

It is up to us, the people building with these technolgoies, to make sure that the future generations fix the current problems. Too often history is forgotten, and the bugs of the past revisited. Too many technologies fail before they even leave the drawing board. Without feedback, nothing changes.

Contribute, its the only way that you can ensure that your underlying dependencies are going to get better and easier, not bloated and ugly.

ADMINISTRIVIA: On a minor, unrelated note, but I keep forgetting to mention it, I've tied my bookmarks to a FeedBurner/Blogger RSS feed called "Way Overflow". So, like Raganwald's feed, if you subscribe you can see what I am bookmarking. The only difficult bit is that the name, unfortunately, needs quotes to find it in an app like Google reader because it matches soo many different entries as a string. Maybe I'll rename it to something else?

Sunday, August 17, 2008

Social Disorder Revisited

It tanked. I'm not sure why I'm finding this so funny, generally if my posts don't do well at the various news sites, I, like most mildly narcissistic bloggers, feel somewhat depressed. This time however it's different.

The post in disgrace is about the analysis of the very thing in which it is now failing, news sites. This one even tanked on DZone, a place where I've often eeked out at least a couple of friendly positive votes. The sites implicitly ignoring a post about themselves is quite possibly an telling statement about their essential qualities.

Or is it? Less dramatically, it might just be because the title sucks, and my description was lame? Perhaps my timing was off? Who really knows? Or can you even really know? That's the subject of this follow-up post to my earlier one on social sites:

http://theprogrammersparadox.blogspot.com/2008/08/social-disorder.html

The sense that you can't really understand why this most recent post failed is, in all ways, the underlying essence of my point about analysis. You can have all of the facts readily at hand, yet still be surprised because there is some previously unknown variable that has come into play. I can look at the fact that the post didn't do well, but I don't actually know if the problem is packaging, content or timing. I can guess, but I am just guessing.

The inherent nature of people in organizations is such that it is often irrational, unpredictable. Facts can help you construct a model, but the conclusions you draw from those facts can vary widely.

A lot is written about the issues or easy solutions, but if it isn't based on the underlying nature being non-deterministic, then it is unlikely to be usable; wishful thinking more than serious analysis. Predicting the future is hard, predicting chaotic systems is nearly impossible. People don't seem to realize when they are basing things on facts or basing them on predictions. Understanding that distinction is vital.

For this post I just wanted to fall back onto some seemingly disconnected points about analysis that I missed in my original writings. Small things, but relevant. I've tied most of these back to other people or their statements, it's just easier that way. Everything needs a structure, even if it seems arbitrary (which it never is).

STEVEY

In one of his recent posts, Steve Yegge talks about gathering requirements being too little, too late:

http://steve-yegge.blogspot.com/2008/08/business-requirements-are-bullshit.html

He's right, I think, but for the wrong reasons. Analysis, and programming are nearly quantifiable pursuits, they are things that are founded around transferable skills. Not that everyone can do them, but if a person can handle it, you can teach them to do it, even if it takes some time. It might be hard to explain, at times, but it is explainable. There is a method to the madness.

But even with all the analysis in the world, it still doesn't help you envision the right tool for the job. You may know what the users are doing, and what they need, but you'll never know the side-effects of switching to a new tool until they try it.

In that, there are millions of variables, but we ignore most of them, and focus on the less than twenty that out minds can cope with. The significance of the things we don't know is not possible to gauge, but hopefully it's not huge.

There is something more there that supersedes the raw facts of the matter. Understanding how sites work doesn't change the reaction to the content of a post. The system, and the individual behavior at some point are not the same thing. You may know the rules, but not understand when and why they are applied.

More to the point, like a lot of things in life, if it were easy to predict what would work, more people would do it, and that would disrupt the predictions. Some volatile systems consistently maintain their volatility as an intrinsic property, one that is constantly changing. Like many complex systems, such as stock markets, prediction is impossible because the existence of the prediction disrupts the system causing it to change.

Steve's point about the requirements is close to the mark; you are using the requirements to drive the analysis, which isn't going to find anything because you don't know where to look. It's a needle in a haystack search. In his description, people are actually looking for meta-requirements, not the base ones.

Analysis, one type of output of which is requirements, is how you go about validating your understanding. But you have to have that understanding first, in order to direct the analysis in some meaningful direction, otherwise you're just randomly searching.

If you do know what you are going to build, then gathering requirements is a reasonable next step. It's not that the "requirements" are the problem, it's where and when in the process you are looking for them that is off. I'll get back to that a little later.

JOHN

While discussing my original post in email, a friend of mine John Siegrist asked "Would anyone want to use a system that couldn't be gamed?" Which, in its very essence is a deep fundamental question.

This question examines the system as well as the consumers and producers using it. The weakness of the current systems allows people to game them, but is that an essential quality in drawing in producers, many of whom involved in this type of site are amateurs. Does this give them the incentive to publish? Is the game an essential quality?

My guess was that the early adopters looking at these type of systems are drawn to the gaming aspects, but the later ones, who possibly still haven't discover them yet, will use them more as tools of convenience.

As the market matures, a different crowd of people will get drawn in, and their behavior and expectations will be significantly different. Strangely, while the producers later on will be more conservative, I suspect that the consumers will be more forgiving.

Any which way, looking at all of the back-door features and communication methods, gaming the system is currently an important part of the "whole" system functionality.

PHIL

My friend and classmate Philip Haine has an interesting view of analysis and design in his blog:

http://stealthisidea.com/articles/design-pyramid

While I like his perspective, I think that we could improve on it by making it a bit simpler; possibly just vision -> analysis -> design as the 'three' layers in a pyramid. I see his use of "understanding" as just a side-effect of analysis (with a mix of vision) and I see "requirements" as the way analysis is documented, making these two faces of the same coin.

Following from what Steve Yegge says, you have to know where you are mostly going first before you start to do the analysis to back up your belief and assumption. Requirements can't take you there, they can only confirm what you know (or don't). In that sense, "understanding" in Phil's diagram is really partly "vision". You need to know in what direction your headed, before you can travel to get there.

Once you know what you are going to build, then you have to nail down the specifics. A set of requirements is just one of many different forms of documenting the analysis. We don't need requirements per say, but we need the analysis to be tangible (although some small projects just leave the results in the programmer's head).

What makes this all so interesting is that "vision" is not a quantifiable or teachable skill. You can teach analysis, and you can teach programming, but vision is really just being able to predict the future. Not only can you not teach it, but as it is more or less based on some strange cross between instinct and luck.

It is not a repeatable skill either. It's not uncommon to see someone get it correct one day, only to miss my a mile on another. It's not all luck, but being lucky helps.

NICK

On another strangely related note, Nicholas Carr was wondering if we are getting stupider:

http://www.roughtype.com/archives/2008/08/is_google_makin.php

The fast access to junk info-lets, leads to a steady diet of fast knowledge. This type of diet does little to enhance the strength of one's thinking, and a lot to make it sluggish.

This also fits into the overall theme because the news sites are the magnets for the dispensation of information, and the new way to do that is to make it entertaining. Quality of content is not really as important as getting a good title, or easily understandable platitudes. Consequently, our choices in this world are getting made based on poorer and poorer models of our surrounding circumstances. We've sacrificing quality for instant popularity.

The news sites are transforming the way we see information. what was once significant and very carefully constructed is now just whipped out. The freedom of the masses in publishing means the degenerating of the publications. Overall, we trade quantity for quality.

ME

I wrap this up in my being a producer, and being able, at the amateur level to easily publish my ideas, along with millions of other people. In the past I would have either needed a scholarly journal, or a magazine, to get noticed. Now my crazy ideas are getting out there easily.

The problem is that I want to be more than info-tainment. I think that my ideas are fairly strong, and I dislike the idea that the medium is making them wishy-washy.

I suspect that maybe some of Reg Braithwaite's decision to quite blogging may have been partly driven by the futility of publishing in this modern age. He implied that he had reached a point where he had nothing left to say. But if we're not talking, is anybody going backwards to the "old" posts to read them again? Do they just stop listening? Many of the news sites won't allow existing URLs to be reposted; few of them easily retain your reading history. Yesterday's news is old, not worth reading.

Yet, for myself, and I'm sure a large number of other bloggers, we have aspirations of filling our posts with more than timely tidbits. The things I say, rightly or wrongly are meant to stay around for a while.

Many other software development bloggers are the same. We truly mean to change the world, to take an immature discipline and inject some new vibrant ideas into the core. Once you're doing it long enough, software development is so frustrating because the true progress is so little. But for all of the effort that people are putting into their blogs, the masses are passing us by, or just reading the works to help them smile. It's hard to judge the effects of my efforts.

That's the irony of my last post, its title was unlikely to be 'entertaining' enough to have survived in the various sites. A premature death to a reasonable understanding of the phenomena that killed it. Funny that.

ANALYSIS REVISITED

Given what I said about the unpredictability of the future, it may seem to be impossible to apply analysis, and use it in a practical way to succeed. It is essentially random after all. But oddly, that is not the case.

To see if something works the only way to know is to try it. It depends on luck. But I love the expression "hard work generates luck" because it is true, and very true in this case.

You can't tell if some thing is going to work just by thinking about it; you can't correctly account for all of the variables. But that doesn't mean you can't devise an algorithm to walk through the whole thing and find success. The only thing you can't cope with is how long it will take.

If you set forth some metric, say sales for example, and then float a simple inexpensive version of the tool, you can determine on a small scale if there is demand. Accounting for it, slightly you can difference the actual normal growth from any new product changes, to see if they are helping or hurting. The different "causes" blend together, but they are still there, and measurable.

In a less deterministic way this is what entrepreneurs do all of the time. The philosophy is to try a million things, see what works. Sometimes you just have to keep testing it over and over again until you blunder into an intuitive sense that it is the right direction. There is no quantifiable skill here, it is a just luck, instinct and some strange rare ability to take a huge number of variables, ignore the right ones, and tweak the remaining ones. This adherent form of creativity, since it involves creativity ignoring reality, is not teachable or trainable. Some people focus in on the right aspects, most people confuse or blind themselves.

And it is worth noting that analysis, if it is not based on emotion, prediction, instinct, etc. is just a way of applying a structure over a large series of facts and observations. Once you've found a problem and analyzed it, you can make a few predictions about it's future behavior and use all of that to create a tool that should, mostly, help users solve their problems.

And most real problems are neither that hard, that different, nor that volatile, so a little experience in the same domain goes a long way towards being able to make better predictions.

In a simple analogy, analysis is like searching for a specific point in a vary large field. The larger the area, the harder the search and the less likely you are to find what you are looking for. Shrink the area and the odds of the search get considerably better. In software you predicate the type of solution you are looking for, and then use analysis as the search to find it. On a good day, if you stick with it, you'll probably get lucky. But only if you're reasonable.

Sunday, August 10, 2008

Social Disorder

I haven't blundered through a discussion about software-oriented domain analysis for a while. It is an important topic, but a hard one to discuss.

Good analysis drives a successful software project, but its underlying nature makes it difficult to quantify and describe.

Analysis sits at the intersection of the real-world and the abstract. Its complications lie in trying to force a discrete model over top of incomplete information and quasi-irrational behavior. We observe, but then we skew that to meet our own expectations. We blind ourselves into making nice little simple rules. We often don't see what is really there.

Even if we see it, quantifying it can also fail. If you don't compact things down into simple discrete chunks, the analysis has no value. If you pack too many different pieces onto each other there is also trouble. We like to look for one unifying rule that contains everything, but that desire is often the problem. The real world is rarely that simple.

Perspective can influence the analysis. It can change radically based on which things you observe. We only ever see a sliver of the problem at any one time, that can produce misleading results.

An analyst need not delve into every nook and corner, only the domain knowledge relative to the software needs to be understood. Compiling it is time-consuming. Wasted effort if it can't be used.

Analysis just for the sake of it is bound to lead in directions that are not practically useful for programmers. That is why earlier attempts to split off business analysis from programming were unsuccessful. The understanding required from the process need only be enough to implement the software properly. If you don't understand what is required at the end of the analysis, you don't know how deep to go. All we really need is just enough to get the code written.

Analysis is very different from programming, but it is still a necessary skill for software developers. A big project may not need a huge number of analysts, complex development can often pivot more around technical issues, but their work is vital in assuring that the code being developed is really a usable tool. Good analysis contains the problem.

Over the years, every successful project that I have been on has been based on solid underlying analysis. Most of the failures came from lightly dealing with, or skipping it. A project that does not know what it is building is doomed from the start.

For success, I have learned to leave my emotions, assumptions and expectations at door preventing them from tainting my findings. But is is always difficult to transcend oneself and achieve a level of objectivity. In analysis, I am my own worst enemy.

A SIMPLE EXAMPLE

The easiest way to explore analysis is to work through a concrete example; a simple and publicly available one. For this I think that the Web 2.0 social bookmarking sites are interesting and deep enough. They keep popping up in blog articles. This includes sites such as Slashdot, Reddit, Digg, Del.icio.us, StumbleUpon and Hacker News. These sites revolve around the sharing of information links amongst a large community.

Most of these sites are fairly simple in concept, users submit interesting links that they find around the web. The good stuff rises to the top, so everyone can spend less time surfing. Technically this is all about sharing some common resources (links) in an organized manner.

SOME HIGH LEVEL ANALYSIS

On the surface this is a simple problem: web surfers want to access the latest and greatest bits in a massively expanding web of information, but they don't want to spend forever looking for it. Collectively, a large group of people can do the work necessary to filter out the bad stuff, making the rest publicly available.

But, if we dig deeper we find that there are lots of people-related problems bubbling just below the surface. For instance, a lot of what Clay Shirky says about social group interactions holds true:

http://www.shirky.com/writings/group_enemy.html

These sites quickly degenerate, and anti-social behaviors start to pile up. People are constantly trying to manipulate or "game" the system. Complaints start up about control and quality, and then the users move on to another site. It is a reoccurring theme. Giles Bowkett is just a common example of another dissatisfied user:

http://gilesbowkett.blogspot.com/2008/05/summon-monsters-open-door-heal-or-die.html

These are tonnes of these types of rants available for any and all of the main sites. The big question is why? Why, does this keep happening again and again?

To know that, we need to understand what is underneath. We need to quantify it in some way.

The first part of any analysis is too look at the people involved. People always form the base layer in the real world that interacts with the system. Most systems are just ways to pile up data related to people. Intrinsically, any or all complication draws its source or inspiration from the actions or history of people. The messiness always starts there.

In these systems the users can be broken down into consumers or producers. The consumers are passively using the system, taking advantage of the work put into the filtering. The producers are actively going out to the web and finding new and interesting links to add in. The roles mix to some degree, most users are a little of one or the other, but being a producer holds a higher precedence, it is a more interesting role than being a consumer.

CONSUMERS

To understand the consumers, you have to start by asking yourself: "why do people come to a site full of links?" What is it they are looking for?

The often forgotten central point of analysis is that the software tool is only there to automate some specific effort; to help in managing a pile of data. Knowing what's in the pile, and why it is important is the key. If you don't understand what motivates a consumer to spend time using the system, then its hard to augment that experience. Playing with the pile of data has to be worthwhile, or people won't bother.

The consumers, in this case, are looking to the computer to find current, relevant or significant information that relates to them, their lives and their interests. If you're interested in computer equipment, you'll spend a lot of time browsing and reading about hardware. If you are not interested in cameras, you're not going to want to spend a lot of time reading about the latest offerings from Cannon an Nikon. Each user is different.

Interest is a driving factor, but not the only one. An easily misunderstood point is that all of the information on the web has a timeliness factor. Some of the information only has value if it is delivered right away. Some of it has value no matter when it is delivered. Time is significant.

That complicates the problem because consumers are really looking for two very different types of things. They want to be kept up-to-date, but they also want to see the 'must-read' information for their particular topics of interest. There are two different types of data intermixing in these sites.

Classically, most often both types of information are combined, and then falsely assumed to be timely. That places a careless quality over-top, turning much of the underlying knowledge into a shallow form of info-tainment. A light-hearted way of just skimming over the surface of the web, without generating any lasting effects. Easily read, easily forgotten.

People gravitate to these systems, but the ties are loose. They easily pick up and go to the next one, once the fashion has changed. Loosely bound users makes for a volatile audience.

The changing landscape of both the content and the users counts for a lot of the complaints for these types of systems. Consumers get accustomed to some specific steadiness, and then re annoyed by it changing.

For myself, I want some idea of the latest news, but I also want the to see the must-read papers and articles for the software industry. Sometimes I am interesting in the weird and wonderful, sometimes just the core stuff. If I have some time to kill, it is sometimes nice to get something light and funny. But, I also have a more serious side.

I want to keep up with that big news storied and I'd like the computer to help me keep track of all of the important 'critical' industry specific papers that I have read, and help me find more that will enhance my professional development. Sometimes info-tainment is fine, but I want a tool to keep me up-to-date and to help me grow as a professional.

PRODUCERS

Producers are intrinsically more complicated than consumers. You have the clearly motivated ones, such as bloggers, writers or advertisers that are trying to promote their work. These are the root producers of content on the web, some are professional, many are not. Some have noble goals of trying to share their understanding, many are just after fame or money. All of them are driven by some need to get their work out to as many people as possible.

For the social bookmarking sites, you have another lightweight type of producer that scans the web looking for interesting material and then posts it. These are the heart and soul of any social networking site. The motivations of these people are far more difficult, many are just out there for the sheer kick of trying to influence the market. Some are bored, some just want to contribute.

Intentionally or not, all of the producers are always trying to game the system. On either side, the more they get control of the underlying resources, the happier they are. It's not that their goals are necessarily destructive, it's just that the need to get the widest possible audience is fundamental to being a producer. Why produce if nobody is going to listen? More readers means more incentive.

The heavy content producers tend to frequent many different sites, and prefer to minimize their time at each site. They often make illicit trades with other users to get clicks -- there are a huge number of sites dedicated to these markets -- but they manipulate the game primarily to get their own work into the top-ranking positions. They're not really interested in the site, just its ability to distribute their efforts.

Lightweight producers stick to specific sites, where they try and dominate the flow. The lightweight ones also form together in cliques, realizing control with their mass, but most of these groups form internally in the systems. Usually these groups are more cohesive and long lasting than any heavyweight arrangements. They often have very strong identities, some set of underlying moral conduct, and some purpose.

It's hard to tell, but the lightweight producers seem to keep it up for a while and then get bored, leaving the field open for other cliques. A large site will have a huge number of different cliques fighting it out for the top positions. It can be competitive, and likely pretty nasty at times. Many sites encourage this with rankings and secretive back-door ways for their users to communicate. A good clique can put in a tremendous effort to get quality links into the system.

The lightweights most directly influence the nature of their sites. Large-scale shifts in content generally mean significant changes in the underlying cliques. Often what is though of as diminishing quality in a growing site, is nothing more than just an increasingly competitve market, with a few earlier more conscientious players leaving the field.

Ultimately, because there is little direct or long-term incentive, the controlling groups get larger and less selective. The system by its nature will degenerate at times, cycling between periods better and less effort. Bigger and smaller groups. More or less links. People always need a reason to do things, the stronger the reason, the more time and effort they'll put into it. There are exceptions, of course, but the incentives for a lightweight producer are so low that the market will always be highly volatile.

In some social sites there are also groups of people actively trying to disrupt the site, a set of anti-producers that get some entertainment out of corrupting the works. These seem to be short term affairs, but they cause process problems for the rest of the site. They effectivily gum up the works, by forcing the site to implement too many rules to stop them. In the same category come the entirely self-serving spammers, people whose only interest is in generating some noise that arbitrary hooks people, they don't care at all about content. These are the intentionally negative producers.

Mostly, the intent for most people is on the positive site. There is some content that they would like to share for some reason. There are many different ways to approach that type of desire.

My goals for my blogs, for instance, are to get them read by the largest possible 'relevant' audience, a goal that is often harder than it sounds. With all of the different available sites for 'trading' clicks, you'd think it would be easy to just join them, put in some effort, and get noticed. The problem with gaming many of these systems however, is that it is self-defeating. I get readers, but not the ones I want. Since most of my writing is targeted towards a niche software developer audience, attracting lots of people interested in video games, for example, doesn't help me get my message out, or extend my readership. It just wastes my time and misleads me into thinking it is more poplar than it really is.

Few sites support the content producers, or give them tools to more effectivily target their intended audience. From a producer's perspective, many of these systems are barely assessable or just plain annoying. Over the last couple of years I definitely build up a list of sites to avoid.

Some sites assume that producers are spam by default, but that type of punishing is counter-productive. Ultimately the quality of the site depends on its content, so attracting and keeping the interest of good content producers should be a core requirement. Allowing them to help target their material should also be deemed important.

THE NEXT STEP

In analysis, now that we've examined the pieces fairly carefully it is time to step backwards and look at the bigger picture. The way things interact with each other is important. Until you understand how the puzzle pieces fit, it's hard to put the elements in context, so it's hard to construct something workable from that understanding. Understanding the little bits isn't that difficult it's just observation, but people often fail to draw meaningful understandings from what they see.

The social sites, at their core, are very simple. They are just resources collected by producers and shared with consumers. It is amazing how something so simple in its essence has spawned so many different related-sites, policies, groups, etc, and so many people are dedicated to gaming these systems in some way.

The first most important question is whether or not the current sites are actually meeting the user's needs? That answer, I think is no. That is why there are so many new sites and so much constant migrations between sites. The site-du-jour is simply the next place to go for the few months before the quality falls again.

One guesses that it is either failed analysis (trying to map it all onto one type of resource), or just technical limitations that have been driving the current generation of systems. Either way, they are not meeting the needs of their users.

We know what users are looking for, both timely news and the best-of information on a specific topic. The computer as a tool is in its element when we use it to remember things for us. The information pile the we personally want to build up is the we've "seen it" type, while the information pile we want the whole system to build up is that it is "ranked X in this category".

Although it is a lot of information, what we really want is to be away from the computer for a while, but know that we have not missed anything important in that period. The tool should 'monitor' the flow for us, and 'synchronize' us with the important links.

The best way to handle this is to categorize the resources into different types, possibly overlapping. News and articles might be considered reasonably suitable names for these types.

For the news-based timely resources, they should be presented to the user within their life-span, in order of their popularity. A "what's hot now" list. For the historic article-based ones, the system should remember where the user is in the reading list. They are slowly making their way through all of the articles, it is important not to miss any. That may be a lot of information, but it's not impossible to compute at all, and optimizations can be found that approximate this behavoir with less resources.

Slashdot, Digg, Reddit and Hacker News have the news concept down. Del.icio.us and StumbleUpon do better at the article one, since they don't impose specific lists onto their users. None of the sites really cope with insuring that you don't miss out on a 'changing' critical reading list. I want the computer to track my progress.

Another point that is important with these sites is that the cliques in charge of editing the flow of data are volatile, so the quality of data is volatile as a result. Most of these systems are unstable by definition. The nice part about an old fashion newspaper or magazine is that the staff is fairly consistent over the years and is motived to make the work likable by a well-known set of subscribers. Quality is why they get paid.

Motivated editing cannot be under-valued. Tossing this responsibility to the masses means unpredictability, which is exactly how these sites work. A small group of paid employees is far more likely to produce a consistent package then a large group of volunteer ones. This will always be the case. The sites founders are hoping to exploit cheap labor, but like all things the trade-off is an inherent instability.

Thinking long and hard about it, the ways the sites have been growing popular than fading is exactly the way it should work. The short-term motivation to dominate a site dissipates as the site becomes too large, thus degenerating into something more chaotic. The later stage forces lots of people away, that they makes it controllable again. A new group finds the system and starts gaming it. A pendulum swinging back and forth.

Human based problems are often driven by irrational behavior that fluctuates over time. It makes sense, we straddle the line between our primitive evolutionary emotions and our growing abilities of intellect. We push forward on the intellectual front, only to be pulled by back by our underlying nature.

We are inherently inconsistent, and will always be that way. We're all individually ping-ponging between these different internal forces, so its not a surprise to see them effecting us in groups as well.

BACK UP TO ANALYSIS

A group of people sharing some common links leads to a lot of interesting and messy problems. The key to really understanding their difficulty is too realize that it is a people problem, and that people are emotional, irrational and slippery. It's easy to miss that point in analysis, leading to failure.

Even if you account for our nature, it is more difficult than most people realize to take a full step backwards and see something 'entirely' objectively, but that distance is absolutely required to understand how to automate something with a computer.

The precise pure abstract nature of a computer means that we have to transcend our own sloppy nature and find consistent deterministic discrete ways to map that back into an abstract world. We've never had to apply that type of rigor to our thinking so far in history. It is a new type of problem. In the past, if it was messy, we could just become accustomed to it. Now, however, we clearly see that the messiness snowballs into serious complexity problems. We have to understand why it is messy, so that we can unwind it, or at least model it correctly.

The computer as a common tool has forced us to see ourselves in a whole new way. We can't just ignore our flaws anymore and hope they go away. They don't and only get magnified when automated.

Thus, the only way to really make computer systems work well, is to account for the irrational mess underneath, and work around it. Mostly it is not the type of thing that will change for at least a generation, probably many, so it is not really incorrect to enshrine it into the data. It is the real-world and that is what we want to capture. Analysis is primarily based on understanding this. The things we do not understand are the ones that come back to haunt us later.

Saturday, August 2, 2008

Pattern for Pattern

I was buried deep in the coding trenches, again. Once I get going, the lines fly by, each fitting comfortably into place, finding its own groove.

As I sat back and looked over what I had written, I achieved one of those "Aha" moments. That instance when you've come to a deep understanding that has eluded you for a long time, possibly years. The type of thing that is so obvious, that once you know it you're left to wonder why it took this long for you to really see it.

But before dumping out my revelation, it is best to understand this type of knowledge in its own context. A little bit of history helps.

SOME EXPOSURE

My first exposure to the Object Oriented (OO) paradigm was back in the late eighties. I saw C++ used successfully for developing complex, yet highly visual programs. The one-to-one correspondence between the things on the screens and the objects in the source code meant that any desired changes to the presentation of the program were easily mapped back to the correct responsible source code. If you saw something you didn't like, it was obvious where it needed to be changed in the code. The extra work to decompose the code as objects in this case was hugely paying off.

While I appreciated the strengths of the paradigm, it was a long time before my employers came to allow the related technologies into production systems. Software development is often slow to change.

The early nineties left me programming with a strict Abstract Data Type (ADT) philosophy, similar to objects but a little different. The code is decomposed into lists, arrays, trees, etc. These form the building blocks for the system. It's a very similar decomposition to Object Oriented, but it is not enforced by the implementation language. It is close enough that you could use Object Oriented design techniques.

In those days Object Oriented programming was gaining in popularity, but a movement towards real-world objects had taken over. The result was the application of 'brute force' towards the development of objects. The results were ugly.

Instead of thinking about it in abstract terms, the dumbed-down solution was to just start pounding as much behavior into every object as possible. Presumably, this was to get going fast and avoid deep thought. As with all "ain't it easy if you just follow these 3 simple rules" approaches it works for simple code, but bloat and poor performance were soon to follow. Then came un-maintainability, the death nail to any fragile system.

At this point two good things happen: Java arrived, and the Design Patterns book by the GoF was published. The language was clean and simple, but the patterns were the big leap. To me it was a renewal of abstract programming techniques into the Object Oriented world. I spent considerable time on a pet project implementing many of the design patterns in Java, just to see how well they worked.

Bending the focus back towards the abstract helped in trying to elevate the quality of the code. Programming isn't hard, but it isn't thoughtless either, you still need to spend some internal wetware cycles to find a clean readable ways to accomplish your goals. You have to be doing it for a very long time before it becomes so instinctual, that you no longer need to think about it. Thus schemes to make programming simple and easy never work.

OVER THE YEARS

I ping-ponged between ADTs, and OO code over the next few years. In one notable case -- possibly because of the language but also because it was easier to debug -- I shied completely away from OO code, and fell back into ADTs. I didn't have the luxury of being able to spend a lot of time fixing problems, so the code had to be ultra simple. ADTs don't have the 'philosophical' problems that objects have, there are no conflicting 'right' ways to use them. Put them together, and off you go.

It was in and around that point that I kept feeling that the decomposition of the world into this object-type model was exceedingly awkward at times. Some things fit easily into objects, but many did not. If there isn't enough value to justify the decomposition, all of the extra work is wasted, and quite possibly negative.

I was following the developments and discussions, but I kept failing to climb on the "objects rule" train. But worse, I was beginning to suspect that there was something horribly wrong with patterns as well.

At one point, in a job interview someone ask me what patterns I would use to create an email server. That question really bothered me. I fumbled through my answer, not convincingly, but in the back of my head I knew that the question itself was completely wrong. Had I been keen on the job I might have questioned it there and then, but it was just one of many warning signals to avoid this particular position. Some fights aren't worth having.

A while back, I returned back to the Object Oriented world, specifically using Java. For all of the advances in libraries, and frameworks and things, I was very disappointed with the state of the code I was seeing. It should have been more elegant and less fragile. Most of it wasn't. It seems that the standards and conventions skewed back towards being brute force again, probably because programmers get scared, so they belt out the code.

What was difference this time, was how Design Patterns, which I thought were a great movement, had somehow managed to get subservient into more brute force techniques. How could something so abstract, be wound down into with such crude implementations. I knew there was a reason for this, but I couldn't put my finger on it, until now.

A BIT OF A REVELATION

At this point in my career it's fairly safe to say that I am proficient with Objects and Design Patterns, but I'm not what you might call a "true believer". I know all the techniques and rules, but for readability I'll often stray from them. I know all of styles and conventions, but I only pick from some of them. I'm comfortable building up a huge Object Oriented system, but if I was extremely tight on resources, again, I'd prefer ADTs. I use the technologies because they are there and popular, but I am far from convinced that they are truly paying for all of their insecurities. For some code it works well, for some it does not.

Back to the problem at hand.

So here I was trying to leverage polymorphisms to save me having to splat out a lot of permutations of simple code. I also wanted to use inheritance to cut down on the duplicated sequences of code. The thing I was trying to write broke down into three base objects, one of which broke out into dozens of different but similar ones. The parent, I used as an abstract factory for creating the kids. Each of the child objects was to act as a proxy controlling a specific set of underlying system widgets. The proxy aspect was critical because it meant that I could explicitly limit resource use in the case where there where huge structures, and there were going to be huge structures. From the outside the kids shouldn't be visible, just the parent. I want to encapsulate their differences away from the rest of the system. And most of this should be in the same file, so its easier to see and visually inspect the code for inconsistencies.

So, it was a heirarchy, composition type model, where one of the nodes acted as a singleton factory facade towards proxies, that were partially flyweights. And there was an observer to collate and redirect events. Really, there were a whole lot of design patterns, that I wanted to get working in this time code space.

Now at this point, one has to understand that once I became suspicions of design patterns I stopped trying to push them hard. I felt it wasn't right to name something xxxxFactory for instance, but I had no clue as to why? It just felt vaguely Hungarian. I kept using them, but I wanted to deemphasis their presence.

In this code I kept to the rule about calling all Objects for their real-world names, while avoiding any mention of their underlying pattern construction. Also, even though I know the patterns, I didn't limit my self to their strict implementation. I simply wanted them to be vaguely like a singleton or factory, not actually like that. They are patterns after all, just starting places for code construction, not Lego-like building blocks.

And that is where and when it hit me. Hit me hard.

Of course, patterns are just patterns, not building blocks. Not ADTs. Not things that you use to compose a system. They are just suggestions. But that, is not really the big revelation, I had always believed that. The real understanding came from looking back at my code.

I had just badly implemented seven or eight patterns, but I had only done so with a few basic classes.

Ohh. Dang. Ahhhh. Yes. It hit like a meteorite.

A pattern, being just a pattern means that for any sequence of code multiple patterns can and will "overlap".

They'll all be placed in the same small set of code. You can have a Singleton/Proxy/Factory or a Composite/FlyWeight/Iterator, or whatever crazy combination of patterns will make the underlying objects most appropriately behave in the simplest manner that makes them usable.

The way we model the data inside of the computer is not composed of a bunch of patterns as building blocks, instead it is composed of a bunch of objects that span several different patterns all at once. It was how this had been flipped, that was bothering me so much.

Then because there are many possible patterns existing in the same space, the obvious corollary is that you clearly don't want the pattern names to influence the objects.

My hunch about not calling the objects xxxxFactory was absolutely correct. That's just ugly and will cause trouble over time, for exactly the same argument that proved it was a bad idea to stick the 'data type' into a variable name. Hungarian notation was clearly wrong. Code changes, but programmers are lazy. Thus things quickly get misleading, which wastes time, lots of it.

AND FINALLY

So, that is it underneath. Patterns, I've alway known shouldn't be taken as literal building blocks for code construction. Doing so results in massively complex code because the implementation of overlapping patterns as many disconnected artificial objects is a massive increase in complexity for no real reason. That type of "right" way to do it, isn't even close.

The worst thing we can do as developers is to deliberately introduce unnecessary complexity into the solution. Things are complicated enough without having to make it more so.

Following some rigid, fixed breakdown for coding comes at the expense of making it readable. Oddly the industry has always had a bipolar nature where many of the younger programmers pine about how programming is an art form, while looking for quick and easy ways to mindlessly build code. It's a funny sort of hypocrisy, that gravitates programmers towards the extremes.

Patterns are a place to start, and they should overlap each other. They are very different than data structures, and should not be confused with a set of axiomatic building blocks. Patterns shouldn't appear in class or method names, but then again neither should type or data-structure. The "thing" that the object represents in the real-world, abstract or not, should be used a simple name for the object. That keeps it obvious and cuts down on the length of the names.

That fixes some of my complaints about OO, but I still have trouble applying the code to highly long sequences of actions. Noun based code is easy, but protocol based stuff is just damned ugly. Still, I am looking for ways to just plunk an algorithm into one single big containing object, ignoring the proper way to use the paradigm, so that it is easy to debug. It might not be "right", but at 4am I'm less interested in being right, and more interested in getting back to sleep. One has to keep their priorities straight.