Sunday, August 10, 2008

Social Disorder

I haven't blundered through a discussion about software-oriented domain analysis for a while. It is an important topic, but a hard one to discuss.

Good analysis drives a successful software project, but its underlying nature makes it difficult to quantify and describe.

Analysis sits at the intersection of the real-world and the abstract. Its complications lie in trying to force a discrete model over top of incomplete information and quasi-irrational behavior. We observe, but then we skew that to meet our own expectations. We blind ourselves into making nice little simple rules. We often don't see what is really there.

Even if we see it, quantifying it can also fail. If you don't compact things down into simple discrete chunks, the analysis has no value. If you pack too many different pieces onto each other there is also trouble. We like to look for one unifying rule that contains everything, but that desire is often the problem. The real world is rarely that simple.

Perspective can influence the analysis. It can change radically based on which things you observe. We only ever see a sliver of the problem at any one time, that can produce misleading results.

An analyst need not delve into every nook and corner, only the domain knowledge relative to the software needs to be understood. Compiling it is time-consuming. Wasted effort if it can't be used.

Analysis just for the sake of it is bound to lead in directions that are not practically useful for programmers. That is why earlier attempts to split off business analysis from programming were unsuccessful. The understanding required from the process need only be enough to implement the software properly. If you don't understand what is required at the end of the analysis, you don't know how deep to go. All we really need is just enough to get the code written.

Analysis is very different from programming, but it is still a necessary skill for software developers. A big project may not need a huge number of analysts, complex development can often pivot more around technical issues, but their work is vital in assuring that the code being developed is really a usable tool. Good analysis contains the problem.

Over the years, every successful project that I have been on has been based on solid underlying analysis. Most of the failures came from lightly dealing with, or skipping it. A project that does not know what it is building is doomed from the start.

For success, I have learned to leave my emotions, assumptions and expectations at door preventing them from tainting my findings. But is is always difficult to transcend oneself and achieve a level of objectivity. In analysis, I am my own worst enemy.


A SIMPLE EXAMPLE

The easiest way to explore analysis is to work through a concrete example; a simple and publicly available one. For this I think that the Web 2.0 social bookmarking sites are interesting and deep enough. They keep popping up in blog articles. This includes sites such as Slashdot, Reddit, Digg, Del.icio.us, StumbleUpon and Hacker News. These sites revolve around the sharing of information links amongst a large community.

Most of these sites are fairly simple in concept, users submit interesting links that they find around the web. The good stuff rises to the top, so everyone can spend less time surfing. Technically this is all about sharing some common resources (links) in an organized manner.


SOME HIGH LEVEL ANALYSIS

On the surface this is a simple problem: web surfers want to access the latest and greatest bits in a massively expanding web of information, but they don't want to spend forever looking for it. Collectively, a large group of people can do the work necessary to filter out the bad stuff, making the rest publicly available.

But, if we dig deeper we find that there are lots of people-related problems bubbling just below the surface. For instance, a lot of what Clay Shirky says about social group interactions holds true:

http://www.shirky.com/writings/group_enemy.html

These sites quickly degenerate, and anti-social behaviors start to pile up. People are constantly trying to manipulate or "game" the system. Complaints start up about control and quality, and then the users move on to another site. It is a reoccurring theme. Giles Bowkett is just a common example of another dissatisfied user:

http://gilesbowkett.blogspot.com/2008/05/summon-monsters-open-door-heal-or-die.html

These are tonnes of these types of rants available for any and all of the main sites. The big question is why? Why, does this keep happening again and again?

To know that, we need to understand what is underneath. We need to quantify it in some way.

The first part of any analysis is too look at the people involved. People always form the base layer in the real world that interacts with the system. Most systems are just ways to pile up data related to people. Intrinsically, any or all complication draws its source or inspiration from the actions or history of people. The messiness always starts there.

In these systems the users can be broken down into consumers or producers. The consumers are passively using the system, taking advantage of the work put into the filtering. The producers are actively going out to the web and finding new and interesting links to add in. The roles mix to some degree, most users are a little of one or the other, but being a producer holds a higher precedence, it is a more interesting role than being a consumer.


CONSUMERS

To understand the consumers, you have to start by asking yourself: "why do people come to a site full of links?" What is it they are looking for?

The often forgotten central point of analysis is that the software tool is only there to automate some specific effort; to help in managing a pile of data. Knowing what's in the pile, and why it is important is the key. If you don't understand what motivates a consumer to spend time using the system, then its hard to augment that experience. Playing with the pile of data has to be worthwhile, or people won't bother.

The consumers, in this case, are looking to the computer to find current, relevant or significant information that relates to them, their lives and their interests. If you're interested in computer equipment, you'll spend a lot of time browsing and reading about hardware. If you are not interested in cameras, you're not going to want to spend a lot of time reading about the latest offerings from Cannon an Nikon. Each user is different.

Interest is a driving factor, but not the only one. An easily misunderstood point is that all of the information on the web has a timeliness factor. Some of the information only has value if it is delivered right away. Some of it has value no matter when it is delivered. Time is significant.

That complicates the problem because consumers are really looking for two very different types of things. They want to be kept up-to-date, but they also want to see the 'must-read' information for their particular topics of interest. There are two different types of data intermixing in these sites.

Classically, most often both types of information are combined, and then falsely assumed to be timely. That places a careless quality over-top, turning much of the underlying knowledge into a shallow form of info-tainment. A light-hearted way of just skimming over the surface of the web, without generating any lasting effects. Easily read, easily forgotten.

People gravitate to these systems, but the ties are loose. They easily pick up and go to the next one, once the fashion has changed. Loosely bound users makes for a volatile audience.

The changing landscape of both the content and the users counts for a lot of the complaints for these types of systems. Consumers get accustomed to some specific steadiness, and then re annoyed by it changing.

For myself, I want some idea of the latest news, but I also want the to see the must-read papers and articles for the software industry. Sometimes I am interesting in the weird and wonderful, sometimes just the core stuff. If I have some time to kill, it is sometimes nice to get something light and funny. But, I also have a more serious side.

I want to keep up with that big news storied and I'd like the computer to help me keep track of all of the important 'critical' industry specific papers that I have read, and help me find more that will enhance my professional development. Sometimes info-tainment is fine, but I want a tool to keep me up-to-date and to help me grow as a professional.


PRODUCERS

Producers are intrinsically more complicated than consumers. You have the clearly motivated ones, such as bloggers, writers or advertisers that are trying to promote their work. These are the root producers of content on the web, some are professional, many are not. Some have noble goals of trying to share their understanding, many are just after fame or money. All of them are driven by some need to get their work out to as many people as possible.

For the social bookmarking sites, you have another lightweight type of producer that scans the web looking for interesting material and then posts it. These are the heart and soul of any social networking site. The motivations of these people are far more difficult, many are just out there for the sheer kick of trying to influence the market. Some are bored, some just want to contribute.

Intentionally or not, all of the producers are always trying to game the system. On either side, the more they get control of the underlying resources, the happier they are. It's not that their goals are necessarily destructive, it's just that the need to get the widest possible audience is fundamental to being a producer. Why produce if nobody is going to listen? More readers means more incentive.

The heavy content producers tend to frequent many different sites, and prefer to minimize their time at each site. They often make illicit trades with other users to get clicks -- there are a huge number of sites dedicated to these markets -- but they manipulate the game primarily to get their own work into the top-ranking positions. They're not really interested in the site, just its ability to distribute their efforts.

Lightweight producers stick to specific sites, where they try and dominate the flow. The lightweight ones also form together in cliques, realizing control with their mass, but most of these groups form internally in the systems. Usually these groups are more cohesive and long lasting than any heavyweight arrangements. They often have very strong identities, some set of underlying moral conduct, and some purpose.

It's hard to tell, but the lightweight producers seem to keep it up for a while and then get bored, leaving the field open for other cliques. A large site will have a huge number of different cliques fighting it out for the top positions. It can be competitive, and likely pretty nasty at times. Many sites encourage this with rankings and secretive back-door ways for their users to communicate. A good clique can put in a tremendous effort to get quality links into the system.

The lightweights most directly influence the nature of their sites. Large-scale shifts in content generally mean significant changes in the underlying cliques. Often what is though of as diminishing quality in a growing site, is nothing more than just an increasingly competitve market, with a few earlier more conscientious players leaving the field.

Ultimately, because there is little direct or long-term incentive, the controlling groups get larger and less selective. The system by its nature will degenerate at times, cycling between periods better and less effort. Bigger and smaller groups. More or less links. People always need a reason to do things, the stronger the reason, the more time and effort they'll put into it. There are exceptions, of course, but the incentives for a lightweight producer are so low that the market will always be highly volatile.

In some social sites there are also groups of people actively trying to disrupt the site, a set of anti-producers that get some entertainment out of corrupting the works. These seem to be short term affairs, but they cause process problems for the rest of the site. They effectivily gum up the works, by forcing the site to implement too many rules to stop them. In the same category come the entirely self-serving spammers, people whose only interest is in generating some noise that arbitrary hooks people, they don't care at all about content. These are the intentionally negative producers.

Mostly, the intent for most people is on the positive site. There is some content that they would like to share for some reason. There are many different ways to approach that type of desire.

My goals for my blogs, for instance, are to get them read by the largest possible 'relevant' audience, a goal that is often harder than it sounds. With all of the different available sites for 'trading' clicks, you'd think it would be easy to just join them, put in some effort, and get noticed. The problem with gaming many of these systems however, is that it is self-defeating. I get readers, but not the ones I want. Since most of my writing is targeted towards a niche software developer audience, attracting lots of people interested in video games, for example, doesn't help me get my message out, or extend my readership. It just wastes my time and misleads me into thinking it is more poplar than it really is.

Few sites support the content producers, or give them tools to more effectivily target their intended audience. From a producer's perspective, many of these systems are barely assessable or just plain annoying. Over the last couple of years I definitely build up a list of sites to avoid.

Some sites assume that producers are spam by default, but that type of punishing is counter-productive. Ultimately the quality of the site depends on its content, so attracting and keeping the interest of good content producers should be a core requirement. Allowing them to help target their material should also be deemed important.


THE NEXT STEP

In analysis, now that we've examined the pieces fairly carefully it is time to step backwards and look at the bigger picture. The way things interact with each other is important. Until you understand how the puzzle pieces fit, it's hard to put the elements in context, so it's hard to construct something workable from that understanding. Understanding the little bits isn't that difficult it's just observation, but people often fail to draw meaningful understandings from what they see.

The social sites, at their core, are very simple. They are just resources collected by producers and shared with consumers. It is amazing how something so simple in its essence has spawned so many different related-sites, policies, groups, etc, and so many people are dedicated to gaming these systems in some way.

The first most important question is whether or not the current sites are actually meeting the user's needs? That answer, I think is no. That is why there are so many new sites and so much constant migrations between sites. The site-du-jour is simply the next place to go for the few months before the quality falls again.

One guesses that it is either failed analysis (trying to map it all onto one type of resource), or just technical limitations that have been driving the current generation of systems. Either way, they are not meeting the needs of their users.

We know what users are looking for, both timely news and the best-of information on a specific topic. The computer as a tool is in its element when we use it to remember things for us. The information pile the we personally want to build up is the we've "seen it" type, while the information pile we want the whole system to build up is that it is "ranked X in this category".

Although it is a lot of information, what we really want is to be away from the computer for a while, but know that we have not missed anything important in that period. The tool should 'monitor' the flow for us, and 'synchronize' us with the important links.

The best way to handle this is to categorize the resources into different types, possibly overlapping. News and articles might be considered reasonably suitable names for these types.

For the news-based timely resources, they should be presented to the user within their life-span, in order of their popularity. A "what's hot now" list. For the historic article-based ones, the system should remember where the user is in the reading list. They are slowly making their way through all of the articles, it is important not to miss any. That may be a lot of information, but it's not impossible to compute at all, and optimizations can be found that approximate this behavoir with less resources.

Slashdot, Digg, Reddit and Hacker News have the news concept down. Del.icio.us and StumbleUpon do better at the article one, since they don't impose specific lists onto their users. None of the sites really cope with insuring that you don't miss out on a 'changing' critical reading list. I want the computer to track my progress.

Another point that is important with these sites is that the cliques in charge of editing the flow of data are volatile, so the quality of data is volatile as a result. Most of these systems are unstable by definition. The nice part about an old fashion newspaper or magazine is that the staff is fairly consistent over the years and is motived to make the work likable by a well-known set of subscribers. Quality is why they get paid.

Motivated editing cannot be under-valued. Tossing this responsibility to the masses means unpredictability, which is exactly how these sites work. A small group of paid employees is far more likely to produce a consistent package then a large group of volunteer ones. This will always be the case. The sites founders are hoping to exploit cheap labor, but like all things the trade-off is an inherent instability.

Thinking long and hard about it, the ways the sites have been growing popular than fading is exactly the way it should work. The short-term motivation to dominate a site dissipates as the site becomes too large, thus degenerating into something more chaotic. The later stage forces lots of people away, that they makes it controllable again. A new group finds the system and starts gaming it. A pendulum swinging back and forth.

Human based problems are often driven by irrational behavior that fluctuates over time. It makes sense, we straddle the line between our primitive evolutionary emotions and our growing abilities of intellect. We push forward on the intellectual front, only to be pulled by back by our underlying nature.

We are inherently inconsistent, and will always be that way. We're all individually ping-ponging between these different internal forces, so its not a surprise to see them effecting us in groups as well.


BACK UP TO ANALYSIS

A group of people sharing some common links leads to a lot of interesting and messy problems. The key to really understanding their difficulty is too realize that it is a people problem, and that people are emotional, irrational and slippery. It's easy to miss that point in analysis, leading to failure.

Even if you account for our nature, it is more difficult than most people realize to take a full step backwards and see something 'entirely' objectively, but that distance is absolutely required to understand how to automate something with a computer.

The precise pure abstract nature of a computer means that we have to transcend our own sloppy nature and find consistent deterministic discrete ways to map that back into an abstract world. We've never had to apply that type of rigor to our thinking so far in history. It is a new type of problem. In the past, if it was messy, we could just become accustomed to it. Now, however, we clearly see that the messiness snowballs into serious complexity problems. We have to understand why it is messy, so that we can unwind it, or at least model it correctly.

The computer as a common tool has forced us to see ourselves in a whole new way. We can't just ignore our flaws anymore and hope they go away. They don't and only get magnified when automated.

Thus, the only way to really make computer systems work well, is to account for the irrational mess underneath, and work around it. Mostly it is not the type of thing that will change for at least a generation, probably many, so it is not really incorrect to enshrine it into the data. It is the real-world and that is what we want to capture. Analysis is primarily based on understanding this. The things we do not understand are the ones that come back to haunt us later.