Saturday, December 15, 2018

Scaling Development

Given a software project that would require three people a full year to build, is it possible to get it completed in six months?

The first key point is whether or not it cleanly partitionable? Can it be broken up into two, separate, independent projects? For some types of software development, this is fairly easy. A large website, for instance, can often be split up conveniently by screens. A team works on one set of screens, while another team works on a different set. A complication comes if both sets of screens rely on the same underlying database schema, so that moves the schema definition to the beginning of the project and blocks both teams until completed. It also means that any changes to the schema now have double the impact. Both teams are affected, both need to recode their parts of the system.

If the partition crosses a natural boundary, like one team is working on the user side of the system while the other is working on the administrative side, there are few collisions, so minimal coordination needs to happen. Still, there is at least some effort to bring the two teams into alignment at the management level, so there is some extra effort there.

In most systems, there is are significant opportunities for reuse of identical code. Gui screens actually have a huge amount of redundancy, although most programmers do not try to capitalize on this. Still, it means there will definitely be some redundancies in the code, but also in the design and testing stages. If these redundancies are not fully independent then there will also be bugs created by the two teams being out of sync with other. That problem can be mitigated early but would require more upfront design, particularly at the boundaries that intersect. Many systems can be built effectively with just a top-down, high-level specification, but intersections would require additional mid-level specifications to be properly coordinated. Not doing that work would result in more testing and integration problems, generally with an exponential growth, so a lot of time is saved by coordinating up front.

The differences in the teams would inevitably lead to differences in styles and conventions, so along with the intersections, this would result in significant technical debt. That might not be noticed in the earlier versions, but would definitely impede any efforts to extend the code base, so would ultimately shorten the effective life of the code.

Now that would be the situation for fairly partitionable, routine software development. It is basically doable, but it would be highly likely that the resulting quality would be lower than if the project wasn’t scaled (although there is no intrinsic guarantee that this would be true since it depends on the skill and knowledge of the original team). Like any concurrency issue, doubling the number of people would not directly half the time. The intersection overhead kicks in to prevent it from being linear.
For a project that was highly interdependent and/or deeply on the cutting edge, the situation would be far worse. Since the ability to craft mid and low-level specifications is far harder, the time either grows exponentially or there are exponentially more integration bugs. As well, either the project adopts an onion-like architecture (impedes growth) or the impact of changes becomes more disruptive (often referred to as ‘merge hell’). If maybe a routine development project has a multiple like 2.2x, the management overhead for a complex project would start to hit ranges like 3x-5x, or possibly worse. 

Extending this out, it’s clear to see that if you need 200 lines of code, having 201 programmers means that at least one programmer has nothing to do. If the project is composed of ten 20 line pieces, then 11 programmers is too many. As such, for any development, you might be able to half the time (scale the development) but there is some physical fixed limitation that directly applied that means you cannot recursively apply this halving forever. That is, as always, scaling has fixed limitations. There could never be such a thing as ‘infinite scaling’.

This is important to understand, particularly if we look at a project from the perspective of ‘density’. That is, a routine website development can be considered to not be dense. The code is fairly obvious and straightforward. As the underlying complexity and interdependencies grow, we consider that to be more dense. If we partition a routine project cleanly, we can do so because it has low density, but the two new resulting sub-projects are now denser than the original. If we cannot partition forever, the fundamental reason is that the density increases until it passes a threshold where partitioning is ineffective.

Getting back to already dense code, it by definition is far less partitionable just because of its underlying nature. 

What this says overall is that, yes, there are some projects that you can do twice as quickly. There is a real cost to doing this, but if you can accept lower quality it will work. But there are definitely projects where this is absolutely impossible. That adding more bodies would just prevent or significantly slow down the project. 

Realistically, as an industry, we have known this for a very long time, but there have been some major changes to the way we develop code that has obscured this knowledge. 

One huge change is that more of software development has often become just writing light ‘glue’ code to go between frameworks and libraries. While this degrades the quality of the final system (since the different components don’t fit together nicely) the glue code itself is far more partitionable. In this type of development, scaling is easier and any reductions in quality are intrinsic to the underlying components themselves and not because of the scaling effort. This type of shift has really only enhanced the myth that software development is always linearly scalable (with a minor overhead). Unfortunately, most, if not all, of the software development projects with this attribute are straight up routine development, and we’ve been redundantly applying a massive amount of effort to repeat this work over and over again, with only slight variations in design. Given its nature, it is fairly obvious that we could use software itself to automate significant aspects of this type of work, but as an industry, we have chosen to ignore this and continue forward with the manual approach of just gluing together misfitting components. For software that doesn’t fit into this category, it is obviously more dense and as shown earlier it has real constraints on its scalability, and obviously automation.
So the answer is ‘it depends’, but the corollary is that if it does work, then it is highly likely that there was a more automated approach that could have made it unnecessary anyways. If we wanted to improve our industry, it would make a lot of sense to really clarify density and also build more tools to eliminate the huge amount of manual effort required in continuously rebuilding the same routine systems over and over again.

Saturday, February 10, 2018

The Value of Software

A huge whack of code, on its own, is pretty useless. The code needs data to process.

The user’s real-world issues only get resolved by enabling better decisions made from the persistent data. Data is the heart and soul of any software system.

What’s really changed over the decades is not that the code we write got any better, or that the frameworks and libraries became easier to use, but rather that data became hugely abundant. The World Wide Web is a great example. From a coding perspective, it is a dog’s breakfast of good and bad ideas munged together in rapidly evolving chaos that often barely works and has had exponential explosions of obscene complexity. Technically, taken all together, it is pretty crappy. But the Web itself, at least until recently, was a monumental achievement. With a bit of patience and perseverance, you could find a tremendous amount of knowledge. It opened up the world, allowing for people to learn about things that were previously off limits. It was the data, the sheer scale of it, that made the Web wonderful.

That explosion, of course, is diminishing now, as the quality of available data is watered down by noise. The tools we built were predicated on not having enough data, they cannot deal with having too much low-quality stuff.

Programmers still, however, hyperfocus on code. It’s as if an ‘algorithm’ really has the power to save the day. All they think we need is just something better, that can magically separate the good stuff from the bad. Ironically, for our industry, we have known for at least half a century that shoving garbage into code produces garbage as output. And, at least algorithmically, nothing short of intelligence, can reliably distinguish data quality. If we want better data then we have to train an army of people to sift through the garbage to find it. The best we can do is to craft tools that would make this work less painful.

The promise of computers was that they could remove some of the drudgery from our lives, that they could help keep us better organized and better informed. The reality is they now waste our time, while constantly distracting us from the stuff that really matters. The flaky code we rely on is a big issue, but the real problems come from streams and streams of bad data. If we let everyone type in whatever they want, then not only does the conversation degrade, but it also becomes increasingly impossible to pick out value. A massive network of digital noise isn’t going to drive positive changes. Code is a multiplier, the real underlying value of software comes from the quality of the data that it captures. We won’t get better software systems until we learn how to get better data.