The Programmer's Paradox: Time Consuming

Sunday, July 11, 2021

Time Consuming

For most development projects, most of the time to build them ends up in just two places:

Arranging and wiring up widgets for the screens
Pulling a reliable window of persistent data into the runtime

The third most expensive place is usually ETL code to import data. Constantly changing “static” reports often falls into the fourth spot.

Systems with deep data (billions of rows, all similar in type) have easier ETL coding issues than systems with broad data (hundreds or even thousands of entities).

Reporting can often be off-loaded onto some other ‘reporting system’, generally handled outside of the current development process. That works way better when the persistent schema isn’t a mess.

So, if we are looking at a greenfield project, whose ‘brute force’ equivalent is hundreds or thousands of screens sitting on a broad schema, then the ‘final’ work might weigh in as high as 1M lines of code. If you had reasonably good coders averaging 20K per year, that’s like 50 person-years to get to maturity, which is a substantial amount of time and effort. And that doesn’t include all of the dead ends that people will go down, as they meander around building the final stuff.

When you view it that way, it’s pretty obvious that it would be a whole lot smarter to not just default to brute force. Rather, instead of allowing people to spend weeks or months on each screen, you work really hard to bring it down to days or even hours of effort. The same is true in the backend. It’s a big schema, which means tonnes of redundant boilerplate code just to make getting around to specific subsets convenient.

What if neither of these massive work efforts is actually necessary? You could build something that lays out each screen in a readable text file. You could have a format for describing data in minimalistic terms, that generates both the code and the necessary database configurations. If you could bring that 1M behemoth down to 150K, you could shave down the effort to just 7.5 person-years, a better than 6x reduction. If you offload the ETL and Reporting requirements to other tools, you could possibly reach maturity in 1/6th of the time that anyone else will get there.

Oddly, the above is not a pipe dream. It’s been done by a lot of products over the decades; it is a more common approach to development than our industry promotes. Sure, it’s harder, but it’s not that much riskier given that starting any new development work is always high risk anyways. And the payoff is massive.

So, why are you manually coding the screens? Why are you manually coding the persistence? Why not spend the time learning about all of the ways people have found over the last 50 years to work smarter in these areas?

There are lots of notions out there that libraries and frameworks will help with these issues. The problem is that at some point the rubber has to hit the road, that is, all of the degrees of freedom have to be filled in with something specific. When building code for general purposes, the more degrees of freedom you plug up, the more everyone will describe it as ‘opinionated’. So, there is a real incentive to keep as many degrees open as possible, but the final workload is proportional to them.

The other big problem is that once the code has decided on an architecture, approach, or philosophy, that can’t be easily changed if lots of other people are using the code. It is a huge disruption for them. But it’s an extraordinarily difficult task to pick the correct approach out of a hat, without fully knowing where the whole thing will end up. Nearly impossible really. If you built the code for yourself, and you realized that you were wrong, you could just bite the bullet and fix it everywhere. If it’s a public library, they have more pressure to not fix it than to actually correct any problems. So, the flaws of the initial construction tend to propagate throughout the work, getting worse and worse as time goes by. And there will be flaws unless the authors keep rewriting the same things over and over again. What that implies is that the code you write, if you are willing to learn from it, will improve. The other code you depend upon still grows, but its core has a tendency to get permanently locked up. If it was nearly perfect, then it’s not really a big problem, but if it were rushed into existence by programmers without enough experience, then it has a relatively short life span. Too short to be usable.

Bad or misused libraries and frameworks can account for an awful lot of gyrations in your code, which can add up quickly and get into the top 5 areas of code bloat. If it doesn’t fit tightly, it will end up costing a lot. If it doesn’t eliminate enough degrees of freedom, then it’s just extra work on top of what you already had to do. Grabbing somebody else’s code seems like it might be faster, but in a lot of cases it ends up eating way more time.

The Programmer's Paradox

Sunday, July 11, 2021

Time Consuming

No comments:

Post a Comment