Wednesday, July 8, 2020

Defensive Coding: Direction

A rather big question in software development that typically gets avoided is whether or not the development project is going well. 

That seems like an easy question. There is more code than last year, there are more features, more people are using it, etc. But those types of metrics really don’t capture momentum. They are still somewhat short term. 

For example, you start building a domain-based inventory system, and it all seems great. It’s using a fairly recent tech stack, there are a growing number of users and lots of new features are in development. So, it’s a success, yes? If you could fast forward to 2 years later, you might find that the system has become hopelessly over-complicated, it’s kinda ugly and slow now, the database is full of questionable data, the code is a mess and the original dev team has moved on to greener pastures. What happened? 

We could have looked at the project 2 years earlier and seen the seeds of its destruction. It was there, in the workmanship, the process, and generally the direction. 

Those ongoing little problems gradually become the dominant, fatal issues. They start small but multiply quickly. To see them through the noise requires looking at higher-level properties of the project. 

For instance, it’s not really the amount of code you have, it’s the amount of code that isn’t crappy or misplaced that matters. It’s not how many features you have, but rather the number that is easily accessible and obvious to a user during their normal workflows. It’s not the number of releases you’ve done, or whether you have made it on schedule, but really the operational stability that matters. Looking at these types of metrics gives a better sense of momentum.

We could really build up some serious underlying metrics that are geared towards showing these growing problems, but there is a much easier way to see it. 

You’re working hard for this upcoming release, but after that is done, will the work get easier or harder for the next release? That’s it. That is all there is to it. 

If the ongoing work is getting easier, it’s because you’ve built up and refined better code, processes, knowledge, etc. Then the momentum of the project is positive. 

If each time, it is getting harder, the things that you are trying to ignore, or workaround, or just hope to go away are getting bigger and becoming more of a blocker, then the momentum of the project is negative. If a project suffers from a bunch of negative releases, it has most likely gotten caught in a cycle, and that is very difficult to get out of. 

With that foundation then it isn’t hard to start getting into particular details. What would make some upcoming work easier? What would make it harder? We just want to spend time reducing friction and providing more ability to make the work go easily. 

We can look at a few specific issues. 

First, if there are 4 or 5 programmers, and each one is coding in their own unique style, then unless they are fully siloed from each other, their ability to utilize, fix or incorporate each others work is compromised. Slower. 

The opposite is also true. If there are well-defined standards and conventions that everyone is forced to follow, then moving around the entire codebase isn’t that difficult. There might be some type of domain understanding necessary, but the technical implementations are obvious. 

Following standards is obviously a bit slower, and a bit of ramp-up to learn them, but when traded off against a detached, siloed codebase or a big ball of mega-mud, it is a huge improvement.

The same is true for frameworks and libraries. If everyone has thrown in their own massive set of dependencies, then moving around the code requires epic amounts of learning, which eats time.

Often in big systems, there is a lot of build mechanics and configuration floating around. If that is set up cleanly, then it’s not to hard to absorb it and enhance it. If it is spaghetti, then it just becomes another obstacle.

Abstractions are the double-edge sword of most development. On the one hand, they reduce code often by orders of magnitude, and in doing so they kick up the quality. If they are reused all over, they also cut down on the redundancies and impedance mismatches. On the other hand, they can be weird enough that most other programmers have little hope of understanding them, or their implementations can be impenetrable. If they are nicely documented, with clean decompositions, and encapsulated they are a huge strength and a massive reduction in work, code, time, etc. But they need to be clean and there needs to be a way that most of the current and future team understands them. That has been a growing problem, particularly with the over-reliance on question-and-answer sites for patching code to avoid understand how it works. 

The overall flow of the development matters too. 

Are the ideas for new changes coming from feedback by people who actually use the system, or is it more of a wanton creative exercise based on assumptions? The way the work enters ‘the development pipeline’ usually defines its quality. A weird, non-essential, misplaced feature is just wasting space and time, it’s contribution is negative. 

Once the work is in the pipeline, there are lots of questions that need to be answered, usually with very precise details. Again, if that is getting skipped, it’s substituted with more assumptions or bad facts, so any downstream work is unlikely to be positive. 

There is too, a strong necessity to ensure that the goals are technologically feasible. Adding a feature to search everything doesn’t help if it takes hours to return, or forces a crazy complex replication and caching architecture into being. That might work for Google search, but it out scales most system’s limitations. The costs (money and time) are unrecoverable.

In larger shops, there is usually a need to parallelize the pipelines, so making sure that they don’t knock each other off course means having to have a process around it to control, track and adjust as the priorities and schedules shift around. 

I could continue, adding a lot more, but I think that stepping back and assessing whether or not the endless series of releases are getting easier or harder is such a strong way of being able to focus in on slow brewing problems, that it doesn’t need to be explicit. It’s an easy question to ask, and you can get valid answers directly from the development teams themselves. If things are getting harder, then you mostly need to identify the many reasons why this is happening and start to mitigate them one-by-one in order of contribution. For example, if it is hard to release the code, requires lots of steps that are often forgotten, then coding that as a single script is obviously going to shave off a significant part of the blockage. Getting rid of the blockages gets rid of the friction, and frees up more time to spend on better issues like quality, readability, or even performance. A positive direction for a big project is far better than letting gravity do its job.

Monday, July 6, 2020

Defensive Coding: Cleanup

In one of my first programming jobs, long ago, my boss would make us clean up the office on the last Friday of every month. Even though we were really busy, we’d take the whole day and rearrange stuff, wipe off the dust, go out and buy new furniture if needed, and just focus on making sure our physical office space looked presentable. We weren’t allowed to work on coding or other digital tasks. Even if we were running late for a deadline. 

At the time, I didn’t get it and thought it was eccentric, after all, we were super busy, had lots of technical stuff to do, why would we waste time on the physical environment?

It wasn’t until a bit later when I worked with people who kept procrastinating on any and all cleanup tasks that I began to figure it out. If there is a little bit of clean up work building up over time, and you keep a fairly regular schedule of making sure it is done, it is not a significant problem. But as you leave it for longer, it compounds, until it becomes a rather huge problem. Once it is big, scheduling it is harder, so you procrastinate even more, and eventually find yourself increasingly hampered by it, while gradually losing the ability to fix the issue.

If you watch a master craftsman or painter go about their work, you often see similar habits. They make sure their workspace and tools are perfectly arranged, as the warm-up task for getting the job done. Working in a messy environment interferes with their ability to move fluidly through the task, which ultimately hurts the quality of the work.

In software, it’s not just the physical environment that needs tending to. The setup and maintenance of the tools we use, and the build, test, and release cycle are important as well. If you struggle to run even basic tests on the code, that will eat away at your concentration on the code itself. It’s always worth spending time on these parts of the work to get them smoother, working better. If there is a large team that loses a couple of minutes per day on scratchy, ill-fitting tools, it may seem like it’s a small issue, but when you start piling it up over years, it adds up quickly. And if you account for the intangibles, like that it affects mood and context switching time, then it adds up a lot more rapidly than people realize. 

Even if the physical and development environments are all neat and tidy and kept up to date, there are still plenty of internal code issues that need to clean up as well. The structural organization for a large project consists of a high-level architecture, some mid-level component organization, and a lot of low-level arrangements. If these are missing or disorganized, it becomes harder to know where to put new pieces of code and when that happens, that messiness slows down the rest of the work. The pieces don’t fit, or they are awkward, or that code already exists in multiple other places. Any doubts that come from that eat away at morale and confidence. In that sense, if you know what code to write, but aren’t sure where it belongs then the structure isn’t organized and the codebase is messy, and you are losing valuable time because of it. 

Nobody likes cleaning up. But the longer you avoid it, the worse it becomes. Living in a huge mess is a lot harder than living in a nice and tidy environment. If everything you need to do is Yak Shaving, it is a lot of wear-and-tear on your ability to get things done. Some people are better at ignoring it and just tunnel visioning their focus on tiny parts of the mess at any one time, but as a habit, all that is going to do is ensure that the quality remains low. It’s far better to identify what is blocking you or slowing you down, and to spend a little time each day, week, month, etc. on trying to improve it. 

It’s that you take it as many, many small incremental cleanup tasks, that are fit in as often as necessary, that becomes important. And it is also important to get it into an ongoing, regular habit, instead of trying to make it some sort of special case that you can ignore. 

Not only does it help with the morale and flow of the work, but it also helps with larger issues like estimations. If you dedicate one day a week to cleaning up, and you know that the next task should take 10 days, you also know now that it isn’t going to be done in 2 weeks. It will spill over into the 3rd week, so it is better if you predict that in advance, rather than it turns out to be a surprise later.

Building big systems isn’t a dynamic, exciting profession. It’s a lot of work, most of it is routine, and to be good at it requires patience, discipline and a set of really strong habits to make sure that each time you do work and add to the codebase, it is good enough to not blow up in your face later. If you keep that up, while the world spins wildly around you, you end up building something that is worthwhile and people will use it to make their lives easier. If you get caught up in the whirlwind and lose concentration, then you will wake up one day with a huge pile of unusable code that is causing everyone around you to be pissed off. 

Saturday, July 4, 2020

Opaque Boxes

The idea is to create an opaque box. You can use it to do a few well-specified things, but you need to stay within its tolerances.

The box has some finite set of entry-points. One or more points may allow some dynamic behavior or statefulness as represented by a question and answer paradigm.

If the box can do X, there is a way of testing whether X has been done, or not. 

If the box provides optimized methods, any and all of the information needed for those optimizations is entirely constrained to being in the box and is not available on the outside. If there is a means of toggling the optimization, then there is a way to see that it is set or not.

If the box stores data for later use, the outside format is RAW and it is converted to something else once inside the box. There is a means of getting the RAW data back as it was specified. If there is some other useful internal format, there might be a way of returning that directly, but only if that format itself does not explicitly depend on any of the ‘code’ in the box. So, its more likely that if the box doesn’t implement an external standard, then to get back FORMATTED data requires the box to format it on the way out. 

A simple box equates to the computational model of a Turing Machine (TM). That is, it is just 1 computational engine, from the outside perspective. It takes some input, runs a computation, and produces some output. It is deterministic and will always produce the same output with the same input. 

The communication between the calling program and the box may or may not be asynchronous, but if it is the latter, then there is a means of binding together different entry-points into the same operation (they all happen or they all fail, there is no in-between). 

A non-deterministic box is a simple box where the output can or is guaranteed to change, even though the inputs are consistent. 

A composite box is one that encapsulates the interaction between many TMs. So a request, whether it is read-only or a write, is bound to the behavior of a set of different computational engines, that should be assumed to be asynchronous. If there are multiple TMs, but they are sitting on a synchronous bus, we still shouldn’t consider them as just a single TM. 

A composite box may or may not preserve order. It may or may not protect interdependencies. The onus then is on the user to ensure that a composite box is only used under reasonable circumstances, where independence is either easily provable or where each and every dependence is guaranteed by known properties of the specific box.

A simple box, subject to underlying resource partitioning, will usually return results within a fixed range of time. There is a potential for the halting problem to kick in, so it could possibly never return. On top of this, a composite box may also wait forever on the results of an underlying asynchronous communication, so some of its properties may require it to pause operations and eventually run out of any queuing, so there is a flow problem on top as well. The caller of either box needs to set a bound for the operation and to react appropriately when the box is indisposed. This is different from being unavailable.

A composite box can be built around a collection of other simple and composite boxes. If any of those boxes is asynchronous or non-deterministic, the outer box picks up those properties, unless something has been explicitly done to negate them.

It is possible to frame any one-time or continuous computations in terms of boxes. It is complete. In decomposing the problem this way, the attributes of the boxes have to be explicit and understood so that anything built on top needs to handle the communications with care. While this perspective is somewhat abstract, it is also formal enough that it can be used as the structural guidelines for a larger architecture. That is, we could decompose a very large system into a great number of boxes that could cover 100% of the code. If the boxes work properly and all of the attributes are explicitly handled, then the system works properly.