Thursday, February 29, 2024

Coding

Major points:
  1. Coding is always slow
  2. Coding produces both code & bugs
  3. The code always needs to be edited, the first version is just roughed in.
  4. Do not use disposable code in industrial strength projects.
The primary goal is to produce a minimal amount of readable code.

You want the code to be as small as possible, it is easier to deal with. Larger codebases are worse, not better.

You want the code to be as readable as possible so it is easier to edit. If it is a choice between small or readability, readability wins. If it is a choice between readable or performance, readability wins.

You can always fix readable code later. But it must be reable first, and remain readable afterwards.

You don’t want the code to be redundant, cause you’ll always forget to change all of the different manifestations of it for the same bug. Redundancies are bugs or potential bugs. Changes to redundant code cause the code to drift apart.

You need the codebase to be tightly organized so that it is easier to find and fix the problems. You can accidentally waste more time fixing bugs, than in coding/refactoring, so you need to optimize for that.

There should be one and only one place to put each line of code. All of the code should be in its one place. If there are lots of different places where you can put the code, you are disorganized.

The author is not the only one who needs to reread the code. Others will have to read it as well. Good code will be read by a lot of people.

Because you didn’t just magically get it right the first time, you and other people will have to go over the code, again and again, in order to make it better. Code doesn’t get written, it evolves.

The fiction that layers are bad is poor advice. Layers are the main way you keep the code organized. Without them, the code is just one huge flat mess. That is far worse, it is totally unreadable.

Layers can be abused, there can be too many of them. But not having any at all is worse. It is easier to remove a layer than add one.

A good function is specific to some part of the computation. It is general. It does the one thing that it says it does, nothing more. Sub-level detail processing is below it, in other functions. High-level flow is above it. Once you know what it does, and trust that it does exactly and only that, then you can ignore it, which makes life easier.

All data should come into the code from somewhere else. It should never be hardcoded in the code, it should not be hardcoded when passed down to the code. Thus all strings, integers, constants, etc. are suspect. The best code has zero hardcoded values.

If you need to do a bunch of steps each time for a common action, wrap the steps. The fewer things you call the better your code is. If you rely on remembering that all things have to always be done together, either you’ll forget or someone else who never knew will do it incorrectly. Either way, it is now a bug, and it may not be an obvious one, so it will waste time.

If you need to move around some data, all together. Wrap the data, in a struct, object, whatever the language supports. Composite variables are far better than lots of independent variables.

If you need to decompose the data (aka parse) to use it somewhere, decompose it once and only once. Keep it decomposed in a struct, object, etc. move it around as a composite.

If the call for some library/technology/etc. is messy, wrap it. Wrapping is a form of encapsulation, it helps to avoid bugs and reduce complexity.

If there are strange lines of code that are nonintuitive or don’t make sense, wrap them. At minimum, it gives you a chance to name it appropriately; at maximum, it leaves just one place to change it later.

Too many functions are way better than too few. If you have to get it wrong, create a billion functions. They force you to have to find reasonable names for the parts of work you are doing. If you don’t know how to name a function, then you don’t understand what you are doing. If you have too many functions it is easy to compact them. If you have too few, you are screwed.

Don’t use language features if you don’t understand them. The goal of coding for a system is not to learn new technology, it is to write industrial-strength code that withstands the test of time. If you want to play, good, but don’t do it in a real project, do it in little demos (which can be as messy as you want).

Do not pack lines. Saving yourself a few lines of code, but packing together a whole bunch of mechanics, just hides the mechanics and misguides you as to the amount of code you have. Separate out each and every line of code, it doesn’t take any real time and it lays out the mess in its full ugliness. If the mess is ugly fix that, don’t hide it.

Never do the same thing in a system in two or more different ways. You need to do something, do it one way and only one way, wrap it in a function, and reuse it in all other instances. This cuts down on complexity. By a huge amount. It cuts down on code, thus it cuts down on bugs.

Build up the mechanics to work at a higher level. That is, if you need an id to get to a user, and the user to get to their profile, then you should have a FindUser(id) which is supplied to the call FindProfile(user). Build up reusable pieces, don’t code down into stuff.

Thursday, February 22, 2024

Self-Inflicted Pain

The difference between regular programmers and 10x programmers is not typing speed. In some cases, it is not even knowledge.

It is that 10x programmers are aware of and strongly avoid self-inflicted injuries while coding.

That is, they tend to avoid shortcuts and work far smarter than harder. They don’t tolerate a mess, they don’t flail at their work.

They need some code, they think first before coding, they code what they need, and then they refine it rapidly until it works. Then they leverage that code, over and over again, to save crazy large amounts of time. This is why their output is so high.

If you watch other programmers, they jump in too fast. They don’t fully understand what they are doing. The code gets messier and messier. Debugging it sinks through massive effort. Then they abandon that work and do it all over again for the next part that is similar. They burn through time in all the wrong places.

All of these are self-inflicted injuries.

Writing code when you only half understand what it should do will go badly. It’s not that you should be able to predict the future, but rather that given your knowledge today it should span the code you write. If there is something you don’t understand, figure that out before starting to code. If you have to change the code later because things change, that is okay. But if you are coding beyond your current knowledge it will go badly and eat through time.

Trying to fix crappy code is a waste of time. Clean it up first, then fix it. If the code doesn’t clearly articulate what it was supposed to do, then any perceived bug may be predicated on top of a whole lot of other bugs. Foundations matter.

So, when debugging, unless it is some crazy emergency patch, you find the first bug you encounter and correct that first. Then the next one. Then the next one. You keep that up until you finally find and fix the bug you were looking for. Yes, it takes way longer to fix that bug, but not really, as you are saving yourself a lot of time down the road. Those other bugs were going to catch up with you eventually.

If you see bad names, you fix those. If you see disorganization, you fix it, or at a minimum write it down to be fixed later. If you see extra variables you get rid of them. If you see redundant functions, you switch to only using one instance. If you see poorly structured code or bad error handling, you fix that. If you see a schema or modeling problem, you either fix it now or write it down to fix it later. The things you wrote down to fix later, you actually fix them later.

The crap you ignore will always come back to haunt you. The time you saved by not dealing with it today is tiny compared to the time you will lose by allowing these problems to build up and get worse. You do not save time by wobbling through the code, fixing it at random. Those higher-level fixes get invalidated by lower-level changes, so they are a waste of time and energy.

And then, the biggest part. Once you have some good code that mostly does what you want it to do, you leverage that. That is, putting minimal effort into highly redundant code is as slow as molasses. Putting a lot of effort into a piece of code that you can use over and over again is exponentially faster. Why keep solving the same lower-level problems again and again, when instead you can lift yourself up and solve increasingly higher-level problems at faster speeds? That is the 10x secret.

If you have to solve the same basic problems again and again, it is self-inflicted. If you lose higher-level work because of lower-level fixes, it is self-inflicted. If you have to do spooky things on top of broken code, it is often self-inflicted. If you get lost or confused in your own code, it is self-inflicted. If you want to be better and faster at coding and to have less stress in your job, stop injuring yourself, it isn’t helping.

Thursday, February 15, 2024

A Rose by Any Other Name

Naming is hard. Very hard. Possibly the hardest part about building software.

And it only gets harder as the size of the codebase grows, since there are far more naming collisions. Code scales very, very badly. Do not make it worse than it has to be.

This is why naming things correctly is such a fundamental skill for all programmers.

Coding itself is oddly the second most important skill. If you write good code but bury it under a misleading name, then it doesn’t exist. You haven’t done your job. Eventually, you’ll forget where you put it. Other people can’t even find it. Tools like fancy IDEs do not save you from that fate.

There are no one-size-fits-all naming conventions that always work correctly. More pointedly there can never be such a convention. Naming is not mindless, you have to think long and hard about it. You cannot avoid thinking about it.

The good news is that the more time you spend trying to find good names, the easier it gets. It’s a skill that takes forever to master, but at least you can learn to not do it badly.

There are some basic naming rules of thumb:

First is that a name should never, ever be misleading. If the name is wrong, it is as bad a name as possible. If someone reads it and comes to the wrong conclusion, then it is the author's fault. When you name something you have to understand what that thing is and give it the best possible name.

Second is that the name should be self-describing. That is, when someone reads the name, they should arrive at the right conclusion. The variable should hold the data they expect. The function should do what it says. The repo that holds a given codebase should be obvious.

“Most people never see the names I use in my code …”

No, they do see them. All of them.

And if they see them and they are poor or even bad, they will recommend that your code gets rewritten. They will throw away your work. Nothing else you did matters. If the code is unreadable, it will not survive. If it doesn’t survive, you aren't particularly good at your job. It’s pretty simple.

Occasionally, some really awful code does get frozen way deep in a ball of mud. But that unfortunate situation is not justification for you being bad at your job. Really, it isn’t.

Third, don’t put litter into your names. Made up acronyms, strange ‘pre’ or ‘post’ text. Long and stupid names are not helping. Stop typing in long crazy names, spend some time to thinking about it. Find short reasonable names that are both descriptive and correct.

Fourth, don’t put in irrelevant or temporary stuff in there either. If some unrelated thing in an organization changes and now the name is either wrong or needs to be changed, you did it wrong. Names should be nearly timeless. Only if the nature of the problem changes, should they need changing, and you should do that right away. Names that used to be correct suck.

Names are important. They form the basis of readability, and unreadable code is just an irritant. If you were asked to really write some code, you need to really write it properly. If it takes longer, too bad. Good naming only slows you down until you get better at it. You need to be better at it.

Wednesday, February 7, 2024

Natural Decompositions

Given a large problem, we start by breaking it down into smaller, more manageable pieces. We can then solve all of the smaller problems and combine them back together to solve the original problem.

The hiccup is that not all decompositions are created equal. If you break a big problem down into subparts, when they have any sort of cross dependencies with each other you can’t work on them independently. The dependencies invalidate the decomposition.

So we call any decomposition where all of the subparts are fully independent a ‘natural’ decomposition. It is a natural, complete, hard ‘line’ that completely separates the different parts.

Do natural decompositions actually exist?

Any subpart that has no dependencies on other outside parts is fully encapsulated. It is a black box.

A black box can have an interface. You can put things into the box. It’s just that whatever happens in the box stays in the box. You don’t need to know anything about how the box works inside, just on the outside.

A car engine is a good example. You put in fuel, and you play with the pedals, then the car moves. If you are just driving around, you don’t need to know much more than that. Maybe if you are pushing it on the highway or a racetrack, you’d need to understand gearing, acceleration, or torque better, but to go to the grocery store with an automatic transmission it isn’t necessary.

Cars have fairly good natural decompositions. They are complex machines, but most people don’t really need to understand how they work. Mechanics and race car drivers do.

Software though is much harder to decompose because it isn’t visible. The lines between things can be messed up and awful, but very few people would know this. A five wheeled car/truck/motorbike monstrosity would be quickly discounted in reality, but likely survive as a software component.

Although we don’t see it the same way, we can detect when a decomposition is bad. The most obvious test is that if you have to add a line of code, how many places are there that it would fit reasonably? The answer should be one. If that is not the answer then the lines are blurred somewhere.

And that is the crux. A good decomposition eliminates the degrees of freedom. There is just one place for everything. Then your code is organized if everything is in its one place. It’s simple, yet not simple at all.

For example, If you break off part of the system as a printing subsystem, then any and all code that is specifically tied to printing must be in that subsystem.

Now it’s not to say that there isn’t an interface to the printing subsystem. There is. Handling user context and the specific gui contexts is done elsewhere and must be passed in. But no heavy lifting is ever done outside. Only on the inside. You might have to pass in a print-it-this-way context that directs what is done, but it only directs it from the outside, the ‘doing it’ part is inside the box.

One of the hardest problems in software is getting a group of programmers to agree on defining one place for all of the different types of code and actually putting that code in the one place it belongs.

It fails for two reasons. The first is that it is a huge reduction in freedom. You aren’t free anymore to put the code anywhere. The culture of programming celebrates freedom, even when it makes our lives way harder or even tragic.

The other reason is in making it quick and easy for newer programmers to know where to put stuff. If we fully documented all of those places it would be far too much to read, and if we don’t most people won’t read the code to try to figure it out for themselves. Various standards and code reviews have tried to address it over the decades, but more often than not people just create a mess and pretend like they didn’t. Occasionally you see large projects with good discipline, it happens.

This shows up in other places too. Architecture is the drawing of lines between things. An enterprise architect should draw enough lines in a company to keep it organized; a system architect should draw enough lines in a system for the same effect. Again, these lines need to be natural to be useful. If they are arbitrary they make the problems worse not better.

Decomposition is the workhorse of software development, but it's far too easy to get it wrong. Fortunately it’s not hard to figure out if its wrong and fix it. Things go a lot smoother when the decompositions are natural and the work is organized. Programming is hard enough sometimes, we don’t need to find ways to make it worse.

Thursday, February 1, 2024

Anti-patterns

“Calling something an anti-pattern is an anti-pattern.”

There are lots of ways to accomplish things with software. Some of them are better than others. But the connotation for the term ‘anti-pattern’ is that the thing you are doing is wrong, which is often not the case.

Realistically, the ‘pattern’ part of the phrase is abused. A design pattern is just a generalized abstraction of some work you are doing. It is less specific than an ‘idiom’. It is less specific than a ‘data structure’. It is just essentially a code structuring arrangement that is common. That is, it is effectively a micro-architecture, a way to structure some functionality so that it is easier to understand and will behave as expected.

So, mostly what people mean when they call something an anti-pattern is just that it is not the ‘best’ alternative. But even if it is not the best, that doesn’t make it a bad choice. Or basically, the set of alternatives for coding is not boolean. It’s not a case of right or wrong. It’s a large gradient, there are a huge number of ways to code stuff, some are better than others. And sometimes, for some contexts, a lesser approach is actually better.

We saw this in the 70s with sorting, but the understanding doesn’t seem to have crystalized.

There are lots of different ways to sort, with different performance. We can track ‘growth’ which is effectively how an algorithm performs relative to the size of the data. A cute algorithm like a pivot sort has nearly optimal growth, it is O(log N). Bubble sort however is considerably worse at O(N^2).

So, you should always implement a pivot sort if you have to implement your own sort?

No. If you have a large amount of data to sort, then you probably want to spend the time to implement a pivot sort. But… if you usually only have a few things to sort, then just putting in a bubble sort is fine.

Why?

Because the code for a bubble sort is way, way easier to write and visually validate. And performance is not even close to an issue, the set of data is always too small. With that tiny size, it wouldn’t matter if one algorithm was a few instructions different from the other, since it doesn't loop long enough for that to become a meaningful time.

So, in that reduced context, the shorter, easier, less likely to have a bug, code is the better alternative. More significantly, for front-end devs, whipping together a bubble sort is fine, for back-end ones, learning to implement pivot sorts is better.

But in modern programming, since most stacks implement pretty darn good sorting, the issue is moot. It is presented in data structure courses as a means of learning how to correctly think about implementation details, rather than an explicit skill.

In modern terms, I’m sure that a lot of people would incorrectly call a bubble sort an anti-pattern, which it is not. Most ‘lesser’ patterns are not anti-patterns. An actual anti-pattern would be to have multiple copies of the same globals, when what you really just needed was one. Another less common anti-pattern would be using string splits as the way to parse LR(1) grammars, as it would never, ever work properly but that is a longer and far more difficult discussion.

In general though, the software industry has a real problem with using hand waving to summarily dismiss significant technical issues. Programmers resort to claiming that something “right” or “wrong” far too quickly, when neither case applies. It is a form of boolean disease, caused by spending too much of the day crafting booleans, you start to see the rest of the world only in those terms.