The Programmer's Paradox: Self-describing Names

Sunday, August 15, 2021

Self-describing Names

If you are trying to keep track of an X in a system and you name any variable that holds information about it as X then your intent it’s really obvious.

If you call the variable that holds the data something like ‘foorbarizilla’ then you’ve intentionally misnamed that data in the system. If it’s foobarInfo or foobarData, that isn’t any better. Nor is xFactory, prj1_X or ptrX.

But it’s not just variables. If the function should be called Y but instead it’s called GetN, you messed up. If the process tracks Zs but it says it is monitoring Qs, then it is also wrong.

These are all just variations on the same problem. You aren’t correctly telling other people what you’ve put in your variable or what you are doing with the code or data.

You may have given the variable a name, but it doesn’t match the contents. But you really want to tell other people what data you’ve used the variable to store. You want to describe the contents as precisely as you can, and then embed that description right into the name.

You know it is a ‘self-describing’ variable name because it doesn’t need a comment, description, or an explanation; that would be redundant. The name says it all. If the variable is a composite, people can look up its underlying variables and get even more enlightened. It's all clear and obvious. No one is ever confused.

Naming is hard. It’s a difficult skill to learn and it’s way harder if you are coding in a second language. That said, it is also one of the most important skills you need to acquire to be able to consider yourself a professional programmer. Anyone can write code, but very few can do it properly.

It’s not enough to just get the code working. There are always bugs and any useful code will evolve. If the work you are doing isn’t just throw-away or wasted time, then it is done very poorly if you’ve explicitly made it harder for it to be supported or enhanced.

Part of professional programming is building things that last for a long time. That means ensuring that the code lasts by communicating how it works to others. But it’s not just other programmers that you need to worry about. If you’ve picked way too many random crappy names, later, when you end up revisiting the code in a few months you’ll have forgotten what they mean too.

It's worth noting that your names for technical things should be the correct technical names as used in our industry, and your names for domain issues should match the vocabulary of your users. Mostly variables should be nouns. Functions, end-points, and any other actions should be verbs. Most of the time you don’t need to make up names, but there are often many situations where a bunch of different names are equally applicable. In general, picking the shortest name is usually best.

Code should not be overly dense; names are one of the key modifiers of density. Well-balanced code makes the readability pop. The meaning is not obscured by useless noise, it comes out clearly.

You can easily test the quality of a name. If it needs a description for people to understand it, then it is a poor name. If it exactly describes itself, then it is not.

There are many bad things to use as names or to embed into them:

Jargon (if there is a better, simpler choice)
Made-up acronyms
Redundant terms, stuttering or stuttering within a larger context
Meaningless junk/filler words

Naming takes time. If you are in a rush, the names will reflect that. If you are misfocused on some other aspect of the code, the names will also reflect that.

If you make bad low-level changes that create or don’t fix naming issues, any code on top is also wrong. You can’t just throw extra data into any old column in a database, for example, you need to make sure the schema is accurate. A few bad names won’t wreck an entire project, but a lot of them will inject enough instability into everything else that it will cause unnecessary grief.

Your first attempt at naming stuff is probably wrong. That’s okay, and that’s why it is so important to revisit the code, files, configs, etc. over and over again throughout the life of the project. This lets you head off this easily avoidable technical debt before it spirals out of control. Code isn’t just written and then forgotten. It evolves and grows with the project, it needs constant tending.

Having a bunch of redundant little lumps of code does not avoid this problem, in fact, it usually makes it far worse as it piles up and wastes a lot of time and energy.

If you have to tackle a sophisticated problem, you’re going to need some really well-thought-out, sophisticated code to accomplish that. There are no easy ways around it; trying to find them just makes it all worse. But it all starts with a bunch of names. If you don’t even know what to call it, you’re trying to code way too early.

The Programmer's Paradox

Sunday, August 15, 2021

Self-describing Names

No comments:

Post a Comment