The Programmer's Paradox: Megafunctions

For a long time, I have been attempting to write something specific about how to decompose code into reasonable functions.

The industry takes this to be an entirely subjective issue.

Programming styles go through waves. Sometimes the newest generation of coders write nicely structured code, but sometimes they start writing all sorts of crazy “megafunctions”.

Over the decades there has been an all too obvious link between megafunctions and bugs. Where you find one, you tend to find a lot of the other.

Still, it oscillates, and every so often you start to see one-function-to-rule-them-all code examples, and people arguing about how this is so much better. Once I was even rejected for a job because after the interview they said I had too many functions in my code, which was madness.

Functions should be small. But how small?

Within the context of software code, a ‘concern’ is a sequence of related instructions applied to a group of variables within the same contextual level.

If the code is low-level, the operations applied to those variables are all low-level. If the code is structural then the operations applied are structural. If the code is domain-based, then the operations are all domain-based. etc.

This is an organizational category. It starts with “some of these instructions are not like the others.” and flows through “there is a place for everything, everything in its palace, and not too many similar things in the same place.”

It is that last point that is key. The code is not organized if some of the instructions in a function should not be in that function because they are dealing with a completely different concern.

It is an objective definition, but somewhat subjective on the margins. You can have a wider or narrower view of concerns, but at some point, a different concern is a different concern. When that is ignored it is usually obvious.

If an operation needs to be applied to one or more variables at a different level than the context, it is a different concern. If an operation mixes different types of data, that is a different concern. If its scope is global then it is a bunch of concerns. If it is for optimization then it is a different concern.

This categorization is important.

Limiting the number of lines of code in a function is not a reasonable way to control function scope. Rather any given function should just take care of one and only one concern, however large or small it is. If part of doing that involves a different concern, then the function should call out to another function to perform that effort. That is, each function is an atomic primitive operation that does exactly one concern.

So then it is a one-to-one relationship. The function says it does X then the code in it just does X.

If you don’t know if two things are the same concern or different ones, then it is far better to err on the side of caution and treat them as different concerns. Fixing that later by joining them together is a lot easier than separating them.

This seems to be a hard concept to get across to some programmers. There are a number of reasons why they want megafunctions:

Some programmers just don’t know any better or they don’t care
Some programmers feel like not creating functions is an optimization
Some programmers like to see everything all together at once.
Some programmers think the code is easier to step through in a debugger

None of these are correct.

If you don’t care about what you code, it will show, both in readability, but also in quality. Your job will be stressful and difficult.

Functions execute quickly, so it is almost never reasonable to avoid using them just to save a tiny insignificant overhead.

Big functions may help in writing the code slightly faster, but they really hurt when you have to debug them. They only get worse with time.

Using a debugger is a vital way of figuring out what the code is doing, but it is far better if you just code the right thing in a clean and readable way and don’t have to step through it. It saves a lot of time.

Functions are a main feature of all programming languages. Learning to use them properly is a critical skill.

Keeping the scope of any function down to just a single concern has a big impact:

The code is readable
The code is debuggable
You can deal with ensuring that the concern itself is correct without getting lost in the details
The code is extendable
The code can be reused as many times as you need

Megafunctions are often an indication that the programmer does not have a lot of experience. However, sometimes it is just that the programmer got stuck in really bad habits. The more you code, the more you come to realize that keeping your code organized isn’t optional, it is key to be able to get high-quality work out to deployment quickly.

It’s like cooking. If your kitchen is nicely organized and clean, cooking is the main task. If your kitchen is dirty, disorganized, and a total mess, then trying to cook in that environment is extremely hard. The friction from your kitchen being a dumpster fire is preventing you from focusing on the cooking itself. Whether you like or hate cooking, you still want a clean kitchen. Coding is the same.

The Programmer's Paradox

Thursday, May 16, 2024

Megafunctions

No comments:

Post a Comment