When I first started coding, functions had come out of the procedural paradigm. I guess, long ago, in maybe assembler, a program was just one giant list of instructions. That would be a little crippling, so one of the early attempts to help was to break it up into smaller functions. An added benefit was that you could reuse them.
By the time I started coding, the better practice was to break up the code along the lines of similarity. Code that is similar is clumped together.
As the data structures and object-oriented paradigms started taking hold, the practices switched to being targeted. For instance, you’d write a lot of little ‘atomic’ primitive functions for each action you did against the structure, like create, add, or traverse. Indirectly, that gave rise to the notion of a function just having a single responsibility.
In data structures, you might end up coding up a whole bunch of structures, then stacking them one on top of the other, mostly trying to get to one rather giant data structure for the whole program. That was excellent in building up sophistication from reusable parts, but a lot of programmers just saw it as excessive layering, not one big interactive structure. People kept wanting to decompose, without ever reassembling.
Object-oriented followed suit, but seemed to get lost on that notion of building up application objects. There were often dozens at the top. It also renamed functions to ‘methods’, but I’ll skip that. It was initially a very successful paradigm change, but later people started objecting to the feeling of layering, and to the idea that the entry points were somehow a ‘god’ object.
The very early smalltalk object-oriented code had lots of functions, many of which were just one-liners. My first encounter surprised me. There were so many functions...
I guess, as a later reaction to the success of those earlier styles, the common practices moved back towards procedural. Almost no layers and very huge functions. Giant ones. This reopened the door for more brute force practices, which had been pushed away by those earlier paradigms.
I’ve always known that huge functions were a nightmare. Too much stuff all tangled together, it is unreadable and hard to follow. But any of the earlier attempts to limit function size were too restrictive and too fragile. You can’t just say that all functions must be less than 10 lines of code, for example. The attempts to categorize it as a single responsibility were pretty good, but because of the intense dislike of layers, they didn’t get across the notion that it is one thing at just one level. So, it would be coded as one thing, but with all of the raw instructions below that, as far down as the coder could go, all intertwined together. If you have messy low-level stuff to do, it should be hidden below in more functions, effectively a layer, but not really. For example, string fiddling in the middle of business logic is distracting, quickly killing off the readability.
Layers, as they first came out, were really architectural lines, not stacked data structures. For example, you cut a hard line between code that messes with persistence and code that computes derived objects, so you don’t mix and match. The derived stuff then sits in a layer above the persistence.
The point of those types of lines is to make the code super easy to debug later. If you know it is a derived calculation problem, you can skip the other code.
Overall, I’d say that there can never be too many functions. Each one is a chance to attach some self-describing name to a chunk of code. Think of it as concise language-embedded comments. If you were tight about coding with some of the older paradigms, then the data structures or even objects are pretty much all named with nouns, and the functions are all verbs. Using that, you can implement the code with the same vocabulary that you might verbally describe to a friend or another programmer. The closer that implementation is to descriptive paragraphs of what it does, the more you will be able to verify its behaviour on sight. That doesn’t absolve you of testing, or even for some code, creating a real formal proof of correctness, but it does cut down a lot of the work later when catching bugs.
If you also put a hard split between the domain logic and the technical necessities, you can usually just jump right to the incorrect block of code. When someone describes the bug, you know immediately where it is located. Since diagnostics and debugging eat way more time than coding, any sort of practice to reduce friction for them will really help with scheduling and reducing stress. Code that you can fix effortlessly is worth far more than code that you can write quickly.
For me, I think we should return to that notion of stacking up data structures and objects in order to build up sophistication. The best code I’ve seen does this, and has crazy long shelf lives. Its strength is that it encapsulates really well and makes it easy for reuse. It is also quite defensive, and it helps to zero in quickly on bugs. Realistically, it isn’t layering; it is a form of stacking, and those two should not be confused with each other. A layer is a line in the architecture you should have; stacking is just depth in the call chain. If the stacking is really encapsulated, programmers don’t have to go down a rabbit hole to understand what is happening in the higher levels. Entangling that all together is worse.
You can always get a sense of the code quality by quickly looking at the functions. Big, bloated functions with ambiguous, convoluted, or vague names are just nurseries for bugs. If you can skim the code and mostly know what it should do, then it is readable. If you have to pick over it line by line, it is a cognitive nightmare. If the function says DoX, and the code looks like it might actually do X, then it is pretty good.
In data structures, you might end up coding up a whole bunch of structures, then stacking them one on top of the other, mostly trying to get to one rather giant data structure for the whole program. That was excellent in building up sophistication from reusable parts, but a lot of programmers just saw it as excessive layering, not one big interactive structure. People kept wanting to decompose, without ever reassembling.
Object-oriented followed suit, but seemed to get lost on that notion of building up application objects. There were often dozens at the top. It also renamed functions to ‘methods’, but I’ll skip that. It was initially a very successful paradigm change, but later people started objecting to the feeling of layering, and to the idea that the entry points were somehow a ‘god’ object.
The very early smalltalk object-oriented code had lots of functions, many of which were just one-liners. My first encounter surprised me. There were so many functions...
I guess, as a later reaction to the success of those earlier styles, the common practices moved back towards procedural. Almost no layers and very huge functions. Giant ones. This reopened the door for more brute force practices, which had been pushed away by those earlier paradigms.
I’ve always known that huge functions were a nightmare. Too much stuff all tangled together, it is unreadable and hard to follow. But any of the earlier attempts to limit function size were too restrictive and too fragile. You can’t just say that all functions must be less than 10 lines of code, for example. The attempts to categorize it as a single responsibility were pretty good, but because of the intense dislike of layers, they didn’t get across the notion that it is one thing at just one level. So, it would be coded as one thing, but with all of the raw instructions below that, as far down as the coder could go, all intertwined together. If you have messy low-level stuff to do, it should be hidden below in more functions, effectively a layer, but not really. For example, string fiddling in the middle of business logic is distracting, quickly killing off the readability.
Layers, as they first came out, were really architectural lines, not stacked data structures. For example, you cut a hard line between code that messes with persistence and code that computes derived objects, so you don’t mix and match. The derived stuff then sits in a layer above the persistence.
The point of those types of lines is to make the code super easy to debug later. If you know it is a derived calculation problem, you can skip the other code.
Overall, I’d say that there can never be too many functions. Each one is a chance to attach some self-describing name to a chunk of code. Think of it as concise language-embedded comments. If you were tight about coding with some of the older paradigms, then the data structures or even objects are pretty much all named with nouns, and the functions are all verbs. Using that, you can implement the code with the same vocabulary that you might verbally describe to a friend or another programmer. The closer that implementation is to descriptive paragraphs of what it does, the more you will be able to verify its behaviour on sight. That doesn’t absolve you of testing, or even for some code, creating a real formal proof of correctness, but it does cut down a lot of the work later when catching bugs.
If you also put a hard split between the domain logic and the technical necessities, you can usually just jump right to the incorrect block of code. When someone describes the bug, you know immediately where it is located. Since diagnostics and debugging eat way more time than coding, any sort of practice to reduce friction for them will really help with scheduling and reducing stress. Code that you can fix effortlessly is worth far more than code that you can write quickly.
For me, I think we should return to that notion of stacking up data structures and objects in order to build up sophistication. The best code I’ve seen does this, and has crazy long shelf lives. Its strength is that it encapsulates really well and makes it easy for reuse. It is also quite defensive, and it helps to zero in quickly on bugs. Realistically, it isn’t layering; it is a form of stacking, and those two should not be confused with each other. A layer is a line in the architecture you should have; stacking is just depth in the call chain. If the stacking is really encapsulated, programmers don’t have to go down a rabbit hole to understand what is happening in the higher levels. Entangling that all together is worse.
You can always get a sense of the code quality by quickly looking at the functions. Big, bloated functions with ambiguous, convoluted, or vague names are just nurseries for bugs. If you can skim the code and mostly know what it should do, then it is readable. If you have to pick over it line by line, it is a cognitive nightmare. If the function says DoX, and the code looks like it might actually do X, then it is pretty good.
No comments:
Post a Comment
Thanks for the Feedback!