There is the real world. The physical one has been around for as long as humans can remember.
Then there is the digital world, which is an artificially constructed realm based on top of millions, possibly billions or even trillions, of interconnected computers.
Hardware always forms the sub-structure. The foundation. It is what binds the digital realm to reality.
What’s above that is just data and code. Nothing else.
Any other thing that can be imagined in all ways is either data, code, or a combination of the two.
Data is static. It just exists as it is. You can really only change it by writing some other data on top of it, wiping the original copy out of existence.
Code is active. It is a list of instructions, often crazy long, sometimes broken up in countless pieces spread across all soft of places.
Code ‘runs’. Something marches through it, effectively instruction by instruction, executing it, in more or less a deterministic fashion.
Code is data long before it is code. That is because it is a ‘list’ of instructions; when it is not running it is just a list of things. It is data when inactive.
Data can effectively be code. You can declare a whack load of data that is interpreted as ‘high-level’ code to trigger very broad instruction sets.
Data is not just bits and bytes. It is not just single pieces of information encoded in some fashion. Most data only has value if it is used in conjunction with related data. Those groups have structure, whether it is a collection of individual data points or a list of stuff. There are higher level structure relationships too, like dags, trees, graphs, and hypergraphs. Mostly, but not always, the individual parts and their various structures have some names associated with them. Meta-data really. Information about how all the individual points related back to each other. Data about the structure of the underlying data.
In it’s simplest sense, data corresponds to the way we use nouns in language, code corresponds to verbs. We blur the lines for some sophisticated usage, but most forms of programming tend towards keeping them separate and distinct.
We know we need to secure data. It is the heart and soul of the information we are collecting with our computers. That information could be used by malicious people for bad ends. But we also need to secure code. Not just when it is data but also as it executes. As they are distinct, one means of securing them will never cover both; they are, in effect, two different dimensions. Thus, we need two different and distinct security models, each of which covers its underlying resource. They won’t look similar; they can not be blended into one.
Software is a static list of instructions, which we are constantly changing.
Friday, February 28, 2025
Saturday, February 22, 2025
Mastery
Oddly, mastering programming is not about being able to spew out massive sets of instructions quickly. It’s about managing cognitive load. Let me explain.
Essentially, we have a finite amount of cognitive horsepower, most of which we tend to spend on our professions, but life can be quite demanding too.
So that’s the hard limit to our ability to work.
If you burn through that memorizing a whack load of convoluted tidbits while coding, it will utilize most of your energy. I often refer to this as ‘friction’.
So, what you need to make the best progress on your work as possible is to reduce as much of that friction as you can. You don’t want to spend your time thinking about ‘little’ things. Instead, you want to use it diving deeply into the substantial problems. The big ones that will keep you from moving forward directly.
In that sense, less is more. Far more.
For example, I’ve known for a long time that a million little bits of functionality is not all that helpful. It’s not a coherent solution.
It’s better to generalize it a little, then pack it all together and encapsulate it into larger ‘lego’ bricks. Now you have less things that you can call, but they do a far wider range of tasks. You still need to supply some type of configuration knowledge to them, but that too can be nicely packaged.
This fits with what I said initially about cognitive load. You don’t have to remember a million little functions. You don’t have to keep rewriting them. Once you have solved a problem, big or small, you package it up and lean on that solution later. So now you can forget about it. It’s done, it's easily findable. You get a bunch of those at a low level, you can build higher level ones on top.
Since you rarely need to work at that previous level now, you have far less to think about. It is a solved problem. As you go, you get higher and higher, the friction is less, and the work you are doing is far more sophisticated.
It was known long ago that good projects get easier to work on with time, bad ones do not. Easier because there is less friction and you can do larger things quickly with better accuracy. You’ve got this codebase that solves the little problems, so you can now work on bigger ones. That’s why we have system libraries for languages, for instance.
If you’ve got some good pieces and someone asks for something complicated, it is not that hard to extend it. But it is really hard to go right back to ground zero and do it all over again. You brain has to cope with lots of levels now, it will get overwhelmed. But if you can skip over all of the things already solved, then you can focus on the good stuff. The new stuff. The really hard stuff.
In a similar way, disorganization and inconsistencies eat through massive amounts of cognitive load. It is inordinately harder to work in a messy environment than a neat and tidy one. If all your tools are neatly lined up and ready when you need them, then jumping around is fluid. If you have to struggle through a mess just to find something, it bogs you down. Struggling through the mess is what you're doing, not solving the problems you need to solve.
So you learn that the dev shop and its environment needs to be kept as clean as time allows. And tidying up your working environment is almost always worth the time. Not because you are a neat freak, but because of how the friction will tire you out.
If you manage your cognitive load really well then you don’t have to spend it on friction. You can spend it on valuable things like understanding how the tech below you really works. Or what solutions will really help people with their problems.
The less time you spend on things like bizarre names, strange files, and cryptic configurations the more you have to spend on these deeper things, which helps you see straighter and more accurate paths to better solutions. In that sense the ‘works on my machine’ excuse by someone who exhausted themselves drowning in a mess is really just a symptom of their losing control over the tornado of complexity that surrounds them.
Essentially, we have a finite amount of cognitive horsepower, most of which we tend to spend on our professions, but life can be quite demanding too.
So that’s the hard limit to our ability to work.
If you burn through that memorizing a whack load of convoluted tidbits while coding, it will utilize most of your energy. I often refer to this as ‘friction’.
So, what you need to make the best progress on your work as possible is to reduce as much of that friction as you can. You don’t want to spend your time thinking about ‘little’ things. Instead, you want to use it diving deeply into the substantial problems. The big ones that will keep you from moving forward directly.
In that sense, less is more. Far more.
For example, I’ve known for a long time that a million little bits of functionality is not all that helpful. It’s not a coherent solution.
It’s better to generalize it a little, then pack it all together and encapsulate it into larger ‘lego’ bricks. Now you have less things that you can call, but they do a far wider range of tasks. You still need to supply some type of configuration knowledge to them, but that too can be nicely packaged.
This fits with what I said initially about cognitive load. You don’t have to remember a million little functions. You don’t have to keep rewriting them. Once you have solved a problem, big or small, you package it up and lean on that solution later. So now you can forget about it. It’s done, it's easily findable. You get a bunch of those at a low level, you can build higher level ones on top.
Since you rarely need to work at that previous level now, you have far less to think about. It is a solved problem. As you go, you get higher and higher, the friction is less, and the work you are doing is far more sophisticated.
It was known long ago that good projects get easier to work on with time, bad ones do not. Easier because there is less friction and you can do larger things quickly with better accuracy. You’ve got this codebase that solves the little problems, so you can now work on bigger ones. That’s why we have system libraries for languages, for instance.
If you’ve got some good pieces and someone asks for something complicated, it is not that hard to extend it. But it is really hard to go right back to ground zero and do it all over again. You brain has to cope with lots of levels now, it will get overwhelmed. But if you can skip over all of the things already solved, then you can focus on the good stuff. The new stuff. The really hard stuff.
In a similar way, disorganization and inconsistencies eat through massive amounts of cognitive load. It is inordinately harder to work in a messy environment than a neat and tidy one. If all your tools are neatly lined up and ready when you need them, then jumping around is fluid. If you have to struggle through a mess just to find something, it bogs you down. Struggling through the mess is what you're doing, not solving the problems you need to solve.
So you learn that the dev shop and its environment needs to be kept as clean as time allows. And tidying up your working environment is almost always worth the time. Not because you are a neat freak, but because of how the friction will tire you out.
If you manage your cognitive load really well then you don’t have to spend it on friction. You can spend it on valuable things like understanding how the tech below you really works. Or what solutions will really help people with their problems.
The less time you spend on things like bizarre names, strange files, and cryptic configurations the more you have to spend on these deeper things, which helps you see straighter and more accurate paths to better solutions. In that sense the ‘works on my machine’ excuse by someone who exhausted themselves drowning in a mess is really just a symptom of their losing control over the tornado of complexity that surrounds them.
Thursday, February 13, 2025
Control
I’ve often written about the importance of reusing code, but I fear that that notion in our industry has drifted far away from what I mean.
As far as time goes, the worst thing you can do as a programmer is write very similar code, over and over and over again. We’ve always referred to that as ‘brute force’. You sit at the keyboard and pound out very specific code with slight modifications. It’s a waste of time.
We don’t want to do that because it is an extreme work multiplier. If you have a bunch of similar problems, it saves orders of magnitude of time to just write it once a little generally, then leverage it for everything else.
But somehow the modern version of that notion is that instead of writing any significant code, you just pile as many libraries, frameworks, and products as you can. The idea is that you don’t write stuff, you just glue it together for construction speed. The corollary is that stuff written by other people is better than the stuff you’ll write.
The flaw in that approach is ‘control’. If you don’t control the code, then when there is a problem with that code, your life will become a nightmare. Your ‘dependencies’ may be buggy. Those bugs will always trigger at the moment you don’t have time to deal with them. With no control, there is little you can do about some low-level bug except find a bad patch for it. If you get enough bad patches, the whole thing is unstable, and will eventually collapse.
You get caught in a bad cycle of wasting all of your time on things you can’t do anything about, so you don’t have the time anymore to break out of the cycle. It just sucks you down and down and down.
The other problem is that the dependencies may go rogue. You picked them for a subset of what they do, but their developers might really want to do something else. They drift away from you, so your glue gets uglier and uglier. Once that starts, it never gets better.
In software, the ‘things’ you don’t control will always come back to haunt you. Which is why we want to control as much as possible.
So, reusing your own stuff is great, but reusing other people’s stuff has severe inherent risks.
The best way to deal with this is to write your own version of whatever you can, given the time available. That is, throwing in a trivial library just because it exists is bad. You can look at how they implemented it, and then do your own version which is better and fits properly into your codebase. In that sense, it's nice that these libraries exist, but it is far safer to use them as examples for learning than to wire them up into your code.
There are some underlying components however that are super hard to get correct. Pretty much anything that deals with persistence falls into this category, as it requires a great deal of knowledge about transactional integrity to make the mechanics fault-tolerant. If you do it wrong, you get random bugs popping up all over the place. You can’t fix a super rare bug simply because you can not replicate it, so you’d never have any certainty that your code changes did what you needed them to do. Where there is one heisenbug, there are usually lots more lurking about.
You could learn all about low-level systems programming, fault tolerance, and such, but you probably don’t have the decade available to do that right now, so you really do want to use someone else’s code for this. You want to leverage their deep knowledge and get something nearly state-of-the-art.
But that is where things get complicated again. People seem to think that ‘newer’ is always better. Coding seems to come in waves, so sometimes the newer technologies are real actual improvements on the older stuff. The authors understood the state of the art and improved upon it. But only sometimes.
Sometimes the authors ignore what is out there, have no idea what the state of the art really is, and just go all the way back to first principles to make every old mistake again. And again. There might be some slight terminology differences that seem more modern, but the underlying work is crude and will take decades to mature if it does. You really don't want to be building on anything like that. It is unstable and everything you put on top will be unstable too. Bad technology never gets better.
So, you need to add other stuff you can’t control and it is inherently hazardous.
If you pick something trendy that is also flakey, you’ll just suffer a lot of unnecessary problems. You need to pick the last good thing, not the most recent one.
That is always a tough choice, but crucial to building stable stuff. As a consequence though, it is important to know that sometimes the choice made was bad, you picked a dude. Admit it early, since it is usually cheaper to swap that for something else as early as possible.
Bad dependencies are time sinks. If you don’t control it and can’t fix it when it breaks, then at the very least you need it to be trustworthy. Which means it is reliable and relatively straightforward to use. You never need a lot of features, and in most cases, you shouldn’t need a lot of configurations either. Just stuff that does exactly what it is supposed to do, all of the time. You want it to encapsulate all of the ugliness away from you, but you also want it to deal with that ugliness correctly, not just ignore it.
If you are picking great stuff to build on, then you get more time to spend building your own stuff, and if you aren’t just retyping similar code over and over again, you can spend this time keeping your work organized and digging deeply into the problems you face. You are in control. That makes coding a whole lot more enjoyable than just rushing through splatting out endless frail code. After all, programming is about problem-solving, and we want to keep solving unique high-quality problems, not redundantly trivial and annoying ones. Your codebase should build on your knowledge and understanding. That is how you master the art.
As far as time goes, the worst thing you can do as a programmer is write very similar code, over and over and over again. We’ve always referred to that as ‘brute force’. You sit at the keyboard and pound out very specific code with slight modifications. It’s a waste of time.
We don’t want to do that because it is an extreme work multiplier. If you have a bunch of similar problems, it saves orders of magnitude of time to just write it once a little generally, then leverage it for everything else.
But somehow the modern version of that notion is that instead of writing any significant code, you just pile as many libraries, frameworks, and products as you can. The idea is that you don’t write stuff, you just glue it together for construction speed. The corollary is that stuff written by other people is better than the stuff you’ll write.
The flaw in that approach is ‘control’. If you don’t control the code, then when there is a problem with that code, your life will become a nightmare. Your ‘dependencies’ may be buggy. Those bugs will always trigger at the moment you don’t have time to deal with them. With no control, there is little you can do about some low-level bug except find a bad patch for it. If you get enough bad patches, the whole thing is unstable, and will eventually collapse.
You get caught in a bad cycle of wasting all of your time on things you can’t do anything about, so you don’t have the time anymore to break out of the cycle. It just sucks you down and down and down.
The other problem is that the dependencies may go rogue. You picked them for a subset of what they do, but their developers might really want to do something else. They drift away from you, so your glue gets uglier and uglier. Once that starts, it never gets better.
In software, the ‘things’ you don’t control will always come back to haunt you. Which is why we want to control as much as possible.
So, reusing your own stuff is great, but reusing other people’s stuff has severe inherent risks.
The best way to deal with this is to write your own version of whatever you can, given the time available. That is, throwing in a trivial library just because it exists is bad. You can look at how they implemented it, and then do your own version which is better and fits properly into your codebase. In that sense, it's nice that these libraries exist, but it is far safer to use them as examples for learning than to wire them up into your code.
There are some underlying components however that are super hard to get correct. Pretty much anything that deals with persistence falls into this category, as it requires a great deal of knowledge about transactional integrity to make the mechanics fault-tolerant. If you do it wrong, you get random bugs popping up all over the place. You can’t fix a super rare bug simply because you can not replicate it, so you’d never have any certainty that your code changes did what you needed them to do. Where there is one heisenbug, there are usually lots more lurking about.
You could learn all about low-level systems programming, fault tolerance, and such, but you probably don’t have the decade available to do that right now, so you really do want to use someone else’s code for this. You want to leverage their deep knowledge and get something nearly state-of-the-art.
But that is where things get complicated again. People seem to think that ‘newer’ is always better. Coding seems to come in waves, so sometimes the newer technologies are real actual improvements on the older stuff. The authors understood the state of the art and improved upon it. But only sometimes.
Sometimes the authors ignore what is out there, have no idea what the state of the art really is, and just go all the way back to first principles to make every old mistake again. And again. There might be some slight terminology differences that seem more modern, but the underlying work is crude and will take decades to mature if it does. You really don't want to be building on anything like that. It is unstable and everything you put on top will be unstable too. Bad technology never gets better.
So, you need to add other stuff you can’t control and it is inherently hazardous.
If you pick something trendy that is also flakey, you’ll just suffer a lot of unnecessary problems. You need to pick the last good thing, not the most recent one.
That is always a tough choice, but crucial to building stable stuff. As a consequence though, it is important to know that sometimes the choice made was bad, you picked a dude. Admit it early, since it is usually cheaper to swap that for something else as early as possible.
Bad dependencies are time sinks. If you don’t control it and can’t fix it when it breaks, then at the very least you need it to be trustworthy. Which means it is reliable and relatively straightforward to use. You never need a lot of features, and in most cases, you shouldn’t need a lot of configurations either. Just stuff that does exactly what it is supposed to do, all of the time. You want it to encapsulate all of the ugliness away from you, but you also want it to deal with that ugliness correctly, not just ignore it.
If you are picking great stuff to build on, then you get more time to spend building your own stuff, and if you aren’t just retyping similar code over and over again, you can spend this time keeping your work organized and digging deeply into the problems you face. You are in control. That makes coding a whole lot more enjoyable than just rushing through splatting out endless frail code. After all, programming is about problem-solving, and we want to keep solving unique high-quality problems, not redundantly trivial and annoying ones. Your codebase should build on your knowledge and understanding. That is how you master the art.
Tuesday, February 4, 2025
Integrated Documentation
Long ago, we built some very complex software.
We had a separate markdown wiki to contain all of the necessary documentation.
Over time, the main repo survived, but that wiki didn’t. All of the documentation was disconnected and thus was lost.
When I returned to the project years later, it was still in active usage, they needed it, but the missing documentation was causing chaos. They shot themselves in the foot.
Since then, I have put the documentation inside the repo with the rest of the source code. Keeping track of one thing in a large organization is difficult enough, trying to keep two different things in sync is impossible.
By now, we should be moving closer to literate programming: https://en.wikipedia.org/wiki/Literate_programming
Code without documentation is just a ball of mud. Code with documentation is a solution that hopefully solves somebody’s problems. Any nontrivial lump of code is complicated enough that it needs extra information to make it usable.
For repo cover sites like Github and Gitlab, if they offer some type of wiki for documentation, that wiki should be placed in the main project repo as a subdirectory. The markdown files are effectively source files. Any included files are effectively source files. They need to get versioned like everything else.
There has alway been this misunderstanding that source files must be ‘text’ and that for the most part, it is always ‘code’. That is incorrect. The files are the ‘source’ of the data, information, binaries, etc. It was common practice to put binary library files into the source repo, for example, when they had been delivered to the project from outside sources. Keeping everything together with a long and valid history is important. The only thing that should not be in a repo is secrets, as they should remain secret.
Otherwise, the repo should contain everything. If it has to be pulled from other sites, it should be very explicit versions, it should not be left to chance. If you go back to a historical version, it should be an accurate historical version, not a random mix of history.
A fully documented self-standing repo is a thing of beauty. A half-baked repo is not. We keep history to reduce friction and make our lives easier. It is a little bit of work, but worth it.
We had a separate markdown wiki to contain all of the necessary documentation.
Over time, the main repo survived, but that wiki didn’t. All of the documentation was disconnected and thus was lost.
When I returned to the project years later, it was still in active usage, they needed it, but the missing documentation was causing chaos. They shot themselves in the foot.
Since then, I have put the documentation inside the repo with the rest of the source code. Keeping track of one thing in a large organization is difficult enough, trying to keep two different things in sync is impossible.
By now, we should be moving closer to literate programming: https://en.wikipedia.org/wiki/Literate_programming
Code without documentation is just a ball of mud. Code with documentation is a solution that hopefully solves somebody’s problems. Any nontrivial lump of code is complicated enough that it needs extra information to make it usable.
For repo cover sites like Github and Gitlab, if they offer some type of wiki for documentation, that wiki should be placed in the main project repo as a subdirectory. The markdown files are effectively source files. Any included files are effectively source files. They need to get versioned like everything else.
There has alway been this misunderstanding that source files must be ‘text’ and that for the most part, it is always ‘code’. That is incorrect. The files are the ‘source’ of the data, information, binaries, etc. It was common practice to put binary library files into the source repo, for example, when they had been delivered to the project from outside sources. Keeping everything together with a long and valid history is important. The only thing that should not be in a repo is secrets, as they should remain secret.
Otherwise, the repo should contain everything. If it has to be pulled from other sites, it should be very explicit versions, it should not be left to chance. If you go back to a historical version, it should be an accurate historical version, not a random mix of history.
A fully documented self-standing repo is a thing of beauty. A half-baked repo is not. We keep history to reduce friction and make our lives easier. It is a little bit of work, but worth it.
Subscribe to:
Posts (Atom)