Recently, I was reading some AI-generated code.
At the high level, it looked like what I would expect for the work that it was trying to do, but once I dug into the details, it was kinda of bizarre.
It was code that one might expect from a novice programmer who was struggling with programming. Its intent was muddled. Its author was clearly confused.
It does help, though, for a discussion of readability.
Really good code just shows you what it is going to do, really easily. It makes it obvious. You don’t need to think too hard or work through the specifics. The code says it is going to do X, and the code does that X, as you would expect. Straightforward, and totally boring, as all good code should be.
In thinking about that, it all comes down to intentions. “What did the programmer intend the code to do?” Is it some application code that moves data from the database and back to the GUI again? Does it calculate some domain metric? Is it trying to span some computation over a large number of resources?
To make it readable, the programmer has to make their intent clear.
The most obvious thing is that if there is a function called GetDataFromFile, which gets data from the database, you know the stated intent is wrong, or obscured, or messed up. Shouldn’t the data come from a file? Why is it going to a database underneath? Did they set it up one way and duct-taped it later, without bothering to properly update the name?
If the code is lying about what it intends to do, it is not readable. That’s an easy point, in that if you have to expend some cognitive effort to remember all of the places where the code is lying to you, that is just unnecessary friction getting in your way. Get enough misdirection in the code, and it is totally useless, even if it compiles. Classic spaghetti.
Programming is construction, but it is also, unfortunately, a performance art. It isn’t enough to just get the code in place; you also have to keep it going, release after release.
Intent also makes it clear for the single responsibility issues.
If the intent of a single function is to “get data from the db”, “... AND to reformat it”, “... AND to check it for bad data”, “... AND to ....” then it’s clear that the function is not doing just one thing, it is doing a bunch of them.
You’d need a higher function that calls all of the steps: “get”, “reformat”, “validate”, etc., as functions. Its name would indicate that it is both grabbing the data and applying some cleanup and/or verification to it. Raw work and derived work are very different beasts.
Programmers hate layering these days, but muddying in a bunch of different things into one giant function not only increases the likelihood of bugs, but also makes it hard for anyone else to understand what is happening. Nobody ever means to write DoHalfOfTheWorkInSomeBizarreInterlacedOrder, but that should really be a far more common function name in a lot of codebases out there. The intent of the coder was to avoid typing in functions to avoid having to name them. What the code itself was doing was forgotten.
If you decompose the code nicely into decent bite-sized chunks and give each chunk a rational, descriptive name, it is pretty easy to follow what the code is trying to do. If it is layered, then you only need to descend into the depths if there is an underlying bug or problem. You can quickly assert that GenerateProgressReport does the 5 steps that you’d expect, and move on. That is readable, easy to understand, and you probably don’t need to go any deeper. You now know those 5 steps are. You need that capability in huge systems; there is often more code there than you can read in a lifetime. If you always have to see it with all of the high and low steps intertwined together, it cripples your ability to write complex or correct code.
In OO languages, you can get it even nicer: progress.report.generate() is nearly self-documenting. The nouns are stacked in the way you’d expect them, even though “progress” is really a nounified verb. If the system were running overnight batches, and the users occasionally wanted to check in on how it was going, that is where you’d expect to find the steps involved. So, if there was a glitch in the progress report, you pretty much know where that has to be located.
A long, long time ago, in the Precambrian days of OO, I remember watching one extremely gifted coder in action. He was working with extremely complicated graphic visualizations. As he’d see a problem on the screen while testing, he had structured his code so well that he pretty much knew exactly which line in it was wrong. That kind of code is super readable, and the readability has an extra property making it highly debuggable as well. The bug nicely tells you exactly where in the code you have a problem. That is a very desirable property.
His intent was to render this complicated stuff; his code was coded in a way to make it easy to know if it was doing the right thing or not. This let him quickly move forward with in his work. If his code had been badly named spaghetti, it would have taken several lifetimes to knock out the bugs. For anybody who does not think those readability and debugability properties are necessary, they don’t realize how much more work they’ve turned it into, how much time they are wasting.
If the intent of the code is obscured or muddled, it limits the value of the code. That’s why we have comments, for adding in extra commentary that the code itself cannot express. Run-once code, even if it works, is too expensive in time to ever allow it to pay for itself. You don’t want to keep writing nearly similar pieces of code each time; if you can just solve it once in a slightly generalized fashion, and then move on to other, larger pieces of work.
It takes a bit of skill to let intent shine through properly. It isn’t something intuitive. The more code from other people you read, the more you learn which mistakes hurt the readability. It’s not always obvious, and it certainly changes a bit as you get more and more experience.
You might, for example, put in a clear idiom that you have seen in the past often, but it could confuse less experienced readers. That implies that you have to stick to the idioms and conventions that match the type of code you are writing. Simple application idioms for the application’s code, and more intricate system programming idioms for the complex or low-level stuff. If you shove some obscure functional programming idiom into some basic application code, it will be hard for other people to read. There is a more ‘application coding’ way of expressing the code.
It is a lot like writing. You have to know who you are coding for and consider your audience. That gives you the most readable code for the work you are doing. It is necessary these days because most reasonably sized coding projects are a team sport now.
It’s worth noting that deliberately hiding your intent and the functionality of the code in an effort to get ‘job security’ is generally and most often unethical. Maybe if it’s some code that you and a small group of people will absolutely maintain for its entire lifetime, then it might make sense, but that is also never actually the case. More often, we write stuff, then move on to writing other stuff.
The Programmer's Paradox
Software is a static list of instructions, which we are constantly changing.
Thursday, November 6, 2025
Friday, October 31, 2025
The Structure of Data
A single point of data -- one ‘variable’ -- isn’t really valuable.
If you have the same variable vibrating over time, then it might give you an indication of its future behavior. We like to call these ‘timeseries’.
You can clump together a bunch of similar variables into a ‘composite variable’. You can mix and match the types; it makes a nexus point within a bunch of arbitrary dimensions.
If you have a group of different types of variables, such as some identifying traits about a specific person, then you can zero in on uniquely identifying one instance of that group and track it over time. You have a ‘key’ to follow it, you know where it has been. You can have multiple different types of keys for the same thing, so long as they are not ambiguous.
You might want to build up a ‘set’ of different things that you are following. There is no real way to order them, but you’d like them to stay together, always. The more data you can bring together, the more value you have collected.
If you can differentiate for at least one given dimension, you can keep them all in an ordered ‘list’. Then you can pick out the first or last ones over the others.
Sometimes things pile up, with one thing about a few others. A few layers of that and we get a ‘tree’. It tends to be how we arrange ourselves socially, but it also works for breaking down categories into subcategories or combining them back up again.
Once in a while, it is a messy tree. The underlying subcategories don’t fit uniquely in one place. That is a ‘directed acrylic graph’ (dag) which also tends back to some optimizing forms of memoization.
When there is no hierarchical order to the whole thing it is just a ‘graph’. It’s a great way to collect things, but the flexibility means it can be dangerously expensive sometimes.
You can impose some flow, making the binary edges into directional ones. It’s a form of embedding traits into the structure itself.
But the limits of a single-dimensional edge may be too imposing, so you could allow edges that connect more than one entry, which is called a ‘hypergraph’. These are rare, but very powerful.
We sometimes use the term ‘entity’ to refer to our main composite variables. They relate to each other within the confines of these other structures, although we look at them slightly differently in terms of, say, 1-to-N relationships, where both sides are effectively wrapped in sets or lists. It forms some expressive composite structures.
You can loosely or tightly structure data as you collect it. Loose works if you are unsure about what you are collecting, it is flexible, but costly.. Tight tends to be considerably more defensive, less bugs, and better error handling.
It’s important not to collect garbage; it has no inherent value, and it causes painful ‘noise’ that makes it harder to understand the real data.
The first thing to do when writing any code is to figure out all of the entities needed and to make sure their structures are well understood. Know your data, or suffer greatly from the code spiraling out of control. Structure tends to get frozen far too quickly; just trying to duct tape over mistakes leads to massive friction and considerable wasted effort. If you misunderstood the structure, admit it and fix the structure first, then the code on top.
If you have the same variable vibrating over time, then it might give you an indication of its future behavior. We like to call these ‘timeseries’.
You can clump together a bunch of similar variables into a ‘composite variable’. You can mix and match the types; it makes a nexus point within a bunch of arbitrary dimensions.
If you have a group of different types of variables, such as some identifying traits about a specific person, then you can zero in on uniquely identifying one instance of that group and track it over time. You have a ‘key’ to follow it, you know where it has been. You can have multiple different types of keys for the same thing, so long as they are not ambiguous.
You might want to build up a ‘set’ of different things that you are following. There is no real way to order them, but you’d like them to stay together, always. The more data you can bring together, the more value you have collected.
If you can differentiate for at least one given dimension, you can keep them all in an ordered ‘list’. Then you can pick out the first or last ones over the others.
Sometimes things pile up, with one thing about a few others. A few layers of that and we get a ‘tree’. It tends to be how we arrange ourselves socially, but it also works for breaking down categories into subcategories or combining them back up again.
Once in a while, it is a messy tree. The underlying subcategories don’t fit uniquely in one place. That is a ‘directed acrylic graph’ (dag) which also tends back to some optimizing forms of memoization.
When there is no hierarchical order to the whole thing it is just a ‘graph’. It’s a great way to collect things, but the flexibility means it can be dangerously expensive sometimes.
You can impose some flow, making the binary edges into directional ones. It’s a form of embedding traits into the structure itself.
But the limits of a single-dimensional edge may be too imposing, so you could allow edges that connect more than one entry, which is called a ‘hypergraph’. These are rare, but very powerful.
We sometimes use the term ‘entity’ to refer to our main composite variables. They relate to each other within the confines of these other structures, although we look at them slightly differently in terms of, say, 1-to-N relationships, where both sides are effectively wrapped in sets or lists. It forms some expressive composite structures.
You can loosely or tightly structure data as you collect it. Loose works if you are unsure about what you are collecting, it is flexible, but costly.. Tight tends to be considerably more defensive, less bugs, and better error handling.
It’s important not to collect garbage; it has no inherent value, and it causes painful ‘noise’ that makes it harder to understand the real data.
The first thing to do when writing any code is to figure out all of the entities needed and to make sure their structures are well understood. Know your data, or suffer greatly from the code spiraling out of control. Structure tends to get frozen far too quickly; just trying to duct tape over mistakes leads to massive friction and considerable wasted effort. If you misunderstood the structure, admit it and fix the structure first, then the code on top.
Monday, October 27, 2025
Fishbowl
I’ve often felt that software development projects were representative fishbowls for the rest of reality.
They have the same toxic mix that we see everywhere else, just on a smaller scale. It’s technology mixed with people mixed with business mixed with time.
Because of its turbulent history, technology is an ever-growing mess. It kinda works, but it's ugly and prone to glitches. We’ve spent decades desperately trying to avoid applying any serious engineering to our efforts.
People all have their own personal agendas that they often prioritize over the success of the collective work.
Some executives would prefer a low-quality early release so they can quickly claim success and move on. Software developers often pick inappropriate technologies to check off boxes on their resumes. All sorts of other players poke their fingers into the pie, hoping to make their careers.
People do what they think is best for themselves, which can be negative overall.
Meanwhile, they are circled by sales and business sharks hoping for a quick buck. They’ll promise anything if it will get their foot in the door. Lots of money is at stake. The software industry is slimy.
As the clock is ticking, too often the domain and engineering aspects fall to the ground. People stop caring about building solid tech that solves the user’s problem; they are more focused on their own issues.
This chaos devolves into producing a mess. Stuff tossed together too quickly for all the wrong reasons. The codebase turns to mud. It becomes a time vortex, with people desperately circling the edge, trying not to get sucked further in.
What usually works in software development is to return to the real underlying priorities. Ignore the surrounding circus. Keep producing reasonable code that goes as far as it can to really solve deep problems. It needs to be neat and consistent. A clean workspace avoids a lot of friction. All of the little details need attention. Programming involves a lot of patience.
If the codebase is solid, all the other problems remain at bay. If the codebase collapses, it opens the floodgates and lets the game get way out of control.
In that sense, it is all about controlling and minimizing the complexities. Fighting all the time to keep the artificial complexities from spawning, while making sure the inherent ones are satisfied.
Mastering this requires both a lot of knowledge and a lot of experience. It is a tricky juggling act. Keeping a million little balls in motion and off the ground while people scream at you about time.
That’s why people promising simple answers to these types of complex situations are inevitably destructive. They pick a few balls and improve those while the rest fall to the ground with a resounding thud. It seems to work right up until it inevitably collapses.
You can eliminate as much of the artificial complexity as possible, but never any of the inherent complexity. It remains and cannot be ignored. In software, you either have a reasonable codebase or you have a mess. This seems to be true elsewhere in meatspace as well.
They have the same toxic mix that we see everywhere else, just on a smaller scale. It’s technology mixed with people mixed with business mixed with time.
Because of its turbulent history, technology is an ever-growing mess. It kinda works, but it's ugly and prone to glitches. We’ve spent decades desperately trying to avoid applying any serious engineering to our efforts.
People all have their own personal agendas that they often prioritize over the success of the collective work.
Some executives would prefer a low-quality early release so they can quickly claim success and move on. Software developers often pick inappropriate technologies to check off boxes on their resumes. All sorts of other players poke their fingers into the pie, hoping to make their careers.
People do what they think is best for themselves, which can be negative overall.
Meanwhile, they are circled by sales and business sharks hoping for a quick buck. They’ll promise anything if it will get their foot in the door. Lots of money is at stake. The software industry is slimy.
As the clock is ticking, too often the domain and engineering aspects fall to the ground. People stop caring about building solid tech that solves the user’s problem; they are more focused on their own issues.
This chaos devolves into producing a mess. Stuff tossed together too quickly for all the wrong reasons. The codebase turns to mud. It becomes a time vortex, with people desperately circling the edge, trying not to get sucked further in.
What usually works in software development is to return to the real underlying priorities. Ignore the surrounding circus. Keep producing reasonable code that goes as far as it can to really solve deep problems. It needs to be neat and consistent. A clean workspace avoids a lot of friction. All of the little details need attention. Programming involves a lot of patience.
If the codebase is solid, all the other problems remain at bay. If the codebase collapses, it opens the floodgates and lets the game get way out of control.
In that sense, it is all about controlling and minimizing the complexities. Fighting all the time to keep the artificial complexities from spawning, while making sure the inherent ones are satisfied.
Mastering this requires both a lot of knowledge and a lot of experience. It is a tricky juggling act. Keeping a million little balls in motion and off the ground while people scream at you about time.
That’s why people promising simple answers to these types of complex situations are inevitably destructive. They pick a few balls and improve those while the rest fall to the ground with a resounding thud. It seems to work right up until it inevitably collapses.
You can eliminate as much of the artificial complexity as possible, but never any of the inherent complexity. It remains and cannot be ignored. In software, you either have a reasonable codebase or you have a mess. This seems to be true elsewhere in meatspace as well.
Thursday, October 16, 2025
Patience
The biggest difference between now and when I started programming 35 years ago is patience.
Many of the people who commission software development projects are really impatient now.
The shift started with the dot-com era. There was a lot of hype about being the first into any given market. So, lots of people felt that it was better to go in early with very low quality, than to wait and produce something that was well refined.
That made sense then; a lot of those early markets were brand new, and many of the attempts to establish them were total failures. So, it doesn’t make a lot of sense to invest heavily in building a great piece of software if, in the end, nobody would want it anyway.
In the crater left behind, the industry shifted heavily to reactivity. Forget any sort of long-term planning or goals; just survive in the short term, throwing together whatever people say they want. That is a recipe to create a mess, but recreating that mess over and over again kept people busy.
Behind the scenes, we started sharing code a lot more. When I started coding, you had to write everything yourself. That took a long time, but if you were good, it also provided really great quality.
As more code became available, people would blindly throw in all sorts of stuff. It would bump up the functionality rapidly, but it also tended to bloat the code and leave a lot of dark corners in the codebase. They would wire up stuff that they barely understood, and it would seem to work for a while, only to end in tears.
Because of that, someone could toss together a quick demo that was really promising with a few neat features, without understanding that a real serious version of the same thing would require exponentially more effort. It started with websites, but quickly infected all software development. Fast-glued balls of mud became the de facto base for lots of systems, and they scale really poorly.
As the web dominated even more, since there were so many available components, and documentation never really matured, Q&A sites emerged. If you're rushing through a piece of work, with impatient people screaming at you, you can jump online, grab some example code, and slap it in. It just amplified the problems.
Mobile phones compounded the effect. An endless stream of noise made it hard to think deeply about anything. But shallow knowledge is effectively low-quality knowledge. You might know how to combine a bunch of things together, but when it doesn’t work as expected, there is very little you can do about it, except try again.
There are all sorts of trends about scaling software, and people get sucked into believing that it should be easy, but the first major failure point is the ability of people to deal with a big, ugly, messy, poorly constructed codebase. You will never get any sort of effective or reasonable behavior out of a pile of stuff that you don’t understand. Scaling requires deep knowledge, but impatience prevents us from acquiring that.
So I find it frustrating now. People run around making great claims about their software, but most of it is ugly, bloated, and buggy. We’re an industry of prioritizing marketing over engineering.
My favorite jobs were decades ago, back in what was at least the golden age of programming for me. Long before the central requirement became “just whack it out, we’ll fix it later”. What you don’t understand is a bug; it just may not have manifested yet.
Many of the people who commission software development projects are really impatient now.
The shift started with the dot-com era. There was a lot of hype about being the first into any given market. So, lots of people felt that it was better to go in early with very low quality, than to wait and produce something that was well refined.
That made sense then; a lot of those early markets were brand new, and many of the attempts to establish them were total failures. So, it doesn’t make a lot of sense to invest heavily in building a great piece of software if, in the end, nobody would want it anyway.
In the crater left behind, the industry shifted heavily to reactivity. Forget any sort of long-term planning or goals; just survive in the short term, throwing together whatever people say they want. That is a recipe to create a mess, but recreating that mess over and over again kept people busy.
Behind the scenes, we started sharing code a lot more. When I started coding, you had to write everything yourself. That took a long time, but if you were good, it also provided really great quality.
As more code became available, people would blindly throw in all sorts of stuff. It would bump up the functionality rapidly, but it also tended to bloat the code and leave a lot of dark corners in the codebase. They would wire up stuff that they barely understood, and it would seem to work for a while, only to end in tears.
Because of that, someone could toss together a quick demo that was really promising with a few neat features, without understanding that a real serious version of the same thing would require exponentially more effort. It started with websites, but quickly infected all software development. Fast-glued balls of mud became the de facto base for lots of systems, and they scale really poorly.
As the web dominated even more, since there were so many available components, and documentation never really matured, Q&A sites emerged. If you're rushing through a piece of work, with impatient people screaming at you, you can jump online, grab some example code, and slap it in. It just amplified the problems.
Mobile phones compounded the effect. An endless stream of noise made it hard to think deeply about anything. But shallow knowledge is effectively low-quality knowledge. You might know how to combine a bunch of things together, but when it doesn’t work as expected, there is very little you can do about it, except try again.
There are all sorts of trends about scaling software, and people get sucked into believing that it should be easy, but the first major failure point is the ability of people to deal with a big, ugly, messy, poorly constructed codebase. You will never get any sort of effective or reasonable behavior out of a pile of stuff that you don’t understand. Scaling requires deep knowledge, but impatience prevents us from acquiring that.
So I find it frustrating now. People run around making great claims about their software, but most of it is ugly, bloated, and buggy. We’re an industry of prioritizing marketing over engineering.
My favorite jobs were decades ago, back in what was at least the golden age of programming for me. Long before the central requirement became “just whack it out, we’ll fix it later”. What you don’t understand is a bug; it just may not have manifested yet.
Thursday, October 9, 2025
Experimentation
There are two basic ways of writing software code: experimentation and visualization.
With experimentation, you add a bunch of lines of code, then run it to see if it worked. As it is rather unlikely to work the first time, you modify some of the code and rerun. You keep this up until you a) get all the code you need and b) it does what you expect it to do.
For visualization, you think about what the code needs to do first. Maybe you draw a few pictures, but really, the functionality of the code is in your head. You are “seeing” it in some way. Then, once you are sure that that is the code you need, you type it out line by line to be as close as you can to the way you imagined it. After you’ve fixed typos and syntactic problems, the code should behave in the way you intended.
Experimentation is where everyone starts when they learn programming. You just have to keep trying things and changing them until the code behaves in the way you want it to.
What’s important, though, is if the code does not work as expected, which is common, you dig a little to figure out why it didn’t work. Learn from failure. But some people will just keep making semi-random changes to the code, hoping to stumble on a working version.
That isn’t so bad where there are only a small number of permutations; you end up visiting most of them, but for bigger functionality, there can be a massive number of permutations, and in some cases, it can be infinite. If you are not learning from each failure, it could take an awfully long time before you stumble upon the right changes. By avoiding learning something from each failure, you cap your abilities to fairly small pieces of code.
Instead, the best approach is to hypothesize about what will happen each time before you run the code. When the code differs, and it mostly will, you use that difference as a reason to dig into what’s underneath. Little by little, you will build up a stronger understanding of what each line of code does, what they do in combination, and how you can better leverage them. Randomly changing things and ignoring the failures wastes a lot of time and misses the necessity for you to learn stuff.
Visualization comes later, once you’ve started to build up a strong internal model of what’s happening underneath. You don’t have to write code to see what happens; instead, you can decide what you want to happen and then just make the code do that. This opens the door to you not only for writing bigger things, but also being able to writing far more sophisticated things. A step closer to mastering coding.
Experimentation is still a necessity, though. Bad documentation, weak technologies, weird behaviours; modern software is a mess and getting a little worse each year. As long as we keep rushing through the construction, we’ll never get a strong, stable foundation. We’re so often building on quicksand these days.
With experimentation, you add a bunch of lines of code, then run it to see if it worked. As it is rather unlikely to work the first time, you modify some of the code and rerun. You keep this up until you a) get all the code you need and b) it does what you expect it to do.
For visualization, you think about what the code needs to do first. Maybe you draw a few pictures, but really, the functionality of the code is in your head. You are “seeing” it in some way. Then, once you are sure that that is the code you need, you type it out line by line to be as close as you can to the way you imagined it. After you’ve fixed typos and syntactic problems, the code should behave in the way you intended.
Experimentation is where everyone starts when they learn programming. You just have to keep trying things and changing them until the code behaves in the way you want it to.
What’s important, though, is if the code does not work as expected, which is common, you dig a little to figure out why it didn’t work. Learn from failure. But some people will just keep making semi-random changes to the code, hoping to stumble on a working version.
That isn’t so bad where there are only a small number of permutations; you end up visiting most of them, but for bigger functionality, there can be a massive number of permutations, and in some cases, it can be infinite. If you are not learning from each failure, it could take an awfully long time before you stumble upon the right changes. By avoiding learning something from each failure, you cap your abilities to fairly small pieces of code.
Instead, the best approach is to hypothesize about what will happen each time before you run the code. When the code differs, and it mostly will, you use that difference as a reason to dig into what’s underneath. Little by little, you will build up a stronger understanding of what each line of code does, what they do in combination, and how you can better leverage them. Randomly changing things and ignoring the failures wastes a lot of time and misses the necessity for you to learn stuff.
Visualization comes later, once you’ve started to build up a strong internal model of what’s happening underneath. You don’t have to write code to see what happens; instead, you can decide what you want to happen and then just make the code do that. This opens the door to you not only for writing bigger things, but also being able to writing far more sophisticated things. A step closer to mastering coding.
Experimentation is still a necessity, though. Bad documentation, weak technologies, weird behaviours; modern software is a mess and getting a little worse each year. As long as we keep rushing through the construction, we’ll never get a strong, stable foundation. We’re so often building on quicksand these days.
Subscribe to:
Comments (Atom)