Thursday, July 2, 2026

Language Ideas

Decades ago, our project had a convention where we never directly used ANSI C primitive types, unless it was trivial in a loop. But for any and all other variables, everything had to be typedef’d explicitly. Everything.

Some people would see that as excessive and way over the top, but it actually turned out to be a very good habit that helps with getting really high quality.

If you went to call something, you’d need an X, and that would force you to go into the code to find out how to create one. If the code is all ints, strings, doubles, etc., then people skip that effort and just find some hacky way to kludge the value they need. But now you can’t.

I’ve often thought that I’d like to see a language that has zero primitive types for variables. None. If you need a variable type, you have to declare it.

But to add to the fun, there aren’t even open types like ‘int’. If you needed an integer, you’d create the type, but as you did, you’d have to explicitly specify the range.

For example:

type Counter: integer 0..INTMAX

It would be more fun for strings, as they are not open either:

type Token: string [a-zA-Z]*

Each would be constrained by an RE state machine.

What I think would happen, maybe naturally, is that programmers would get bored with creating a million different types and a billion conversions between them. So, instead, they’d start to pack things together into larger structures all the time.

type UserName:
First: string [a-zA-Z]*
Last: string [a-zA-Z]*

And for any discerning programmers out there who suspect that using just alphabetical characters is not wide enough to handle all of these types of names across the planet, staring at this declaration would trigger a need to further investigate and correct the model.

But the trick would be that in correcting the model, in that one place, it would also be corrected everywhere else. So the benefit would be that all of the validation code, both at the interface and at persistence, that would have needed to have also been updated when the model changes, would actually not need to be changed, since it is all implicit in the language. A recompile would do the trick.

I’d go even further, though. I’d not have any primitive data structures in the language’s library; they’d all be baked into the language too.

For example:

type UserList linkedlist:
Entries: User all
Sorted by User.UserName.Last

If it’s starting to smell like SQL, I apologize. The type semantics is essentially declarative, but the rest of the language syntax would be imperative, with some extended paradigm on top. Probably more like Golang than any of the other OO or FP variants.

The point, though, is that you would use the type mechanism to build up larger and larger data structures, and all of the base ones, like lists, trees, dags, graphs, and even hypergraphs, would already be there. For fun, you’d have all others like stacks, queues, and pagodas. All with various options, but to use more advanced ones, you would have to explicitly declare that they were not just trivial implementations.

That would ensure that anyone reading would not have to infer anything about the underlying implementations. The crudest thing would always be the default thing. Any special ability, property, enhancement, or optimization would have to be explicitly mentioned. It would minimize confusion.It would aid in avoiding bloat.

Over the decades, the trend in languages was to make them more dynamic and to minimize expressions. The result of adding those freedoms in the hands of a disciplined programmer was great, but the abuses heavily outweighed the elegant examples. Learning from that, I think we should find more ways to restrict the freedoms, but not cripple the expressiveness. That is, you’re not forced to type in reams of boring boilerplate, but you also can’t write code that is cryptic enough to win the obfuscated C code contest either. The code you have to write always makes its intent clear. Readability is part of the language.

Thursday, June 25, 2026

The Craft of Programming

Back in the eighties, when I was a student, there were only two choices: you used ‘vi’ or you used ‘emacs’.

I picked vi, my roommate picked emacs. The stuff he did with his editor was way cooler, but over the decades, vi has pretty much been everywhere I have stumbled.

I assumed it would fall out of fashion, but it persisted. Vim and all of those embeddings in the IDEs keep it alive.

I read something about how someone thought that using a complicated editor like vi was unnecessary. They implied that it was a waste of time to learn lots of its features. There were so many other things to learn, why not just use a lame mouse-based editor and move on?

For me, the answer is that it is all about craft.

When you build a large program, it’s a lot more than just throwing together the thousands of instructions necessary to tell the computer what to do. The code is just the output, but if you are hasty and careless on your way there, the whole thing ends up as a house of cards; any little breeze will knock it over. That is, it isn’t just coming up with some code; it is coming up with good, solid code that is readable and will survive all of the craziness that the world throws at it.

I’d rather have 150K lines of something that is super tight, abstract, solid, dependable than have 1M lines of stuff that could be served in an Italian restaurant. A whole lot of bad code is just a few steps closer to an apocalypse than a system.

So, it’s not about how much code you can create or how fast you can create it. Instead, it is about how carefully you embed precision into that code. Not for tomorrow, but for its entire lifespan. You want strong solutions to the real problems you need to solve, and you want them encoded in a way that you can keep them moving forward and leverage them for all sorts of related problems.

In the act of doing that sometimes tedious, thoughtful, and precise work, you want to develop habits that are more likely to produce high-quality output. Quality matters, not quantity.

Getting back to vi, if you are concerned about your code, you care very deeply about how it is ordered in any sort of file. It should not be random; you have come up with some consistent reason why one function is placed ahead of the others. In the daily grind, you have probably violated that order accidentally a few times. In most editors, you can use the mouse to highlight a chunk of code, hit ctrl-X to cut it, then ctrl-V to paste it back where it should go. It works, but it’s a touch haphazard. You could be interrupted, or get confused, or any other disruption that might interfere with the outcome you desire. In vi, you ‘mark’ the start, then ‘mark’ the end, then issue a few commands to move the code into place. It’s a small difference, but it’s transactional integrity. It either happens or it does not. If you get interrupted, it doesn’t matter; nothing is lost, nothing is messed up. If you're unsure about where to place the code, you can scroll around with the keyboard until you are sure. It isn’t any slower than using the mouse, but it is a whole lot more precise.

That’s a trivial example, for sure, but it’s just one of those things that you can do with a good tool when you’ve invested the time to understand it fully. Being a good programmer, a reliable one that is always there in a crisis, is the act of investing a whole lot of time in honing these acts of precision. It’s not that you can whack out a whole lot of code that you barely understand; it is that when you are given a piece of work, you can ensure that that work is done satisfactorily, even in environments that are difficult.

That is the essence of good programmers. They are the ones that everybody leans on, especially when it gets difficult, because they know that they’ll be there and make sure it is at least as good as it can get right now. They are reliable. Not always efficient, never fast, but always making sure that the work going out the door is solid.

We’ve always had the expression “garbage in, garbage out” in reference to the necessity to have good input data, but it also applies to code. If you get a group of people that churn millions of lines of questionable code, it’s just a field of landmines waiting to go off. If you get a group of people that solve complex problems with the least amount of code they can, carefully, then it doesn’t take that long before that quality of work starts to pay for itself. The craft of programming isn’t coding; the craft is in providing solutions, it just so happens that most of them involve code in some way.

Thursday, June 18, 2026

Structureless

The most common mistake I have seen in big ugly balls of mud is to try to capture data without enough structure.

Chopping up some incoming data into a lot of little strongly typed fields is a pain. Sometimes it seems like an unwarranted pain. You may need to get a mailing address. Why not just give the user a big textbox to fill in?

The problem isn’t that the users can’t carefully type in the text with appropriate structure; it’s that sometimes they won’t.

And the code to parse unstructured text is stupidly complicated. They can type anything; you have to be able to apply some type of structure to each and every possible variation. Since there are an infinite number of those, you are going to lose.

You can add a ton of validation, but if it’s not rigorous within a fully interactive interface, then if there is any tiny way to bypass it, it was a total waste of effort.

You could scream at the users and make them format it correctly, but as time wears on, unless you keep screaming, eventually that practice will degrade. It will just delay the inevitable.

Which is to say that in a computer, a big box full of text is absolutely nothing more than a big box full of text. It has no other use or value. That makes it useful for someone putting a “personal note” somewhere in a report, or something like that, but there is nothing beyond that. It’s not really data; it’s just an extra external comment of some type.

If what you intend to do is collect data with a very specific structure, you should never subvert text boxes to do that for you. It’s not a shortcut; you haven’t “figured it out”, you just made a very bad mistake. And sadly, doing it right wasn’t that much more time.

Likewise, if you are using some questionable software, and it has a lot of text boxes, so you come up with a clever idea about how to put structured data into those, just because you can’t or don’t know how to change the program, then it isn’t brilliant. It’s just a sloppy, hacky way of trying to get around some other code. One step worse than duct tape.

Data is not data without structure. Untyped data is not data. A big string of characters is a mess. Mostly.

If it originated under the strict control of some code somewhere, and the pathway was closed and guarded, then sure, two programs can use strings and complicated parsers to pass data back and forth. It starts with structure, is transported in an unstructured container, and then it is restructured again. But if the pathway is open or one end of that game is a human, then a big string of stuff is just potential garbage. At some point, usually in the not-too-distant future, someone will fill that string with a problem, and it will end up wasting a lot of time.

Do note that there are subtle variations on this. A user might use one program to render XML that they upload manually into another. In a case like this, the human isn’t the endpoint; they are just the transfer medium. It's an open pathway, but still just between two programs. And you can close down that pathway by only strictly accepting XML with a very specific schema. Since you can verify that and reject garbage as necessary, it becomes closed.

The foundation of all software systems is to collect data. The full and complete structure of that data, as it relates to the real and digital worlds, is an essential part of that data.

There are extraordinary times when it makes sense to only sub-model some external data and live with the consequences of that choice. But the default should always be that if the program needs some data, it is key to its usage and computations, then the data needs to be fully and correctly modelled. That is, it needs the right structures, for example, not shoving a tree into a list, and it needs all of the individual fields in that data to be very strongly typed. It needs this in order to make the correct choices with its instructions based on what is actually there. It can’t be vague or ambiguous; computers are not smart enough to grok external context. They can only act on the very specific information that they have.

It’s also worth noting that while parsing may look easy, and in some tiny instances, it is not that complicated, it should always be considered a hard problem to properly solve. As such, if you can avoid parsing, or at least push it to a human somewhere, then you will get far fewer bugs and the code will be more likely to behave as expected. If you do have to venture into parsing, then it is one of the coding areas where reading a boatload of stuff in advance will pay off huge dividends. Trial and error with parsing is a massive bug generator.

Thursday, June 11, 2026

Software Systems

I use the term ‘software system’ loosely. I usually intend it to mean: all of the boundaries for a set of related solutions that have been or will be implemented with software.

In that sense, it is less about the technical parts of the ‘system’ and more about how they all come together to help people.

I do this mostly because I tend to visualize a ‘problem space’ as a flat 2D terrain. It is a convenient oversimplification. It is a big, wide, open, empty field of grass which spans over related problems.

When I am doing greenfield work, I see the start as picking one spot in that field. You start there, building up enough structure to be useful. First, you lay down some common foundations, then you start adding in functionality that implements the features you know will help solve the problem.

As you do this, people will see the effort and start making suggestions. Some will want to go off in one direction, while others will prioritize the opposite way.

The trick to keeping it all as usable as possible is to slowly expand out your borders, but not in too many directions all at once.

Someone once told me that for software, you should never pick a path unless you are willing to walk it. From this perspective, it usually means that you won’t expand into another area in the field haphazardly. If you do choose to go there, it needs to be done correctly. That is, adding a few really good, solid features is way better than adding a million lame ones.

The same is true for the data. If you need new data, you add it carefully, properly structured, or not at all.

Overall, though, you start at one specific spot and keep growing. If it’s a good set of programs and people find it valuable, you’ll probably be at it for years, if not decades. So, it’s really crucial to its lifespan that the work you do in the very early days is as good as it can be. It needs to be neat, tidy, organized, and carefully thought out.

With that in mind, calling the current work and any of the rather obvious future work a ‘system’ works quite well. The system isn’t the code, but rather it is all of the territory that the code is trying to cover at some point. You might build a system for handling the account problems in a large corporation, for example. There might be lots of included pieces, and even some nearly stand-alone sub-systems, but they are all trying to fit together to deal with the same problems.

So, it’s similar to seeing the forest through the trees. The boundaries of the expected work are the system, but the system may not stretch right up to those boundaries yet.

From this view, it makes it easier to understand a bottom-up implementation. You might not know all of the features that people will ask for, but you should have a reasonable sense of the territory you are covering right now. Lots of that territory is similar, so building reusable components and engines will really help in getting more ground covered at a faster rate.

The classic example is reporting. You know people will need it at some point, so instead of just hardcoding a couple of static examples, it would be better to either offload it to somewhere else as exported data, or write some generic engine that is flexible enough to cover its rather massive width. The trick is not to write a lot of code, but rather to leverage any code you do write to cover the largest parts of that territory. In software, a little foresight goes a long way.

Thinking of a software system this way really helps in making a lot of the implementation decisions. If your field doesn’t cover having a million users, then designing an architecture to support that scale doesn’t make any sense.

More importantly, if you are located in one corner of the field, then trying to expand way over to the other side of a nearby hill doesn’t make a lot of sense either. That’s far enough away that it is clearly another system, another project and thus another codebase.

Building even medium-sized software is surprisingly complicated, so finding ways to frame it nicely really helps with making better decisions. Since time is a precious resource, getting to the right code as quickly as possible is important. Seeing it all as a system occupying some territory in an endless field is a good guide.

Thursday, June 4, 2026

Round Holes

A classic expression describes shoving a square peg into a round hole. Basically, you’re matching the wrong part with the wrong location.

I see this often in software architecture. Sometimes I like to use the term impedance mismatch. There is a component that someone is suggesting for use in a different variation of its problem space. It fits badly.

Sometimes the issue is access. There are hard limits on the availability of stacks, libraries, frameworks, tools, and services. In some organizations, they need to be vetted and approved first. This can be very slow, but I still think a good idea; using too many technologies is a nightmare.

In some cases, it is knowledge. People tend to gravitate to the things they already know, so they’ll prefer technologies they’ve used in the past, even if it means forcing them into place. That makes little sense if the architect has that preference, but it’s a different development team that does the construction.

Sometimes it is a misunderstanding. The marketing for the component says it will do everything, perfectly, but the reality is that it is far more than a stretch. Hastily cobbled together features to help sales. Still, the people funding the effort got swayed, so now everyone else is forced to jam weak pieces into the wrong place.

The habit I’ve always encouraged is to look carefully and admit that a round hole is, in fact, just a round hole. 

I don’t start with the square pegs; that is my last step. Not surprisingly, this can frustrate people, in that there might not be any round pegs available. They don’t exist or can’t be used. For me, though, I still want to know that the hole is round, even if I can’t fill it correctly right now, or in some cases, ever.

But if you can do that, and imagine for any bunch of holes what would fit rather perfectly, first, before getting lost in the messiness, it will really help both simplify the effort and get it as good as possible. 

Otherwise, you risk unintentionally creating a Rube Goldberg machine. For people unfamiliar with those machines, they are works of art that are deliberately overcomplicated. Just a collection of mismatching pieces made to do something interesting. They make great entertainment, but are not something that you’d want to have to rely on.

I’ve seen that too often in enterprise architecture, systems built out of odd, mismatching components sloppily glued together into a giant house of cards. That, paired with excessive brute force for the glue, tends to generate an endless amount of support and bug fixing, while never really working correctly. The system exists, but it is just off by enough that it would be better if it didn’t. It’s a time sink. Now, instead of a solution, it is an ill-placed speed bump.

Often, to avoid that fate, I want to just look at the way the data needs to flow around at the high, rather abstract level. 

You need to get the major entities from other sources, persist it all, and then deliver to interfaces, reporting, other systems, etc. If you understand the amount of information, its timeliness, and frequency, you start to get a sense of the minimum pegs you need below it. If you can grok that, then you can start the torturous phase of trying to see what is actually available and whether or not it is close enough to be workable. But if you go the other way and pick the components first, you’ll quickly get lost in gluing together odd parts for no reason.

It’s the same form of thinking that is needed to get good, simple, clean code, too. You have to see it first from a top-down perspective, before trying to build it up from what’s already available. It’s really the only way to keep from getting lost, but also to leverage reuse, encapsulation, etc. You want to know the scope of the problem first, come up with a near-perfect solution, and then map that impossible solution back to things that are possible. It probably sounds a bit crazy to people who can’t see it that way, but it is a perspective that anyone can learn to leverage. A superpower of sorts.

So we can get there with three easy questions. 
  • What is the ‘full’ scope of the problem? 
  • What would solve this perfectly? 
  • What’s available to approximate that perfect solution?
In an enterprise that might be building up a replacement system for tracking some type of inventory or case management. The primary features are pretty well known; the useful secondary ones are findable with a bit of investigation.

Perfection might be a dynamic data store to accommodate wide but slowly changing shallow data. The users need a nice GUI to get at this and keep control. The incoming data is real-time, vibrates occasionally, so a queue would protect it and help with integrity. The system feeds a few others that specialize in other forms of management. It’s always a smallish number of people. It should all run in a managed environment.

This then is the hole that needs to be filled in with whatever technologies are available now, in the future, or can be suitably crafted in a “reasonable” time.

Contrast that with something where the data rarely changes, there are millions of users constantly accessing it, and they are the primary source of the data. It’s a very different hole that likely needs industrial-strength pegs in order to keep it going. It’s not a system running on one or two boxes, but requires a large cluster of machines all cooperating to cope with its huge and variable load. The scale is so large that there is no overlap with that first medium system, so it’s unlikely that they should share any common technologies. It’s more of a star-shaped hole, needs special stuff to fill it.

The converse is also true, in that any of the technologies suitable for the second design would be grossly over-engineered for the first one. You can’t just cherry-pick a few and shove them into place. One is a 2D circle that needs to be painted, the other is a 3D hole that needs to be filled.

In that sense, you learn as much as you can about the full width of the problem, then let your imagination run wild with getting it perfect. With those boundaries in place, you can start picking the fewest number of pieces that come close to filling it. There will be ugliness and rough edges, but you’ve found them early and minimized them, which is the best you can do if you can’t just build it all from the metal to the top.