Thursday, December 28, 2023

Identity

The biggest problem with software security is that we have wired up a great deal of our stuff to rely on ‘anonymous’ actions.

A user logs into some device, but with distributed computing that will machine talk to a very large number of other machines which often talk to even more machines behind the scenes. Many of those conversations default to anonymous. When we implement security, we only take ‘some’ of those conversations and wrap them in some type of authentication.

The most common failures are that either we forget to wrap some important conversations, or that there are various bugs when we do.

A much better way is to insist that ‘all’ conversations have the originating user’s identity “attached” to them. Everything. All of the way down. No verifiable identity, no computation. Simple rule.

"Why?"

Security as an afterthought will always get forgotten in a rush. And if it’s complicated and redundant, the implementations will vary, most landing on the broken side. It is a losing battle.

“But we don't need that much security...”

Actually, we do, and we’ve always needed that much security. It’s just that long, long ago when these things were first written, there were so few people involved and they tended to be far more trustworthy. Now everyone is involved and the law of averages prevails. We put security on everything else in our lives, why wouldn’t we do it properly on software?

“It’s too hard to fix it...”

Nope. It certainly isn’t easy, but if you look at the technologies we’ve been using for a long, long time now, they have the capacity to get extended to do this. It won’t be easy or trivial, but it isn’t impossible either. If we get it wired up correctly, we can gradually evolve it to improve.

“It doesn’t work for middleware...”

Any incoming request must be associated with an identity. The use of generic identities would be curtailed. So, the web server runs as SERVER1, but all of its request threads run as the identity of the caller. No identity, no request.

“That’s crazy, we’d have to know all of the user's identities in advance...”

Ah, but we do. We always do. Any non-trivial system has to authorize, which means that some functionality is tagged directly to a set of users. If you have user preferences for example, then only you are authorized to modify them (in most cases). There could be anonymous access, but that is mostly for advertising or onboarding. It is special, so it should not be the default.

Some systems could have anonymous identities, and it can be turned on or off, in the same way that we learned to live with them in FTP. But they wouldn’t be the default, you’d have to do a lot of extra work to add them, and you’d only do that for very special cases.

Every thread in middleware could have an identity attached to it that is not the ‘system identity’, aka the base code that is doing the initialization and processing the requests. It’s pretty simple and it should be baked in so low that people can’t change it. They could only just ‘add’ some other anonymous identity if they wanted to bypass the security issues. It’s analogous to the split between processes and the kernel in a reasonable operating system.

“But the database doesn’t support it...”

Oddly, the problem with most databases does not seem to be technical. It is all about licenses. Historically, the way companies figured out how to make extra money was through licensing users. It’s a great proxy for usage and usage is a way of sizing the bill to fit larger companies. You set a price for small companies. then add multipliers to get more out of the bigger ones.

We should probably stop doing that now. Or at least stop using ‘users’ as proxies for it, especially if that is one of the root causes of all of our security issues.

Then any statement to the database is also attached to an identity. Always. The database has all of the individual users, and every update is automatically stamped with user and time. No need to rewrite an application version of this anymore. It is there for all rows and all tables, always.

“That’s too much processing, some rows need far less...”

Programmers cheat the game in their applications and don’t properly audit some of the changes. Usually, that seems like a great idea right up until someone realizes that it isn’t. Whenever you collect data, you always need a way of gauging its reliability, and that is always the source of the data. If it comes from somewhere else, you need to keep that attached to the data. If a user changes it, you need to know that too. If a user changes it and it jumps through 18 systems, then if you lose its origins, you also lose any sense that it is reliable. So, it would make far more sense if, during an ETL, you keep that information too, and honor it. It would increase your data quality and certainly make it a whole lot easier to figure out how bugs and malicious crimes happened.

“That’s too much disk space...”

Most large organizations store their data redundantly. I’ve actually seen some types of data stored dozens of times in different places. We really should stop doing that. It would be a macro optimization on saving a huge amount of badly used disk space, as opposed to a micro one caused by lowering the data quality.

“But what about caching...”

I’ve said it before, and I’ll say it again, you should not be rolling your own caching. Particularly not adding in a read cache, when you have writable data. You’re just causing problems. So, realistically, you initialize with a system identity, and then it primes the cache under that identity. If someone builds a real working cache for you, it needs user identities, and it figures out how to weigh those against the system identity work to appropriately account for each. It does that both for security, but also to ensure that as a cache it is effective. If the system identity reads a wack load of data for one user but never uses it again, then the cache is broken. So, weights of 100% for example would mean that the caching was totally and utterly useless. A weight less than 0.01% would probably be quite effective. Security and instrumentation, combined.

“But what about ex-users...”

People come and go. Keeping track of that is an organizational issue. They really shouldn’t forget that someone worked for them a few decades back, but if they wanted to do that, they could just swap to a single ‘ex-employee’ identity. I wouldn’t recommend this myself, I think it makes far more sense that if you have returned to a company they reconnect you to your previous identity, but it should be a company-wide decision, not left to the whims of each application. When you start building something new, the ‘group’ of people that can use it should already be established, otherwise, how would you know that you need to build the thing?

“What about tracking?”

If you know all of the computations that an identity triggers and all of the data that they have changed, then you have a pretty powerful way of assessing them. That’s not necessarily a good thing, and it would have to be dealt with outside of the scope of technology. It would not be accurate though, because it is really easy to game, so if a company used it as a performance metric, it would only end up hurting them.


“But I want to roll my own Security...”

Yeah, that is the problem with our security. It takes a crazy amount of knowledge to do it correctly, everyone wants to do it differently, most attempts get it wrong, and while it would be fun to code up some super security, in reality, it is always the first functionality that gets slashed when everyone realized they aren’t going to make the release deadlines. If your job is effectively to rush through coding, then most of the coding you should stick to is straightforward. It sucks, but it is reality. It also plays back to the notion that you should always do the hard stuff first, not last. That is, the first release of any application should be a trivial shell that sets the foundations, but effectively has no features. Then the first release of the application is actually an upgrade. Doing it will eliminate a lot of pain and is easier to schedule.

"There are too many vendors, they won't agree to this..."

The industry is notoriously addicted to locking customers in. This type of change would not affect that, so if we crafted it as an ISO standard, and then there was pressure to be compliant, most of them would comply simply because it was good for sales. The downside is that in some cases it would affect their invoicing, but I'm sure they could find another proxy for organization size that is probably easier and cheaper to implement.

Identity, like a lot of other software development problems, is difficult simply because we like to shoot ourselves in the foot. If we could stop doing that, then we could put in place some technologies that would help ensure that the things we build work far better than they do now. Oddly, these problems are not hard to implement, and we basically know how to do them correctly, the issue isn’t technological, it has nothing to do with computers themselves, it is all about people.

Thursday, December 21, 2023

Software Knowledge

There are at least two different categories of knowledge in software.

One is the specifics in using a tech stack to make a computer jump through specific hoops to do some specific computations. It is very specific. For example, the ways to store certain arrangements of bits for persistence when using Microsoft stacks.

When you are building software, if you know how to do something, it will make the work go faster. But people are right when they say this type of knowledge has a half-life. It keeps changing all of the time, if you haven't done it for a while, you’ve forgotten it or it has moved out from under you.

This is the stuff you look up in StackOverflow.

The other category of knowledge is far more important.

There are all sorts of ways of working and building things with software that have not changed significantly for decades. They are as true now, as when I started forty years ago. They are the same no matter what tech stack you use, be it COBOL or JavaScript. Sometimes they are forgotten and then reinvented later under different names. A good example is that we used to edit our code, now we refactor it.

Fundamentally, building software is a construction effort. The medium appears as more malleable than most, but it is not immune from any other constructive issue. And because programmers intentionally freeze far too much, we change things too fast, and often need backward compatibility, it is rarely as malleable as it could be.

There are a couple of obvious anchors.

The first is that size matters. It is a whole lot easier to build tiny things than massive ones. Size is everything.

The second is that as the size of the effort grows, disorganization causes worse problems. If you write some tiny spaghetti, it is okay, you can still change it. But if you have a million lines of spaghetti you are screwed.

Organizing stuff isn’t fun, and oddly it isn’t a one-time task either. It is an ongoing effort. The data and code are only as organized as the explicit effort you put into them to keep them organized. If you aren’t doing anything, it is likely a mess.

But even more specifically, there is a lot of general knowledge about how to code things like data structures, algorithms, data normalization, or GUI conventions that hold true regardless of the stack. You may not need to create a hash table yourself anymore but you still need to understand how to leverage it and its limits. People will always need to ensure their data is at least 3NF or they will pay the price for storing it badly. A poorly wired GUI will diminish trust, it may be marginally workable but generating ill will.

The tools too. Learning to properly configure and use an editor or IDE tends to stay relevant for a very long time. There are all sorts of build tools and scripting, most of which haven't changed for decades, although sometimes they get obscured by trendy stuff that doesn’t last. But the need for the tools and usage of them never changes. If you spend time to figure one out, the others come easily. It also helps in understanding why some newer trends are poor and should be avoided.

Of course, all of the issues with people and politics never, ever change. We build software as a solution to some users' problems. If you don’t fully understand what you are trying to solve, then the things you’ve built are far less likely to work as needed. There is also a lot of gymnastics involved with funding software development, often resulting in too much stress and rushing through the work. Stress is bad for thinking; rushing is bad for quality.

Most of what you do specifically in software changes. The trends come and go in roughly five-year waves, developers need to keep up but not every wave. You can skip some waves, but if you skip too many your opportunities narrow. Once you are old and out, it is brutal getting back in.

Most of the general knowledge is far more important than the specifics. It is what keeps the projects from chaos, ensures that the work is at least good enough, and helps control the expectations of the people on the margins. If you know generally how to build things well, you can always look up the specifics. But if you don’t know how to properly persist the data, for instance, the work is doomed before it even starts. If you don’t understand fragmentation, you won’t understand why your work keeps failing when you bring it all together. If you don’t understand the components, you cannot craft a reasonable architecture.

You actually need more general knowledge to ensure that a large project is successful than specific knowledge. It is what keeps it all out of trouble while people are coding like mad. This is probably why the high failure rate of modern software is independent of the methodologies used. It's more likely experience-related. A bunch of kids who really know their specifics well will still usually fail in very predictable, general ways. That was true when I was a kid and it still holds true today.

Thursday, December 14, 2023

Fragmentation

When you have never experienced large-scale software development, your preference is often to prefer fragmentation. You want lots of little parts.

It’s understandable.

Most people are taught to decompose large problems into smaller ones. Once they have completed that, they assume that all of those smaller sub-solutions are mostly independent from each other.

They believe that dealing with each piece independently would be easier. That way you can just focus on one and ignore the rest.

It’s just that when people learn to decompose stuff, they tend to choose rather arbitrary lines of decomposition. There are lots of options, they choose the easiest. But that means that the pieces below are more likely to have dependencies between them. That they are not independent.

If the problem was decomposed based on ‘natural’ lines that tend away from dependencies, then the idea of treating stuff as fragments would work. But they don’t know what that means, so it doesn’t happen.

The other part of this issue then comes into play.

If you decompose a problem into parts, you need to ensure that the parts themselves still hold together to solve the original problem. That at least they cover everything necessary. That is, after you break it down, you build it back up again to make sure it is right. Breaking it down is only the first half of the effort.

So overlaps, gaps, and dependencies tend to derail a lot of decomposition attempts.

Once any complexity gets fragmented it grows exponentially. If there are enough fragments it becomes nearly impossible to assert anything reasonable about its overall behavior. That is, each of the components may work as expected, but the combination of them does not. This is an all too common problem in software.

The cure is to be suspicious of fragmentation. It’s not the same as encapsulation.

In the latter case, all of the rough edges are hidden nicely inside of a box. In the first case, the edges are exposed and are effectively their own pieces, thus throwing the complexity out of whack. You quickly end up with far more pieces than you can handle.

You can see this as an issue of ‘scope’ and most programming languages provide strong tools to control it, but very few programmers take advantage of them. We first figured this out with global variables, but there are endless ways to create similar issues.

If your decomposition into an Object is correct, for example, then all of the internal variables in the Object can be set to private. They are not modifiable from the outside. They are not visible from the outside. The entire interface is methods, and all of the variables are properly encapsulated. Instead, we have crazy ideas like ‘getters’ and ‘setters’ that drop functions directly over the variables, so that we can pretend that we encapsulated them when clearly we didn’t.

Other fun examples of fragmentation include the early attempts to distribute code throughout lots of static html files, making it nearly impossible to correctly predict behavior of anything non-trivial.

Modern frameworks are often based around fragments as well. You know there is a problem if you need to access ‘globals’ in a lot of little ‘callbacks’; it will quickly become a mess.

Even a lot of modern data storage philosophies make the same mistake. Just dumping all of the data in little files into a disorganized pool or lake is only going to blow out the complexity. Sure, you save time while collecting the data, but if it is nearly impossible to find stuff when you need it, then the collection will grow into a swamp.

Breaking things down into smaller pieces without fully encapsulating them is fragmentation. It is bad, in that while encapsulation wraps and controls complexity, fragmentation just amplifies it. Complexity is the impassable barrier for size. If you can’t manage it, you cannot get any larger or more sophisticated. If you encapsulate parts of it properly, you can grow the solution until it covers the whole problem.

Thursday, December 7, 2023

Solving Hard Problems

Trivial solutions are great for solving trivial problems. But if you have a hard problem, then no combined set of trivial solutions will ever end up correctly solving it.

The mechanics of this are fairly easy to understand.

If you try to solve only a small subset of a big problem with a solution that targets just that subset, then it will spin off more problems, it is a form of fragmentation.

You’ve addressed the subset, but now that needs to interact with many of the other parts, and those are artificial complexity. They would not have been necessary if you had addressed the whole problem.

If you try to solve a hard problem, either one that is huge or one that is complex, with a lot of trivial solutions, there will be an endless stream of these unsolved fragments, and as you try to solve these, they will make it all rather perpetual. You can’t get a perpetual motion machine in our physical universe, but you can effectively spend nearly forever creating and patching up little holes in a misfitting solution.

If you want to solve a hard problem, then the solution itself cannot be trivial. It will not be simple, it will not be easy. Any desire or quest to get around this is hopeless.

“Things should be made as simple as possible, but no simpler” -- possibly Albert Einstein

The really important part of that misattributed above quote is at the end. That there is a notion of too simple.

The belief that one can get away with over-simplifying solutions is the underlying cause of so many software problems. It’s nice when software is simple and easy to understand, but if it isn’t the right solution, then it will cause trouble.

Yes, you can whack out a large number of trivialized silos into software. You can get these into active usage really quickly. But if they do not fit properly, the net effect is now worse than before. You’ve created all sorts of other problems that will accumulate to be worse than the original one. The software isn’t really helping, it’s just distracting everyone from really fixing the problem.

This is quite common in a lot of large companies. They have an incredible number of systems that spend more of their resources pushing data back and forth, than they do working with their intended users. And the more they move data around, the worse it gets. The quality issues spin off secondary systems trying to fix those data problems, but by then it is all just artificial complexity. The real underlying problems have been lost.

If any of the silos aren’t fully encapsulated, then either the partitioning is wrong, or the problem is hard and can’t be split up.