Thursday, December 28, 2023

Identity

The biggest problem with software security is that we have wired up a great deal of our stuff to rely on ‘anonymous’ actions.

A user logs into some device, but with distributed computing that will machine talk to a very large number of other machines which often talk to even more machines behind the scenes. Many of those conversations default to anonymous. When we implement security, we only take ‘some’ of those conversations and wrap them in some type of authentication.

The most common failures are that either we forget to wrap some important conversations, or that there are various bugs when we do.

A much better way is to insist that ‘all’ conversations have the originating user’s identity “attached” to them. Everything. All of the way down. No verifiable identity, no computation. Simple rule.

"Why?"

Security as an afterthought will always get forgotten in a rush. And if it’s complicated and redundant, the implementations will vary, most landing on the broken side. It is a losing battle.

“But we don't need that much security...”

Actually, we do, and we’ve always needed that much security. It’s just that long, long ago when these things were first written, there were so few people involved and they tended to be far more trustworthy. Now everyone is involved and the law of averages prevails. We put security on everything else in our lives, why wouldn’t we do it properly on software?

“It’s too hard to fix it...”

Nope. It certainly isn’t easy, but if you look at the technologies we’ve been using for a long, long time now, they have the capacity to get extended to do this. It won’t be easy or trivial, but it isn’t impossible either. If we get it wired up correctly, we can gradually evolve it to improve.

“It doesn’t work for middleware...”

Any incoming request must be associated with an identity. The use of generic identities would be curtailed. So, the web server runs as SERVER1, but all of its request threads run as the identity of the caller. No identity, no request.

“That’s crazy, we’d have to know all of the user's identities in advance...”

Ah, but we do. We always do. Any non-trivial system has to authorize, which means that some functionality is tagged directly to a set of users. If you have user preferences for example, then only you are authorized to modify them (in most cases). There could be anonymous access, but that is mostly for advertising or onboarding. It is special, so it should not be the default.

Some systems could have anonymous identities, and it can be turned on or off, in the same way that we learned to live with them in FTP. But they wouldn’t be the default, you’d have to do a lot of extra work to add them, and you’d only do that for very special cases.

Every thread in middleware could have an identity attached to it that is not the ‘system identity’, aka the base code that is doing the initialization and processing the requests. It’s pretty simple and it should be baked in so low that people can’t change it. They could only just ‘add’ some other anonymous identity if they wanted to bypass the security issues. It’s analogous to the split between processes and the kernel in a reasonable operating system.

“But the database doesn’t support it...”

Oddly, the problem with most databases does not seem to be technical. It is all about licenses. Historically, the way companies figured out how to make extra money was through licensing users. It’s a great proxy for usage and usage is a way of sizing the bill to fit larger companies. You set a price for small companies. then add multipliers to get more out of the bigger ones.

We should probably stop doing that now. Or at least stop using ‘users’ as proxies for it, especially if that is one of the root causes of all of our security issues.

Then any statement to the database is also attached to an identity. Always. The database has all of the individual users, and every update is automatically stamped with user and time. No need to rewrite an application version of this anymore. It is there for all rows and all tables, always.

“That’s too much processing, some rows need far less...”

Programmers cheat the game in their applications and don’t properly audit some of the changes. Usually, that seems like a great idea right up until someone realizes that it isn’t. Whenever you collect data, you always need a way of gauging its reliability, and that is always the source of the data. If it comes from somewhere else, you need to keep that attached to the data. If a user changes it, you need to know that too. If a user changes it and it jumps through 18 systems, then if you lose its origins, you also lose any sense that it is reliable. So, it would make far more sense if, during an ETL, you keep that information too, and honor it. It would increase your data quality and certainly make it a whole lot easier to figure out how bugs and malicious crimes happened.

“That’s too much disk space...”

Most large organizations store their data redundantly. I’ve actually seen some types of data stored dozens of times in different places. We really should stop doing that. It would be a macro optimization on saving a huge amount of badly used disk space, as opposed to a micro one caused by lowering the data quality.

“But what about caching...”

I’ve said it before, and I’ll say it again, you should not be rolling your own caching. Particularly not adding in a read cache, when you have writable data. You’re just causing problems. So, realistically, you initialize with a system identity, and then it primes the cache under that identity. If someone builds a real working cache for you, it needs user identities, and it figures out how to weigh those against the system identity work to appropriately account for each. It does that both for security, but also to ensure that as a cache it is effective. If the system identity reads a wack load of data for one user but never uses it again, then the cache is broken. So, weights of 100% for example would mean that the caching was totally and utterly useless. A weight less than 0.01% would probably be quite effective. Security and instrumentation, combined.

“But what about ex-users...”

People come and go. Keeping track of that is an organizational issue. They really shouldn’t forget that someone worked for them a few decades back, but if they wanted to do that, they could just swap to a single ‘ex-employee’ identity. I wouldn’t recommend this myself, I think it makes far more sense that if you have returned to a company they reconnect you to your previous identity, but it should be a company-wide decision, not left to the whims of each application. When you start building something new, the ‘group’ of people that can use it should already be established, otherwise, how would you know that you need to build the thing?

“What about tracking?”

If you know all of the computations that an identity triggers and all of the data that they have changed, then you have a pretty powerful way of assessing them. That’s not necessarily a good thing, and it would have to be dealt with outside of the scope of technology. It would not be accurate though, because it is really easy to game, so if a company used it as a performance metric, it would only end up hurting them.


“But I want to roll my own Security...”

Yeah, that is the problem with our security. It takes a crazy amount of knowledge to do it correctly, everyone wants to do it differently, most attempts get it wrong, and while it would be fun to code up some super security, in reality, it is always the first functionality that gets slashed when everyone realized they aren’t going to make the release deadlines. If your job is effectively to rush through coding, then most of the coding you should stick to is straightforward. It sucks, but it is reality. It also plays back to the notion that you should always do the hard stuff first, not last. That is, the first release of any application should be a trivial shell that sets the foundations, but effectively has no features. Then the first release of the application is actually an upgrade. Doing it will eliminate a lot of pain and is easier to schedule.

"There are too many vendors, they won't agree to this..."

The industry is notoriously addicted to locking customers in. This type of change would not affect that, so if we crafted it as an ISO standard, and then there was pressure to be compliant, most of them would comply simply because it was good for sales. The downside is that in some cases it would affect their invoicing, but I'm sure they could find another proxy for organization size that is probably easier and cheaper to implement.

Identity, like a lot of other software development problems, is difficult simply because we like to shoot ourselves in the foot. If we could stop doing that, then we could put in place some technologies that would help ensure that the things we build work far better than they do now. Oddly, these problems are not hard to implement, and we basically know how to do them correctly, the issue isn’t technological, it has nothing to do with computers themselves, it is all about people.

No comments:

Post a Comment

Thanks for the Feedback!