Thursday, August 1, 2024

Massively Parallel Computing

A long time ago I was often thinking about the computer architectures of that day. They were a little simpler back then.

The first thing that bothered me was why we had disks, memory, and registers. Later we added pipelines and caches.

Why can’t we just put some data directly in some long-term persistence and access it there? Drop the redundancy. Turn it all upside down?

Then the data isn’t moved into a CPU, or FPU, or even the GPU. Instead, the computation is taken directly to the data. Ignoring the co-processors, the CPU roves around persistence. It is told to go somewhere, examine the data, and then produce a result. Sort of like a little robot in a huge factory of memory cells.

It would get a little weirder of course, since there might be multiple memory cells and a bit of indirection involved. And we’d want the addressable space to be gigantic, like petabytes or exabytes. Infinite would be better.

Then a computer isn’t time splicing a bunch of chips, but rather it is a swarm of much simpler chips that are effectively each dedicated to one task. Let’s call them TPUs. Since the pool of them wouldn’t be infinite, they would still do some work, get interrupted, and switch to some other work. But we could interrupt them a lot less, particularly if there are a lot of them, and some of them could be locked to specific code sequences like the operating system or device drivers.

If when they moved they fully owned and uniquely occupied the cells that they needed, it would be a strong form of locking. Built right into the mechanics.

We couldn’t do this before, the CPUs are really complicated, but all we’d need is for each one to be a fraction of that size. A tiny, tiny instruction set, just the minimum. As small as possible.

Then the bus is just mapped under addressable space. Just more cells, but with some different type of implementation beneath. The TPUs won’t know or care about the difference. Everything is polymorphic in this massive factory.

Besides a basic computation set, they’d have some sort of batch strength. That way they could lock a lot of cells all at once, then move or copy them somewhere else in one optimized operation. They might have different categories too, so some could behave more like an FPU.

It would be easy to add more, different types. You would start with a basic number and keep dumping in more. In fact, you could take any two machines and easily combine them as one, even if they are of different ages. You could keep combining them until you had a beast. Or split a beast in two.

I don’t do hardware, so there is probably something obvious that I am missing, but it always seemed like this would make more sense. I thought it would be super cool if instead of trashing all the machines I’ve bought over the decades, I could just meld them together into one big entity.

No comments:

Post a Comment

Thanks for the Feedback!