There are lots of different ways to decompose large software projects.
A strong decomposition that is applied consistently across a system forms the base of good organization, which make the development smoother and provides better quality.
One way to look at the different types of code in any large system is to separate it between end-points and computations.
We’ll start computations.
If you have a bunch of inputs, you can apply some work to them, and you’ll end up with a bunch of outputs. That is a simple, rather pure, stateless computation.
Way down on the nearly trivial scale, we have operators like addition. You take 2 integers, add them together and provide the result. You can go slightly higher up to something like string concatenation, where you join two strings to form a larger one.
But it also applies to much higher, larger groups of instructions. For instance, you might calculate some complex metric like a bond yield, from the description and current time series around the bond in a market. Way more information than addition or concatenation, but still the same general idea. It’s just a computation.
It’s stateless, and everything you need to compute successfully comes in from the inputs. Then it either works or it gives you a reason for not being successful.
Described that way, you can see that ‘compiling’ for a language like C or Golang is in itself also just a computation. You give it the ‘source’ and you end up with a binary of some type or a list of very specific errors.
But we can go even higher. You might give some piece of code a URL, and some navigation stuff, and it will return a clump of data like JSON. It's still a computation, just one part of it is distributed. It triggers one or more other machines to do their computations based on the input you sent it.
So you could structure the code that calls someone else’s REST API as a series of stateless computations. And if the API were somewhat stateful itself, you can just take the output of one call and use it as the input for another, and still keep it somewhat stateless. At least each of the calls is stateless, even if the combined interaction is not.
We can also see that going to some large backend for persistent data, say an RDBMS or NoSQL database, is the same. We might give it an id for something, and it returns all of the associated data with that id, in a particular structure. Still a computation, and still devoid of state on each call.
Then that leads us to the definition of an end-point. Really it is any leftover code that is itself not a computation.
For instance, in the backend rest API, there is some routing code that bonds the URL to the code you want it to execute. Sort of a computation, but not really. It’s just the end-point mechanics to route incoming things to the right handlers. You could pull out any simple computations from the mechanisms used to actually trigger the code.
A GUI might have a bunch of buttons that people can press. As they do, sometimes a ‘context’ builds up. Then at some point that leaves the interface end-points and triggers the desired computation. Maybe if it’s a web app, the app itself is mostly end-points, and the backend directs it to the correct computation.
So, any end-point code is stateful, contextual, configurable, etc. Often quite messy. All of the other bits of code that are necessary to wire up stuff to users or other computers, to run correctly. It could include operational issues, platform issues, configuration, etc.
And it tends to be the code that runs into the most difficulty.
It’s not that hard to write a computation, and while it might take a bit of work to get a multi-party distributed computation working correctly, it is fairly easy to test its behavior.
It is hard though to set up a bunch of end-points and make sure that they are durable enough to withstand the chaos around them. So, end-points tend to have a lot more bugs. They are the front line, where all of the problems originate.
So, now if you can clearly separate these two different types of code for a large system, it opens up a lot of good organizational properties.
For instance, you know to put all of your computations into shared libraries, so that a lot of other people can use them too. But you also know that the end-points are specific, and tend to be ugly and redundant. So, you don’t waste a lot of time trying to figure out how to reuse them. They tend to be single-use. At best maybe you provide a skeleton or template or something to get people up and going faster.
If you lean on that perspective, you realize that minimizing end-points is a great thing, and maximizing the computations is good too.
When we have talked about building up reusable lego blocks from the ground up, it usually means the computations. Where we have talked about just writing things up quickly, it usually means the end-points. And if you have a lot of thin end-points separated from libraries of shared computations, you have a great deal of flexibility in how you will deploy stuff, but also the ability to leverage the bulk of the work you have already done.
No comments:
Post a Comment
Thanks for the Feedback!