Thursday, January 12, 2023

Software Evolution

As software gets more complicated, it would be nice to see newer tech stacks that absorb some of the older complexities away from developers. Over the decades, I’ve seen three places where we could do a much better job handling this.

These days, we have a nearly common repeating pattern for API end-points. So it would be nice if there was a really cheap way to spin them up that included security, configuration, and monitoring.

That is, all you provide is a table of endpoints, with behaviors turned on or off, and some code that matches and does all of the heavy lifting. There would be some persistent, secure, configuration parameters as well that you might need to change occasionally.

To help people use it properly there is a set of known documented patterns for every possible scenario, you just have to follow the instructions.

It includes both REST endpoints, but also background executions (aka batch jobs, repeated or one-off). So it is everything, all in one well-organized place, completely put together so that the programmers can concentrate on what their code needs to do, not how to get it set up. It should be so easy that the programmers would dread doing it any other way.

It would have a facility to browse what is available, which would support security to restrict some of that information. If you were on a dev team, you could see what was running, when, and how often. You could get a list of the configuration parameters available, but not their current values. Basically, after an upgrade, you’d be able to verify that all the pieces were in all of the places that you expected. It would satisfy a lot of documentation requirements as well.

Another piece that would be good is solid data plumbing. I think in the past there were lots of excellent examples, but their price was so high that they were unaffordable most of the time. So the costs here would have to be reasonable.

You want a nice clean inventory screen that lists out all of the persistent data for the organization. Developers can browse. The permissions are easy to open up to match the business needs. The security requirements for the data are baked in. That is if a business unit can see some data, the developers for that unit can see the definitions for that data, its model, and examples. They may not be able to see the data itself though.

So, it is both the full inventory of all persistent data and the home of all documentation necessary to be able to use that data in development. It is not tied to vendors or specific tech stacks. So, it includes RDBMSes, NoSQL, file systems, and external references, such as APIs. Anything, anywhere that is persisted. If the organization gets or uses any form of data, even from the outside, it is included.

But the real strength is the third part: plumbing. It also contains the list of all internal jobs that move data from one place to another. It distinguishes between real-time and ETL. It holds all configurations and transformations needed to move all of the data around. From those lists, it explicitly does all of the scheduling and monitoring. You would be able to see all jobs that were executed over months or years, their status, and the number of times they were rerun. You’d know when they were last updated and by whom.

If a team needs some new data in their system, they can arrange and schedule it from here. It should be simple, and it should allow that plumbing to come up rapidly. It should also behave like a code repository and keep a list of all deltas made to its instructions and configurations and by whom it was made. Everything, so that it can be tested in one environment, then safely moved to another.

In this way, we would know that a specific feed was changed months ago, know what those changes were, and know whether or not that made it work better or worse.

It can move data from outside to inside, from databases to files, and from primary dbs to read-only replicas. It is a complete inventory of all data and all the plumbing.

No longer would a system rely on its own custom ETL madness. This would organize all of it in one place.

The final piece is the ability to spin up simple interfaces, quickly.

A complex interface takes a long time to build and a lot of work, but most interfaces needed by most systems are fairly trivial. So, it should be possible to just whip them up quickly.

We can do this now, on two fronts.

The first is having frameworks that allow people to build “super” widgets. You can build them and use them alongside all of the regular widgets, you can design them to handle complex data. That is, you have a widget to display a complex composite type like user options. Then instead of just passing it a primitive variable, you give it your composite one. Everything else stays the same. Basically, it is analogous to adding composite types in a programming language but by adding super widgets to the gui.

The second part is dynamic forms. Well, not really “forms” but screens full of widgets that can be arranged and rearranged dynamically. Each widget in the form has a textual description of itself and some type of binding id. If you put the form on the screen, you can give it one big piece of composite data and it will traverse that and find all of the pieces that match the widget ids. In its simplest arrangement, the ids are just unique names that match up. It can be a bit more sophisticated with a hierarchy or scope, but simple names work well.

Then you get the definition of a form and some large clump of data. You draw the form on the screen and give that screen the data. You get another data clump back from that and give it to the backend. You don’t have to go through the data or understand it. If there is other stuff there that doesn’t bind, it doesn’t show up in widgets but does stay around after edits. Instead of being fixated on this or that variable, you only have to worry about this or that clump. Basically, you lifted up your efforts.

To really make it flexible, the forms can be recursive. And they can contain super widgets. But even more flexible is that they also can contain buttons, links, and images. All of the widgets can be either editable or read-only, so you can use the same layout for viewing and editing. All of the widgets have a type and a validation description. Forms implicitly know not to return invalid data.

Doing this, the main screen is just a form with a top-level menu, and as the user interacts with it, other forms are loaded. Any data is synced with the backend as needed, so the only real two parts a developer needs to do is to define a few forms, and define the structure or model of the data. With that work, the interface is mostly done.

For any complex interactions, basically cross-widget wiring like country/state mechanics, that is handled by super widgets. The super widget binds to both the countries and the sub-regional breakdown. It synchronizes correctly on the screen as the user would expect.

If you have some persistent data in a relational database, all you need is a corresponding form to be able to supply a basic gui for viewing or editing it. If the data is normalized and modeled reasonably, you can automatically generate that form. Mistakes in the schema would have to be worked around by hand.

Of course, that won’t work for every screen in every system. The really complicated ones or the screens that don’t match the persistent representation will still need a lot of custom work. Super widgets may cover some of it, but some screens may need full customization outside of this framework. Still, it gives developers more time to craft those properly, while the trivial stuff just gets thrown together easily. In urgent situations, you could put up some interim screens for a while, until the correct ones are ready. Then you can throw away the interim ones since it wasn’t a lot of work.

Together I think these three ideas would form an incredibly strong infrastructure for most large organizations. You’d end up knowing exactly what is in production and how it was doing. You’d know what data is persisted and where it was all located. And you'd be able to quickly throw together simple interfaces to deal with any urgent problems.

It would not however take away from any of the heavier work that still needs to be done. You’d still need some complex custom interfaces, there would still be data representation problems, some of the data would still need expensive migrations or strange transformations, etc. That work never goes away, but at least it wouldn’t be happening in the middle of a huge disorganized mess. Everything would have a place, and there would be a place for everything.

About the only concern with this might be performance. You wouldn’t want the demands of one side of the business consuming resources for another side. That is an easy fix. You partition them away from each other. They have independent setups in really large companies.

Costs are often another big problem, and in the past, we have seen that result in a lot of disorganization. People try to do things cheaply, but the resulting mess usually makes it way more expensive. Big systems require big efforts. You can’t keep cheating the game and expect the results to be any different.

1 comment:

  1. Check out DataHub, it handles a big chunk of the data discoverability and observability, plus some governance.

    ReplyDelete

Thanks for the Feedback!