Sometimes, the software industry tries to sell products that people don’t actually need and won’t solve any real problems. They are very profitable and need minimal support.
The classic example is the set of products that falls loosely under the “data warehouse” category.
The problem some people think they have is that if they did not collect data when they needed it, they don’t have access to it now. In a lot of cases, you can’t simply go back and reconstruct it or get it from another source; it is lost forever.
So, people started coming up with a long series of products that would let you capture massive amounts of unstructured data, then later, when you need it, you could apply some structure on top and use it.
That makes sense, sort of. Collect absolutely everything and dump it into a giant warehouse, then use it later.
The first flaw, though, is that collecting data and keeping it around for a long time is expensive. You need some storage devices, but you also need a way to tell when the data has expired. Consumer data from thirty years ago may be of interest to a historian, but is probably useless for most businesses. So, you have all of this unstructured data, when do you get rid of it? If you don’t know what the data really is, then sadly, the answer is ‘never’, which is insanely expensive.
The other obvious problem is that some of the data you have captured is ‘raw’, and some of it is ‘derived’. It is a waste of money to persist that derived stuff, since you can reconstruct it. But again, if you don’t know what you have captured, you can not distinguish it.
The bigger problem, though, is that you now have this sea of unstructured data, and thanks to various changes over time, it does not fit together nicely. So that act of putting a structure on top is non-trivial. In fact, it is orders of magnitude more complicated now than if you had just sorted out carefully what you needed first.
Changes make it hard to stitch together, and bugs and glitches pad it out with lots and lots of garbage. The noise overwhelms the signal.
It’s so difficult to meticulously pick through it and turn it into usable information that it is highly unlikely that anyone will ever bother doing that. Or if they did it for a while, eventually they’d stop doing it. If they don’t have the time and patience to do it earlier, then why would that change later?
So, you're burning all of these resources to prevent a problem you really shouldn’t have, in the unlikely case that someone may go through Herculean effort to get value out of it later.
If there is a line of business, then there are fundamental core data entities that underpin it. Mostly, they change slowly, but in all cases, you will always have ongoing work to keep up with any changes. You can’t escape that. If you did reasonable analysis and modelled the data correctly, then you could set up partitioned entry points for all of the data and keep teams around to stay synced to any gradual changes. In that case, you know what data is out there, you know how it is structured and what it means, and you have the appropriate systems in place to capture it. Your IT department is organized and effective.
The derived variations of this core data may go through all sorts of weird gyrations, but the fundamentals are easy enough to understand and capture. So, if you are organized and functioning correctly, you wouldn’t need an insurance technology to double up the capture ‘just in case’.
Flipped around, if you think you “need” a data warehouse in order to capture stuff that you worried you might have missed, your actual problem is that your IT department is a disorganized disaster area. You still don’t need a warehouse; you need to fix your IT department. So someone selling you a warehouse as a solution to your IT problems is selling you snake oil. It ain’t going to help.
Now it is possible that there is a lag between ‘changes’ and the ability to update the collection of the data. So, you think that to solve that, you need a warehouse, but the same argument applies.
The changes aren’t frequent enough and really aren’t surprises, so the lag is caused by other internal issues. If you have an important line of business, and it changes every so often, then it would make sense (and is cheaper) if you just have a team waiting to jump into action to keep up with those changes. If they are not fully occupied in between changes, that is not a flaw or waste of money; they are just firefighters and need to be on standby. Sometimes you need firefighters, or a little spare capacity, or some insurance. That is reasonable resource management. Don’t place and build your house on the beach based on low tides; do it based on at least king tides or even tsunamis.
There are plenty of other products sold to enterprises that are similar. If you look at what they do, and you ask reasonable questions about why they exist, you’ll often find that the answers don’t make any real sense. The industry prefers to solve these easy problems on a rotating basis.
There will be wave after wave of questionable solutions to secondary problems that ultimately just compound the whole mess. They make it worse. Then, as people realized that they don’t work very well, a whole new wave of uselessness will hit. So long as everyone is distracted and chasing that latest wave, they will be too busy to question the sanity of what they are implementing.
No comments:
Post a Comment
Thanks for the Feedback!