If there are two people who may edit the same data concurrently, as explained to me when I first started working, there is a serious problem. Someone way back then called it the dirty write problem.
Bob opens up the data and starts to edit. But then he goes to lunch.
During lunchtime, Alice opens up the same data, edits it, then saves it.
After lunch, Bob returns, makes a couple more changes, and then saves his work.
Bob's write will obliterate Alice’s changes. Alice did not know that Bob was also working on the data. Bob did not know that Alice had changed the data.
This problem exists in far too many GUIs to count. Way back people would actually put code in to solve it, but these days that is now considered too complicated. Too long, didn’t code. Alice will just have to remember to check later to see if her changes actually did or did not persist. Lucky Alice. Bob should probably check all of the time too. Maybe they should both write down their changes on paper first...
One way to solve this is to have Bob lock the file for editing. Alice will find out at lunchtime that she cannot edit, but of course, if her changes are urgent it will be a big problem, Bob might be having a very long lunch. Alice will be upset.
Another way to solve the problem is when Bob starts to make his post-lunch edits, a warning pops up saying the underlying data has changed. It would give Bob a screen to deal with it. It’s a bit tricky since Bob already made changes, any merge tool would be 3 way at that point. The original, Bob’s version, and Alice’s version. This might hurt Bob’s brain, and it isn’t the easiest stuff to code either.
A variation on the above is to just err out and lose Bob’s changes. Bob will be upset, it was a long lunch so now he can’t remember what he did earlier.
In that sense, locking Alice out, but giving her some way to just create a second copy seems better. But then you also need to add in some way to reconcile the two copies after lunch, so we are back to a diff screen again.
Because it is complicated and there are no simple solutions, it is ignored. Most systems are optimistic assuming that overlapping edits don’t occur on shared data. They do nothing about it. And if work is lost, most people will forget about it or at least not exactly remember the circumstances, so no valid bug reports.
Occasionally I will run across a rare programmer who will realize that this is a problem. But then I usually have to explain that because we are working with a shoestring budget, it is best if they just look away. Software, it seems, is increasingly about looking away. Why solve a problem if you can pretend you didn’t see it?
This is what CRDTs (and operational transforms) were created to solve!
ReplyDelete