In case of conflicts, CouchDB assumes the most modified branch of the document (i.e., the document with the higher revision number) is the winner. You can resolve the conflict by choosing a different branch/revision manually, but you can also choose to not do anything.
Yes, it picks a winner, which it show on all machines (so all machines that have seen same changes will pick the same winner). But it also keeps conflicts around, so users who care about them can correctly resolve them.
Sometimes the winner it picks is not what the users want, That could surprising, but it is correct because it really is a user-level conflicts.
(Now, user may very well at a timestamp field to the document, hope ntp works well and resolve the conflicts if they appear based on that, but CouchDB tries not to make such assumption on behalf of the user).
Really? That's nearly undifferentiated from just picking one at random. How is, "whoever hits the queue most often" a useful deterministic resolution strategy? I mean I guess it's functionally no worse than wall-clock time or something, but still kinda funny. :-)
It is not random. It is consitently picking the same document on all servers that have seen the same changes. By default it picks the one with the most changes. That consintency ("the same" part is very important) it means if you replicate and bring in some conflicts, both sides will show the same state. So you won't randomly after replicating A to B, and B to A see document 1 as the winner on A but 2 on B. They'll both pick 1 or 2. So both would settle on the same state.
Also it doesn't delete or remove conflicting siblings, it is very good about not doing that to user data. Users only know exactly how to solve particular conflicts.
> How is, "whoever hits the queue most often" a useful deterministic resolution strategy?
It's a deterministic resolution strategy, and is thus useful.
> I guess it's functionally no worse than wall-clock time or something,
Wall-clock time is not deterministic; therefore it's far worse.
When dealing with distributed systems, deterministic processes are critical. Multiple systems all being right is awesome, but multiple systems being wrong in different ways is a nightmare. :)
Is it deterministic in a way that's useful? From the perspective of the end-user its going to appear random because they don't control the system environment where "highest revision number" can mean something useful to them. In fact, the CouchDB guide even alludes to this when they talk about not relying on this scheme for complex conflict resolution it seems.
Two nodes split. A and B. Say there are 100 updates to A and 500 updates to B. The split heals, the system picks B because 500 > 100, but the write you actually want to dominate is A. The user can't control which replica gets hit more often, or when a split happens, so while this might be deterministic inside the DB it is semantically random from the user's perspective. So the system can make the same choice on all replicas, assuming it can guarantee it has seen all replicas, which I guess allows you to push merging management to each replica instead of requiring an intermediate coordination replica and then re-publishing the merge state to the replicas? So there's a system optimization benefit there.
But consider if the system did pick a winner at random, how would this look any different to the user? The user doesn't necessarily know if A or B should be picked.
Deterministic behavior is really important, but it seems like it really only looks non-random to the end user when deterministically picking a least upper bound for converging a join-semilattice or when all operations on the data are commutative or idempotent doesn't it?
Yes, because it allows to maintain a consistent state across distributed nodes.
> From the perspective of the end-user its going to appear random
...but consistent. If every node picks a random revision on conflict, then when multiple clients try to continue editing, they'll end up increasing the conflicts.
> The split heals, the system picks B because 500 > 100, but the write you actually want to dominate is A. The user can't control which replica gets hit more often, or when a split happens, so while this might be deterministic inside the DB it is semantically random from the user's perspective.
Yeah, but what happens if B picks B, and A picks A? Now the write you're looking for is either there or not there, depending on on which node you're talking to.
> how would this look any different to the user?
Everything is going to look random to the user, no?
It is just that random if the systems remain isolated for a long time. Since CouchDB requires you to send the last revision number when updating a document, if the systems are are live replicating between themselves, the guy who is hitting the queue more rapidly will be forced to fetch the latest winning revision every time before hitting the queue (that may be a revision from a different guy). This will give him time to think about the revision he just received, perhaps examine the document linked to that revision number, see if everything is place, perhaps merge changes himself manually... all that before updating the document in the database.
I don't see how this could be better for a deterministic approach. The recommendations are always that the developer must implement a saner way to resolve the conflicts.
In the CouchDB world, however, I have the impression that conflict resolution is ignored most of times, so we are left with this.
(I say this based on what I do, other people's code I read on the internet and the concerns of the CouchDB core developers about educating users and developers to setup saner conflict resolution approaches themselves.)
I don't think so. Though "most recently changed" is pretty useless too. They won't be synchronized in a distributed system, and even on a single machine if it's setup to use something like NTP, then the time won't be monotonically increasing since the clock-sync mechanism can move time both forward and backward.
"Each revision includes a list of previous revisions. The revision with the longest revision history list becomes the winning revision. If they are the same, the _rev values are compared in ASCII sort order, and the highest wins. So, in our example, 2-de0ea16f8621cbac506d23a0fbbde08a beats 2-7c971bb974251ae8541b8fe045964219."