-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CBL 1.1 (iOS) Docs not syncing when I think they should #723
Comments
This may be a Sync Gateway issue; paging @adamcfraser ... |
Unfortunately it's not that simple. It's also possible that when CBL sends the saved checkpoint, SG doesn't correctly determine which changes are newer than it. (That logic is surprisingly complex.) Resetting the replication on the client side means no checkpoint is sent. |
...Or the checkpoint is wrong on the client-side because of #728 |
I just tried CBL 1.1 1621267 and its happening noticeably less, but still happening. Previously it was really easy for me to reproduce in our app, but now it takes a bit more effort. So, definitely better, but I'm still needing to call |
Adding some more info with lastSequence and remoteCheckpointDocID since #728 got me thinking. If there is anything in here that is useful, or anything else I could add that might help, please let me know:
|
Not sure if this is helpful, but I cherry-picked some log lines from SG for a doc ID that I knew was not synced properly: 555e73a54a6f731321b60000 in this case, and then based on lastSequence in CBL, I think I may have spotted something: https://gist.github.com/joenoon/36b36c9307c922d6630c When the CBL pull replication thinks its done, its lastSequence was 637. In SG, 637 seems to be: But above After this, I "reset"/restarted the pull replicator, and then lastSequence becomes "637:616". I don't know exactly what that means, but it seems the missing sequences were the ones with the colons. I can see the doc that was missing (555e73a54a6f731321b60000) come through on Seq:637:616:
|
@adamcfraser could you help checking this, especially the last two comments from @joenoon? |
At first glance it looks like the user placeholder sequence (which should trigger backfill of the newly added channel for the user) isn't being sent correctly, and so the backfill (of sequence 616 in this case) isn't happening. I haven't been able to reproduce yet - still working on it. @joenoon If you are able to share the full SG logs that your last gist is excerpted from, that may help. |
Or the backfill of the 616 (result to 637:616) did happen but somehow not be sent in the changes feed along with 637. I'm just guessing as after resetting the replicator to pull everything, 637:616 did arrive. |
@adamcfraser @pasin I just recorded full logs on both iOS and SG from a clean start (so ignore previous seq numbers and ids): CBL iOS 1.1 1621267 SG: https://gist.github.com/joenoon/820c98f056b72c73e423 Comments are inline and can be found by searching "JOECOMMENT"... the comments are the same in both logs to try and make it easier to understand the state in each. Please let me know if I can make any changes and re-run a new set of logs. |
Thanks very much for the detailed logs @joenoon - digging through those now. |
I looked into both log files and captured some logs (by components and timestamp) that might be useful. |
Waiting confirmation that this is actually a SG issue, which it currently appears to be. |
@snej @adamcfraser @pasin @zgramana @tleyden I hope this helps: https://github.com/joenoon/ToDoLite-iOS/tree/issue_723 I've taken over the Logout button to add some steps: Always start with an empty SG/no-data. Log in with Facebook the first time - once you're in you shouldnt need to do it again. The buttons in the Logout menu are ordered the way you should go through them. Some steps hit SG, so you should watch SG logs to make sure its finished working before proceeding. The sync function is slightly different - lists aren't automatically routed to anyone, just given a channel. Want docs route the lists -- this is to match the issue I'm having in my app. Here are what the steps do:
Here is my log as an example:
|
Thanks a lot for the test app. I could use the app to reproduce the issue. |
Thanks @joenoon. Based on the unit testing I've done thus far, this looks like a race condition in Sync Gateway between the _changes feed and heavy write-side access() load. I'll give your modified ToDoLite a try today to confirm. I also want to try it against earlier SG builds to find out whether this issue has been recently introduced. I suspect it's a long-standing issue, but it's possible that it's being exacerbated by recent enhancements to handle Couchbase Server feed latency. There isn't an obvious quick fix. I've filed Sync Gateway issue couchbase/sync_gateway#880 to track. |
Great work, Adam! Based on that, should we close this issue in favor of the one you opened? |
Yes - we can close this and track on the SG issue. |
CouchbaseLite iOS 1.1 (25f3967)
SG 1.1.0-16
CB 4.0.0-dp
The pull replicator seems to think its done before it actually is, and is missing docs. I just found out if I do this:
... it catches up. So I can hopefully come up with a short-term workaround using that periodically. Because I can get the replication to correct using that reset switch, I'm guessing this issue is only in CBL and doesn't have anything to do with SG.
Background info: My iOS app is creating "Want" docs, which the SG sync function uses to grant access to the user to the doc id they are requesting. For example: If I need docId place1, I create a Want{want_id:"place1", want_type:"Place", user_id:"user1"} and the sync function does: access("user1", "Place-place1"). a Place is always in channel("Place-"_id), so this all should line up. (more details here https://gist.github.com/joenoon/7cc8e0aa2db7d51cc9d2 )
So when the user "Wants" several places, those Want docs get synced up to SG, processed, and then Place docs should get Pulled back to iOS. Some of that is happening correctly, but this seems to trigger the Pull replicator just thinking its done and not trying for anything else, leaving some docs unsynced.
Here are logs throughout the process:
The first run. At the bottom of the cbl log you will see several ** BROKEN ** log lines which is my periodic check for expected docs telling me they aren't available. I've been trying to take one of the BROKEN ids and trace it through all the logs.
https://gist.github.com/joenoon/a32fc8db1e9c0a70e545#file-cbl_before_restart_repl-txt
https://gist.github.com/joenoon/a32fc8db1e9c0a70e545#file-sg_before_restart_repl-txt
This is after just killing the old replicators and re-creating them (it doesnt fix the problem)
https://gist.github.com/joenoon/a32fc8db1e9c0a70e545#file-cbl_after_reset_repl-txt
https://gist.github.com/joenoon/a32fc8db1e9c0a70e545#file-sg_after_reset_repl-txt
This is after repl.customProperties = @{@"reset": @yES}; [repl restart];
THIS triggered all the missing docs to get synced.
You can see this by the ******* document:didChange lines I'm logging
https://gist.github.com/joenoon/a32fc8db1e9c0a70e545#file-cbl_after_restart_repl-txt
https://gist.github.com/joenoon/a32fc8db1e9c0a70e545#file-sg_after_restart_repl-txt
The text was updated successfully, but these errors were encountered: