Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COGNITO] [SYNC] Unhandled (by SDK) conflict breaks synchronization #391

Closed
sebas86 opened this issue Jun 20, 2016 · 7 comments
Closed

[COGNITO] [SYNC] Unhandled (by SDK) conflict breaks synchronization #391

sebas86 opened this issue Jun 20, 2016 · 7 comments
Labels

Comments

@sebas86
Copy link

sebas86 commented Jun 20, 2016

Hi, I have one another, much bigger problem with synchronization using Cognito Sync with Unity.

After few synchronizations I can't save any new data in cloud. When I check communication between client app and server I see many 409 Conflict responses but in application I gets only synchronization success event (Dataset.OnSyncSuccess). Also in server response I get always this same message:

{
    "message": "Current SyncCount for: SomeKeyName is: 2 not: 1"
}

My code handles conflicts just like in provided sample. Conflict code also was tested and it just works.

This bug occurs more often when many unrelated keys is saved in random order and time. After saving all data under one key as simple JSON string this bug occurs less often, but still occurs.

Application restart doesn't solve problem, error disappear only if I clean application data.

Problem occur in Unity Editor and on iOS devices.

@pfleeter
Copy link

pfleeter commented Jun 20, 2016

I see this issue very frequently in the wild, as well. I believe the issue is that the local device sqlite db has a record that is 1) modified, 2) has the same value as the remote record, and 3) has a different sync count.

The conflict detection code in RunSyncOperation ignores the differing SyncCount's:

// if conflict, prompt developer/user with callback
List<SyncConflict> conflicts = new List<SyncConflict>();
List<Record> conflictRecords = new List<Record>();
foreach (Record remoteRecord in remoteRecords)
{
    Record localRecord = Local.GetRecord(IdentityId,
                                          DatasetName,
                                          remoteRecord.Key);
    // only when local is changed and its value is different
    if (localRecord != null && localRecord.IsModified
        && !StringUtils.Equals(localRecord.Value, remoteRecord.Value))
    {
        conflicts.Add(new SyncConflict(remoteRecord, localRecord));
        conflictRecords.Add(remoteRecord);
    }
}

I also only see this issue on iOS (not on Android), which seems odd since I don't immediately notice any platform specific code related to this.

As an aside, it would be nice if this condition was not handled as a conflict at all, since the local device is trying to achieve what is already true on the remote store, albeit at a different sync count. I have considered trying to patch this by removing the modified flag, updating the sync count, and leaving the value alone.

@sebas86
Copy link
Author

sebas86 commented Jun 23, 2016

After latest tests we also gets this same error on Android devices. Problem is almost 100% to reproduce just by setting some new values in Dataset during synchronization – next synchronization should just fail with 409 conflict and block all future synchronizations.

@pfleeter
Copy link

I have been reading through the SDK's code some more, and my initial comment now seems off base. It looks like in the case where an identical value is reported from the server with a different sync count than a locally modified record, it should just stomp the local record (resetting the is modified flag and upping the sync count).

It is interesting that you have traced a problem to updating the local record during the sync. There is code in the sync operation specifically to handle that case. In our application, I implemented my own locking code so that we do not change values during synchronizations (I didn't know the safeties were already attempted in the SDK itself). Provided that my locking code is working the way I think it is, we would not have a problem of updating a local record during a sync. But, we do see the "Current sync count is x not y" error.

I'm confused about "record sync counts" vs "dataset sync counts" and have asked a question on the AWS forums about that. My suspicion is that somehow the dataset sync count tracking can become wrong, so that updates are not seen by a client. Suppose a record is updated, but when a client asks for updates it is told there are none. It could then have modifications to an out of date record (out of date by sync count) because the "stomp" mentioned above would not happen. If it tries to push that change, it will be told that it is modifying the wrong sync count.

@sebas86
Copy link
Author

sebas86 commented Jun 27, 2016

I implemented my own locking code so that we do not change values during synchronizations

Good joke… I replace my custom DB with Cognito Sync and I can't just disable data save to Dataset as long as is not synchronized… This case is so obvious it must be handled by SDK. I also get suggest from AWS support that I can't call any next synchronization request as long as previous called isn't done with success or failure response – obviously I handle in my code in that way and this doesn't help at all, also documentation doesn't prohibit this use case.

@pfleeter
Copy link

I'm still trying to figure out why 1) I see this sync conflict so much more (maybe only) on IOS than on Android, and 2) how it can happen at all.

It seems like the only way it could happen (and not fix itself, which I know is not happening because my SyncFailure handler is being called back) is with a partial write of the records that are pushed.

For instance, if two records are pushed, but locally only one of the records' sync counts is updated (supposed to be done by Local.ConditionallyPutRecords) AND the dataset sync count is updated, then on a subsequent synchronize, the record that did not update will be considered out of date. (The ListUpdates call will return nothing because the Dataset sync count is current, but it can't push a still-marked-as-modified record because it has a record sync count that is not in agreement with the service.) The attempted fix logic in the DataConflictException handler will not run if there is any current record.

But, the dataset sync count update occurs after the record sync counts should be updated. I'm now suspicious that the reason this problem is showing on iOS and not Android, is that they have different versions of sqlite. The Android one is bundled as a .so file in the Assets/Plugins/Android folder. The iOS one (I believe) is no newer than 3.8.10.2 according to this wiki and is part of iOS. I do not know what version of sqlite is used on Android. Maybe there is something asynchronous about the db writes on iOS? Maybe this opens a window of possibility that some records and the dataset sync count are updated, while others are not?

@pfleeter
Copy link

I found a potential hole in my own sync locking scheme. It may be that my users are also seeing a race condition where (on iOS mostly, for some reason) the dataset is being put to during the synchronization, particularly during the call to Local.ConditionallyPutRecords.

There is a problem with the DataConflictException handling when attempting to put multiple records to the service. In my case, I have two records that always update together. In the case where one of them happens to be updated before the call to Local.ConditionallyPutRecords and one updates after (the sync and the local modifications occur in two different threads), it will cause one of the records to be updated with the remote metadata for sync count, and the other to not be.

On the next synchronization, both will be modified again. But, the maxPatchSyncCount will be equal to the record that Local.ConditionallyPutRecords was able to update. The other record will be "behind", and sync retry in the DataConflictException will fail repeatedly until it is out of retry attempts.

I believe the fix for this is to set the sync count to minPatchSyncCount before attempting a retry. This will probably cause a conflict, and means that the conflict handler should be prepared to resolve things with something more complicated than "take remote". (Basically, there isn't really a conflict, it just looks like it since the device has created a conflict with itself).

@klaytaybai
Copy link
Contributor

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants