-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataStore would not attempt to re-establish subscriptions when subscription timeout or handshake error occured #6488
Comments
Hi @nubpro - Just discussed with the team. First, can you to validate that this is a still an issue with our recent changes? Second, we need to check internally on reducing or customizing this timeout window. Also, regarding your suggestion, after the reconnection happens on the client a sync process starts, so this should ensure that no data is lost. Can you confirm that this is the case? |
Yes, I believe this issue is still present on the latest Amplify. Currently DataStore would only restart the sync process if:
DataStore doesn't restart for the following case:
We should all understand that connectivity drops are common occurrences and currently, DataStore is simply too fragile to recover from these scenarios. I would also want to highlight that by constantly restarting the subscriptions, we are hitting I hope this answered some of your question. To get a respond after 17 days is a bit disappointing honestly. |
Are you still experiencing this issue, I did the same thing you mentioned (iPad > Developer setting > Network Link Conditioner > Enable Bad network.) and everything works as expected. Have you tried the latest release? |
This is fairly difficult to reproduce but it does happen. I don't have any more details to share as we departing from using the websockets. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because of inactivity. Please open a new issue if are still encountering problems. |
I am still experiencing this due to my wifi connection intermittently dropping and reconnecting.
|
It's confusing because one of the main reasons for datastore is online/offline syncing but then as soon as I am offline I get this error. |
@laclance Are you able to consistently get this behavior to happen, and if so, would you be able to share some steps? We are still having issues trying to reproduce the error. Also, if you could share any logs ( |
This is relatively difficult to reproduce as it regards to network stability. Would it be plausible to force the problem by commenting these chunk of code? |
@nubpro Good thought. Testing locally, that does seem to throw this error consistently, but it doesn't necessarily give me much confidence with how an the library would actually handle this scenario. With that said, enabling logging does show that PubSub is attempting to reconnect multiple times on connection timeout since I'd be interested to see what the logs look like without commenting out any code. If you could try to catch the error when attempting to reproduce, that'd be great! Logs
|
yes I just have to switch off my internet to get the error.
|
@laclance Could you post for of the logging output? I'm mainly interested in what happens leading up to when the error occurs. And be sure to remove any sensitive data that might be contained in the logs. |
Logs
|
maybe has something to do with cognito? |
I basically spent my entire day trying to prove there's indeed a problem(or a gap) with the underlying mechanism. What I did was, changing AWSAppSyncRealTimeProvider.ts#L634-L653 to force it to throw a reject error (simulating I encountered 2 tiny bugs after I did that, hence I've submitted a PR for it #7225. Here's a video that shows DataStore does not recover from it: My observation: How does this translate to real world scenario? You might wonder why I really hope my information would be make useful to expedite the fix. |
From the way I am understanding this, DataStore is still making multiple attempts to reconnect whenever it encounters |
It is actually PubSub doing the handshake jittered retry here. I actually think increasing the Would there be any implication if DataStore were to re-attempt the connection indefinitely? On top of that, I think exposing an API that allows the developer to restart the connection manually isn't too bad of an idea. |
Calling |
Yes that is correct, it was only recently added with the introduction of selective sync. |
It seems that when DataStore does not re-attempt the subscriptions when any of the subscription has timeout (again due to network drop). amplify-js/packages/pubsub/src/Providers/AWSAppSyncRealTimeProvider.ts Lines 541 to 558 in b7eef9d
|
@amhinson Don't mean to be rude, but is there any updates on this? |
We are still thinking through this one, as it also relates to #7036. To summarize, it appears that there are actually 2 separate (but related) issues within this. One being the subscription timeout regarding the default timeout of 5 minutes, and the other being how the handshake error is handled. For the subscription timeout, this could potentially be some configuration added in the future to override the default value of 5 minutes:
There are certain intricacies as to why this was used, but we will treat this as a feature request for now and make sure it is accounted for with any future updates around this behavior since this is the same across all other platforms and potentially involves AppSync changes as well. For the handshake error, this is also tough one to reason about because replicating the issue reliably has still proven to be difficult without changing the code as you mentioned. However, it seems like the main issue here is dealing with network connectivity issues when NetInfo is unable to detect and transient issues, and thus doesn't send an With that said, one potential solution to the handshake error would be to emit a Hub event so that the developer can listen and retry ( We are still thinking through these solutions as they related to other issues, so I will keep you posted as things progress. |
Just wanted to reach out and see if there is any update on this. Thanks! |
It's been almost 5 months, erm, what can we expect here? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
no bad bot |
up |
Hey! also interested in solution here. This is causing major blocker on our app rollout. Let us know. We will keep watching. |
Hey sorry everyone, I'm closing this issue as it has become cluttered and I could not provide clear reproduction steps despite listing the flaws it has in words. I have given up on DataStore and I wholeheartledly cannot vouch to use it on production due to the number of criticial bugs it still has and its architecture that I cant reason with. You should file a new issue if you are still facing the same problem I had in the past. |
So what are we going to do? I think it is clear that the community is experiencing this issue and AWS needs to fix. for us it is highly reproducible ~ after 10-15 minutes, going back and forth from background. its not like the phone just blinked out of existence? and the handle for the subscription is still active. we have existing open ticket with Amazon support to figure this out. https://us-east-1.console.aws.amazon.com/support/home?region=us-east-1#/case/?displayId=10220611901&language=en |
our code snippet looks like this and still fails
|
This issue has been automatically locked since there hasn't been any recent activity after it was closed. Please open a new issue for related bugs. Looking for a help forum? We recommend joining the Amplify Community Discord server |
Describe the bug
Subscriptions with PubSub in DataStore can be very fragile and would not re-attempt to re-estalish itself when an internet disconnection happen over a short period of time. Network drops are common and therefore DataStore's subscriptions needs the resilience to recover from such scenario.
After the merge of PR #6366, DataStore would only re-estalish the subscriptions in the case of
Connection closed
andTimeout disconnect
. However, if there is a failure when conducting a handshake or when the client didnt receivestart_ack
from server, DataStore would not do anything. Do that note that, on React Native, NetInfo is unable to detect a transient network drop has occured. Hence, it will tell DataStore that the client is online throughout despite a network drop.I have highlighted the failure point in red which DataStore subscriptions does not recover.
![image](https://user-images.githubusercontent.com/762914/89309159-a2f2f200-d6a5-11ea-8a49-07619ab5d50d.png)
In purple, currently the client will only know that the websocket has broken off after 5 minutes when it did not receive
KA
from the server. In my opinion, 5 minutes is too long for PubSub to detectTimeout disconnect
has happened. We would lose all the data that happened during this period, it simply doesn't cut for our project. Perhaps we can lower this down to a minute and half?To Reproduce
Steps to reproduce the behavior:
This can be tricky to reproduce therefore you need to have the patience.
Expected behavior
DataStore subscriptions should recover or at least re-estalish itself when handshake error or subscription timeout has occured.
**I've made some changes to force DataStore to re-establish the subscriptions when subscription timeout has occured. However, if we would constantly re-establish the subsciptions on the same websocket over and over within a short period, the client would easily hit
Max subscriptions
error. I don't know what else I can do apart from this.What is Configured?
Environment
Smartphone (please complete the following information):
The text was updated successfully, but these errors were encountered: