Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(@aws-amplify/datastore): mutation hub event drops during reconnect #11132

Merged
merged 2 commits into from
Mar 27, 2023

Conversation

svidgen
Copy link
Member

@svidgen svidgen commented Mar 24, 2023

Description of changes

outboxMutationProcessed Hub events were being dropped for mutations that were being raced against reconnect activities. A fringe case, but during reconnect, this could lead to customer code waiting for outboxMutationProcessed events that would never arrive.

This PR fixes the issue by properly removing the observer from the mutation processor during an unsubscribe or stop(). An undefined observer signals to the mutation processor not to immediately resume on mutation, but instead wait for an observer (the sync engine) to register itself and re-start() the mutation processor.

This PR also incidentally tamps down on console.warn noise in the scope of connectivityHandler.test.ts.

Issue #, if available

Description of how you validated changes

Un-skipped the test that originally caught the error. Made it pass.

Checklist

  • PR description included
  • yarn test passes
  • Tests are changed or added
  • Relevant documentation is changed or added (and PR referenced)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

Codecov Report

Merging #11132 (8dc25a9) into main (4e3b8fc) will increase coverage by 0.00%.
The diff coverage is n/a.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@           Coverage Diff           @@
##             main   #11132   +/-   ##
=======================================
  Coverage   82.36%   82.36%           
=======================================
  Files         204      204           
  Lines       19505    19507    +2     
  Branches     4213     4213           
=======================================
+ Hits        16065    16067    +2     
  Misses       3154     3154           
  Partials      286      286           

see 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@david-mcafee david-mcafee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few questions

@@ -256,7 +267,7 @@ class MutationProcessor {
hasMore = (await this.outbox.peek(storage)) !== undefined;
});

this.observer.next!({
this.observer?.next?.({
Copy link
Contributor

@david-mcafee david-mcafee Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's entirely possible I'm being dense, but according to the PR description: "An undefined observer signals to the mutation processor not to immediately resume on mutation, but instead wait for an observer (the sync engine) to register itself and re-start() the mutation processor." Wouldn't this mean that resume doesn't happen if the observer is undefined? And if so, why is it possible that the observer is possibly undefined, here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrestled with this little part of the diff for longer than I care to admit, actually. TypeScript isn't convinced that this.observer always exists here, or that .next exists on the observer. So, with that said, I'm actually not 100% sure PR eliminates all cases where a mutation event goes missing. It does provide an incremental improvement to one edge I identified while trying to add tests.

Maybe it makes more sense to revert to bangs to at least get errors logged if someone hits that edge. I think we can shake out more edges after #11131, which will give us a little more control to stress the protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, though we check that the observer isn't undefined at the beginning of resume (i.e. the status of isReady), it's possible that between there and here, the observer can now become undefined. I think I'm just most concerned about silent failures, and think that error logs would be better. Thoughts?

Copy link
Member Author

@svidgen svidgen Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there are 100% still some potential edges here. This PR only addresses a semi-related edge wherein reconnect fundamentally pits mutation processing and sync engine restart against each other. I want to be cautious about changing the failure mode at this line. Today, it will be silent if the observer has unsubscribed, AFAIK. This line will happily call .next() on a closed subscriber, if I'm not mistaken and silently deliver that message into the ether.

I could be wrong. Happy to be shown I'm wrong. I just want to avoid entangling the purpose of this PR with the lesser-known edges here. This is especially true if we don't have any repros or reports of those edges, and when the right solution there is really unclear to me.

I dunno. Happy to chat on Monday about this if needed!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line will happily call .next() on a closed subscriber, if I'm not mistaken and silently deliver that message into the ether.

Just following up on this, I've tested my recollection on this and it looks correct. Completed and/or unsubscribed observers simply cause next() to be a no-op. No error message will be logged. It will just a silent failure. I'll create a backlog item for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think TypeScript is just confused that this.isReady actually asserts that this.observer is defined. Even if you place this.isReady directly before this.observer.next you will still get the same error.

if (this.isReady()) {
  // still thinks that this.observer could be undefined
  this.observer.next ...

You could use assertion predicates to give TypeScript enough information to know it would be defined in this code path. But it looks like it isn't well supported on private class properties and would require some hacks.

https://www.typescriptlang.org/docs/handbook/release-notes/typescript-3-7.html#assertion-functions

https://stackoverflow.com/questions/74264474/in-typescript-how-do-i-assert-the-value-of-a-class-variable-using-a-method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's still a potential issue that could occur if an unsubscribe occurs while a request is in flight. But, it's an existing and less likely race, outside the scope of the more predictable and easy to repro race this PR addresses.

In any case, there's a story in the backlog and on the grooming list to revisit this at some point.

* receive all of the expected Hub events.
*/
test.skip('survives online -> offline -> save/online race', async () => {
test('survives online -> offline -> save/online race', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on also adding tests for update and delete immediately before reconnect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I have a compelling reason not to! I'll add them in.

manueliglesias
manueliglesias previously approved these changes Mar 24, 2023
Copy link
Contributor

@manueliglesias manueliglesias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I only have one question (non-blocking)

Copy link
Contributor

@manueliglesias manueliglesias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@svidgen svidgen merged commit 69a9cf0 into aws-amplify:main Mar 27, 2023
cshfang added a commit to cshfang/amplify-js that referenced this pull request Mar 28, 2023
* feat(cognito): remove required domain param when create CookieStorage

* test: enable integ test

* test(cognito): update unit test

* Revert "test: enable integ test"

This reverts commit e3db965.

* feat(cognito): make cookiestorage constructor parameter optional

* chore: Docs updates (aws-amplify#11082)

* chore(release): Publish [ci skip]

 - @aws-amplify/analytics@6.0.19
 - @aws-amplify/api-graphql@3.1.7
 - @aws-amplify/api-rest@3.0.19
 - @aws-amplify/api@5.0.19
 - @aws-amplify/auth@5.1.13
 - aws-amplify@5.0.19
 - @aws-amplify/cache@5.0.19
 - @aws-amplify/core@5.1.2
 - @aws-amplify/datastore-storage-adapter@2.0.19
 - @aws-amplify/datastore@4.1.1
 - @aws-amplify/geo@2.0.19
 - @aws-amplify/interactions@5.0.19
 - @aws-amplify/notifications@1.0.19
 - @aws-amplify/predictions@5.0.19
 - @aws-amplify/pubsub@5.1.2
 - @aws-amplify/pushnotification@5.0.19
 - @aws-amplify/storage@5.1.9
 - @aws-amplify/xr@4.0.19

* chore(release): update version.ts [ci skip]

* refactor(datastore): storage adapters (aws-amplify#11073)

* chore: Docs updates (aws-amplify#11091)

* chore: Tweaked `main` pre-id. (aws-amplify#11093)

* chore(deps): bump activesupport from 6.1.7.2 to 7.0.4.3 in /docs

Bumps [activesupport](https://github.com/rails/rails) from 6.1.7.2 to 7.0.4.3.
- [Release notes](https://github.com/rails/rails/releases)
- [Changelog](https://github.com/rails/rails/blob/v7.0.4.3/activesupport/CHANGELOG.md)
- [Commits](rails/rails@v6.1.7.2...v7.0.4.3)

---
updated-dependencies:
- dependency-name: activesupport
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(release): Publish [ci skip]

 - @aws-amplify/analytics@6.0.20
 - @aws-amplify/api-graphql@3.1.8
 - @aws-amplify/api-rest@3.0.20
 - @aws-amplify/api@5.0.20
 - @aws-amplify/auth@5.1.14
 - aws-amplify@5.0.20
 - @aws-amplify/cache@5.0.20
 - @aws-amplify/core@5.1.3
 - @aws-amplify/datastore-storage-adapter@2.0.20
 - @aws-amplify/datastore@4.1.2
 - @aws-amplify/geo@2.0.20
 - @aws-amplify/interactions@5.0.20
 - @aws-amplify/notifications@1.0.20
 - @aws-amplify/predictions@5.0.20
 - @aws-amplify/pubsub@5.1.3
 - @aws-amplify/pushnotification@5.0.20
 - @aws-amplify/storage@5.1.10
 - @aws-amplify/xr@4.0.20

* chore(release): update version.ts [ci skip]

* chore(@aws-amplify/datastore): initial decomposition of test helpers (aws-amplify#11096)

* moved test helpers to folder

* first pass, rough decomposition of test helpers file

* added readme for helpers

* test helpers readme typo

* Update packages/datastore/__tests__/helpers/README.md

Co-authored-by: Ivan Artemiev <29709626+iartemiev@users.noreply.github.com>

---------

Co-authored-by: Ivan Artemiev <29709626+iartemiev@users.noreply.github.com>

* fix: add warning message when trying to use SQLite with CPK enabled (aws-amplify#11027)

* fix(data): update SQLite utils to use 'instr' instead of 'like' when constructing WHERE conditions from predicates (aws-amplify#11108)

* chore: Docs updates (aws-amplify#11111)

* docs(cognito): excluding subdomains when domain cookie attr is omitted

* chore(release): Publish [ci skip]

 - @aws-amplify/analytics@6.0.21
 - @aws-amplify/api-graphql@3.1.9
 - @aws-amplify/api-rest@3.0.21
 - @aws-amplify/api@5.0.21
 - @aws-amplify/auth@5.1.15
 - aws-amplify@5.0.21
 - @aws-amplify/cache@5.0.21
 - @aws-amplify/core@5.1.4
 - @aws-amplify/datastore-storage-adapter@2.0.21
 - @aws-amplify/datastore@4.1.3
 - @aws-amplify/geo@2.0.21
 - @aws-amplify/interactions@5.0.21
 - @aws-amplify/notifications@1.0.21
 - @aws-amplify/predictions@5.0.21
 - @aws-amplify/pubsub@5.1.4
 - @aws-amplify/pushnotification@5.0.21
 - @aws-amplify/storage@5.1.11
 - @aws-amplify/xr@4.0.21

* chore(release): update version.ts [ci skip]

* test: increase coverage for delete connection by model field (aws-amplify#11099)

* Update config.yml

Removes `Usage Question` link that sends users to an old 404 page for discussions

* fix(datastore): stale observeQuery snapshot with sort param (aws-amplify#11119)

* chore(notifications): Integrate refactored Android common utils

* chore: Docs updates

* Update config.yml

Switched usage question to point at discord rather than deleting it

* chore(release): Publish [ci skip]

 - amazon-cognito-identity-js@6.2.0
 - @aws-amplify/analytics@6.0.22
 - @aws-amplify/api-graphql@3.1.10
 - @aws-amplify/api-rest@3.0.22
 - @aws-amplify/api@5.0.22
 - @aws-amplify/auth@5.2.0
 - aws-amplify@5.0.22
 - @aws-amplify/cache@5.0.22
 - @aws-amplify/core@5.1.5
 - @aws-amplify/datastore-storage-adapter@2.0.22
 - @aws-amplify/datastore@4.1.4
 - @aws-amplify/geo@2.0.22
 - @aws-amplify/interactions@5.0.22
 - @aws-amplify/notifications@1.0.22
 - @aws-amplify/predictions@5.0.22
 - @aws-amplify/pubsub@5.1.5
 - @aws-amplify/pushnotification@5.0.22
 - @aws-amplify/storage@5.1.12
 - @aws-amplify/xr@4.0.22

* chore(release): update version.ts [ci skip]

* fix(@aws-amplify/datastore): mutation hub event drops during reconnect (aws-amplify#11132)

* fix(@aws-amplify/datastore): fixes mutation hub event dropped during races with reconnect

* add explicit observer completion, expanded test coverage

* fix(@aws-amplify/datastore): adds serialization for empty predicates (aws-amplify#11133)

adds serialization for empty predicates

* test(notifications): Add native Android unit tests (aws-amplify#11139)

* test(notifications): Add native Android unit tests

* Disable android tests from being automatically run in CI

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Allan Zheng <zheallan@amazon.com>
Co-authored-by: aws-amplify-bot <aws@amazon.com>
Co-authored-by: Ivan Artemiev <29709626+iartemiev@users.noreply.github.com>
Co-authored-by: Jim Blanchard <jim.l.blanchard@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jon Wire <iambipedal@gmail.com>
Co-authored-by: Dane Pilcher <dppilche@amazon.com>
Co-authored-by: David McAfee <mcafd@amazon.com>
Co-authored-by: Olya Balashova <42189299+helgabalashova@users.noreply.github.com>
Co-authored-by: Bannon Tanner <bannonta@amazon.com>
Co-authored-by: helgabalashova <helga.stolyarova@gmail.com>
Co-authored-by: Aaron S <94858815+stocaaro@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants