Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TokenNetwork events fetching #1822

Merged
merged 3 commits into from
Jun 29, 2020
Merged

Refactor TokenNetwork events fetching #1822

merged 3 commits into from
Jun 29, 2020

Conversation

andrevmatos
Copy link
Contributor

@andrevmatos andrevmatos commented Jun 27, 2020

Fix #1787
Fix #1780
Should fix #1792

Short description
Complete rewrite of TokenNetwork events monitoring logic.
This should not have drawbacks, and UX shouldn't be changed (besides things just working™)

Before:

  • tokenMonitoredEpic was responsible for monitoring ChannelOpened events, was stateful on which blocks it had already seen and couldn't recover it network errors made it lose events (issue 1780 & 1787)
    • it was also inneficient because although it did only 1 getLogs call, it was too wide (any ChannelOpened event with anyone) and filtered client-side, could be thousands of events on first launch e.g. on mainnet
  • channelMonitoredEpic then, for each channel, did a getLogs call on every startup (narrowed by channelId) and kept 1 filter

Now:

  • channelEventsEpic: a single epic per TokenNetwork
    • 2-3 getLogs on each startup, but which are ranged from latest seen event blockNumber, and are strictly limited to channels of interest (with us), so even on initial sync, won't need to fetch thousands of irrelevant channels as before
    • 1 single filter per TokenNetwork (for all channels), only on new blocks it's a catch-all filter (filtering client-side), which can't be too many events but are more performant on number of network requests required
    • If network issues which can't be detected makes it not get some logs, it'll scan again from the narrowest safe range possible (since latest event seen), and will pick up any missed event

Costs per TokenNetwork monitored (N is number of channels):
Before: (1 + N) * (getLogs, filters) (plus first TN getLogs could be very expensive)
Now: 2..3 * getLogs + 1 * filter (all getLogs are as specific as possible)

Definition of Done

  • Steps to manually test the change have been documented
  • Acceptance criteria are met
  • Change has been manually tested by the reviewer (dApp)

Steps to manually test the change (dApp)

  1. Not much to say, just test past and new events are detected

@andrevmatos andrevmatos added sdk 🖥 optimization ⚡ Optimizations for the implementation or protocol labels Jun 27, 2020
@andrevmatos andrevmatos self-assigned this Jun 27, 2020
@codecov
Copy link

codecov bot commented Jun 27, 2020

Codecov Report

Merging #1822 into master will decrease coverage by 0.13%.
The diff coverage is 98.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1822      +/-   ##
==========================================
- Coverage   95.41%   95.28%   -0.14%     
==========================================
  Files         153      153              
  Lines        5387     5426      +39     
  Branches     1039     1052      +13     
==========================================
+ Hits         5140     5170      +30     
- Misses        193      200       +7     
- Partials       54       56       +2     
Flag Coverage Δ
#dapp 91.52% <ø> (ø)
#sdk 96.84% <98.58%> (-0.21%) ⬇️
#sdk_integration ?
Impacted Files Coverage Δ
raiden-ts/src/services/epics.ts 99.23% <ø> (ø)
raiden-ts/src/channels/epics.ts 98.43% <98.27%> (-0.51%) ⬇️
raiden-ts/src/channels/actions.ts 100.00% <100.00%> (ø)
raiden-ts/src/raiden.ts 95.64% <100.00%> (-0.26%) ⬇️
raiden-ts/src/transport/epics/presence.ts 97.64% <100.00%> (ø)
raiden-ts/src/transport/epics/rooms.ts 96.66% <100.00%> (ø)
raiden-ts/src/utils/ethers.ts 98.41% <100.00%> (+0.05%) ⬆️
raiden-ts/src/utils/rx.ts 100.00% <100.00%> (ø)
raiden-ts/src/polyfills.ts 71.42% <0.00%> (-28.58%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 083a040...7fba563. Read the comment docs.

@andrevmatos
Copy link
Contributor Author

Ok, testing this PR by hand, it definitely improved the situation by ensuring we're able to recover from lost events upon restarts, which is nice to have/keep, but not essential if we get the events fetching working properly, plus the good performance improvements.

Unfortunately, while testing it, I've seen the event loss happening even on top of this PR. I've tested with a local ethereum node (geth.goerli.ethnodes...) and couldn't reproduce it there, only on Metamask + Infura.

My best bet is that this is caused by how ethers (v4 at least) does filters subscriptions: instead of web3js filters logic of registering remotely (which actually doesn't work very well, and I think not at all with Infura), it registers it locally, and upon every new block, it does a getLogs under the hood from previous block to current. This work well with a local node which is the only one consistently being used, but Infura AFAIK actually load-balance these requests between several nodes, and since it doesn't work with the concept of confirmation blocks, the different Infura nodes could see the mined transaction at different points in time, which then would give weird behaviors when relying on getLogs over only latest block.

Fortunately, if the above is the case, there's a nice way to fix it: implement my own algorithm on top of ether's provider to register filters locally and, on new blocks, perform getLogs on a larger range, e.g. [blockNumber - confirmationBlocks, blockNumber], with local/client-side event deduplication. Let's do it.

@kelsos
Copy link
Contributor

kelsos commented Jun 29, 2020

@andrevmatos How do we proceed with this PR, do we merge it and you do the new algorithm implemented in a new one?

It would make sense for that to go to a new PR since this is already big.

Copy link
Contributor

@kelsos kelsos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I commented I would suggest merging that and implementing the new algorithm in a new PR.

Also, a question I have is, do we know if ethers event filters have improved in ethers v5?

@andrevmatos
Copy link
Contributor Author

Ok, I agree, let's merge this and do a follow-up PR with the additional fixes, they're complementary anyway. Thanks

@andrevmatos andrevmatos marked this pull request as ready for review June 29, 2020 11:11
@andrevmatos andrevmatos merged commit 8df0d3c into master Jun 29, 2020
@andrevmatos andrevmatos deleted the fix/events branch June 29, 2020 11:11
Copy link
Contributor

@weilbith weilbith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good work! I think this has potential so solve many issues by their root cause. 👍

@weilbith
Copy link
Contributor

Oh my review took to long... 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization ⚡ Optimizations for the implementation or protocol sdk 🖥
Projects
None yet
3 participants