Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relayer slows down exponentially in some circumstances #2008

Closed
5 tasks
romac opened this issue Mar 24, 2022 · 0 comments · Fixed by #2007
Closed
5 tasks

Relayer slows down exponentially in some circumstances #2008

romac opened this issue Mar 24, 2022 · 0 comments · Fixed by #2007
Assignees
Labels
A: bug Admin: something isn't working I: logic Internal: related to the relaying logic O: performance Objective: cause to improve performance
Milestone

Comments

@romac
Copy link
Member

romac commented Mar 24, 2022

Summary of Bug

The relayer sometimes slows down exponentially.

The cause of slowness is that the height of the events we are pulling from the subscriptions drift exponentially from the real latest height, because we are not pulling them from the event monitor stream fast enough.

We are trying to get an event from the stream of events (with try_recv_multiple) every 500ms.

So with two chains we have:

  • call try_recv_multiple and get a NewBlock from chain A
  • wait 500ms
  • call try_recv_multiple and get a NewBlock from chain B
  • wait 500ms
  • call try_recv_multiple and get a NewBlock from chain A
    etc.

So we get an event per chain roughly once every second.

With three chains:

  • call try_recv_multiple and get a NewBlock from chain A
  • wait 500ms
  • call try_recv_multiple and get a NewBlock from chain B
  • wait 500ms
  • call try_recv_multiple and get a NewBlock from chain C
  • wait 500ms
  • call try_recv_multiple and get a NewBlock from chain A
  • etc.

So we were getting a NewBlock event per chain every 1.5s.

But since the block time for testing is 1s, we end up drifting behind more and more.
I guess that's why we only see this in testing and not in prod, because in prod we query often enough that we are always up to date.

The problem gets worse the lower the block time and the higher the number of chains the relayer is connected to.

To fix this, we should use a blocking recv_multiple on the subscriptions stream so that we get the events as fast as they are emitted, which solves the drift.

Version

v0.13.0-rc0

Steps to Reproduce

  1. Spawn 3 chains with a block time of 1s
  2. Create a channel between 2 chains
  3. Start Hermes
  4. Wait a few minutes
  5. Do a ft-transfer
  6. See that the relayer only processes the transfer after a long time
  7. Wait more
  8. Do another ft-transfer
  9. It takes even longer until the relayer processes the transfer

Acceptance Criteria

The relayer does not exhibit this issue anymore.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@romac romac added A: bug Admin: something isn't working I: logic Internal: related to the relaying logic O: performance Objective: cause to improve performance P-critical labels Mar 24, 2022
@romac romac added this to the v0.13.0 milestone Mar 24, 2022
@romac romac self-assigned this Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: bug Admin: something isn't working I: logic Internal: related to the relaying logic O: performance Objective: cause to improve performance
Projects
No open projects
Status: Closed
Development

Successfully merging a pull request may close this issue.

1 participant