Relayer slows down exponentially in some circumstances #2008

romac · 2022-03-24T14:28:44Z

Summary of Bug

The relayer sometimes slows down exponentially.

The cause of slowness is that the height of the events we are pulling from the subscriptions drift exponentially from the real latest height, because we are not pulling them from the event monitor stream fast enough.

We are trying to get an event from the stream of events (with try_recv_multiple) every 500ms.

So with two chains we have:

call try_recv_multiple and get a NewBlock from chain A
wait 500ms
call try_recv_multiple and get a NewBlock from chain B
wait 500ms
call try_recv_multiple and get a NewBlock from chain A
etc.

So we get an event per chain roughly once every second.

With three chains:

call try_recv_multiple and get a NewBlock from chain A
wait 500ms
call try_recv_multiple and get a NewBlock from chain B
wait 500ms
call try_recv_multiple and get a NewBlock from chain C
wait 500ms
call try_recv_multiple and get a NewBlock from chain A
etc.

So we were getting a NewBlock event per chain every 1.5s.

But since the block time for testing is 1s, we end up drifting behind more and more.
I guess that's why we only see this in testing and not in prod, because in prod we query often enough that we are always up to date.

The problem gets worse the lower the block time and the higher the number of chains the relayer is connected to.

To fix this, we should use a blocking recv_multiple on the subscriptions stream so that we get the events as fast as they are emitted, which solves the drift.

Version

v0.13.0-rc0

Steps to Reproduce

Spawn 3 chains with a block time of 1s
Create a channel between 2 chains
Start Hermes
Wait a few minutes
Do a ft-transfer
See that the relayer only processes the transfer after a long time
Wait more
Do another ft-transfer
It takes even longer until the relayer processes the transfer

Acceptance Criteria

The relayer does not exhibit this issue anymore.

For Admin Use

Not duplicate issue
Appropriate labels applied
Appropriate milestone (priority) applied
Appropriate contributors tagged
Contributor assigned/self-assigned

The text was updated successfully, but these errors were encountered:

romac added A: bug Admin: something isn't working I: logic Internal: related to the relaying logic O: performance Objective: cause to improve performance P-critical labels Mar 24, 2022

romac added this to the v0.13.0 milestone Mar 24, 2022

romac self-assigned this Mar 24, 2022

romac mentioned this issue Mar 24, 2022

Fix slowness of relayer with 3+ chains and low average block time #2007

Merged

6 tasks

romac closed this as completed in #2007 Mar 25, 2022

romac mentioned this issue Mar 25, 2022

Channel establishment is slow #1714

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relayer slows down exponentially in some circumstances #2008

Relayer slows down exponentially in some circumstances #2008

romac commented Mar 24, 2022

Relayer slows down exponentially in some circumstances #2008

Relayer slows down exponentially in some circumstances #2008

Comments

romac commented Mar 24, 2022

Summary of Bug

Version

Steps to Reproduce

Acceptance Criteria

For Admin Use