Do not retry clearing with same height forever #2348

ancazamfir · 2022-06-28T09:18:35Z

Closes: #2155

Description

spawn_packet_cmd_worker runs handle_packet_cmd passing a "command" that can be NewBlock.
The first thing it does is to try clearing packets. If this fails with ignorable error, the old code will retry immediately with same command and height. If the error persists (as in #2155) then the worker enters an infinite loop and new IBC events (coming via IbcEvent commands) are never processed.

I believe the infinite loop started with #2238. Before that the worker would abort (due to TaskError::Fatal(RunError::retry(e)))

The propose change is to move to the next command even if previous has failed. The reasons are:

it avoids the infinite loop
the failed events will still be processed at next clear interval with a fresh height
at different levels down in handle_packet_cmd there are retries mechanisms for MAX_RETRIES (current value hardcoded at 5).

PR author checklist:

Added changelog entry, using unclog.
Added tests: integration (for Hermes) or unit/mock tests (for modules).
Linked to GitHub issue.
Updated code comments and documentation (e.g., docs/).

Reviewer checklist:

Reviewed Files changed in the GitHub PR explorer.
Manually tested (in case integration/unit/mock tests are absent).

.changelog/unreleased/improvements/ibc-relayer/2155-remove-infinite-loop.md

romac

Works just as expected, the relayer properly recovers and does not get stuck in a loop and will then attempt to clear the packets (which will also fail in the test scenario with pruning=everything but should otherwise work).

Thanks @ancazamfir!

## Description `spawn_packet_cmd_worker` runs `handle_packet_cmd` passing a "command" that can be `NewBlock`. The first thing it does is to try clearing packets. If this fails with ignorable error, the old code will retry immediately with same command and height. If the error persists (as in informalsystems#2155) then the worker enters an infinite loop and new IBC events (coming via `IbcEvent` commands) are never processed. I believe the infinite loop started with informalsystems#2238. Before that the worker would abort (due to `TaskError::Fatal(RunError::retry(e))`) The propose change is to move to the next command even if previous has failed. The reasons are: - it avoids the infinite loop - the failed events will still be processed at next clear interval with a fresh height - at different levels down in `handle_packet_cmd` there are retries mechanisms for MAX_RETRIES (current value hardcoded at 5). ## Commits * Do not retry clearing with same height forever * Undo one-chain script changes * Add changelog * Reword changelog entry * Remove dbg! statement * Formatting Co-authored-by: Romain Ruetschi <romain@informal.systems>

ancazamfir added 3 commits June 28, 2022 11:15

Do not retry clearing with same height forever

507ade1

Undo one-chain script changes

1fb3d16

Add changelog

4ac802c

romac reviewed Jun 28, 2022

View reviewed changes

.changelog/unreleased/improvements/ibc-relayer/2155-remove-infinite-loop.md Outdated Show resolved Hide resolved

Reword changelog entry

2d587aa

ancazamfir marked this pull request as ready for review June 29, 2022 07:15

ancazamfir requested review from adizere and seanchen1991 as code owners June 29, 2022 07:15

romac self-assigned this Jun 29, 2022

romac added 2 commits June 30, 2022 14:35

Merge branch 'master' into anca/2155_infinite_clear

eba1bd8

Remove dbg! statement

d6c19ff

romac approved these changes Jun 30, 2022

View reviewed changes

Formatting

4cec200

romac merged commit 7371aef into master Jun 30, 2022

romac deleted the anca/2155_infinite_clear branch June 30, 2022 14:57

ancazamfir mentioned this pull request Jul 15, 2022

Hermes selective retry upon account sequence mismatch #2411

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not retry clearing with same height forever #2348

Do not retry clearing with same height forever #2348

ancazamfir commented Jun 28, 2022 •

edited

Loading

romac left a comment

Do not retry clearing with same height forever #2348

Do not retry clearing with same height forever #2348

Conversation

ancazamfir commented Jun 28, 2022 • edited Loading

Description

PR author checklist:

Reviewer checklist:

romac left a comment

Choose a reason for hiding this comment

ancazamfir commented Jun 28, 2022 •

edited

Loading