BCFR-1087 Detect Finality Violation on Backfill #15825

dhaidashenko · 2025-01-02T18:23:41Z

Improves detection of finality violations during backfill operation. DD
Test namespace load-ccip-d6901

github-actions · 2025-01-02T18:25:02Z

I see you updated files related to core. Please run pnpm changeset in the root directory to add a changeset as well as in the text include at least one of the following tags:

#added For any new functionality added.
#breaking_change For any functionality that requires manual action for the node to boot.
#bugfix For bug fixes.
#changed For any change to the existing functionality.
#db_update For any feature that introduces updates to database schema.
#deprecation_notice For any upcoming deprecation functionality.
#internal For changesets that need to be excluded from the final changelog.
#nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
#removed For any functionality/config that is removed.
#updated For any functionality that is updated.
#wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

github-actions · 2025-01-02T18:37:01Z

AER Report: CI Core ran successfully ✅

aer_workflow , commit

AER Report: Operator UI CI

aer_workflow , commit , Breaking Changes GQL Check

1. Workflow triggered downstream job failed: breaking-changes-gql-check

Source of Error:

Run convictional/trigger-workflow-and-wait@f69fa9eedd3c62a599220f4d5745230e237904be	2025-01-17T12:54:21.6221837Z Checking conclusion [failure]
Run convictional/trigger-workflow-and-wait@f69fa9eedd3c62a599220f4d5745230e237904be	2025-01-17T12:54:21.6222348Z Checking status [completed]
Run convictional/trigger-workflow-and-wait@f69fa9eedd3c62a599220f4d5745230e237904be	2025-01-17T12:54:21.6222977Z Conclusion is not success, it's [failure].
Run convictional/trigger-workflow-and-wait@f69fa9eedd3c62a599220f4d5745230e237904be	2025-01-17T12:54:21.6223465Z Propagating failure to upstream job

Why: The error indicates that the downstream workflow triggered by the convictional/trigger-workflow-and-wait action did not complete successfully. The conclusion of the downstream workflow was failure, which caused the upstream job to fail as well.

Suggested fix: Investigate the logs of the downstream workflow at the provided URL to identify the specific cause of the failure. Address the issue in the downstream workflow to ensure it completes successfully.

…n-sparse-fetch

core/chains/evm/logpoller/log_poller.go

mateusz-sekara · 2025-01-15T08:39:48Z

core/chains/evm/logpoller/log_poller.go

+	}
+
+	if err != nil {
+		lp.lggr.Errorw("Failed to poll and save logs, retrying later", "err", err)


food for thought, should we track these errors using a more "observable" approach? For instance, by increasing a prometheus counter whenever pollAndSave or backup fails?

I'm not sure that it's really helpful to track this as a metric. Failures of pollAndSave and backup are normal (RPC failed for one of the requests), so we won't be using it on overview dashboards. We might want to introduce something of a higher level. Like the latest block successfully processed by LogPoller and compare it to one observed by HT or another component. High delta of these two values signals that PollAndSaved failed too often.
In any case it seems to be out of scope for this PR.

Yeah, I think the "failed to poll and save logs due to finality violation" one is worth tracking, but as I understand it we're already tracking that indirectly because Healthy() will start returning false as soon as it happens. The other ones are fairly normal, will happen due to any sort of temporary network instability which presumably is also tracked elsewhere

cl-sonarqube-production · 2025-01-17T13:04:55Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

detect finality violation on backfill

c593bf0

dhaidashenko added 3 commits January 3, 2025 13:56

fix linter issues

a1a07ff

Merge branch 'develop' into fix/BCFR-1087-finality-violation-detectio…

96d802b

…n-sparse-fetch

improve test coverage

42eaff6

dhaidashenko marked this pull request as ready for review January 3, 2025 15:14

dhaidashenko requested review from a team as code owners January 3, 2025 15:14

Merge branch 'develop' into fix/BCFR-1087-finality-violation-detectio…

1b4da79

…n-sparse-fetch

dhaidashenko requested review from reductionista and mateusz-sekara January 13, 2025 11:55

mateusz-sekara reviewed Jan 15, 2025

View reviewed changes

fix nit

8f4d628

dhaidashenko requested a review from mateusz-sekara January 17, 2025 13:09

reductionista approved these changes Jan 17, 2025

View reviewed changes

DylanTinianov approved these changes Jan 20, 2025

View reviewed changes

mateusz-sekara added this pull request to the merge queue Jan 20, 2025

Merged via the queue into develop with commit cb6565d Jan 20, 2025
167 of 169 checks passed

mateusz-sekara deleted the fix/BCFR-1087-finality-violation-detection-sparse-fetch branch January 20, 2025 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BCFR-1087 Detect Finality Violation on Backfill #15825

BCFR-1087 Detect Finality Violation on Backfill #15825

dhaidashenko commented Jan 2, 2025 •

edited

Loading

github-actions bot commented Jan 2, 2025

github-actions bot commented Jan 2, 2025 •

edited

Loading

mateusz-sekara Jan 15, 2025

dhaidashenko Jan 17, 2025

reductionista Jan 17, 2025

cl-sonarqube-production bot commented Jan 17, 2025

BCFR-1087 Detect Finality Violation on Backfill #15825

BCFR-1087 Detect Finality Violation on Backfill #15825

Conversation

dhaidashenko commented Jan 2, 2025 • edited Loading

github-actions bot commented Jan 2, 2025

github-actions bot commented Jan 2, 2025 • edited Loading

AER Report: CI Core ran successfully ✅

AER Report: Operator UI CI

1. Workflow triggered downstream job failed: breaking-changes-gql-check

mateusz-sekara Jan 15, 2025

Choose a reason for hiding this comment

dhaidashenko Jan 17, 2025

Choose a reason for hiding this comment

reductionista Jan 17, 2025

Choose a reason for hiding this comment

cl-sonarqube-production bot commented Jan 17, 2025

Quality Gate passed

dhaidashenko commented Jan 2, 2025 •

edited

Loading

github-actions bot commented Jan 2, 2025 •

edited

Loading