Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix volatile result sent before it is fully committed (24-1) #2624

Conversation

snaury
Copy link
Member

@snaury snaury commented Mar 11, 2024

Changelog entry

Fix volatile result sent before it is fully committed.

Changelog category

  • Bugfix

Additional information

A rare failure was detected with Jepsen when volatile transactions feature is enabled and used. Investigation showed that YDB would sometimes reply SUCCESS to a commit that has actually failed. The underlying issue is that since volatile transactions are prepared in-memory, a PREPARED reply is sent very early in the pipeline, before the (readonly) propose transaction is committed (which is expected, volatile transactions don't have storage in their critical path until they are executed). When storage is a bit slow the volatile transaction may be fast enough to plan and execute while a previous localdb transaction is still committing. When propose localdb transaction finally commits it would erroneously observe the new operation result (which optimistically has SUCCESS while it is waiting for localdb commit and other participants), and would send it thinking this is a propose phase result. The localdb commit with execution side effects may fail however, but the SUCCESS result has already been sent by that time. This is fixed by marking operations that send propose result early, and ignoring operation result on propose completion in such operations.

Fixes KIKIMR-21156. Merges #2505. Merges #2598.

@snaury snaury self-assigned this Mar 11, 2024
@snaury snaury marked this pull request as ready for review March 11, 2024 15:58
@snaury snaury requested a review from a team as a code owner March 11, 2024 15:58
Copy link

github-actions bot commented Mar 11, 2024

2024-03-11 16:11:36 UTC Pre-commit check for b979ea5 has started.
2024-03-11 16:11:38 UTC Build linux-x86_64-relwithdebinfo is running...
🟢 2024-03-11 16:17:12 UTC Build successful.
2024-03-11 16:17:25 UTC Tests are running...
🔴 2024-03-11 16:43:43 UTC Test run completed, no test results found for commit 8192d38. Please check build logs.

Copy link

github-actions bot commented Mar 11, 2024

2024-03-11 16:20:57 UTC Pre-commit check for b979ea5 has started.
2024-03-11 16:20:59 UTC Build linux-x86_64-release-asan is running...
🟢 2024-03-11 16:26:37 UTC Build successful.
2024-03-11 16:26:50 UTC Tests are running...
🔴 2024-03-11 17:58:17 UTC Some tests failed, follow the links below.

Test history

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
16050 15949 0 11 51 39

@snaury snaury merged commit ed139af into ydb-platform:stable-24-1 Mar 12, 2024
2 of 4 checks passed
@snaury snaury deleted the KIKIMR-21156-fix-early-volatile-reply-24-1 branch March 12, 2024 07:53
@mregrock mregrock mentioned this pull request May 15, 2024
This was referenced Jun 7, 2024
@CyberROFL CyberROFL mentioned this pull request Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants