Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replication: fix io-threads possible race by moving waitForClientIO #1422

Merged

Conversation

uriyage
Copy link
Contributor

@uriyage uriyage commented Dec 11, 2024

Fix race with pending writes in replica state transition

The Problem

In #60 (Dual channel replication) a new connWrite call was added before the waitForClientIO check. This created a race condition where the main thread may attempt to write to a client that could have pending writes in IO threads.

The Fix

Moved the waitForClientIO() call earlier in syncCommand, before any connWrite call. This ensures all pending IO operations are completed before attempting to write to the client.

…arlier

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
@uriyage uriyage force-pushed the fix_unprotected_write_to_replica_client branch from 930267a to bbc1829 Compare December 11, 2024 08:03
@ranshid
Copy link
Member

ranshid commented Dec 11, 2024

@uriyage I am fine with the fix but let's also introduce assertion or debug assertion in connWrite to catch these things.

@ranshid ranshid self-requested a review December 11, 2024 08:14
Copy link

codecov bot commented Dec 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.83%. Comparing base (789a73b) to head (756c3dc).
Report is 141 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1422      +/-   ##
============================================
+ Coverage     70.66%   70.83%   +0.16%     
============================================
  Files           114      119       +5     
  Lines         63150    64861    +1711     
============================================
+ Hits          44624    45943    +1319     
- Misses        18526    18918     +392     
Files with missing lines Coverage Δ
src/replication.c 87.57% <100.00%> (+0.31%) ⬆️
src/socket.c 92.65% <100.00%> (+1.53%) ⬆️

... and 70 files with indirect coverage changes

@uriyage
Copy link
Contributor Author

uriyage commented Dec 17, 2024

@uriyage I am fine with the fix but let's also introduce assertion or debug assertion in connWrite to catch these things.

We need a flag on the connection to indicate that it is a client struct, which will allow us to assert it in connWrite. We will add this flag in PR #1338, and I will add the assertion after that PR is merged.

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
@uriyage
Copy link
Contributor Author

uriyage commented Dec 24, 2024

@uriyage I am fine with the fix but let's also introduce assertion or debug assertion in connWrite to catch these things.

We need a flag on the connection to indicate that it is a client struct, which will allow us to assert it in connWrite. We will add this flag in PR #1338, and I will add the assertion after that PR is merged.

@ranshid done

@ranshid
Copy link
Member

ranshid commented Jan 1, 2025

@uriyage . Sorry - can you please rebase this so I can merge later?

@ranshid ranshid merged commit ae70c54 into valkey-io:unstable Jan 2, 2025
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants