Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor leader.c to fix stack growth in handle_exec_sql #697

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

cole-miller
Copy link
Contributor

@cole-miller cole-miller commented Sep 2, 2024

This PR is another attempt to fix the stack growth issue with handle_exec_sql (related to #679), this time in a more principled way that involves a significant refactor of leader.c. Details follow.

Recall that the issue arises with a call stack like this (callees at the top)

handle_exec_sql_next
handle_exec_sql_cb
leaderExecV2
leader__barrier
leader__exec
handle_exec_sql_next
...

that is, indirect recursion that can generate a number of stack frames proportional to the number of ;-separated statements in an EXEC_SQL request. This happens because leader__barrier and leader__exec can invoke their callbacks synchronously when suspending is not required (respectively, because the FSM is up to date with the raft log and and because sqlite3_step generated no changed pages).

This PR rewrites those two functions, establishing a new contract: the callback is only invoked if we yielded to the event loop at least once while processing the request; if the request was processed synchronously, a magic value LEADER_NOT_ASYNC is returned to indicate that the caller should invoke the callback. Existing callsites are updated to reflect this new contract.

With these new implementations available, we can fix the original problem by rewriting handle_exec_sql_next to iteratively process as many statements as possible until one of them suspends or returns an error. The PR includes a regression test to validate this fix.

The new implementations of the leader.c functions are intentionally different in style from the old ones. Effectively, each barrier request and exec request is a coroutine driven by a state machine. With this approach the LEADER_NOT_ASYNC feature falls out rather naturally, and we get the added benefits of more compact code and built-in observability.

Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
@cole-miller cole-miller changed the title Leader pseudo coroutines Refactor leader.c to fix stack growth in handle_exec_sql Sep 2, 2024
@cole-miller cole-miller marked this pull request as draft September 2, 2024 17:05
Copy link

codecov bot commented Sep 2, 2024

Codecov Report

Attention: Patch coverage is 82.57576% with 46 lines in your changes missing coverage. Please review.

Project coverage is 81.19%. Comparing base (b74145c) to head (5f139a2).
Report is 97 commits behind head on master.

Files with missing lines Patch % Lines
src/leader.c 82.00% 10 Missing and 17 partials ⚠️
src/gateway.c 78.16% 11 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #697      +/-   ##
==========================================
+ Coverage   77.90%   81.19%   +3.29%     
==========================================
  Files         197      197              
  Lines       28011    29148    +1137     
  Branches     5535     4068    -1467     
==========================================
+ Hits        21822    23667    +1845     
+ Misses       4331     3829     -502     
+ Partials     1858     1652     -206     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cole-miller
Copy link
Contributor Author

cole-miller commented Sep 2, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recursion in EXEC_SQL request handling
1 participant