Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Runloop might be suspended when assigned with many partitions #986

Merged
merged 1 commit into from
Jul 19, 2023

Conversation

yaarix
Copy link
Contributor

@yaarix yaarix commented Jul 19, 2023

  • The Runloop command queue is a bounded queue, with a fixed size of 1024
  • The Runloop stream takes commands from the queue, process them and offer a new Poll command into the stream (when applicable).
  • The .offer can suspend the fiber when the command queue is already full. this, in turn, makes the entire stream freeze. the underlying kafka consumer will eventually (after max.poll.interval.ms) be dropped from the group without any notification to the zstream.
  • to fix this, we can change the command queue from bounded to unbounded which will not suspend the fiber that adds the Poll command

 - The Runloop command queue is a bounded queue, with a fixed size of 1024
 - The Runloop stream takes commands from the queue, process them and offer a new `Poll` command into the stream (when applicable).
 - The `.offer` can suspend the fiber when the command queue is already full. this, in turn, makes the entire stream freeze. the underlying kafka consumer will eventually (after `max.poll.interval.ms`) be dropped from the group without any notification to the zstream.
 - to fix this, we can change the command queue from bounded to unbounded which will not suspend the fiber that adds the `Poll` command
@CLAassistant
Copy link

CLAassistant commented Jul 19, 2023

CLA assistant check
All committers have signed the CLA.

@svroonland
Copy link
Collaborator

👍

As expressed on Discord, unbounded memory usage is something to be avoided in general, but in this case it's preferrable over freezing. Should this lead to trouble, it's at least easily diagnosed as OOM errors.

@yaarix yaarix marked this pull request as ready for review July 19, 2023 08:07
@yaarix yaarix requested a review from erikvanoosten as a code owner July 19, 2023 08:07
@yaarix
Copy link
Contributor Author

yaarix commented Jul 19, 2023

@svroonland any idea why the benchmark fails? it fails on lack of permissions

@erikvanoosten erikvanoosten merged commit 4a89a7a into zio:master Jul 19, 2023
@guizmaii
Copy link
Member

@yaarix benchmarks don't work on forks

@svroonland
Copy link
Collaborator

@yaarix It's been a long time since this issue, but have you actually experienced this issue or was it just a theoretical one?

@yaarix
Copy link
Contributor Author

yaarix commented Nov 16, 2024

I have experienced it and merged a fix a long time ago

@svroonland
Copy link
Collaborator

Thanks for the quick response. Alright, then we'll cancel plans for #1386 (completely forgot about this issue here) unless we have a good idea what exactly caused it and how we can mitigate that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants