Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ScoreUploader to process replays concurrently #235

Merged
merged 3 commits into from
May 22, 2024

Conversation

bdach
Copy link
Collaborator

@bdach bdach commented May 21, 2024

As things stand, ScoreUploader is inherently single threaded. Today, @peppy noticed that replay uploads have started piling up:

1716312252

and @ThePooN pointed out that this wasn't exactly new, and started happening a few days before already as bursts of pending uploads that then go back down eventually, which also has the effect of increasing the memory footprint of the spectator server instance:

1716312268
1716312279

With no apparent direct cause (as in, no apparent deploy or change that broke it), the working theory is that something may be throttling the upload process (apparently it was doing ~1 score per second which seems way too slow) and maybe throwing more threads at it may help. So this diff attempts to do that, using channels (think BlockingCollection but async).

I admittedly don't have much experience working with Channel<T> so I'm not super sure how well this is gonna behave. One notable thing to read in the docs of that class is this:

Whenever a Channel<TWrite,TRead>.Writer produces faster than a Channel<TWrite,TRead>.Reader can consume, the channel's writer experiences back pressure.

I'm not sure I understand the full implications of whatever that means; I know what backpressure means, but depending on how it's done, it might be decent, or it might be not. This probably needs more testing in general but I'm not super sure how to test this sort of thing because this is the sort of thing you only can observe working properly under load, but I wrote it and it might be useful so I'm PRing it to see if there's at least interest to pursue this further and maybe do some stress testing on.

The degree of concurrency is controlled via an envvar (REPLAY_UPLOAD_THREADS). Yes, this means that changing the degree means restarting the instance. That was considered acceptable in discussion.

(As an aside I considered merging REPLAY_UPLOAD_THREADS with SAVE_REPLAYS to have REPLAY_UPLOAD_THREADS=0 mean "replays off" but it's annoying because tests depend on SAVE_REPLAYS being toggleable in runtime, which just works if it's done as it was in Enqueue() but is quite annoying to do if you need to kill/spin up threads on demand on the reader end just for tests, and I wasn't super sure that was a reasonable thing to do to begin with, so I didn't in the end. Can reconsider if that would be considered nicer.)

@peppy peppy self-requested a review May 22, 2024 04:18
Copy link
Member

@peppy peppy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid enough. Using channels has been on my bucket list and it's good to see they are pretty low overhead.

@peppy peppy merged commit 79810ab into ppy:master May 22, 2024
2 checks passed
@bdach bdach deleted the upload-scaling branch May 23, 2024 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants