-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling #85
Comments
Right now I have a manually written autoscaler that watches CPU on the web container. For sidekiq I made an entry in values for each queue, so each queue gets its own ReplicaSet, with default pod count set in values. I then keep a loose eye out for traffic jams and manually scale. That mostly only matters if I catch a burst of dead jobs and retry them all. I am interested in learning whether kubernetes supports custom metrics for autoscaling and if so how I could publish some sidekiq metrics to use there. |
Same, but watching CPU on sidekiq instead since I'm the only user.
Kubernetes supports custom metrics, but you might want to have a look at keda.sh as it takes that to a whole new level. |
I can share a bit about what we're doing in production for mastodon.social and mastodon.online. SidekiqIn our clusters, we have deployed a prometheus exporter that exports sidekiq queue statistics, and then we ingest that into datadog. We have a datadog operator installed on the cluster, which means we can set a I haven't done this myself, but I know there's a prometheus operator available for kubernetes. You should be able to set up a custom WebWe've actually been trying to figure out a good way to scale web pods ourselves for a while. At present we don't actually have an autoscaler set up in production. However, there is an upcoming feature in Mastodon that allows for exporting prometheus metrics for both the web and sidekiq pods' ruby processes. This will be officially available in version v4.4, but the helm chart is already updated with the relevant configuration, and we are actively testing this in production with nighly builds to see if we can use these to autoscale. Of particular note is the I'll leave this issue open as we experiment, and will update in the future when we have more info~ |
How is everyone scaling this chart? Looking for ways to scale when the webserver gets busier, but also when there is a ton of items in the queue and my instance would need more workers to process the data in the queue.
The text was updated successfully, but these errors were encountered: