Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 🎸 replace Queue.add_job with Queue.upsert_job #694

Merged
merged 3 commits into from
Jan 23, 2023

Conversation

severo
Copy link
Collaborator

@severo severo commented Jan 23, 2023

upsert_job ensures only one waiting job for the same set of parameters. On every call to upsert_job, all the previous waiting jobs for the same set of parameters are canceled, and a new one is created with a new "created_at" date, which means it will be put at the end of the queue. It will help to fight against datasets that are updated very often (eg, every minute): they will be treated only when there are available workers.

It's a quick PR to fix the issue that the queues are increasing faster than the availability of the workers and that most of these jobs will be later skipped. Better to reduce the number of results in the queries (by reducing the number of waiting jobs).

upsert_job ensures there is only one waiting job for the same set of
parameters. On every call to upsert_job, all the previous waiting jobs
for the same set of parameters are cancelled, and a new one is created,
with a new "created_at" date, which means it will be put at the end of
the queue. It will help to fight against datasets that are updated very
often (eg every minute): they will be treated only when there is
available workers.
@severo severo merged commit 984f0b5 into main Jan 23, 2023
@severo severo deleted the reduce-size-of-queue branch January 23, 2023 15:17
@severo
Copy link
Collaborator Author

severo commented Jan 23, 2023

See for example: https://huggingface.co/datasets/atokforps/latent_v1_fullrun_alpha3_13/commits/main. Two commits are pushed every minute. The queues contain thousands of waiting jobs.

Capture d’écran 2023-01-23 à 16 18 11

@severo
Copy link
Collaborator Author

severo commented Jan 23, 2023

The issue is effectively fixed.

Capture d’écran 2023-01-23 à 16 46 04

For the "/first-rows" queue, we will have first to wait for the /splits to be run again on them

@severo
Copy link
Collaborator Author

severo commented Jan 23, 2023

OK, #695 helped reduce the number of waiting jobs for /first-rows. There is no way to go further, since there are no more duplicates (for example, allenai/nllb has 2.656 splits, thus up to 2.656 jobs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant