Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add stop/delete experiment endpoint #13

Merged
merged 2 commits into from
Feb 16, 2023
Merged

add stop/delete experiment endpoint #13

merged 2 commits into from
Feb 16, 2023

Conversation

nn-ch
Copy link
Contributor

@nn-ch nn-ch commented Feb 14, 2023

  • currently there are 3 celery tasks that run inference, training and metrics. these are unbounded tasks which means queue is blocked if we reach max number of workers (currently set at 6 in docker-compose), that's 2 experiments.
  • add a small functionality to cancel running tasks for given experiment to be able to launch new experiments
  • track list of celery task ids for each experiment

feature pr: add additional way of running training and inference jobs that are infinite in its nature

- track list of celery task ids and cancel running tasks to free up the space
@nn-ch nn-ch requested a review from MaxHalford February 14, 2023 23:24
Copy link
Member

@MaxHalford MaxHalford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just added a few remarks.

@@ -149,9 +150,11 @@ class Experiment(sqlm.SQLModel, table=True): # type: ignore[call-arg]
runner: runners.Runner = sqlm.Relationship(back_populates="experiments")
sink_id: int = sqlm.Field(foreign_key="sink.id")
sink: sinks.Sink = sqlm.Relationship(back_populates="experiments")
task_ids: str | None = sqlm.Field(default=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a cleaner way than using str is to store the IDs in a list; see here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i was looking for that.
I tried task_ids: List[str] and variations of that but was not working.

if not exp:
raise HTTPException(status_code=404, detail="Experiment not found")
for task_id in exp.task_ids.split(","):
RUNNER.control.revoke(task_id, terminate=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!



@router.delete("/{experiment_id}")
def delete_experiment(experiment_id: int):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tasks need to be stopped if the experiment is deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! good catch

when deleting experiment also stop the corresponding training and inference tasks from running
@nn-ch nn-ch merged commit ca6ef00 into main Feb 16, 2023
@nn-ch nn-ch deleted the add_stop_experiment branch February 16, 2023 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants