Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to specify services #77

Closed
FrankD412 opened this issue Feb 25, 2018 · 6 comments
Closed

Add the ability to specify services #77

FrankD412 opened this issue Feb 25, 2018 · 6 comments

Comments

@FrankD412
Copy link
Member

We should support a way to launch services, where a service is a command that wants to be spun up before executing the steps in a study. These services can be spun down once the study is determined to have completed (either a failure or success condition).

@FrankD412
Copy link
Member Author

@jsemler has also requested this use case.

@FrankD412
Copy link
Member Author

To specify services, we have two options. We could specify a study step with a type key or introduce a new section specifically for services. Maybe part of the env section? @dinatale2 -- Thoughts?

There are three parts to this issue:

  1. The local adapter has to be modified to use a thread pool so that it isn't sequentially executing steps and waiting on their termination. This modification would make it so that a service can be executing and left in the thread pool.
  2. Add a way to add and launch services to the ExecutionGraph -- also need to decide if the graph will track those services also.
  3. Modify the YAMLSpecification to allow for services to be specified.

@FrankD412
Copy link
Member Author

After some discussion with @dinatale2, he and I think that the services should be placed into a new section within a study specification. That section would look as follows:

services:
    service1:
        cmd: <cmd>
        restart: <cmd>
        resources:
            nodes: 1
            procs: 1
            walltime: "00:00:00"

These would be spun up separately ahead of executing the study section with Maestro exiting if these services experience an abnormal start up.

It's also a good way to introduce a new resources notation I've been wanting to push forward to help clean up the specification. I want to apply the resources notation to the study steps as well. I do want to note that the new notation will break existing specifications, so once this issue's PR gets pushed existing specifications will need to be updated.

Another issue that @dinatale2 and I flagged with the implementation is the interaction with services failing mid-study (or reaching wall clock time, etc.). If a study is running and expects the service to be running, there is the corner case that it may not be because the conductor has not restarted the service yet (if a restart was specified). We came to a loose conclusion that this case is outside of Maestro's scope. Maestro is only responsible for launching the services and making sure that they are restarted according to the conductors refresh cycle. Codes relying on a service should handle the case when there is a lapse in service availability.

@jsemler -- Feel free to chime in if you have any thoughts on this issue.

@dinatale2
Copy link
Collaborator

@FrankD412 Another thought -- If a service fails and cannot be restarted for some reason, should Maestro cancel/fail the running study? That would make Maestro a "responsible user" of computing resources and free them up for another study.

@jsemler
Copy link
Collaborator

jsemler commented Mar 9, 2018

The service section seems reasonable to me. I think it would also be helpful to have a way to specify a post step to run before cancelling the study.

@FrankD412
Copy link
Member Author

This conversation has died down and I think generally the notion is that Maestro may not need to be responsible for services. A solid division of responsibility is the assumption that persistent services are already running. That means that an individual study remains simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants