Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an orchestrator #816

Closed
severo opened this issue Feb 14, 2023 · 14 comments
Closed

Create an orchestrator #816

severo opened this issue Feb 14, 2023 · 14 comments

Comments

@severo
Copy link
Collaborator

severo commented Feb 14, 2023

Proposal

We should add a new service: orchestrator.

On one side, it would receive events:

  • webhooks when a dataset has changed (added, updated, deleted, changed to gated, etc.)
  • manual trigger: refresh a dataset, update all the datasets for a specific step, etc.

On the other side, it would command the jobs:

  • create only the jobs that are needed
  • receive the heartbeat to ensure a job is still running
  • receive the result of a job and:
    • store it,
    • create the dependent jobs based on the result

The orchestrator would have access to the queue and to the cache.

This means that the job runners can be a lot dumber:

  • they don't need to check if they should skip the job or not: they just run what they are given
  • they don't need to create the dependent jobs or even know about the processing graph
  • they don't have access to the cache (see Refactor to have only one app accessing a mongodb database  #751) and to the queue (but the worker still has to access the queue; we could also make the orchestrator launch the jobs on demand instead of having workers that loop and look for jobs in the queue. It's an unrelated issue, though.)

See the related issues: #764, #736, #751, #741, #740

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@severo
Copy link
Collaborator Author

severo commented Mar 17, 2023

Let's keep it for the moment, but I'm unsure if we will migrate to this infrastructure one day.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@severo
Copy link
Collaborator Author

severo commented Apr 12, 2023

keep open

@severo
Copy link
Collaborator Author

severo commented Apr 19, 2023

I think that the orchestrator would live inside the admin service. Possibly we could move most of the logic there, in particular it should be the only app to access the queue and the cache databases at one point.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@severo
Copy link
Collaborator Author

severo commented May 15, 2023

keep open

@github-actions
Copy link

github-actions bot commented Jun 8, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@severo
Copy link
Collaborator Author

severo commented Jun 8, 2023

keep open. We're already half the way.

@github-actions
Copy link

github-actions bot commented Jul 3, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@severo
Copy link
Collaborator Author

severo commented Jul 3, 2023

keep open

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Aug 6, 2023
@severo
Copy link
Collaborator Author

severo commented Aug 7, 2023

I still think it would be good to have only one service accessing:

  • the jobs and locks
  • the cache
    and that would provide the workers with the data they need (the previous steps cache entries).

@severo severo reopened this Aug 7, 2023
@severo severo added the P2 Nice to have label Aug 7, 2023
@severo
Copy link
Collaborator Author

severo commented Feb 2, 2024

Partially done. The remaining ideas are not clear, let's close

@severo severo closed this as completed Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant