Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Job Scheduler plugin to core (modules) #147

Open
praveensameneni opened this issue Mar 17, 2022 · 14 comments
Open

Move Job Scheduler plugin to core (modules) #147

praveensameneni opened this issue Mar 17, 2022 · 14 comments
Assignees
Labels
enhancement New feature or request

Comments

@praveensameneni
Copy link
Member

praveensameneni commented Mar 17, 2022

Is your feature request related to a problem? Please describe.
OpenSearch JobScheduler plugin provides a framework for OpenSearch plugin developers to schedule periodical jobs running within OpenSearch nodes. You can schedule jobs by specifying an interval, or using Unix Cron expression to define more flexible schedule to execute your job.
Plugins which require this functionality take a dependency on job-scheduler.

Describe the solution you'd like
Propose moving the job-scheduler as an independent plugin to the core/modules which will make it part of core and other plugins can extend directly from the core without taking a dependency. Additionally, this will ensure some of the common use cases are available from core.

Part of the reasoning for moving job scheduler to core is that a large number of our plugins have taken a dependency on job scheduler which means during version upgrades there is an additional period of time where first core must be upgraded and then job scheduler and then plugins which is all done by different teams and adds to the overall time needed to get ready. By moving job scheduler to be a part of core, it means when that first step of core being upgraded is done, job scheduler is also automatically upgraded and plugins can immediately get started without relying on another plugin team.

Describe alternatives you've considered
One of the alternatives is to make job-scheduler located under plugins directory instead of modules. However, the plugins in plugins directory are optional and can be un-installed by users.

@praveensameneni praveensameneni added the enhancement New feature or request label Mar 17, 2022
@bbarani
Copy link
Member

bbarani commented Mar 17, 2022

@praveensameneni What do you mean by "moving it as independent plugin to the core"? Will it become a part of core plugin similar to native plugins (like repository-s3)?

@praveensameneni
Copy link
Member Author

What do you mean by "moving it as independent plugin to the core"? Will it become a part of core plugin similar to native plugins (like repository-s3)?

Job Scheduler is an optional plugin that can be installed by users from https://github.com/opensearch-project/job-scheduler

The proposal is to make it a module under modules directory where they are installed by default with out of box value, however plugins under plugins directory are not. (https://github.com/opensearch-project/OpenSearch/tree/main/modules)

The repository-s3 plugin is an optional plugin stored under plugins directory -
https://github.com/opensearch-project/OpenSearch/tree/main/plugins/repository-s3

@kartg
Copy link
Member

kartg commented Mar 23, 2022

I understand the pain point you're describing from an upgrade perspective, but centralizing ownership of plugins doesn't strike me as the right way to alleviate this. Other than the upgrade scenario, can you elaborate on why you believe Job Scheduler should a core functionality of OpenSearch?

cc @mch2 and @nknize

@praveensameneni
Copy link
Member Author

I understand the pain point you're describing from an upgrade perspective, but centralizing ownership of plugins doesn't strike me as the right way to alleviate this. Other than the upgrade scenario, can you elaborate on why you believe Job Scheduler should a core functionality of OpenSearch?

cc @mch2 and @nknize

One way to look at it is centralizing ownership, however another way to look at it is moving a core feature (scheduling like cron) that most plugins can leverage and do not have to depend on a separate plugin. Moving to modules provides an out of box value that other plugins can make use of. Job scheduler as a plugin by itself does not add any value (truly independent) if other plugins are not making use of - unlike the optional plugins like analyzer plugins stored under plugins directory.

Just to clarify - is your question around moving to modules vs plugins or moving to the core in general?

@anasalkouz
Copy link
Member

I don't see enough justification to make this plugin as required. Why to load the module for users who don't need it? Do you have breakdown of the usage? How many plugins currently using job-scheduler? use-case where those plugins needed?

@praveensameneni
Copy link
Member Author

I don't see enough justification to make this plugin as required. Why to load the module for users who don't need it? Do you have breakdown of the usage? How many plugins currently using job-scheduler? use-case where those plugins needed?

There are currently four plugins that use a scheduling mechanism (cron like functionality) - Index Management, Anomaly Detection, Reporting, Alerting (has a built in job scheduler, which will be migrated to a common job scheduler). I foresee observability and security analytics (OpenSearch 2.2) to use job scheduler.

The primary use cases are for scheduling any process that needs to be run in the background - search (async), analyzers (dynamic reload at a scheduled interval vs reloading at search time), managing indices, running monitors/detectors in alerting and anomaly detection, scheduling jobs to send reports, scheduling threat detection rules among others.

Are you suggesting to move to plugins directory instead and depending on usage, move to modules at a later point? What is the metric/criteria we use to determine if a plugin will be a module or optional plugin?

@dblock
Copy link
Member

dblock commented Mar 28, 2022

I cannot find many reasons not to move job-scheduler into -min, but I also cannot find any compelling reasons to do it. So my opinion is that we shouldn't do something just because we can.

The first reason cited above to move job-scheduler to OpenSearch is that plugins need to depend on job-scheduler, and thus have to wait extra time until a build of OpenSearch that includes job-scheduler is available. How much time? As of 1.3.0 this extra time consist of 1) incrementing the version on OpenSearch (opensearch-project/OpenSearch#2509), 2) adding job-scheduler to the next version's distribution manifest (opensearch-project/opensearch-build#1833), 3) incrementing the default version of OpenSearch in job-scheduler itself (#157). As part of opensearch-project/opensearch-build#1375. , 1) and 3) are already automated, and 2) can be automated. These activities take minutes. The entire process takes ~24 hours waiting on CI/CD. It would still take almost as much time with or without job-scheduler because the bulk of the work is in 1).

The second reason cited is moving a core feature (scheduling like cron) so that plugins can leverage it without depending on a separate plugin. I don't see any demand from plugins that ship separately and thus install on on top of opensearch-min for this. I cannot find any mention of anyone asking for it.

@dblock
Copy link
Member

dblock commented Apr 19, 2022

I've commented on the proposed PR, opensearch-project/OpenSearch#2608 (comment), please post your strong opinions there and let's first figure out whether we want to finish & merge or close unmerged that one.

@reta
Copy link
Contributor

reta commented Apr 19, 2022

From my perspective, I would not label the job-sheduler plugin as a core one:

  • by and large, it has nothing to do with bare search or indexing
  • in many deployments it may not be needed at all (so it is no a necessary one)

From the other side, it is quite useful plugin (as many others actually) and building block for others. I think we should not make the simpler release cycle to be a deciding factor for the plugin to become part of the core (since it applies to every single plugin out there).

@dbbaughe
Copy link
Contributor

dbbaughe commented Apr 19, 2022

From my perspective, I would not label the job-sheduler plugin as a core one:

  • by and large, it has nothing to do with bare search or indexing
  • in many deployments it may not be needed at all (so it is no a necessary one)

From the other side, it is quite useful plugin (as many others actually) and building block for others. I think we should not make the simpler release cycle to be a deciding factor for the plugin to become part of the core (since it applies to every single plugin out there).

True it doesn't really relate specifically to search/indexing. It's a building block for other enhancements to core as you said. The ideal path I see is for Job Scheduler to be refactored directly into the Plugin framework that exists in core such that a plugin can extend a new plugin interface that provides the functionality outlined in Job Scheduler. That would appear to be the best "place" for it. This is a step towards that direction - first moving it into core itself under modules (or plugins, doesn't really matter) and then eventually embedded directly as part of the plugin framework itself.

It's a generalized solution to a problem (distributed job scheduling) and would fit in with the plugin framework. It's not a solution to the end user of OpenSearch. It's a solution to a common problem plugin developers face who are extending OpenSearch with new features for the former end users.

@heemin32
Copy link
Contributor

GeoIP database auto update feature could be one use case of job scheduler. opensearch-project/OpenSearch#5856

As GeoIP processor is inside OpenSearch core, this feature will be implemented in OpenSearch core as well. This feature need to run scheduled job in background and can get benefitted by using job scheduler.

@peternied
Copy link
Member

I don't believe job scheduler should be moved into core. We already have this plugin and we've shipped many releases with it in its current state. Don't fix what isn't broken.

It's a generalized solution to a problem (distributed job scheduling) and would fit in with the plugin framework.

I agree with @dbbaughe

[Counter Proposal] - Build up features in the Task framework

Much of the functionality job scheduler offers is duplicated in core in the tasks system. I would rather see scenarios that job scheduler is used for be supported by core, such as scheduling task for future execution. Similarly I think the existing task ecosystem could be greatly improved by including additional scheduling information for operators.

Proposed additional task properties:

  • Owner: The identity of the caller that triggered the task. This would allow for root causing unexpected resource utilization from complex queries or long running operations.
  • Locked Resources: Some tasks should only be executed once in the cluster, adding this support for tasks, and providing a way to discover tasks that are pending because they are waiting on a resource would improve troubleshooting.

Existing _cat/tasks

action task_id parent_task_id type start_time timestamp running_time ip node
cluster:monitor/tasks/lists 1v...w:168062 - transport 16...71 04:56:49 489.5ms 172.18.0.4 odfe-node1

@bbarani
Copy link
Member

bbarani commented Feb 6, 2024

@peternied @prudhvigodithi @ryanbogan can you please confirm if this change can be included in 2.x without breaking existing API? Basically can this change be added in a backward compatible manner in 2.x line?

We are evaluating if this change requires 3.0 release or can be included in 2.x line so need your inputs.

@peternied
Copy link
Member

@bbarani I believe my proposal can be built in such a way that is additive and does not requiring a major version bump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 📦 Backlog
Development

No branches or pull requests

9 participants