Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Task manager] Prevents edge case where already running tasks are reschedule every polling interval #74606

Conversation

gmmorris
Copy link
Contributor

@gmmorris gmmorris commented Aug 6, 2020

Summary

The fix in #73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over.

This prevents that edge case which was effecting both TM in general and Alerting specifically.

Closes #71390
Closes #72803

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@gmmorris gmmorris requested a review from a team as a code owner August 6, 2020 22:18
@gmmorris gmmorris added Feature:Alerting release_note:fix Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.10.0 v7.9.0 v8.0.0 labels Aug 6, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

…-task

* master: (42 commits)
  Allow any hostname for chromium proxy bypass (elastic#74693)
  [ML] ML on Kibana Management: Add ability to pass a group ID filter to job management page (elastic#74533)
  [Metrics UI] Fix No Data preview pluralization (elastic#74399)
  [Bug][Security_Solution][Telemetry] Capitalize S in macOS (elastic#74688)
  Remove karma tests  from legacy maps (elastic#74668)
  [Ingest Manager] stop creating events-* index pattern and placeholder index (elastic#74683)
  [Enterprise Search] Update the browser/document title on plugin navigation (elastic#74392)
  [visualizations] Add i18n translation for 'No results found' (elastic#74619)
  [maps] convert vector style properties to TS (elastic#74553)
  bump geckodriver binary to 0.27 (elastic#74638)
  fix: update apm agents to catch abort requests (elastic#74658)
  [Security Solution] Resolver children pagination (elastic#74603)
  add memoryStatus to df analytics page and analytics table in management (elastic#74570)
  [Ingest Manager] Allow prerelease in package version (elastic#74452)
  [App Arch]: remove legacy karma tests (elastic#74599)
  [i18n] revert reverted changes (elastic#74633)
  [Lens] Clear out all attribute properties before updating (elastic#74483)
  [Uptime] Fix full reloads while navigating to alert/ml (elastic#73796)
  Index pattern field class refactor (elastic#73180)
  [ML] Functional tests - stabilize DFA job type check (elastic#74631)
  ...
claimedTasks: numberOfTasksClaimed,
docs,
claimedTasks: documentsClaimedById.length + documentsClaimedBySchedule.length,
docs: docs.filter((doc) => doc.status === TaskStatus.Claiming),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key change here, the other changes are meant to make the code a little more readable and easier to maintain.
Here we're filtering out the tasks that are already claimed and were returned by the pinned query.
Omitting this can cause the task to be rescheduled if it doesn't complete before TM calls markAsRunning on it.

I'd like to make a broader change here which would be to prevent markAsRunning from actually updating a task that isn't in claiming status, but that feels like it can be done in a follow up PR.

@gmmorris gmmorris added v7.9.1 and removed v7.9.0 labels Aug 11, 2020
Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gmmorris
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@gmmorris gmmorris merged commit eb03295 into elastic:master Aug 13, 2020
gmmorris added a commit to gmmorris/kibana that referenced this pull request Aug 13, 2020
…chedule every polling interval (elastic#74606)

Fixes flaky tests in Task Manager and Alerting.

The fix in elastic#73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over.

This prevents that edge case which was effecting both TM in general and Alerting specifically.
gmmorris added a commit to gmmorris/kibana that referenced this pull request Aug 13, 2020
…chedule every polling interval (elastic#74606)

Fixes flaky tests in Task Manager and Alerting.

The fix in elastic#73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over.

This prevents that edge case which was effecting both TM in general and Alerting specifically.
gmmorris added a commit to gmmorris/kibana that referenced this pull request Aug 13, 2020
* master: (28 commits)
  [Task manager] Prevents edge case where already running tasks are reschedule every polling interval (elastic#74606)
  [Security Solution] Fix the status of timelines' bulk actions (elastic#74560)
  Data plugin: Suggested enhance pattern (elastic#74505)
  Use jest.useFakeTimers instead of hard coded timeout for tooltip tests. (elastic#74642)
  [Security Solution][lists] Adds tests for exception lists and items part 2 (elastic#74815)
  [Security Solution][Resolver] fix presentation role on edgeline (elastic#74869)
  [Security Solution][Detections] Refactor ML calls for newest ML permissions (elastic#74582)
  [bin/kibana-plugin] support KP plugins instead (elastic#74604)
  Reduce number of indexed fields in index pattern saved object (elastic#74817)
  [reporting] Pass along generic parameters in high-order route handler (elastic#74892)
  Migrated last pieces of legacy fixture code (elastic#74470)
  Empty index patterns page re-design  (elastic#68819)
  [babel] coalese some versions to prevent breaking yarn install (elastic#74864)
  [Dashboard First] Decouple Attribute Service and By Value Embeddables (elastic#74302)
  Revert "[reporting] Pass along generic parameters in high-order route handler" (elastic#74891)
  [reporting] Pass along generic parameters in high-order route handler (elastic#74879)
  [src/dev/build] implement a getBuildNumber() mock (elastic#74881)
  [Enterprise Search] Add solution-level side navigation (elastic#74705)
  [DOCS] Canvas docs 7.9 refresh (elastic#74000)
  [Security Solution][Resolver]Enzyme test related events closing (elastic#74811)
  ...
gmmorris added a commit to gmmorris/kibana that referenced this pull request Aug 13, 2020
…le-buffer-with-update-of-same-id

* upstream/master: (37 commits)
  [Task manager] Prevents edge case where already running tasks are reschedule every polling interval (elastic#74606)
  [Security Solution] Fix the status of timelines' bulk actions (elastic#74560)
  Data plugin: Suggested enhance pattern (elastic#74505)
  Use jest.useFakeTimers instead of hard coded timeout for tooltip tests. (elastic#74642)
  [Security Solution][lists] Adds tests for exception lists and items part 2 (elastic#74815)
  [Security Solution][Resolver] fix presentation role on edgeline (elastic#74869)
  [Security Solution][Detections] Refactor ML calls for newest ML permissions (elastic#74582)
  [bin/kibana-plugin] support KP plugins instead (elastic#74604)
  Reduce number of indexed fields in index pattern saved object (elastic#74817)
  [reporting] Pass along generic parameters in high-order route handler (elastic#74892)
  Migrated last pieces of legacy fixture code (elastic#74470)
  Empty index patterns page re-design  (elastic#68819)
  [babel] coalese some versions to prevent breaking yarn install (elastic#74864)
  [Dashboard First] Decouple Attribute Service and By Value Embeddables (elastic#74302)
  Revert "[reporting] Pass along generic parameters in high-order route handler" (elastic#74891)
  [reporting] Pass along generic parameters in high-order route handler (elastic#74879)
  [src/dev/build] implement a getBuildNumber() mock (elastic#74881)
  [Enterprise Search] Add solution-level side navigation (elastic#74705)
  [DOCS] Canvas docs 7.9 refresh (elastic#74000)
  [Security Solution][Resolver]Enzyme test related events closing (elastic#74811)
  ...
gmmorris added a commit that referenced this pull request Aug 13, 2020
…chedule every polling interval (#74606) (#74940)

Fixes flaky tests in Task Manager and Alerting.

The fix in #73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over.

This prevents that edge case which was effecting both TM in general and Alerting specifically.
gmmorris added a commit that referenced this pull request Aug 13, 2020
…chedule every polling interval (#74606) (#74941)

Fixes flaky tests in Task Manager and Alerting.

The fix in #73244 was correct, but it missed an edge case which causes the already running task to be rescheduled over and over.

This prevents that edge case which was effecting both TM in general and Alerting specifically.
@gmmorris gmmorris added v7.9.0 and removed v7.9.1 labels Aug 13, 2020
gmmorris added a commit to gmmorris/kibana that referenced this pull request Aug 13, 2020
* upstream/master: (45 commits)
  [Metrics UI] Fix inventory footer misalignment (elastic#74707)
  Remove legacy optimizer (elastic#73154)
  Update design-specific GH code-owners (elastic#74877)
  skip test Reporting paginates content elastic#74922
  [Metrics UI] Add Jest tests for alert previews (elastic#74890)
  Fixed tooltip (elastic#74074)
  [Ingest Pipelines] Processor forms for processors A-D (elastic#72849)
  [Observability] change ingest manager link (elastic#74928)
  [Task manager] Prevents edge case where already running tasks are reschedule every polling interval (elastic#74606)
  [Security Solution] Fix the status of timelines' bulk actions (elastic#74560)
  Data plugin: Suggested enhance pattern (elastic#74505)
  Use jest.useFakeTimers instead of hard coded timeout for tooltip tests. (elastic#74642)
  [Security Solution][lists] Adds tests for exception lists and items part 2 (elastic#74815)
  [Security Solution][Resolver] fix presentation role on edgeline (elastic#74869)
  [Security Solution][Detections] Refactor ML calls for newest ML permissions (elastic#74582)
  [bin/kibana-plugin] support KP plugins instead (elastic#74604)
  Reduce number of indexed fields in index pattern saved object (elastic#74817)
  [reporting] Pass along generic parameters in high-order route handler (elastic#74892)
  Migrated last pieces of legacy fixture code (elastic#74470)
  Empty index patterns page re-design  (elastic#68819)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment