Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task channel aware rescheduling #3266

Merged
merged 4 commits into from
Aug 25, 2022

Conversation

yycptt
Copy link
Member

@yycptt yycptt commented Aug 24, 2022

What changed?

  • Make task rescheduler aware of task channel assignment and stop rescheduling if task channel is full.
  • Move task priority assigning into task executable and do it before adding to rescheduler since doing it upon rescheduling may change the task channel assignment.
  • Code refactoring
    • Remove error from the return value of history task Scheduler interface
    • Combine the impl of fifo scheduler and namespace priority scheduler
    • some misc. renaming

Why?

  • Existing impl of task rescheduler always pop all tasks that are due for rescheduling since all tasks are in one priority queue, and even if one task can't be submitted, we don't know about the next one. This will consume tons of cpu resource if most tasks in the rescheduler are from one namespace and the task channel for that namespace is always full.

How did you test it?

  • Tested on test cluster. Reduced cpu usage of rescheduler from ~80% to almost 0.

Potential risks

  • The number of priority queue may potential be large. But as long as namespace are not active on all shard at the same time (very unlikely) it should be ok.

Is hotfix candidate?

  • No.

@yycptt yycptt requested review from yux0 and yiminc August 24, 2022 23:27
@yycptt yycptt requested a review from a team as a code owner August 24, 2022 23:27
service/history/queues/executable.go Outdated Show resolved Hide resolved
service/history/queues/priority_assigner.go Show resolved Hide resolved
service/history/queues/rescheduler.go Outdated Show resolved Hide resolved
service/history/queues/rescheduler.go Outdated Show resolved Hide resolved
rescheduled.rescheduleTime.Add(rescheduleFailureBackoff)
failToSubmit = append(failToSubmit, rescheduled)
if pq.IsEmpty() {
delete(r.pqMap, key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth to consider object pooling if profiling shows frequent create/destroy pq become a bottleneck.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah make sense.

I am also thinking that maybe this check can be moved to the beginning of the loop, so we don't immediately clean up the pq when it's empty but wait for the next round to see if it's still empty. If there's no next round, storing some empty pq seems ok to me?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the logic to clean up empty PQ every 3 minutes to avoid keep creating & deleting pq, especially for scheduled queue.

I am going to land this PR. Let me know if you have any concern re. ^ change. If so, I am happy to change it in a later pr. Thanks.

@yycptt yycptt merged commit e084bab into temporalio:master Aug 25, 2022
@yycptt yycptt deleted the task-ch-aware-rescheduling branch August 25, 2022 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants