-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add diagnostics for stuck FTE scheduler #19879
Conversation
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
67fb915
to
c4ab707
Compare
Updated @findepi , @wweiss-starburst PTAL |
c4ab707
to
885f11a
Compare
@@ -675,7 +676,7 @@ private Optional<Throwable> closeAndAddSuppressed(Optional<Throwable> existingFa | |||
private boolean processEvents() | |||
{ | |||
try { | |||
Event event = eventQueue.poll(1, MINUTES); | |||
Event event = eventQueue.poll(EVENT_PROCESSING_ENFORCED_FREQUENCY.toMillis(), MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or call it EVENT_PROCESSING_ENFORCED_FREQUENCY_MILLIS
9100c98
to
903b8ea
Compare
Some updates - unfortunatelly we cannot just filter out empty SplitAssignmentEvents. |
// we need to process event empty events here so stageExecution.taskDescriptorLoadingComplete() | ||
// is called in event handler. Otherwise IdempotentSplitSource may be not called again | ||
// if there is no other SplitAssignmentEvent for this stage in queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @findepi
903b8ea
to
94bbc08
Compare
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/faulttolerant/SplitAssigner.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/scheduler/faulttolerant/SplitAssigner.java
Outdated
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/exchange/ExchangeSourceOutputSelector.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/exchange/ExchangeSourceOutputSelector.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/exchange/ExchangeSourceOutputSelector.java
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
.../java/io/trino/execution/scheduler/faulttolerant/EventDrivenFaultTolerantQueryScheduler.java
Outdated
Show resolved
Hide resolved
Also add default implementations for EventListener methods which delegate to appropriate intermediate method according to class hierachy.
04b1d30
to
c6eb8c6
Compare
Add code which will dump log debug information in case FTE scheduler is not getting any events for 10 minutes. This is to track rare bug where we observe queries running with retry_policy set to FALSE stuck sometimes.
c6eb8c6
to
f4d34fc
Compare
Add code which will dump log debug information in case FTE scheduler is not getting any events for 10 minutes. This is to track rare bug where we observe queries running with retry_policy set to FALSE stuck sometimes.
TODO: