Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add diagnostics for stuck FTE scheduler #19879

Merged
merged 7 commits into from
Nov 28, 2023

Conversation

losipiuk
Copy link
Member

@losipiuk losipiuk commented Nov 23, 2023

Add code which will dump log debug information in case FTE scheduler is not getting any events for 10 minutes. This is to track rare bug where we observe queries running with retry_policy set to FALSE stuck sometimes.

TODO:

  • Test outputSelector tostring

@losipiuk losipiuk added the no-release-notes This pull request does not require release notes entry label Nov 23, 2023
@cla-bot cla-bot bot added the cla-signed label Nov 23, 2023
@losipiuk losipiuk force-pushed the lo/fte-diagnostics branch 4 times, most recently from 67fb915 to c4ab707 Compare November 26, 2023 13:02
@losipiuk
Copy link
Member Author

Updated @findepi , @wweiss-starburst PTAL

@losipiuk losipiuk requested a review from findepi November 26, 2023 13:03
@@ -675,7 +676,7 @@ private Optional<Throwable> closeAndAddSuppressed(Optional<Throwable> existingFa
private boolean processEvents()
{
try {
Event event = eventQueue.poll(1, MINUTES);
Event event = eventQueue.poll(EVENT_PROCESSING_ENFORCED_FREQUENCY.toMillis(), MILLISECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or call it EVENT_PROCESSING_ENFORCED_FREQUENCY_MILLIS

@losipiuk losipiuk force-pushed the lo/fte-diagnostics branch 2 times, most recently from 9100c98 to 903b8ea Compare November 27, 2023 22:40
@losipiuk losipiuk requested a review from findepi November 27, 2023 22:59
@losipiuk
Copy link
Member Author

Some updates - unfortunatelly we cannot just filter out empty SplitAssignmentEvents.

Comment on lines 1417 to 1673
// we need to process event empty events here so stageExecution.taskDescriptorLoadingComplete()
// is called in event handler. Otherwise IdempotentSplitSource may be not called again
// if there is no other SplitAssignmentEvent for this stage in queue.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @findepi

Also add default implementations for EventListener methods which
delegate to appropriate intermediate method according to class hierachy.
@losipiuk losipiuk force-pushed the lo/fte-diagnostics branch 2 times, most recently from 04b1d30 to c6eb8c6 Compare November 28, 2023 11:19
Add code which will dump log debug information in case
FTE scheduler is not getting any events for 10 minutes.
This is to track rare bug where we observe queries running with
retry_policy set to FALSE stuck sometimes.
@losipiuk losipiuk merged commit 01595bd into trinodb:master Nov 28, 2023
92 of 95 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

2 participants