Handle interruption in RetryDriver and TaskRunner #18964

pettyjamesm · 2023-01-24T14:58:12Z

Cross port of trinodb/trino#15803 with one additional commit to fix handling of S3 client exceptions when interrupted. At a high level, this PR:

Adds AbortedException to the list of RetryDriver#stopOn exceptions for all S3 client operations in PrestoS3FileSystem and S3SelectLineRecordReader. When the S3 client receives an InterruptedException internally during an API call operation, it re-sets the current threads interrupted flag, but re-throws an AbortedException. When this occurs, the retry driver should stop attempting retries.
Adds logic to RetryDriver exception handling to stop retries when an InterruptedException is caught or Thread.currentThread().isInterrupted(). Failing to do so could result in drivers that were interrupted as part of task cancellation running significantly longer than necessary by proceeding to retry / backoff instead of exiting.
Changes TaskRunner to continue processing new splits instead of terminating when the current thread was interrupted as a result of task cancellation interrupting the current driver being processed without TaskExecutor having been shut down. Before this change, those interruptions would cause the TaskRunner to stop processing new splits and submit a new TaskRunner into the cached threadpool executor, which could needlessly create new worker threads.

== NO RELEASE NOTE ==

ajaygeorge

Some comments

presto-hive/src/main/java/com/facebook/presto/hive/s3/PrestoS3FileSystem.java

presto-hive-metastore/src/main/java/com/facebook/presto/hive/RetryDriver.java

presto-main/src/main/java/com/facebook/presto/execution/executor/TaskExecutor.java

ajaygeorge

LGTM

When a thread is interrupted during an S3 client operation the SDK client internally catches the InterruptedException, sets the thread back to interrupted, and throws an AbortedException instead of InterruptedException. When this occurs, the RetryDriver should stop attempting retries. This change adds AbortedException to the stopOn list for all retry drivers in PrestoS3FileSystem and S3SelectLineRecordReader

Stop attempting retries in Hive's RetryDriver if an InterruptedException is caught or when the current thread is interrupted and immediately throw the current exception. Otherwise, RetryDriver might interfere with drivers terminating in a timely manner after tasks are terminated via a cancel, failure, or abort.

Drivers may leave the TaskRunner thread's interrupt flag set during the course of processing, but doing so should not result in the TaskRunner terminating its own processing loop until the TaskExecutor is closed. Instead of allowing the interrupt to terminate the current task runner's loop and re-creating a new runner to replace the interrupted one, we can clear the interrupt flag after each iteration. Otherwise, TaskRunners that were interrupted would have to replace themselves and end up creating unnecessary threads in the cached threadpool executor in the process.

arunthirupathi · 2023-02-15T06:34:29Z

presto-hive-metastore/src/main/java/com/facebook/presto/hive/RetryDriver.java

@@ -139,6 +139,11 @@ public <V> V run(String callableName, Callable<V> callable)
                return callable.call();
            }
            catch (Exception e) {
+                // Immediately stop retry attempts once an interrupt has been received
+                if (e instanceof InterruptedException || Thread.currentThread().isInterrupted()) {


Sorry for the late comment, Checking the isInterrupted will clear the interrupt flag on the thread, is that expected ? I am not sure who is responsible for clearing the interrupt flag in the Presto codebase.

Sleep on line 163, tries to rightly set the interrupted flag.

My bad, isInterrupted does not clear the flag, so it is good. but on handling the interruptedException, there are still differences.

We’re rethrowing InterruptedException (or whatever other exception was thrown if the current thread is interrupted) but not “handing” it other than to ensure that we stop retrying and let the exception propagate.

Thanks this makes sense

pettyjamesm requested a review from a team as a code owner January 24, 2023 14:58

pettyjamesm requested review from presto-oss and rschlussel January 24, 2023 14:58

pettyjamesm force-pushed the improve-interrupted-handling branch from 8df783b to e05a499 Compare January 24, 2023 17:30

ajaygeorge reviewed Jan 31, 2023

View reviewed changes

pettyjamesm force-pushed the improve-interrupted-handling branch 2 times, most recently from 2285985 to 394f561 Compare January 31, 2023 23:47

ajaygeorge approved these changes Feb 1, 2023

View reviewed changes

rschlussel approved these changes Feb 2, 2023

View reviewed changes

pettyjamesm added 3 commits February 2, 2023 15:00

pettyjamesm force-pushed the improve-interrupted-handling branch from 394f561 to db99445 Compare February 2, 2023 20:38

pettyjamesm merged commit 6c4459c into prestodb:master Feb 3, 2023

pettyjamesm deleted the improve-interrupted-handling branch February 3, 2023 16:08

arunthirupathi reviewed Feb 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle interruption in RetryDriver and TaskRunner #18964

Handle interruption in RetryDriver and TaskRunner #18964

pettyjamesm commented Jan 24, 2023

ajaygeorge left a comment

ajaygeorge left a comment

arunthirupathi Feb 15, 2023

arunthirupathi Feb 15, 2023

pettyjamesm Feb 15, 2023

arunthirupathi Feb 15, 2023

Handle interruption in RetryDriver and TaskRunner #18964

Handle interruption in RetryDriver and TaskRunner #18964

Conversation

pettyjamesm commented Jan 24, 2023

ajaygeorge left a comment

Choose a reason for hiding this comment

ajaygeorge left a comment

Choose a reason for hiding this comment

arunthirupathi Feb 15, 2023

Choose a reason for hiding this comment

arunthirupathi Feb 15, 2023

Choose a reason for hiding this comment

pettyjamesm Feb 15, 2023

Choose a reason for hiding this comment

arunthirupathi Feb 15, 2023

Choose a reason for hiding this comment