Skip to content

Commit

Permalink
[SPARK-33669] Wrong error message from YARN application state monitor…
Browse files Browse the repository at this point in the history
… when sc.stop in yarn client mode

### What changes were proposed in this pull request?
This change make InterruptedIOException to be treated as InterruptedException when closing YarnClientSchedulerBackend, which doesn't log error with "YARN application has exited unexpectedly xxx"

### Why are the changes needed?
For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries to interrupt Yarn application monitor thread. In MonitorThread.run() it catches InterruptedException to gracefully response to stopping request.

But client.monitorApplication method also throws InterruptedIOException when the hadoop rpc call is calling. In this case, MonitorThread will not know it is interrupted, a Yarn App failed is returned with "Failed to contact YARN for application xxxxx;  YARN application has exited unexpectedly with state xxxxx" is logged with error level. which confuse user a lot.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
very simple patch, seems no need?

Closes #30617 from sqlwindspeaker/yarn-client-interrupt-monitor.

Authored-by: suqilong <suqilong@qiyi.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
  • Loading branch information
suqilong authored and Mridul Muralidharan committed Dec 9, 2020
1 parent c88edda commit 48f93af
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1069,7 +1069,7 @@ private[spark] class Client(
logError(s"Application $appId not found.")
cleanupStagingDir()
return YarnAppReport(YarnApplicationState.KILLED, FinalApplicationStatus.KILLED, None)
case NonFatal(e) =>
case NonFatal(e) if !e.isInstanceOf[InterruptedIOException] =>
val msg = s"Failed to contact YARN for application $appId."
logError(msg, e)
// Don't necessarily clean up staging dir because status is unknown
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@

package org.apache.spark.scheduler.cluster

import java.io.InterruptedIOException

import scala.collection.mutable.ArrayBuffer

import org.apache.hadoop.yarn.api.records.YarnApplicationState
Expand Down Expand Up @@ -121,7 +123,8 @@ private[spark] class YarnClientSchedulerBackend(
allowInterrupt = false
sc.stop()
} catch {
case e: InterruptedException => logInfo("Interrupting monitor thread")
case _: InterruptedException | _: InterruptedIOException =>
logInfo("Interrupting monitor thread")
}
}

Expand Down

0 comments on commit 48f93af

Please sign in to comment.