-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21428] Turn IsolatedClientLoader off while using builtin Hive jars for reusing CliSessionState #18648
Conversation
ping @gatorsmile could you help to review this? |
@@ -312,7 +323,7 @@ private[spark] object HiveUtils extends Logging { | |||
hadoopConf = hadoopConf, | |||
execJars = jars.toSeq, | |||
config = configurations, | |||
isolationOn = true, | |||
isolationOn = isCliSessionState(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain more about this? Why do we need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan According to HiveClientImpl.scala#L140, the cliSessionState shall be reused. But because of IsolateClientClassloader, originalState will be null
. Then it always goes to the else
branch to create and start a new session.SessionState
// properties and then loaded by SparkConf | ||
sysProps.put("spark.yarn.keytab", args.keytab) | ||
sysProps.put("spark.yarn.principal", args.principal) | ||
case Failure(exception) => throw exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can just write
SparkHadoopUtil.get.loginUserFromKeytab(args.principal, args.keytab)
// the comments ...
sysProps.put("spark.yarn.keytab", args.keytab)
sysProps.put("spark.yarn.principal", args.principal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
val sparkConf = new SparkConf() | ||
val hadoopConf = SparkHadoopUtil.get.newConfiguration(sparkConf) | ||
val hiveClient = HiveUtils.newClientForMetadata(sparkConf, hadoopConf) | ||
assert((hiveClient.toString == s1.toString) === expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it safe to just compare toString
result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
@@ -269,6 +232,8 @@ private[hive] class HiveClientImpl( | |||
} | |||
} | |||
|
|||
override def toString: String = state.toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a reasonable toString
implementation for HiveClientImpl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May i add def getState(): SessionState
to HiveClientImpl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
OK to test |
val s1 = SessionState.get | ||
val sparkConf = new SparkConf() | ||
val hadoopConf = SparkHadoopUtil.get.newConfiguration(sparkConf) | ||
val s2 = HiveUtils.newClientForMetadata(sparkConf, hadoopConf).getState |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about HiveUtils.newClientForMetadata(sparkConf, hadoopConf).asInstanceOf[HiveClientImpl].state
? then we don't need to add getState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with IsolateClientClassload, this seems to cause ClassCastException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird, HiveClientImpl
is the only implementation of the HiveClient
interface.
ok to test |
Test build #80442 has finished for PR 18648 at commit
|
retest this please |
def loginUserFromKeytab(principalName: String, keytabFilename: String): Unit = { | ||
if (!new File(keytabFilename).exists()) { | ||
throw new SparkException(s"Keytab file: ${keytabFilename}" + | ||
" specified in spark.yarn.keytab does not exist") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: To be general, let's not mention the config name spark.yarn.keytab
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,notice that
(hadoopConf.iterator().asScala.map(kv => kv.getKey -> kv.getValue) | ||
++ sparkConf.getAll.toMap ++ extraConfig).foreach { case (k, v) => | ||
if (k.toLowerCase(Locale.ROOT).contains("password")) { | ||
logDebug(s"Applying Spark config to Hive Conf: $k=xxx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may also be Hadoop/Hive or extra config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks.
@@ -229,6 +230,17 @@ private[spark] object HiveUtils extends Logging { | |||
}.toMap | |||
} | |||
|
|||
def isCliSessionState(): Boolean = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should add comment for this method.
temp = temp.getSuperclass | ||
if (clientLoader.isolationOn) { | ||
// Switch to the initClassLoader. | ||
Thread.currentThread().setContextClassLoader(initClassLoader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the behavior change safe here? Previously, we switch the context ClassLoader for both conditions, while in this PR we only do that if isolationOn
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when isolation Off, we just switch a classloader to itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If SessionState.get()
is None, we should still call newState()
and init from initClassLoader
, should we also switch in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If SessionState.get
be null, then the IsolateOn will be turned on always. Only if we call SessionState.detachSession
, will this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user app new an CliSessionState instance with built in hive jars to trigger isolate off, then it detach this state, and then new a hive client again, this time isolate off and SessionState.get()
will be None, newState()
will be called without changing the classloader, I think this is OK, because we never create a isolate class loader from beginning to end.
Test build #80448 has finished for PR 18648 at commit
|
} finally { | ||
Thread.currentThread().setContextClassLoader(original) | ||
} else { | ||
Option(SessionState.get()).getOrElse(newState()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since SessionState.get()
won't be None here, we can simplify the code to
SessionState.get()
, and add comment above to issue the reason of doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the condition I mentioned above, I think this should be kept
Test build #80458 has finished for PR 18648 at commit
|
Test build #80460 has finished for PR 18648 at commit
|
ping @jiangxb1987 @cloud-fan anymore suggestions? |
ping @jiangxb1987 @cloud-fan again |
LGTM |
ping @cloud-fan would you take another look? |
@jiangxb1987 Could this pr be merged? |
LGTM, merging to master! |
The description is not clear, at least I get understood after diving into the code changes. |
What changes were proposed in this pull request?
Set isolated to false while using builtin hive jars and
SessionState.get
returns aCliSessionState
instance.How was this patch tested?
1 Unit Tests
2 Manually verified:
hive.exec.strachdir
was only created once because of reusing cliSessionStatecc @cloud-fan @gatorsmile