-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KSHC] Support Kerberized HMS in cluster mode w/o keytab #4560
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4560 +/- ##
============================================
- Coverage 53.34% 53.31% -0.03%
Complexity 13 13
============================================
Files 577 577
Lines 31546 31557 +11
Branches 4244 4244
============================================
- Hits 16827 16825 -2
- Misses 13135 13148 +13
Partials 1584 1584 see 11 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
||
private def hiveConf( | ||
sparkConf: SparkConf, | ||
hadoopConf: Configuration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Hadoop configuration here seems to be shared by all hive clients, which means you share the same security key to open all the doors of different HMS-s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The catalog hive2
could overwrite some dedicated Hadoop configurations by
spark.sql.catalog.hive2.<hadoop-conf-key>=<value>
What do you mean "security key"?
@yaooqinn main changes after your last review, please take a look again
|
Most user may not be familiar with |
@zhouyifan279 good suggestion, I updated code to make it fallback to The change passes local tests w/ |
...-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/HiveTableCatalog.scala
Show resolved
Hide resolved
Thanks all, merged to master |
…o keytab ### _Why are the changes needed?_ This PR aims to make Kyuubi Spark Hive Connector(KSHC) support kerberized HMS in cluster mode w/o keytab(which is the typical use case in Kyuubi) by implementing a `HadoopDelegationTokenProvider`. To enable access to an kerberized HMS using KSHC, the minimal configurations are ``` spark.sql.catalog.warm=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog spark.sql.catalog.warm.hive.metastore.uris=<thrift-uris> ``` then it's able to run federation query across metastores ``` SELECT * FROM spark_catalog.db1.tbl1 JOIN warm.db2.tbl2 ON ... ``` In addition, it allows disabling token renewal for each catalog explicitly ``` spark.sql.catalog.warm.delegation.token.renewal.enabled=false ``` The current implementation has some limitations: the catalog configuration must be present on the Spark application bootstrap, which means the catalog configurations should be set in `spark-defaults.conf` or append as `--conf` like: ``` spark-[sql|shell|submit] \ --conf spark.sql.catalog.xxx=org.apache.kyuubi.spark.connector.hive.HiveTableCatalog --conf spark.sql.catalog.xxx.hive.abc=xyz ``` but does not work for dynamic registering through SET statement, e.g. `SET spark.sql.catalog.xxx=` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate ``` > (select count(*) from hive_2.mammut.test_7) union ( select count(*) from spark_catalog.test.test01 limit 1); +-----------+ | count(1) | +-----------+ | 4 | | 1 | +-----------+ 2 rows selected (8.378 seconds) ``` - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request Closes apache#4560 from pan3793/shc-token. Closes apache#4560 fe8cd0c [Cheng Pan] Centralized metastore token signature fallback logic 8511595 [Cheng Pan] comments fc3b4d5 [Cheng Pan] hive.metastore.token.signature fallback to hive.metastore.uris fb7eb03 [Cheng Pan] unused import 858b390 [Cheng Pan] New catalog property delegation.token.renewal.enabled 28ec5a5 [Cheng Pan] disable hms client retry 52044d4 [Cheng Pan] update comments 33b2418 [Cheng Pan] [KSHC] Support Kerberos by implementing KyuubiHiveConnectorDelegationTokenProvider Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
Why are the changes needed?
This PR aims to make Kyuubi Spark Hive Connector(KSHC) support kerberized HMS in cluster mode w/o keytab(which is the typical use case in Kyuubi) by implementing a
HadoopDelegationTokenProvider
.To enable access to an kerberized HMS using KSHC, the minimal configurations are
then it's able to run federation query across metastores
In addition, it allows disabling token renewal for each catalog explicitly
The current implementation has some limitations:
the catalog configuration must be present on the Spark application bootstrap, which means the catalog configurations should be set in
spark-defaults.conf
or append as--conf
like:note:
spark-sql
may not work properly, because of some tricky code inside Spark ...but does not work for dynamic registering through SET statement, e.g.
SET spark.sql.catalog.xxx=
How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate