Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] kyuubi-1.7 can't use mutil metastores #5181

Open
3 of 4 tasks
tomfans opened this issue Aug 20, 2023 · 3 comments
Open
3 of 4 tasks

[Bug] kyuubi-1.7 can't use mutil metastores #5181

tomfans opened this issue Aug 20, 2023 · 3 comments
Labels
kind:bug This is a clearly a bug priority:major

Comments

@tomfans
Copy link

tomfans commented Aug 20, 2023

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

the case is after configuring iceberg metastore which is different from hive metastore(means i have two different hive metastores), when connecting iceberg catalog. metastore can't be connected cause delegation token expire errors.

Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or does not exist: HIVE_DELEGATION_TOKEN owner=hive, renewer=hive, realUser=hive/hostxxxxxxxxxxxx, issueDate=1692543549564, maxDate=1693148349564, sequenceNumber=67, masterKeyId=1
        at org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:114)
        at org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:56)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.getPassword(HadoopThriftAuthBridge.java:565)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.handle(HadoopThriftAuthBridge.java:596)
        at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)

why i said this is bug, since when i use original spark-sql command, it works fine.

here examples:

spark-sql (default)> 
                   > use hive_prod;
spark-sql (default)> 
                   > 
                   > show databases;
default
Time taken: 0.689 seconds, Fetched 1 row(s)
spark-sql (default)> 

but kyuubi-1.7 failed.

Affects Version(s)

1.7

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.
@tomfans tomfans added kind:bug This is a clearly a bug priority:major labels Aug 20, 2023
@github-actions
Copy link

Hello @tomfans,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.

@pan3793
Copy link
Member

pan3793 commented Aug 21, 2023

The differences may come from Spark client/cluster mode.

spark-sql and spark-shell only support client mode, while Kyuubi supports both client and cluster mode, there are some differences in the kerberized cluster.

By default, Kyuubi uses --proxy-user instead of --principal and --keytab, so on

  • client mode, Spark driver could use ticket cache to access kerberized components
  • cluster mode, Spark driver only has pre-requested(by HadoopDelegationTokenProvider) and distributed delegation tokens during spark-submit phase.

@pan3793
Copy link
Member

pan3793 commented Aug 21, 2023

Something extended to this issue, Kyuubi implemented DSv2 based Hive connector(a.k.a. KSHC).

And in #4560

... make Kyuubi Spark Hive Connector(KSHC) support kerberized-HMS in cluster mode w/o keytab(which is the typical use case in Kyuubi) by implementing a HadoopDelegationTokenProvider.

There are some notable tricks

  1. spark-sql has some inconsistent behaviors on HiveClient initialization, which makes inconsistent behavior when you using spark-sql for testing. Jar-based Spark application, spark-shell and beeline + Kyuubi work well.
  2. we must set different hive.metastore.token.signature for different HMS to distinguish the delegation tokens, otherwise the latter will overwrite the former. In [KSHC] Support Kerberized HMS in cluster mode w/o keytab #4560, we use the metastore uri as the signature for KSHC catalog if hive.metastore.token.signature is not set explicitly. So technically, to allow Iceberg to use different kerberized-HMS, you can register an additional KSHC catalog, and make sure they use the same metastore uri and signature, thus they can share the delegation token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:major
Projects
None yet
Development

No branches or pull requests

2 participants