-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-17912. ABFS: Support for Encryption Context #3440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I left a few comments.
this.clientProvidedEncryptionKey = getBase64EncodedString(encryptionKey); | ||
this.clientProvidedEncryptionKeySHA = getBase64EncodedString( | ||
getSHA256Hash(encryptionKey)); | ||
this.clientProvidedEncryptionKey = EncryptionAdapter.getBase64EncodedString(encryptionKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the encryption key passed in the configuration already be base 64 encoded? This is how it works e.g. for azCopy. Also, the header will required base 64 encoded values. Maybe it would be more consistent if the configuration for the global key also expected base 64 encoded key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
driver needs to send two request headers:
- Base 64 encoded key
- Base 64 encoded (SHA256 hash of key)
As the second header requires the original key to compute the SHA256, the config needs to accept the key without the encoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it can just accept the encoded key and decode it. I am not sure which one is the best way, I was just pointing out that azCopy requires the encoded key instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
azCopy takes precomputed values as configs, i.e., both the encoded key value and the encoded SHA256 hash of the key are required as input. ABFS driver currently computes these headers from the original key provided. Both of these methods are quite similar, hence would like to know your opinion on whether you see a value add in the driver change (to switch to the 2 precomputed key values for configs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am advocating in favor of the encoded version of the key for the following reasons:
- it might be easier to manage base 64 encoded strings in the configuration
- in ingestion flows where users use azCopy to upload data and then use the ABFS client to access them, they would need to provide both encoded (azCopy) and unencoded (this PR) version of the key.
Alternatively, we could also use both precomputed values (key and sha) in the provided configuration for the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have modified to accept encoded version, but will require both pre-computed key and SHA hash value as configs
} else if (encryptionAdapter == null) { | ||
// get encryption context from GetPathStatus response header | ||
encryptionAdapter = new EncryptionAdapter(encryptionContextProvider, | ||
new Path(path).toUri().getPath(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add some error handling and logging around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
return encryptionKey; | ||
} | ||
|
||
public SecretKey fetchEncryptionContextAndComputeKeys() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it should be more explicit in the naming that this is used to create a new encryption context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
} | ||
|
||
public SecretKey fetchEncryptionContextAndComputeKeys() throws IOException { | ||
encryptionContext = provider.getEncryptionContext(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this will only be reached when creating a new file, and will therefore make the provider create a new encryption context. Perhaps we could add a null check on the encryptionContext
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added check
|
||
default: return; // no client-provided encryption keys | ||
} | ||
requestHeaders.add(new AbfsHttpHeader(X_MS_ENCRYPTION_KEY, encodedKey)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some kind of validation of the values of the headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are null checks for the encryption key values (both Global and EncryptionContext types) used to compute the headers, and the sha256hash/base64encoding are standard functions so we can probably skip the check there
String encodedKey, encodedKeySHA256; | ||
switch (encryptionType) { | ||
case GLOBAL_KEY: | ||
encodedKey = clientProvidedEncryptionKey; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider creating an implementation of EncryptionContextProvider for the global key case, just for consistency between the two types? In that case, would the EncryptionAdapter still be required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EncryptionContextProvider currently supports only the non-global case, and is not initialized when encryption is of global type. They are mutually exclusive, and so encryptionAdapter passed by api to the addEncryptionKeyRequestHeaders
method will be null in the global scenario. As the global key is provided in the config, it can directly be added to the headers without going through the encryption framework
} | ||
} | ||
|
||
public EncryptionContextProvider initializeEncryptionContextProvider() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the actual call to initialize()
takes place in https://github.com/apache/hadoop/pull/3440/files#diff-94925ffd3b21968d7e6b476f7e85f68f5ea326f186262017fad61a5a6a3815cbR1630, maybe this should be renamed to createEncryptionContextProvider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
|
||
try { | ||
String configKey = FS_AZURE_ENCRYPTION_CONTEXT_PROVIDER_TYPE; | ||
if (get(configKey) == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this config be strictly per account? Else it might cause problems with instantiating a file system with no intend to use encryption. It will try to create a provider for it anyway. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, keeping it account-agnostic would cause other accounts to instantiate EncryptionContextProvider (ECP) as well. Have made the provider config account-specific. However, each account that uses ECP will need to have its own account-specific entry in the config file
public void computeKeys() throws IOException { | ||
SecretKey key = getEncryptionKey(); | ||
Preconditions.checkNotNull(key, "Encryption key should not be null."); | ||
encodedKey = getBase64EncodedString(new String(key.getEncoded(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just getBase64EncodedString(key.getEncoded())
? I mean, won't new String(key.getEncoded(),StandardCharsets.UTF_8)
basically give us a a key of length !=32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified to getBase64(byte[] encoded)
Conversion to String was followed by conversion back to bytes. It did not alter the value but was an unnecessary step, so has been removed
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…is of type ENCYPTION_CONTEXT.
…tion: javadocs in createEncryptedFile, tearDown() impl.
This reverts commit bb45ae9.
…ckage-protected methods for setting these fields.
yetus run for commit 0cd8c8c 💔 -1 overall
This message was automatically generated. Subsystem Report/Notes Failed because of javadocs. Other task were successful. |
HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
In the method addEncryptionKeyRequestHeaders of AbfsClient.java, encryptionAdapter is always going to be non-null object. when encryptionType==ENCRYPTION_CONTEXT. Hence, we would not need to call getPathStatus in this method. And as there is no logic when encryptionAdapter is null, have not made the change for EncryptionAdapter base class and having NoEncryptionAdapter child class. |
@steveloughran , requesting you to kindly review the PR please. Regards. |
@mukund-thakur , @mehakmeet requesting you to kindly review the PR. Thanks. |
PR introduces use of different customer-provided keys per encrypted file, superseding the global key use in HADOOP-17536.
Adding ABFS driver support for an EncryptionContextProvider plugin to retrieve encryption information, the implementation for which should be provided by the client. When encryption is activated for an account, file creation will involve ABFS driver fetching an encryption context and encryption key from the provider. These will be sent as request headers to the server, which handles encryption/decryption. The server will store the encryption context as system metadata for a file. Any subsequent REST calls to the server to access data or user metadata will require sending the encryption key headers. The encryption context of a file can be obtained through response headers of a GetPathStatus call, and then used to fetch the encryption key from the encryption provider.
New configs:
fs.azure.encryption.encoded.client-provided-key
: Server side encryption key encoded in Base6formatfs.azure.encryption.encoded.client-provided-key-sha
: SHA256 hash of encryption key encoded in Base64formatfs.azure.encryption.context.provider.type
: Custom EncryptionContextProvider type