-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the key management tools API for encryption #426
Conversation
/// This CryptoFactory instance must remain alive and not disposed until after any files using these | ||
/// decryption properties have been read, as internally the FileDecryptionProperties contains references to | ||
/// data in the CryptoFactory that cannot be managed by ParquetSharp. | ||
/// Failure to do so may result in native memory access violations and crashes that cannot be caught as exceptions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit awkward. I couldn't find a way to handle this nicely without exposing users to possible segfaults, as the FileDecryptionProperties
class in C++ has a private constructor and can't be subclassed, so we can't do anything like combine it with a reference to the CryptoFactory. I'm hoping this can be fixed in Arrow so we can fix this limitation for the next version though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a PR in Arrow to address this at apache/arrow#40329
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's been merged, so we should be able to remove this warning in the 16.0.0 release
62b367d
to
7101437
Compare
I'm not sure what to do about the Windows build failure. It worked for me with MSVC 19.35.32216.1, and with 19.39.33521.0 after upgrading. The GitHub runner has version 19.38.33135.0 though. It looks like this is triggered by my changes pulling in a dependency on the Arrow FileSystem API, but this seems to be a bug in MSVC. I found a similar issue reported at https://developercommunity.visualstudio.com/t/Latest-Visual-Studio-version-1791-ca/10598722 We might need to wait until the runners get a newer version of MSVC, or disable this feature on Windows for now. |
Similar build failure issue reported at actions/runner-images#9398, but that was apparently caused by linking to a different STL version than what the compiler expected. Maybe there's something similar going on with our build. |
The build failure appears to be because Arrow is built with a different MSVC than ParquetSharp, which isn't good. I'm not sure why different versions are being selected: From the Arrow build logs:
And from the ParquetSharp build:
|
fff7f25
to
8f6c8dd
Compare
8f6c8dd
to
b57fc62
Compare
b57fc62
to
634b69a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work, well done!
set => ExceptionInfo.Check(EncryptionConfiguration_SetDataKeyLengthBits(Handle.IntPtr, value)); | ||
} | ||
|
||
private static string EncodeColumnKeys(IReadOnlyDictionary<string, IReadOnlyList<string>> columnKeys) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It can be implemented in a single line:
return string.Join(";", columnKeys.Select(kvp => $"{kvp.Key}:{string.Join(",", kvp.Value)}"));
Nonetheless, the current version is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that's much more readable, I've changed this now
Fixes #200
This adds bindings to the Parquet key management tools API, also known as the high level encryption API. The existing low-level encryption API supports setting encryption/decryption keys directly, whereas this API uses a Key Management System (KMS) to implement envelope encryption, where data encryption keys are randomly generated and then stored encrypted in the file metadata.