-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix KMS encryption context handling #435
Conversation
Codecov Report
@@ Coverage Diff @@
## master #435 +/- ##
==========================================
- Coverage 42.94% 36.42% -6.52%
==========================================
Files 16 20 +4
Lines 2254 2715 +461
==========================================
+ Hits 968 989 +21
- Misses 1197 1637 +440
Partials 89 89
Continue to review full report at Codecov.
|
Thank you for the PR and the detailed write-up, and great find. I will do a quick CR here shortly, but also definitely want @autrilla to take a look when he time. As well, as you stated, we will need to have a migration path for users who are affected by this bug. I'm going to get started on that here shortly. We won't want to merge this until that is ready to be merged as well. |
keyservice/server.go
Outdated
Role: svcKey.Role, | ||
EncryptionContext: ctx, | ||
AwsProfile: svcKey.AwsProfile, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please run go fmt
, will remove this semi-colon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad! I'll fix the formatting with go fmt
later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the very comprehensive description of the bug and especially for submitting a patch to fix it.
I fully agree with @ajvb that we want the upgrade path to be part of the codebase before we merge this.
My main concern with this patch is it increases the code complexity a bit just to make it testable. Since we already mock the AWS KMS interface (e.g. in https://github.com/mozilla/sops/blob/c968ce2e3b13fac738d1d0c144dca3c9c7cfacb8/kms/keysource_test.go#L16), I think we could check that that interface is called with the expected parameters, instead of adding an additional layer of abstraction. We could still have tests for this bug, but reduce the impact on code complexity. What do you think?
keyservice/server.go
Outdated
@@ -31,19 +52,19 @@ func (ks *Server) encryptWithPgp(key *PgpKey, plaintext []byte) ([]byte, error) | |||
func (ks *Server) encryptWithKms(key *KmsKey, plaintext []byte) ([]byte, error) { | |||
ctx := make(map[string]*string) | |||
for k, v := range key.Context { | |||
ctx[k] = &v | |||
value := v // Allocate a new string to prevent the pointer below from referring to only the last iteration value | |||
ctx[k] = &value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do ctx[k] = &key.Context[k]
too to avoid this as well, but what you did works just fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if there's any linters that catch or warn against this, since I know for a fact this has triggered a bug at least one other time in sops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to try to use &key.Context[k]
to get pointers to the map values, but that doesn't work, because apparently Go map values are not addressable. So the simplest fix I could find was to just assign the value to a new variable causing a new string to be allocated.
Testing that the encryption context is correctly passed all the way to the KMS SDK would be a good idea and I was initially considering that approach for the test cases. I didn't go with that approach, because I was a little reluctant exposing the kmsSvc kmsiface.KMSAPI
global variable to the keyservice
package, which I thought to be the best location for these tests. If it's okay to reuse that global variable to also enable KMS API mocking in keyservice
package, I don't see why I couldn't change the implementation to use that approach and remove the provider indirection I introduced.
Finally, using a linter to catch these types of bugs would be awesome. I'm not knowledgeable enough about the Go ecosystem to suggest a linter that would detect this problem. I did do some quick googling, but all I was able to find was this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now pushed changes that use the already existing KMS mock. These changes have the aforementioned downside that they expose kmsiface.KMSAPI
variable in kms
package in order to make it accessible from keyservice
package. It would be simpler to have this new encryption context test located within kms
package to allow it to access that kmsiface.KMSAPI
mock without publicly exporting the variable, but this would lead to a circular dependency between the packages (kms
-> keyservice
-> kms
).
In my opinion, it would be preferable to consider getting rid of the use of a global variable for mock injection purposes and instead refactor the kms
package to allow tests to provide the kmsiface.KMSAPI
implementation directly to an instance of kms.MasterKey
. This would remove the need to make the variable public for mocking purposes. Additionally, I'm wondering if the global variable might cause any kind of concurrency issues between tests. I'm not that well-versed in Go to know how concurrency might influence these testing scenarios, but I can see that if two tests were running concurrently and attempted to modify the global kmsiface.KMSAPI
, there would be trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that is somewhat ugly too. I think we're taking the wrong approach to testing this. This bug was caused by the conversion from a keyservice.KmsKey
to a kms.MasterKey
being buggy. We should split that out into a function and test it. I feel like the kind of testing you're doing right now is better suited for our "functional tests" (end-to-end tests). Unfortunately we don't have the infrastructure right now to use KMS keys during tests, but I feel like setting that up and running integration tests would be the way forward. Testing Go code across packages can get very messy, and mocking in Go is not great. I'm not really sure the tests we currently have in the kms
package (before this PR) are actually very useful. It'd be much better to have a real end-to-end test instead of using a mock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the global variable public is indeed ugly. On the other hand, your suggestion of extracting a helper method is a sound choice. I considered that initially, but opted to go with a mock-based test, because by using a mock I could check that the actual call path would provide the wanted encryption context. In contrast calling the helper method directly in a test will not guarantee that the actual call path will actually use that helper function, even if the test case shows the helper function to be working fine. That said, I'm fine with your suggestion, especially if a more comprehensive functional test is added in the future. I pushed new code changes that extract a helper function and test that function directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these changes again. It looks good to me now. If my information is still correct, @ajvb will work on setting up integration tests and an upgrade path for this PR, at which point, it can be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem! I rebased the changes, dropping irrelevant commits.
The code copying encryption context value strings to a map containing string pointers was incorrectly getting a pointer to a string variable which is being re-used by the for loop, causing all keys to point to the same value string.
It turns out that sops doesn't send correct KMS encryption context key-value pairs to KMS service during encryption and decryption operations. If an encryption context contains at least two key-value pairs and more than one unique value among the keys' values, sops will incorrectly call KMS API with an encryption context where all correct keys are present, but they all have the same value. This pull request fixes that and adds a regression test for KMS encryption context handling in
keyservice/server.go
, where the bug occurs.Example of the bug
Suppose sops is asked to encrypt a new file with the following encryption context.
In that case, sops sends to KMS an encryption context where every key has exactly the same value, which is chosen randomly from the values of the actually requested encryption context above. This random choice happens on every invocation of sops KMS encrypt or decrypt command and can lead to sops attempting to encrypt and decrypt with different encryption contexts between successive invocations!
Consequences
Due to the nature of the bug, it looks like the data key in every KMS encryption context using sops file has been incorrectly encrypted, provided that the following conditions are fulfilled.
As a result, this pull request's fix will break decryption of any such KMS-backed data keys!
Workaround suggestion
This pull request only includes the immediate bug fix, but it would be worthwhile for the sops team to consider implementing a compatibility feature that can detect the presence of KMS encryption context configuration that is impacted by this bug. Sops could then prompt the user if sops should be allowed to perform multiple KMS decrypt requests to find out which unique encryption context value was actually used for all context keys when the data key was previously successfully encrypted while the bug was present. It should be possible to detect such KMS encryption context configurations, because the bug only impacts what is sent to the KMS API, meaning that the serialised sops file should still have to the correct encryption context configuration.