-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support AES256 encryption of sensitive params #45195
Support AES256 encryption of sensitive params #45195
Conversation
Starting with a draft with a POC. Please, let me know if this is a good direction. |
692732e
to
56d5d47
Compare
Implements support for `_ENCRYPTED` versions of env variables and `_encrypted` versions of config params. Those would be encrypted using a parameter `params_encryption_aes256_key` (which can be retrieved from a secret for example). The params encryption key itself doesn't support encrypted suffix, or it'd create an infinite recursion. closes: apache#45194 Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
56d5d47
to
c2ca77e
Compare
Before we go any deeper. I would like to dicuss the reasoning of doing it I simply think it does not achieve any "more" security than we currently have. Both parameters and decryption key are necesarlly in the same environment and you neeed to have access to both - if you don't, then you cannot decrypt them. We already have encryption using fernet key (symmetric) but there we use it to encrypt the database values, becuase then for example if you have only access to database backups/dumps or database itself, you are not able to decrypt it without access to the configuraiton of Airlfow - where the fernet key is stored. So here values and keys are kept in two separate "spaces" / "security scopes/perimeters". We also already support In this case and your proposal, seemlngly both encrypted values and the key are supposed to be stored in the same "space" - so if you have access to one, you have access to other, and you can use the key to decrypt the values, so at best it's security-by-obscurity. Do you see any scenario where someone could get hold of the values but not the key - where it could be helpful? What kind of attacks and which actor behaviours it would prevent ? It might be you have some concrete use cases in mind that I don't know. |
Sure. I think of this as more of a convenience feature rather than a feature that heightens a security.
I'd have to think separately on how to work with secret things mounted as files, e.g. Please, let me know if these arguments make sense. Happy to discuss further. |
Here's an example when there's a secret value in a config where it doesn't seem to be possible to hide it https://airflow.apache.org/docs/apache-airflow/1.10.8/security.html#github-enterprise-ghe-authentication. |
Just to note - not that I am against it totally, but I want you to think and come up with proposal how to tell the user to maintain "secret" status of such values? If you are keeping both - secret values and private key in the same configuration place (secure perimeter), then you can use the key to decrypt the values. What your advise should be for the users who are going to use that feature, how they should keep the key and what exactly will the encryption will be protecting against? If we are going to accept some form of it, we need have detailed documentation describing the users what kind of security they can expect and how they should protect (and against what) - the thing is that often when people see some "Encrypted" values, they think they are generally protected against the values leaking (but they don't realise for example that storing the key to decrypt the values in the same security perimeter/storage gives them false sense of security). What your "user documentation" for that feature and user recommendation would look ilike for it? BTW:
Not really. You can have a single command that is used there and pass the parameter to that command indicating which secret you want to retrieve - |
Also. I think it needs an explanation on why you cannot use "k8s secrets" (if you are talking about k8s) - and why is it problematic to deploy them separately and mount as config files. |
I think the problem I'm trying to solve is that the secrets are in different places in different formats. While some of them can be handled with _CMD params (thanks for pointing that a single script can be used!), in some cases this option is not offered, for example:
My proposal offers an option to replace these secrets with one, still probably a k8s secret, to simplify management while offering the same amount of security (e.g. reading one k8s secret in a namespace requires pretty much the same amount of permissions and effort as reading multiple). I'll set users' expectations that they are getting the same amount of security and still need to keep their k8s secret with encrypted key safe, but that this way would simplify managing a configuration. |
One alternative to the
|
Why not having it all configured through env variables? All airflow configurations can be configured via ENV variables in one place - same source for every secret value https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables The thing is that we tend to avoid having to add "features" to Airflow where they can be provided via "external" means in this case deployment that might be responsible to keep all secrets (i.e. env variables) that are providing all secret values from a single "secret" that can even be backed by a secret manager. Generally we are now more in the mode (for Airlfow 3) to search what non core issues we can remove from Airflow rather than what we can add to it. Similalry as your other issue (now discussion) #45221 - we are very, very, very cautious about adding things to Airflow that could be done "externally" - we are good at scheduling workfloads, we are not good in keeping secret configurations (K8S is way better for that) and authentication (KeyCloak is way better in this). You have to remember, that whatever we approve as contributions to Airlfow, we also have to maintain it - and if there are other ways to achieve it - for example via deployment features - we tend to be very hesitant about accepting those. You seem to be very focused on K8S / ArgoCD environment, so my question is - why "standard" feature where all the configurationsecrets can be configured in one place and exposed as env variables are not "good" for you ? |
Thanks for detailed replies. After more thinking I concede - marginal wins in maintenance of configs across shards are not worth the complexity of adding a new feature. I understand the complexity of maintaining open source. I can probably achieve my goals with some customized manifests overlays + a custom script to render the helm templates (which we have anyways). That said, I'm still looking to contribute something, maybe that KeyCloak auth manager. |
Implements support for
_ENCRYPTED
versions of env variables and_encrypted
versions of config params. Those would be encrypted using a parameterparams_encryption_aes256_key
(which can be retrieved from a secret for example). The params encryption key itself doesn't support encrypted suffix, or it'd create an infinite recursion.closes: #45194
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.