Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amazon MSK multiple authentication modes and updates to TLS encryption settings #20956

Closed
ewbankkit opened this issue Sep 18, 2021 · 28 comments · Fixed by #21005
Closed

Amazon MSK multiple authentication modes and updates to TLS encryption settings #20956

ewbankkit opened this issue Sep 18, 2021 · 28 comments · Fixed by #21005
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/kafka Issues and PRs that pertain to the kafka service.
Milestone

Comments

@ewbankkit
Copy link
Contributor

ewbankkit commented Sep 18, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports the simultaneous use of multiple authentication modes and updates to encryption-in-transit settings for Amazon MSK clusters.

New or Affected Resource(s)

Potential Terraform Configuration

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key.

References

Requires AWS SDK v1.40.39:

@ewbankkit ewbankkit added enhancement Requests to existing resources that expand the functionality or scope. service/kafka Issues and PRs that pertain to the kafka service. labels Sep 18, 2021
@e-moshaya
Copy link

Any chance this can be worked on soon?

@james-bjss
Copy link
Contributor

james-bjss commented Sep 20, 2021

Have had a go at this one as it's something we need when swapping and adding PCA certs used by the cluster. I need to do some more testing but should be able to submit a PR later this week.

Out of interest how are PCA resources handled in the acceptance tests? These are quite an expensive resource to create and are billed for the month when created. For my own testing I will try and test this on a free tier account.

@ewbankkit
Copy link
Contributor Author

ewbankkit commented Sep 21, 2021

@james-bjss Take a look at the TestAccAWSMskCluster_ClientAuthentication_Tls_CertificateAuthorityArns test case to see how ACMPCA resources are currently being handled.
Maintainers can verify the tests if necessary.

Update: As part of #14627 I have made TestAccAWSMskCluster_ClientAuthentication_Tls_CertificateAuthorityArns run by activating the ACMPCA as part of the test, but @james-bjss you are correct, these resource can start to cost real money so we are happy for you to make changes to the tests as you think necessary but not run them and the maintainers can verify and fix as necessary.

@james-bjss
Copy link
Contributor

james-bjss commented Sep 22, 2021

@ewbankkit Some queries thrown up from my testing,

AWS provide a valid list of combinations for updates to cluster security. Assuming this should be validated in the provider prior to calling the API, should this be done using a custom validate functions or within the update function itself?

See the table here:
https://docs.aws.amazon.com/msk/latest/developerguide/msk-update-security.html

Interpreting that table I believe it's saying:

  • Any combination of Unauthenticated, mTLS, SASL/SCRAM, SASL/IAM can be enabled

  • If any of mTLS, SASL/SCRAM or SASL/IAM are enabled then Broker-Broker Encryption must be enabled

  • If mTLS is enabled then either TLS or TLS_PLAINTEXT must be enabled

  • If either SASL/SCRAM, or SASL/IAM are enabled then TLS must be enabled

  • I'm also guessing that at least one authentication method should be enabled

Also

  • If the client_authentication was not defined and the config block added in a later update (with identical settings to the defaults) eg. If you define sasl=false then the change on the field is detected and updateSecurity is called. Whilst this doesn't change anything on the cluster it still performs a cluster operation which takes ~25mins. Is this acceptable behaviour?
    Edit: This occurs regardless of whether the encrpytion_info field is sent in the request or not (see screenshot of the first change TLS -> TLS)
    image

  • Currently there is no concept of client_authentication.unauthenticated in the current implementation either SASL or TLS are off or on. Would you agree it makes sense to add this as an additional option and default it to enabled?

@ewbankkit
Copy link
Contributor Author

@james-bjss

  • Implementation of those multi-attribute validation rules are best done in a CustomizeDiff function - There should be examples in the codebase
  • For the missing configuration block/default values issue you could take a look at suppressMissingOptionalConfigurationBlock
  • Yes, I see Unauthenticated is field in the API so we can implement it as an attribute

@james-bjss
Copy link
Contributor

james-bjss commented Sep 24, 2021

@james-bjss

  • Implementation of those multi-attribute validation rules are best done in a CustomizeDiff function - There should be examples in the codebase
  • For the missing configuration block/default values issue you could take a look at suppressMissingOptionalConfigurationBlock
  • Yes, I see Unauthenticated is field in the API so we can implement it as an attribute

Sorry for the delay, been quite busy and only have evenings to look at this.

  • Have implemented the CustomizeDiff
  • suppressMissingOptionalConfigurationBlock - It seems this doesn't work for this case as the clientAuthentication field does not appear in the json response from AWS when calling describeCluster if it not initially set.
  • Added the unauthenticated attribute
  • Have tested various combinations and it seems to be working as expected

To Do:
Will find some time this weekend to add acceptance tests and update the docs.

@james-bjss
Copy link
Contributor

james-bjss commented Sep 25, 2021

Whilst writing the tests I spotted some contradictions and extra validation in the documentation that has me confused.

On the updateSecurity table it states that
for SASL/SCRAM and SASL/IAM the only valid broker-encryption is TLS

However this is contradicted here:

Details for ClientAuthentication using SASL. To turn on SASL, you must also turn on EncryptionInTransit by setting inCluster to true. You must set clientBroker to either TLS or TLS_PLAINTEXT. If you choose TLS_PLAINTEXT, then you must also set unauthenticated to true.

To add to the confusion we have a test cluster that has been deployed prior to the unauthenticated setting being available in the API. This is configured with (m)TLS Client Authentication and TLS_PLAINTEXT Client-Broker authentication. The cluster shows that Unauthenticated access is disabled in the UI and describing the cluster via the API shows that the option is absent/unset; however it does infact allow unauthenticated access.

I will have raised a support ticket for clarification from the MSK team, but this suggests the SASL options in the table should also list TLS_PLAINTEXT.

UPDATE: Support confirmed that Unauthenticated must be set if SASL + TLS_PLAINTEXT is set.

@james-bjss
Copy link
Contributor

james-bjss commented Sep 28, 2021

An update on my investigation (have provided this to AWS support).

  • I have confirmed that when Unauthenticated is not specified in the API that the cluster is provisioned implicitly with Unauthenticated enabled if TLS_PLAINTEXT or PLAINTEXT are set
  • The describeCluster API does not return the field at all
  • The UI reports it as disabled in the main cluster overview.
  • When editing the config in the console the UI reports that Unathenticated Is Enabled
  • The observed behavior is Unauthenticated is enabled
  • When the clientAuthentication field is sent as part of the clusterCreate or securityUpdate at least one of the authentication options must be enabled, else the call fails

I have sent this info to support as this possibly a bug.

Update from AWS Support: If PLAINTEXT OR TLS_PLAINTEXT mode is set on the cluster then AWS will implicitly enable Unauthenticated access, the UI will report Unauthenticated access as disabled and the API will not report the value. Support have passed this back to the MSK team for review/comment.

@ewbankkit - In my PR I proposed to make the client_authentication and unauthenticated block/attributes mandatory. This would force users to explicitly set their desired behavior. Would this be a suitable compromise?
The risk is if we default this attribute it may cause undesired changes and if we leave it optional then there is the assumption that unset = disabled (which may not be the case as-per my findings above).

@gmanjunath
Copy link

Hello Guys,

Quick question, I'm trying to connect DMS to AWS MSK over SSL port (9094) by setting ssl_mode to verify-ca and I'm getting following error message:

aws_dms_endpoint.target: InvalidParameterCombinationException: SSL settings are invalid for EngineName 'kafka' and EndpointType 'target'

I know this thread is not related to DMS, but this thread is about AWS MSK SSL/TLS support, so wondering if anyone has tested using DMS to connect to AWS MSK over SSL. Any pointers/suggestions will be really helpful.

I'm using following version of terraform:

Terraform v0.11.15

provider.aws v2.70.0
provider.consul v2.12.0
provider.template v1.0.0

Many Thanks
Manju

@rocioemera
Copy link

Hi All,

In September Amazon launch an update that allow update the authentication mechanism in a cluster https://aws.amazon.com/about-aws/whats-new/2021/09/amazon-msk-multiple-authentication-modes-tls-encryption-settings using the update-security API https://docs.aws.amazon.com/msk/latest/developerguide/msk-update-security.html. I tried to update my MSK cluster to enable TLS authentication, but instead of just update the security mechanism it tries to replace the whole cluster (I dont want the old cluster be deleted, just update the security mechanism)

Any idea what this is happening??

Terraform CLI and Terraform AWS Provider Version
provider.terraform-provider-aws_v3.61.0_x5
Terraform/0.15.1

Affected Resource(s)
aws_msk_cluster

Expected Behavior
Update the security settings from the cluster that already exist

Actual Behavior
Try to replace the cluster (delete the current one and create a new one with the given settings)

Steps to Reproduce

  1. Create a cluster without any authentication mechanism using terraform
  2. Update the aws_msk_cluster resource to add client_authentication
  3. Run the terraform file again

I am using the latest provider version. We need this setting to enable the authentication in a cluster already exist but does not have authentication.

@james-bjss
Copy link
Contributor

Hi All,

In September Amazon launch an update that allow update the authentication mechanism in a cluster https://aws.amazon.com/about-aws/whats-new/2021/09/amazon-msk-multiple-authentication-modes-tls-encryption-settings using the update-security API https://docs.aws.amazon.com/msk/latest/developerguide/msk-update-security.html. I tried to update my MSK cluster to enable TLS authentication, but instead of just update the security mechanism it tries to replace the whole cluster (I dont want the old cluster be deleted, just update the security mechanism)

Any idea what this is happening??

Terraform CLI and Terraform AWS Provider Version provider.terraform-provider-aws_v3.61.0_x5 Terraform/0.15.1

Affected Resource(s) aws_msk_cluster

Expected Behavior Update the security settings from the cluster that already exist

Actual Behavior Try to replace the cluster (delete the current one and create a new one with the given settings)

Steps to Reproduce

  1. Create a cluster without any authentication mechanism using terraform
  2. Update the aws_msk_cluster resource to add client_authentication
  3. Run the terraform file again

I am using the latest provider version. We need this setting to enable the authentication in a cluster already exist but does not have authentication.

@rocioemera - This is what this change aims to resolve. I am still waiting on a response back from the AWS MSK team; however I believe I have enough information now to progress my PR. Will try to make these changes over the weekend.

@james-bjss
Copy link
Contributor

james-bjss commented Oct 19, 2021

Quick update. Support got back to me yesterday saying they have made a fix. I need to check if this was simply in the console or if any change was made to the API. Either way I think I have all the info I need now to finish up the PR.

Effectively If any PLAINTEXT option is set then AWS will implicitly enable the unauthenticated option, regardless of is this value is set in the API call. To remedy this I will update the validation to check it is explicitly set n the TF configuration.

Re: Unauthenticated showing differing values in the console (AWS support):

I am happy to share with you that an update has been made and deployed to the MSK Console. I would like to thank you for raising this case and I hope this update clarifies the MSK Console UI for better understanding for the customers.

@rocioemera
Copy link

Hi All,
In September Amazon launch an update that allow update the authentication mechanism in a cluster https://aws.amazon.com/about-aws/whats-new/2021/09/amazon-msk-multiple-authentication-modes-tls-encryption-settings using the update-security API https://docs.aws.amazon.com/msk/latest/developerguide/msk-update-security.html. I tried to update my MSK cluster to enable TLS authentication, but instead of just update the security mechanism it tries to replace the whole cluster (I dont want the old cluster be deleted, just update the security mechanism)
Any idea what this is happening??
Terraform CLI and Terraform AWS Provider Version provider.terraform-provider-aws_v3.61.0_x5 Terraform/0.15.1
Affected Resource(s) aws_msk_cluster
Expected Behavior Update the security settings from the cluster that already exist
Actual Behavior Try to replace the cluster (delete the current one and create a new one with the given settings)
Steps to Reproduce

  1. Create a cluster without any authentication mechanism using terraform
  2. Update the aws_msk_cluster resource to add client_authentication
  3. Run the terraform file again

I am using the latest provider version. We need this setting to enable the authentication in a cluster already exist but does not have authentication.

@rocioemera - This is what this change aims to resolve. I am still waiting on a response back from the AWS MSK team; however I believe I have enough information now to progress my PR. Will try to make these changes over the weekend.

Hi, Any advance with this topic?? I tried this weekend to update the security settings to enable TLS in my current cluster but I am still getting the destructive behaviour.

Any idea when this will be solved?

@vicgp-te
Copy link

vicgp-te commented Nov 9, 2021

I also wanted to ask if there is any chance this could be sorted out soon. The linked pull request (#21005) hasn't have any activity in 21 days.
Without these changes, the MSK terraform module is not very usable given that is not compatible with the new features added to MSK.
Shouldn't this be considered a high priority issue?

@mgusiew-guide
Copy link
Contributor

One comment WRT Private CA. I had conversation with AWS support and it looks like Private CA is billed daily. Not sure if it changes much but maybe useful if someone decided to run Private CA tests weekly or monthly. Here is the fragment of the conversation:

  1. Regarding the "1 month of use is $400, but if you provision a PCA and then mark it for deletion 2 days later, you're only charged for the 2 days it was active" scenario. Assuming that:
  • the PCA was active for 2 full days, then marked for deletion and never re-activated,
  • month has 30 days
    Then I will be charged $400 * 2 days / 30 days = $26.67 for that PCA, correct?

-> Yes, in this case you will be charged for 2 days usage.

  1. What is the granularity of the PCA pro-rated billing? Is it days, hours or it is more granular than that? For example, if I created a PCA and deactivated it after exactly 30 minutes, then would I be charged for that 30 minutes only ($400 * 30 minutes / 30 days = $0.28) or for the whole day ($400 * 1 day / 30 days = $13.33)?

-> The granularity of PCA pro-rated billing is on a daily basis.

@nuttmeister
Copy link
Contributor

How are things progressing on this issue? Looks like our business will need both SASL/IAM and TLS enabled on our clusters.
If it's stalled I would be happy to look at it, possibly next week or next weekend. A bit unsure when I will have the time.

@james-bjss
Copy link
Contributor

How are things progressing on this issue? Looks like our business will need both SASL/IAM and TLS enabled on our clusters. If it's stalled I would be happy to look at it, possibly next week or next weekend. A bit unsure when I will have the time.

Sorry, have been slammed with work which took me away from looking at this. Have seen a number of messages chasing, so going to take another look today.

Will need to update the PR to reflect the refactoring of the codebase. I could probably use a hand with writing the tests and review. Testing this out locally is complicated due to the use of PCA (costly) and the fact it takes ~30mins to provision a cluster and a further ~25mins to perform the updates. If you would like to jump in and help please do, would be appreciated.

@waynetaylor
Copy link

Is there any ETA on resolution?

@james-bjss
Copy link
Contributor

james-bjss commented Jan 20, 2022

Is there any ETA on resolution?

The PR has been raised/submitted and is awaiting review by the maintainers.

@breathingdust
Copy link
Member

Hi all 👋 Just letting you know that this is issue is featured on this quarters roadmap. If a PR exists to close the issue a maintainer will review and either make changes directly, or work with the original author to get the contribution merged. If you have written a PR to resolve the issue please ensure the "Allow edits from maintainers" box is checked. Thanks for your patience and we are looking forward to getting this merged soon!

bradonkanyid added a commit to BlueOwlDev/terraform-aws-msk-cluster that referenced this issue Mar 22, 2022
Need this to merge before supporting properly

hashicorp/terraform-provider-aws#20956

[master]
@Gatsby-Lee
Copy link

@breathingdust
Thank you for sharing the updates.

@ashevade1
Copy link

Any ETA on when this will be merged, Multiple Authentication and the ability to not to destroy exiting cluster if the mode i changed from unauthenticated to IAM, SASL etc

@github-actions github-actions bot added this to the v4.13.0 milestone Apr 29, 2022
@ashevade1
Copy link

Thanks a lot for fixing. ANy ETA on when v4.13.0 might be released ? A time frame if possible , Really need this for a lot of issues i m seeing.

@ewbankkit
Copy link
Contributor Author

We usually release on Thursdays.

@ashevade1
Copy link

So it any particular Thursday of the month ?

@bryantbiggs
Copy link
Contributor

@ashevade1 put in a little effort at least - releases are done weekly, usually on thursdays - https://github.com/hashicorp/terraform-provider-aws/blob/main/CHANGELOG.md

@github-actions
Copy link

github-actions bot commented May 5, 2022

This functionality has been released in v4.13.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented Jun 5, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/kafka Issues and PRs that pertain to the kafka service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.