Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle incompatible automount token setting In service account YAML #5549

Merged
merged 6 commits into from
Feb 12, 2021

Conversation

jimil749
Copy link
Contributor

Potential fix for #4651

This PR added a check in the proxy injector webhook stage. This checks if the concerned container has serviceaccount volume mounted. In case, the volume is not mounted, which essentially means that automountServiceAccountToken field in the service account used by the injected workload is set to false, we don't inject the sidecar proxy.

But there is a caveat here. Since I am performing the check at the webhook stage (unlike the other checks, which also take place at the cli stage), there is no error shown to the user when the injection fails.

I am not a 100% how to output the error to the users.

Screenshot from 2021-01-15 16-39-05
If we look at the logs of the proxy-injector, it skips the injection where automountServiceAccountToken is set to false.

Signed-off-by: Jimil Desai jimildesai42@gmail.com

@jimil749 jimil749 requested a review from a team as a code owner January 15, 2021 17:18
pkg/inject/report.go Outdated Show resolved Hide resolved
@adleong
Copy link
Member

adleong commented Jan 15, 2021

I think this may complicate our tests...

If I'm understanding correctly, the service account token volume mount is added by Kubernetes at pod creation time (by a kubernetes system mutating webhook?). However, this doesn't happen in our unit tests because the pods aren't actually being created in a Kubernetes cluster. Making all of our injection tests take this into account seems challenging.

To help me evaluate this trade-off, it would be helpful for me to know how serious of an issue this is to begin with. What are the consequences if we don't fix it? In practice, how much better is it to skip injecting pods which use non-automount service accounts versus injecting them and having them not work?

A different approach would be to allow these pods to be injected, but to be able to identify this situation after the fact with linkerd check. This would give users the information they need to diagnose and remedy the situation themselves and may present fewer technical challenges.

@jimil749
Copy link
Contributor Author

If I'm understanding correctly, the service account token volume mount is added by Kubernetes at pod creation time (by a kubernetes system mutating webhook?). However, this doesn't happen in our unit tests because the pods aren't actually being created in a Kubernetes cluster. Making all of our injection tests take this into account seems challenging.

Yes. The service account token volume is added by the service account admission controller before an object is persisted, hence this cannot be validated at the cli injection phase. I agree that this will complicate the tests, for the reasons mentioned.

To help me evaluate this trade-off, it would be helpful for me to know how serious of an issue this is to begin with. What are the consequences if we don't fix it? In practice, how much better is it to skip injecting pods which use non-automount service accounts versus injecting them and having them not work?

I am not totally sure about the seriousness of the issue. But when a pod has no service account volume mounted, it may cause the pods to go into a crashloop since they won't be able to fetch an identity. So I think when a user tries to inject a workload without service account volume mounted, maybe we should show some sort of error? I think that linkerd check approach mentioned is apt here. We could allow the injection, and check the volume mount with linkerd check to notify the users about the issue.

Do you propose that I move forward with this approach?

@adleong
Copy link
Member

adleong commented Jan 19, 2021

Do you propose that I move forward with this approach?

Yeah, I think that's my preference.

@jimil749
Copy link
Contributor Author

@adleong, where do you think does this check fit (category-wise)? Should introduce a new category for this check (maybe something like serviceaccount validation) or is linkerd config category, the right place for this?

@adleong
Copy link
Member

adleong commented Jan 20, 2021

I think it's a data-plane check since it checks the health of an injected workload rather than checking the health of the control plane itself. Therefore, I think it belongs in the data plane checks category.

@alpeb
Copy link
Member

alpeb commented Jan 22, 2021

To be honest, I would prefer not to introduce a special case here, and let this failure be handled like the other ones: hostnetwork enabled, existing sidecar, udp ports enabled, automountServiceAccountToken disabled. The latter would have to be expanded so the injector doesn't just check the pod's automountServiceAccountToken but also the associated ServiceAccount's. Note these failures are surfaced not only as log entries in the injector, but also as events in the pod's workload owner.

@jimil749
Copy link
Contributor Author

Hmm, the only problem with handling this failure like all the other failures, is testing. As @adleong mentioned, this may complicate testing. Since we are checking the service account mount in the containers, which is added by the service account admission controller; it will be a bit challenging to test this as we are not actually creating a k8s clusters while performing the unit tests.

@adleong
Copy link
Member

adleong commented Jan 25, 2021

Rather than checking for the service account mount in the container, is it instead possible to just check the automountServiceAccountToken field in the pod's ServiceAccount? I think this would also impact tests since we would need to add a ServiceAccount resource to all of the injector tests, but this might be easier than updating all mock pods to include the token mount.

@jimil749
Copy link
Contributor Author

Yes, that is certainly possible, but I would like to ping @alpeb here, because of this comment. Seems like @adleong is implying the same approach that I mentioned. Any opnions? 🤔

@alpeb
Copy link
Member

alpeb commented Feb 1, 2021

About the tests, pkg/inject/inject_test.go doesn't seem to imply lots of changes. cli/cmd/inject_test.go deals with manual injection (resourceTransformerInject.injectProxy being true), in which case the ServiceAccount check should be skipped. And then there are the integration tests, which should in theory have the proper SA resources (or should be added if they don't yet).

@jimil749
Copy link
Contributor Author

jimil749 commented Feb 2, 2021

Alright, so first I'll revert back to the earlier approach, and expand the serviceaccount check in report.go to accommodate the check for pod's service account.

@jimil749
Copy link
Contributor Author

jimil749 commented Feb 2, 2021

@alpeb do you think we need to update report_test.go too? Because the resourceConfig object in report_test.go does not have a service account mount, which is causing the tests to fail. (Because, resourceConfig.origin is set to OriginWebhook in report_test.go and that's when I'm checking for Volume Mount)

@alpeb
Copy link
Member

alpeb commented Feb 3, 2021

@alpeb do you think we need to update report_test.go too? Because the resourceConfig object in report_test.go does not have a service account mount, which is causing the tests to fail. (Because, resourceConfig.origin is set to OriginWebhook in report_test.go and that's when I'm checking for Volume Mount)

Yeah, the podSpecs need to have a stub container added with the appropriate volume mount.

Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
@jimil749
Copy link
Contributor Author

jimil749 commented Feb 3, 2021

So just to summarize I've added the following changes,

  • Added a check for service-account volume mount in the concerned container. (at webhook stage)
  • Updated tests in report_test.go: I've added the stub container with appropriate volume mount in all of the existing check, so they are technically not affected, and added an extra test to test the service-account volume mount.

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! And thanks for the extra test 😉

@jimil749 jimil749 requested a review from adleong February 7, 2021 15:42
@jimil749
Copy link
Contributor Author

Sorry to ping, but @adleong, can you take a look? 😅

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping, @jimil749, and thanks for your hard work and flexibility on this!

@alpeb alpeb merged commit 8bc732e into linkerd:main Feb 12, 2021
@jimil749 jimil749 deleted the service-account branch February 12, 2021 15:46
jijeesh pushed a commit to jijeesh/linkerd2 that referenced this pull request Mar 23, 2021
…inkerd#5549)

Fixes linkerd#4651

This PR adds a check in the proxy injector webhook. This checks if the concerned container has serviceaccount volume mounted. In case, the volume is not mounted, which essentially means that automountServiceAccountToken field in the service account used by the injected workload is set to false, we don't inject the sidecar proxy.

Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
jijeesh pushed a commit to jijeesh/linkerd2 that referenced this pull request Apr 21, 2021
…inkerd#5549)

Fixes linkerd#4651

This PR adds a check in the proxy injector webhook. This checks if the concerned container has serviceaccount volume mounted. In case, the volume is not mounted, which essentially means that automountServiceAccountToken field in the service account used by the injected workload is set to false, we don't inject the sidecar proxy.

Signed-off-by: Jimil Desai <jimildesai42@gmail.com>
Signed-off-by: Jijeesh <jijeesh.ka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants