Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alertmanager and ruler irsa is failing, added support to it and also f… #3740

Conversation

Nitesh-vaidyanath
Copy link

@Nitesh-vaidyanath Nitesh-vaidyanath commented Jan 25, 2021

…ixed buy in dockerfile

Adding support to iam role for service account(irsa) and fixed dockerfile.

Which issue(s) this PR fixes:
irsa is failing from alertmanager and ruler pod
level=warn ts=2021-01-25T11:29:36.056182464Z caller=multitenant.go:304 component=MultiTenantAlertmanager msg="error fetching all configurations, backing off" err="WebIdentityErr: failed to retrieve credentials\ncaused by: SerializationError: failed to unmarshal error message\n\tstatus code: 405

Ingestor, storage gateway, compactor are using different s3 sdk cortex/vendor/github.com/thanos-io/thanos/pkg/objstore/s3/s3.go,
alertmanager and ruler using s3 client defined in cortex/pkg/chunk/aws/s3_storage_client.go so we are seeing issue for these services.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Nitesh-vaidyanath and others added 3 commits January 25, 2021 14:39
…ixed buy in dockerfile

Signed-off-by: Nitesh Vaidyanath <nvaidyanath@stg3bastion.stg3.ap>
Signed-off-by: Nitesh Vaidyanath <nvaidyanath@stg3bastion.stg3.ap>
Signed-off-by: Nitesh Vaidyanath <nvaidyanath@stg3bastion.stg3.ap>
@Nitesh-vaidyanath Nitesh-vaidyanath force-pushed the support-for-web-identity-tokens branch from 010cd72 to a744a1e Compare January 25, 2021 14:40
…iable

Signed-off-by: Nitesh Vaidyanath <nvaidyanath@stg3bastion.stg3.ap>
@Nitesh-vaidyanath Nitesh-vaidyanath force-pushed the support-for-web-identity-tokens branch from 2d138a8 to 9e777b9 Compare January 25, 2021 23:22
Signed-off-by: Nitesh-vaidyanath <niteshbv@ymail.com>
@Nitesh-vaidyanath Nitesh-vaidyanath force-pushed the support-for-web-identity-tokens branch from d554ac6 to 437d5a6 Compare January 26, 2021 03:20
…int so got few errors

Signed-off-by: Nitesh-vaidyanath <niteshbv@ymail.com>
@Nitesh-vaidyanath Nitesh-vaidyanath force-pushed the support-for-web-identity-tokens branch from 8e69e79 to 5704bcf Compare January 26, 2021 03:31
Signed-off-by: Nitesh-vaidyanath <niteshbv@ymail.com>
@Nitesh-vaidyanath Nitesh-vaidyanath force-pushed the support-for-web-identity-tokens branch from bb44046 to 1a5cd04 Compare January 26, 2021 04:45
@Nitesh-vaidyanath Nitesh-vaidyanath changed the title alertmaneger and ruler irsa is failing added support to it and also f… alertmanager and ruler irsa is failing added support to it and also f… Jan 26, 2021
@Nitesh-vaidyanath Nitesh-vaidyanath changed the title alertmanager and ruler irsa is failing added support to it and also f… alertmanager and ruler irsa is failing, added support to it and also f… Jan 26, 2021
@pstibrany
Copy link
Contributor

pstibrany commented Jan 28, 2021

Thank you for your PR. From reading the diff, my understanding is that it tries to use AssumeRoleWithWebIdentity to get credentials from session token.

I don't think this is the right approach for Cortex. Credentials obtained from AssumeRoleWithWebIdentity are temporary, and will only work for short time (hours).

@Nitesh-vaidyanath
Copy link
Author

Nitesh-vaidyanath commented Jan 29, 2021

@pstibrany Yes credentials created by AssumeRoleWithWebIdentity have default TTL 1 hour, i am figuring out a way to add logic in alertmanager and ruler to get credentials whenever it expires. AssumeRoleWithWebIdentity is working fine with ingester and storegateway as it is using different sdk for getting credentials.

@Nitesh-vaidyanath
Copy link
Author

@pstibrany I don't think we need to change the code, just need to upgrade aws-sdk to 0.20.0, currently it is 0.18.0
aws/aws-sdk-go-v2#475

@pstibrany
Copy link
Contributor

@pstibrany I don't think we need to change the code, just need to upgrade aws-sdk to 0.20.0, currently it is 0.18.0
aws/aws-sdk-go-v2#475

Would you like to send PR updating aws-sdk?

@@ -4,7 +4,8 @@


## 1.7.0 in progress

* [ENHANCEMENT] Added support for web indentity tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cortex release 1.8.0 is now in progress. Could you please rebase master and move the CHANGELOG entry under the master / unreleased section?

@Nitesh-vaidyanath
Copy link
Author

@pstibrany Thanks for taking care of this. I will close this PR.

@dgonzalezruiz
Copy link

dgonzalezruiz commented Apr 14, 2021

Hello! Cortex ruler/alertmanager components are still not able to integrate with IRSA, making their setup more complex (as it makes cortex require IAM role + IAM user, instead of using single IAM role with non-static credentials that need to be handled as secrets).

@Nitesh-vaidyanath @pstibrany are there any future plans to get back to this, or was this solved somehow?

Otherwise, if this was a comms issue, I don't mind opening new PR with these changes + changelog + aws sdk update

@abacus3
Copy link

abacus3 commented May 9, 2022

Usually the AWS-Go-SDK will take care of aquiring temporary credentials via the AssumeRoleWithWebIdentity STS API call, which it then uses to actually interact with S3.
The most recent v1.11.1 still does not implement this porperly for only altermanager and ruler services.
Here is the error message that pops up if one tries to use the IRSA Setup that is working for all other components in Cortex:

level=error ts=2022-05-09T13:54:45.671446387Z caller=ruler.go:489 msg="unable to list rules" err="WebIdentityErr: failed to retrieve credentials
caused by: SerializationError: failed to unmarshal error message
	status code: 405, request id: 
caused by: UnmarshalError: failed to unmarshal error message
00000000  3c 3f 78 6d 6c 20 76 65  72 73 69 6f 6e 3d 22 31  |<?xml version=\"1|
00000010  2e 30 22 20 65 6e 63 6f  64 69 6e 67 3d 22 55 54  |.0\" encoding=\"UT|
00000020  46 2d 38 22 3f 3e 0a 3c  45 72 72 6f 72 3e 3c 43  |F-8\"?>.<Error><C|
00000030  6f 64 65 3e 4d 65 74 68  6f 64 4e 6f 74 41 6c 6c  |ode>MethodNotAll|
00000040  6f 77 65 64 3c 2f 43 6f  64 65 3e 3c 4d 65 73 73  |owed</Code><Mess|
00000050  61 67 65 3e 54 68 65 20  73 70 65 63 69 66 69 65  |age>The specifie|
00000060  64 20 6d 65 74 68 6f 64  20 69 73 20 6e 6f 74 20  |d method is not |
00000070  61 6c 6c 6f 77 65 64 20  61 67 61 69 6e 73 74 20  |allowed against |
00000080  74 68 69 73 20 72 65 73  6f 75 72 63 65 2e 3c 2f  |this resource.</|
00000090  4d 65 73 73 61 67 65 3e  3c 4d 65 74 68 6f 64 3e  |Message><Method>|
000000a0  50 4f 53 54 3c 2f 4d 65  74 68 6f 64 3e 3c 52 65  |POST</Method><Re|
000000b0  73 6f 75 72 63 65 54 79  70 65 3e 53 45 52 56 49  |sourceType>SERVI|
000000c0  43 45 3c 2f 52 65 73 6f  75 72 63 65 54 79 70 65  |CE</ResourceType|
000000d0  3e 3c 52 65 71 75 65 73  74 49 64 3e FF FF FF FF  |><RequestId>XXXX|
000000e0  FF FF FF FF FF FF FF FF  FF FF FF FF 3c 2f 52 65  |XXXXXXXXXXXX</Re|
000000f0  71 75 65 73 74 49 64 3e  3c 48 6f 73 74 49 64 3e  |questId><HostId>|
00000100  FF FF FF FF FF FF FF FF  FF FF FF FF FF FF FF FF  |XXXXXXXXXXXXXXXX|
00000110  FF FF FF FF FF FF FF FF  FF FF FF FF FF FF FF FF  |XXXXXXXXXXXXXXXX|
00000120  FF FF FF FF FF FF FF FF  FF FF FF FF FF FF FF FF  |XXXXXXXXXXXXXXXX|
00000130  FF FF FF FF FF FF FF FF  FF FF FF FF FF FF FF FF  |XXXXXXXXXXXXXXXX|
00000140  FF FF FF FF FF FF FF FF  FF FF FF FF 3c 2f 48 6f  |XXXXXXXXXXXX</Ho|
00000150  73 74 49 64 3e 3c 2f 45  72 72 6f 72 3e           |stId></Error>|

caused by: unknown error response tag, {{ Error} []}"

It looks the the ruler instance is trying to perform the STS AssumeRoleWithWebIdentity API against the S3 API.

Any further activities on this @Nitesh-vaidyanath @pstibrany ?

@friedrich-at-adobe
Copy link
Contributor

@abacus3 please open a new issue with the error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants