Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple aws accounts with chained providers #1544

Closed
rgaertner opened this issue Jun 24, 2021 · 9 comments
Closed

multiple aws accounts with chained providers #1544

rgaertner opened this issue Jun 24, 2021 · 9 comments
Labels
awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). kind/bug Some behavior is incorrect or out of spec resolution/duplicate This issue is a duplicate of another issue

Comments

@rgaertner
Copy link

rgaertner commented Jun 24, 2021

We have a mutliple aws account setup. In the external accounts we need to modify resources, e.g. establish vpc peerings or add routes to a transit gateway.

There is a specific role in each account containing permissions to execute those external resource modifications.
To assume those roles in the external accounts, there is one role called intermediateRole that is allowed to assume those roles in the external accounts.

To maintain this multi aws account setup with pulumi we tried to establish chained providers with the according roles.
We experience issues with one particular external account/role while the other one works.

Steps:

  1. create a single intermediateProvider by assuming the intermediateRole
  2. create a destinationProvider by assuming the external role for controlling resources in an external aws account, based on the intermediateProvider
  3. create anotherDestinationProvider by assuming another external role for controlling resources in another external aws account, based on the intermediateProvider

using the provider from step 2, works as expected
using the provider from step 3, though similar build up to the one from step2, fails.
expressed via code:

An example of our functions:
pulumi.Run( ..
  intermediateProvider, err := aws.NewProvider(ctx, "intermediateProvider", &aws.ProviderArgs{
    AssumeRole: &aws.ProviderAssumeRoleArgs{
      RoleArn:     intermediateRole.Arn,
      SessionName: pulumi.String("intermediateProviderSession"),
    },
    Region: pulumi.String(region.Name),
  })
  if err != nil {
    return err
  }
  
  // This code path works
  createTGWPeering(ctx, intermediateProvider, ...)

  // This code path fails
  for _, account := range peeringAccounts{
    peerAccounts(ctx, account, intermediateProvider, ....)
  }

whithin each function the correlated chained destinationProvider are built the same way:

  createTGWPeering(ctx *pulumi.Context, ...) {
		destinationProvider, err := aws.NewProvider(ctx, "destinationProviderProvider", &aws.ProviderArgs{
			AssumeRole: &aws.ProviderAssumeRoleArgs{
				RoleArn:     pulumi.String(fmt.Sprintf("arn:aws:iam::%s:role/%s", AccountID, roleName)),
				SessionName: pulumi.String("destinationProviderSession"),
			},
			Region: pulumi.String(region.Name),
		}, pulumi.Provider(intermediateProvider),
		)
		if err != nil {
			return err
		}
		assumed, error := aws.GetCallerIdentity(ctx, nil, pulumi.Provider(destinationProvider))
		if error != nil {
			fmt.Printf("assuming role failed: %s \n ", error)
			return error
		}
		fmt.Printf("assumed role via destination provider %s \n", assumed.Arn)
  }

peerAccounts(ctx *pulumi.Context, ...) {
		anotherDestinationProvider, err := aws.NewProvider(ctx, "destinationProviderProvider", &aws.ProviderArgs{
			AssumeRole: &aws.ProviderAssumeRoleArgs{
				RoleArn:     pulumi.String(fmt.Sprintf("arn:aws:iam::%s:role/%s", anotherAccountID, anotherRoleName)),
				SessionName: pulumi.String("anotherDestinationProviderSession"),
			},
			Region: pulumi.String(region.Name),
		}, pulumi.Provider(intermediateProvider),
		)
		if err != nil {
			return err
		}

		anotherAssumed, error := aws.GetCallerIdentity(ctx, nil, pulumi.Provider(anotherDestinationProvider))
		if error != nil {
			fmt.Printf("assuming another role failed: %s \n ", error)
			return error
		}
		fmt.Printf("assumed role via another destination provider %s \n", anotherAssumed.Arn)
	}

the aws.GetCallerIdentity calls are exemplary for using the created providers. For the latter call we get the error:

error in peering sts identity: rpc error: code = Unknown desc = invocation of aws:index/getCallerIdentity:getCallerIdentity returned an error: 1 error occurred:
* error configuring Terraform AWS Provider: IAM Role (arn:aws:iam::[anotherAccountID]:role/[anotherRoleName]) cannot be assumed.
There are a number of possible causes of this - the most common are:
* The credentials used in order to assume the role are invalid
* The credentials do not have appropriate permission to assume the role
* The role ARN is not valid
Error: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors

at this point I came to the conclusion that the role setup in the latter external aws account is wrong hence I tried to verify it via the aws cli, which worked fine.
Therefore I tried a workaround with aws go sdk to directly assume the intermediate role and create the failing destinationProvider:

	intermediateRole.Arn.ApplyT(func(roleArn string) string {

		cfg, err := config.LoadDefaultConfig(context.TODO())
		if err != nil {
			panic("configuration error, " + err.Error())
		}

		client := sts.NewFromConfig(cfg)

		session := "intermediateSession"
		input := &sts.AssumeRoleInput{
			RoleArn:         &roleArn,
			RoleSessionName: &session,
		}

		result, err := client.AssumeRole(context.TODO(), input)
		if err != nil {
			fmt.Println("Got an error assuming the role:")
			fmt.Println(err)
		}

		anotherDestinationProvider, err := aws.NewProvider(ctx, "AnotherDestinationProvider", &aws.ProviderArgs{
			AccessKey: pulumi.String(*result.Credentials.AccessKeyId),
			SecretKey: pulumi.String(*result.Credentials.SecretAccessKey),
			Token:     pulumi.String(*result.Credentials.SessionToken),
			AssumeRole: &aws.ProviderAssumeRoleArgs{
				RoleArn:     pulumi.String(fmt.Sprintf("arn:aws:iam::%s:role/%s", anotherAccountID, anotherRoleName)),
				SessionName: pulumi.String("AnotherSessionProvider"),
			},
			Region: pulumi.String(region.Name),
		},
		)
		if err != nil {
			fmt.Println("peering provider failed", err)
		}

which worked fine out of the box.
How can I debug this further? Right now I cannot see any difference, that would explain the inconsistent behaviour of the chained providers.

Expected: the intermediate pulumi provider should work for all chained destinationProviders
Actual: it doesn't work for all

@rgaertner rgaertner added the kind/bug Some behavior is incorrect or out of spec label Jun 24, 2021
@Comradin
Copy link

I have prepared a more complete example of the problem out of our codebase.

I just create the intermediateProvider and the destinationProvider and invoke the GetCallerIdentity function upon each and try to print the two providers arn.

package main

import (
        "fmt"

        "github.com/pulumi/pulumi-aws/sdk/v4/go/aws"
        "github.com/pulumi/pulumi/sdk/v3/go/pulumi"
        "github.com/pulumi/pulumi/sdk/v3/go/pulumi/config"
)

var (
        destinationAccount string
        destinationRole    string
        intermediateRole   string
)

func main() {
        pulumi.Run(func(ctx *pulumi.Context) error {
                conf := config.New(ctx, "")

                destinationAccount = conf.Get("destinationAccount")
                destinationRole = conf.Get("destinationRole")
                intermediateRole = conf.Get("intermediateRole")

                current, err := aws.GetCallerIdentity(ctx, nil, nil)
                if err != nil {
                        return err
                }

                // Intermediate Provider && Get-Caller-Identity
                intermediateRoleArn := fmt.Sprintf("arn:aws:iam::%s:role/%s", current.AccountId, intermediateRole)
                intermediateProvider, err := aws.NewProvider(ctx, "intermediateProvider", &aws.ProviderArgs{
                        AssumeRole: &aws.ProviderAssumeRoleArgs{
                                RoleArn:     pulumi.String(intermediateRoleArn),
                                SessionName: pulumi.String("intermediateProviderSession"),
                        },
                        Region: pulumi.String("eu-central-1"),
                })
                if err != nil {
                        return err
                }

                currentIntermediate, err := aws.GetCallerIdentity(ctx, nil, pulumi.Provider(intermediateProvider))
                if err != nil {
                        return err
                }
                fmt.Println(currentIntermediate.Arn)

                // Destination Provider && Get-Caller-Identity
                destinationRoleArn := fmt.Sprintf("arn:aws:iam::%s:role/%s", destinationAccount, destinationRole)
                destinationProvider, err := aws.NewProvider(ctx, "destinationProvider", &aws.ProviderArgs{
                        AssumeRole: &aws.ProviderAssumeRoleArgs{
                                RoleArn:     pulumi.String(destinationRoleArn),
                                SessionName: pulumi.String("destinationProviderSession"),
                        },
                        Region: pulumi.String("eu-central-1"),
                }, pulumi.Provider(intermediateProvider))
                if err != nil {
                        return err
                }

                currentDestination, err := aws.GetCallerIdentity(ctx, nil, pulumi.Provider(destinationProvider))
                if err != nil {
                        return err
                }
                fmt.Println(currentDestination.Arn)

                return nil
        })
}

Which leads to the above described error message that the role cannot be assumed ..

@Comradin
Copy link

I have the same code flow in pulumi python for aws and it fails like the go code.

The error message is different:

AttributeError: 'NoneType' object has no attribute 'account_id'

I guess this is a followup error from me calling get_caller_identity on the destination provider and this returns an empty response.

The code:

"""An AWS Python Pulumi program"""

import pulumi
import pulumi_aws as aws

config = pulumi.Config()
destination_account = config.require("destinationAccount")
destination_role = config.require("destinationRole")
intermediate_role = config.require("intermediateRole")

current = aws.get_caller_identity()

role_arn = f"arn:aws:iam::{current.account_id}:role/{intermediate_role}"
print(role_arn)

intermediate_provider = aws.Provider("IntermediateProvider",
        region="eu-central-1",
        assume_role={
            "role_arn": role_arn,
            "session_name": "IntermediateSession",
        })

destination_role_arn = f"arn:aws:iam::{destination_account}:role/{destination_role}"
print(destination_role_arn)

peering_provider = aws.Provider("PeeringProvider",
        region="eu-central-1",
        access_key=intermediate_provider.access_key,
        secret_key=intermediate_provider.secret_key,
        token=intermediate_provider.token,
        assume_role={
            "role_arn": destination_role_arn,
            "session_name": "PeeringProviderSession",
        },
        )

current_destination = aws.get_caller_identity(opts=pulumi.ResourceOptions(provider=peering_provider))
print(f"Destination Arn: {current_destination.Arn}") 

@leezen
Copy link
Contributor

leezen commented Jun 25, 2021

I don't think the Python example you have above is functionally equivalent to your Go code as you don't actually construct the peering provider using the intermediate provider as a resource option. The Go example you have does look like what I'd expect and we'll need to do some investigation to understand what's going on there.

@Comradin
Copy link

Comradin commented Jun 28, 2021

I don't think the Python example you have above is functionally equivalent to your Go code as you don't actually construct the peering provider using the intermediate provider as a resource option. The Go example you have does look like what I'd expect and we'll need to do some investigation to understand what's going on there.

Correct, I do not use the pulumi.ResourceOptions() way but instead inject the Token, access_key, and secret_key from the intermediate provider.

I could change the code to:

peering_provider = aws.Provider("PeeringProvider",
    region="eu-central-1",
        assume_role={
            "role_arn": destination_role_arn,
            "session_name": "PeeringProviderSession",
        }, pulumi.ResourceOptions(providers=intermediate_provider)
   )

From my understanding this should be equivalent? It is a little irritating that in Python you have to use providers for the setup of a chained provider.

@leezen
Copy link
Contributor

leezen commented Jun 28, 2021

@Comradin I don't think they'll be equivalent as the provider doesn't actually allow for using access_key and secret_key as outputs

@Comradin
Copy link

Hmm, did I misread the provider documentation?

https://www.pulumi.com/docs/reference/pkg/aws/provider/#outputs

This part states, that all Inputs are implicitly available as outputs. So I should be able to chaint the output of the intermediate provider into the inputs of the peering provider?

@Comradin
Copy link

But, following your suggestion I changed my code and now I get the "cannot be assumed" error message like with the Go code.

    Exception: invoke of aws:index/getCallerIdentity:getCallerIdentity failed: invocation of aws:index/getCallerIdentity:getCallerIdentity returned an error: 1 error occurred:
    	* error configuring Terraform AWS Provider: IAM Role (arn:aws:iam::ACCOUNT_ID:role/ASSUME_ROLE) cannot be assumed.
    
    There are a number of possible causes of this - the most common are:
      * The credentials used in order to assume the role are invalid
      * The credentials do not have appropriate permission to assume the role
      * The role ARN is not valid
    
    Error: NoCredentialProviders: no valid providers in chain. Deprecated.
    	For verbose messaging see aws.Config.CredentialsChainVerboseErrors
    error: an unhandled error occurred: Program exited with non-zero exit code: 1

Ok, this is a bit surprising, did not expect this.

@leezen
Copy link
Contributor

leezen commented Jun 30, 2021

Thanks for the additional details. Digging into this further, it looks like this is an upstream issue as tracked in hashicorp/terraform-provider-aws#16841

@leezen leezen added the awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). label Jun 30, 2021
@lukehoban lukehoban added the resolution/duplicate This issue is a duplicate of another issue label Jun 24, 2023
@lukehoban
Copy link
Member

I believe this is the same issue tracked in #673, so closing out to track there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). kind/bug Some behavior is incorrect or out of spec resolution/duplicate This issue is a duplicate of another issue
Projects
None yet
Development

No branches or pull requests

4 participants