Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IAM eventual consistency race condition for aws_iam_role leaves lambda function in defunct state #3972

Closed
samjgalbraith opened this issue Mar 29, 2018 · 6 comments · Fixed by #3988
Labels
bug Addresses a defect in current functionality. service/lambda Issues and PRs that pertain to the lambda service.
Milestone

Comments

@samjgalbraith
Copy link

To ensure continued functioning of my system through changes which require role replacement, I have used create_before_destroy = true with my lambda function lifecycles. However, I have found that in general, changes which require the role to be replaced fail on terraform apply with the error

InvalidParameterValueException: The role defined for the function cannot be assumed by Lambda

A second run a short time later it succeeds. When I look in the AWS console, between usages of apply, it appears to leave my Lambda in an indeterminate state with no execution role. When I look at Terraform's output, what has happened is:

  • The new role is successfully created to replace the old one.
  • The old role has been destroyed.
  • The change of role for the Lambda function fails, citing "InvalidParameterValueException: The role defined for the function cannot be assumed by Lambda"

It looks like this might be because AWS returns from the role creation call before the new role is completely available to trusted AWS services such as Lambda.

I have been able to work around this issue by adding a sleep provisioner to every one of my roles, but this is not a tidy workaround:

resource "aws_iam_role" "lambda_execution_role" {
  name_prefix = "${var.role_name_prefix}"
  assume_role_policy = "${data.aws_iam_policy_document.entrust_lambda_to_assume_function_role.json}"
  description = "${var.role_description}"
  path = "${var.path}"
  lifecycle {
    create_before_destroy = true
  }
  # AWS returns success for IAM change but change is not yet available for a few seconds. Sleep to miss the race condition failure.
  provisioner "local-exec" {
    interpreter = ["bash", "-c"]
    command = "sleep 30"
  }
}

Terraform Version

Terraform v0.11.5

  • provider.archive v1.0.3
  • provider.aws v1.11.0
  • provider.local v1.1.0
  • provider.null v1.0.0
  • provider.template v1.0.0

Affected Resource(s)

  • aws_iam_role

Expected Behavior

IAM role replacement with create_before_destroy = true provides zero downtime for AWS Lambda.

Actual Behavior

Lambda function left in indeterminate state, unable to execute.

Steps to Reproduce

data "aws_iam_policy_document" "entrust_lambda_to_assume_function_role" {
  statement {
    effect = "Allow"
    principals {
      identifiers = ["lambda.amazonaws.com"]
      type        = "Service"
    }
    actions = ["sts:AssumeRole"]
  }
}

resource "aws_iam_role" "lambda_execution_role" {
  name_prefix = "testing"
  assume_role_policy = "${data.aws_iam_policy_document.entrust_lambda_to_assume_function_role.json}"
  lifecycle {
    create_before_destroy = true
  }
}

Refer to this IAM role resource with a aws_lambda_function resource. Create and apply this infrastructure. Now change the name_prefix field in aws_iam_role.

@bflad bflad added bug Addresses a defect in current functionality. service/lambda Issues and PRs that pertain to the lambda service. labels Mar 29, 2018
@bflad
Copy link
Contributor

bflad commented Mar 29, 2018

PR submitted to retry on this condition during update (was only present during creation): #3988

@bflad bflad added this to the v1.14.0 milestone Mar 29, 2018
@bflad
Copy link
Contributor

bflad commented Mar 29, 2018

The fix has been merged into master and will release with v1.14.0 of the AWS provider, in a week most likely.

@samjgalbraith
Copy link
Author

Thanks for that @bflad

@samjgalbraith
Copy link
Author

samjgalbraith commented Mar 30, 2018

It turns out my problem was actually more general than this and related to create_before_destroy not working as expected. It was creating a new dependency (the role), destroying the old one, THEN updating Lambda's reference to it last. This meant that for a while the Lambda function was still referring to an already deleted role. The failure of Lambda to update due to this IAM race condition caused infrastructure changes to halt at this temporarily broken state. This applies more generally than Lambda or even AWS and is an issue with the core engine, that I've lodged as hashicorp/terraform#17735

I agree that the change you have made is a good one and will help in some cases, it just turns out my problem was more general.

@bflad
Copy link
Contributor

bflad commented Apr 6, 2018

This has been released in version 1.14.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@ghost
Copy link

ghost commented Apr 6, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/lambda Issues and PRs that pertain to the lambda service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants