Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data resource forces update in modules with count #25759

Closed
atimush opened this issue Aug 6, 2020 · 12 comments
Closed

Data resource forces update in modules with count #25759

atimush opened this issue Aug 6, 2020 · 12 comments
Labels
bug waiting for reproduction unable to reproduce issue without further information

Comments

@atimush
Copy link

atimush commented Aug 6, 2020

Terraform Version

Terraform v0.13.0-rc1

Terraform Configuration Files

The submodule (./modules/eks_helm_components):

data "aws_region" "current" {}

resource "helm_release" "cluster_autoscaler" {
  name       = "cluster-autoscaler"
  namespace  = "kube-system"
  repository = "https://kubernetes-charts.storage.googleapis.com"
  chart      = "cluster-autoscaler"
  version    = lookup({ for k, v in var.helm_components : "chart_version" => v["chart_version"] if k == "cluster_autoscaler" }, "chart_version")
  values = [
    yamlencode(lookup({ for k, v in var.helm_components : "values_default" => v["values_default"] if k == "cluster_autoscaler" }, "values_default")),
    yamlencode(lookup({ for k, v in var.helm_components : "values" => v["values"] if k == "cluster_autoscaler" }, "values")),
    ### The below values can not be overridden by user
    yamlencode({
      awsRegion = data.aws_region.current.name
      autoDiscovery = {
        clusterName = var.cluster_name
      }
    })]
}

The root module:

module "eks_helm_components_2" {
  source                      = "./modules/eks_helm_components"
  count                       = var.cluster_2 != null ? 1 : 0
  cluster_name                = var.cluster_1.name
  providers = {
    helm = helm.cluster_1
  }
  helm_components = { for k, v in local.helm_components_properties : k => merge(v, lookup(var.cluster_2.base_components, k, {})) }
}

Debug Output

module.test.module.eks_helm_components_2[0].data.aws_region.current will be read during apply
  (config refers to values not yet known)
 <= data "aws_region" "current"  {
      + current     = (known after apply)
      ~ description = "Europe (Ireland)" -> (known after apply)
      ~ endpoint    = "ec2.eu-west-1.amazonaws.com" -> (known after apply)
      ~ id          = "eu-west-1" -> (known after apply)
      ~ name        = "eu-west-1" -> (known after apply)
    }

module.test.module.eks_helm_components_2[0].helm_release.cluster_autoscaler will be updated in-place
  ~ resource "helm_release" "cluster_autoscaler" {
        atomic                     = false
        chart                      = "cluster-autoscaler"
        cleanup_on_fail            = false
        create_namespace           = false
        dependency_update          = false
        disable_crd_hooks          = false
        disable_openapi_validation = false
        disable_webhooks           = false
        force_update               = false
        id                         = "cluster-autoscaler"
        lint                       = false
        max_history                = 0
        metadata                   = [
            {
                chart     = "cluster-autoscaler"
                name      = "cluster-autoscaler"
                namespace = "kube-system"
                revision  = 1
                values    = jsonencode(
                    {
                        autoDiscovery  = {
                            clusterName = "test"
                        }
                        awsRegion      = "eu-west-1"
                        cloudProvider  = "aws"
                        image          = {
                            tag = "v1.17.1"
                        }
                        nameOverride   = "cluster-autoscaler"
                    }
                )
                version   = "7.3.4"
            },
        ]
        name                       = "cluster-autoscaler"
        namespace                  = "kube-system"
        recreate_pods              = false
        render_subchart_notes      = true
        replace                    = false
        repository                 = "https://kubernetes-charts.storage.googleapis.com"
        reset_values               = false
        reuse_values               = false
        skip_crds                  = false
        status                     = "deployed"
        timeout                    = 300
      ~ values                     = [
          - <<~EOT
                "cloudProvider": "aws"
                "image":
                  "tag": "v1.17.1"
                "nameOverride": "cluster-autoscaler"
            EOT,
          - jsonencode({}),
          - <<~EOT
                "autoDiscovery":
                  "clusterName": "test"
                "awsRegion": "eu-west-1"
            EOT,
        ] -> (known after apply)
        verify                     = false
        version                    = "7.3.4"
        wait                       = true
    }

Crash Output

Expected Behavior

TF should not force Helm resource update on each plan unless data resource change.

Actual Behavior

TF forces the Helm resource to update on each plan.

Steps to Reproduce

  1. terraform init
  2. terraform apply
  3. terraform plan

Additional Context

The state file has the "current": null, value. We run TF in TF Cloud.

"mode": "data",
      "type": "aws_region",
      "name": "current",
      "provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "current": null,
            "description": "Europe (Ireland)",
            "endpoint": "ec2.eu-west-1.amazonaws.com",
            "id": "eu-west-1",
            "name": "eu-west-1"
          }
        }
      ]

References

@atimush atimush added bug new new issue not yet triaged labels Aug 6, 2020
@danieldreier
Copy link
Contributor

@jbardin can you help me understand this? Is this a matter of the aws_region data data source being evaluated during application rather than during planning?

@danieldreier danieldreier added the waiting-response An issue/pull request is waiting for a response from the community label Aug 13, 2020
@jbardin
Copy link
Member

jbardin commented Aug 13, 2020

Yes, it appears that data.aws_region.current is being deferred to apply, but I'm not certain exactly what is causing this as it's clear there is no data source configuration to be "unknown". The only other thing that should cause it to be deferred is if there are any depends_on references with changes, but the only other resource shown is a direct dependent and there are no depends_on attributes in the sample config.

The "current": null field in the state is expected with aws provider versions <3.0.0, since that field has been deprecated but not yet removed.

If this is still happening with 0.13.0, I would need a trace log or reproduction to learn more.

@danieldreier
Copy link
Contributor

Thanks @jbardin! @atimush, do you think you can help generate a simplified reproduction case that we can run directly? The code you've provided has dependencies I can't see, so we'd need something we can copy-paste or git clone. Ideally, this would have a similar structure but leave out helm and eks, because the effort of setting up an EKS + helm infrastructure is pretty significant.

@qwerty1979bg
Copy link

@danieldreier

Note: this might turn out to be an unrelated issue

It seems that at least this specific AWS data source (aws_vpcs) forces an update in v0.13.0. It does not have to be in a module or use count.

provider "aws" {
  version = "3.2.0"
  region  = "us-east-1"
}

data "aws_vpcs" "this" {
}

resource "null_resource" "test" {
}

This code applies only once, as expected, on TF v0.12.29.

With TF v0.13.0 it tries to read the data source during apply.

  # data.aws_vpcs.this will be read during apply
  # (config refers to values not yet known)

(Note: The AWS provider version is the same in both cases - 3.2.0)

@remilapeyre
Copy link
Contributor

Hi @qwerty1979bg, I think the issue you are having is reported at hashicorp/terraform-provider-aws#14579.

@ghost ghost removed waiting-response An issue/pull request is waiting for a response from the community labels Aug 16, 2020
@qwerty1979bg
Copy link

@remilapeyre, thanks ! This is an exact match ! :)

@atimush,
Your code uses data "aws_region" "current" {}, which is in the list of affected resources in hashicorp/terraform-provider-aws#14579

@apparentlymart
Copy link
Contributor

Thanks for the pointer, @remilapeyre!

I believe the issue over in the AWS provider affects the aws_regions (plural) data source, but not the aws_region singular one, because the singular one sets its id to the name of the single region it looks up.

My sense is that this problem arose because a plural data source doesn't really have a single id string to use, and the Terraform SDK requires that every data source result include a non-empty id so the developers of these data sources used the current time as a placeholder ID value.

However, because aws_region does converge on a stable result (it uses the current region name as the id), I think the behavior reported here is separate from hashicorp/terraform-provider-aws#14579 and warrants some independent investigation.

With that said, as @danieldreier said the next step here would be to find a simplified configuration that we can run to reproduce the problem. Hopefully there's a smaller reproduction case that includes only aws_region and potentially an additional placeholder resource like null_resource to exercise its result, which would then allow us to observe it without needing a full Kubernetes cluster.

@apparentlymart apparentlymart added the waiting-response An issue/pull request is waiting for a response from the community label Aug 19, 2020
@jbardin
Copy link
Member

jbardin commented Aug 20, 2020

To be clear, the non-deterministic ID issues described in terraform-provider-aws#14579 is only the symptom. There are 2 primary causes we have found: data sources that are altering non-computed attributes, and the SDK altering zero block values. Since those two situations cover all the known cases so far, we should probably just wait and confirm the status of this issue with the upcoming aws provider and core releases.

@atimush
Copy link
Author

atimush commented Aug 23, 2020

Hello guys,
Thank you for your efforts and apologies for a late reply. I've been trying to provide a simple local execution to simulate the issue but all runs are perfectly fine :(

Still, the existing EKS modules are reporting the same issue. Noticed that the problem is not related with AWS region only data source but with any data resources (even with some of the AWS IAM Policy documents). I would assume could be something related to the providers:
providers = { helm = helm.cluster_1 }

Providing below some of the traces that might help:

2020/08/23 19:37:07 [TRACE] vertex "module.g8s.module.eks_helm_components_1.data.aws_region.current": expanding dynamic subgraph
2020/08/23 19:37:07 [TRACE] Executing graph transform *terraform.ResourceCountTransformer
2020/08/23 19:37:07 [TRACE] ResourceCountTransformer: adding module.g8s.module.eks_helm_components_1[0].data.aws_region.current as *terraform.NodeRefreshableDataResourceInstance
2020/08/23 19:37:07 [TRACE] Completed graph transform *terraform.ResourceCountTransformer with new graph:
  module.g8s.module.eks_helm_components_1[0].data.aws_region.current - *terraform.NodeRefreshableDataResourceInstance  ------
2020/08/23 19:37:07 [TRACE] Executing graph transform *terraform.OrphanResourceInstanceCountTransformer
2020/08/23 19:37:07 [TRACE] Completed graph transform *terraform.OrphanResourceInstanceCountTransformer (no changes)
2020/08/23 19:37:07 [TRACE] Executing graph transform *terraform.AttachStateTransformer
2020/08/23 19:37:07 [TRACE] Completed graph transform *terraform.AttachStateTransformer (no changes)
2020/08/23 19:37:07 [TRACE] Executing graph transform *terraform.TargetsTransformer
2020/08/23 19:37:07 [TRACE] Completed graph transform *terraform.TargetsTransformer (no changes)
2020/08/23 19:37:07 [TRACE] Executing graph transform *terraform.ReferenceTransformer
2020/08/23 19:37:07 [DEBUG] ReferenceTransformer: "module.g8s.module.eks_helm_components_1[0].data.aws_region.current" references: []
2020/08/23 19:37:07 [TRACE] Completed graph transform *terraform.ReferenceTransformer (no changes)
2020/08/23 19:37:07 [TRACE] Executing graph transform *terraform.RootTransformer
2020/08/23 19:37:07 [TRACE] Completed graph transform *terraform.RootTransformer (no changes)
2020/08/23 19:37:07 [TRACE] vertex "module.g8s.module.eks_helm_components_1.data.aws_region.current": entering dynamic subgraph
2020/08/23 19:37:07 [TRACE] dag/walk: visiting "module.g8s.module.eks_helm_components_1[0].data.aws_region.current"
2020/08/23 19:37:07 [TRACE] vertex "module.g8s.module.eks_helm_components_1[0].data.aws_region.current": starting visit (*terraform.NodeRefreshableDataResourceInstance)
2020/08/23 19:37:07 [TRACE] vertex "module.g8s.module.eks_helm_components_1[0].data.aws_region.current": evaluating
2020/08/23 19:37:07 [TRACE] [walkRefresh] Entering eval tree: module.g8s.module.eks_helm_components_1[0].data.aws_region.current
2020/08/23 19:37:07 [TRACE] eval: *terraform.EvalSequence
2020/08/23 19:37:07 [TRACE] eval: *terraform.EvalGetProvider

Feel free to close the case if we can not find the root cause of this behaviour. Will try to update in the upcoming hours if able to re-produce it in a more simple way.

Thank you,
Andrei

@ghost ghost removed waiting-response An issue/pull request is waiting for a response from the community labels Aug 23, 2020
@jbardin
Copy link
Member

jbardin commented Sep 1, 2020

@atimush

You mention that this happens with a module count in conjunction with the data sources. Without a repro we can't tell exactly what's going on, but some general debugging may help. I suspect the data source being shown in the plan is simply a side-effect of what's going on, and not the reason. From the partial config here, I would suspect it's being caused by something in the helm_components inputs being unknown.

First thing to check, is this still a problem with 0.13.1?
Does the situation change when hard-coding the count value?
Does hard-coding the awsRegion value change anything?

Reducing the problem to a single config change that triggers the diff could give us a clue as to what is going on.
The relevant section of the trace log would be from the plan creation, but the entire log would be more helpful if possible.

Thanks

@jbardin jbardin added waiting for reproduction unable to reproduce issue without further information and removed new new issue not yet triaged labels Sep 1, 2020
@jbardin
Copy link
Member

jbardin commented Oct 13, 2020

There hasn't been any new activity here, so I'm going to close this out.
Similar issues should be rectified in 0.14, as the merging of refresh+plan has allowed for more data resource lifecycle improvements.

Thanks!

@jbardin jbardin closed this as completed Oct 13, 2020
@ghost
Copy link

ghost commented Nov 13, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug waiting for reproduction unable to reproduce issue without further information
Projects
None yet
Development

No branches or pull requests

6 participants