Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure when adding or removing cold or frozen tiers #466

Closed
4 tasks done
immavalls opened this issue Apr 12, 2022 · 8 comments
Closed
4 tasks done

Failure when adding or removing cold or frozen tiers #466

immavalls opened this issue Apr 12, 2022 · 8 comments
Assignees
Labels
bug Something isn't working framework-limitation This issue is caused by limitations of the Terraform SDK Known issue Known issue that is well documented Team:Stack-And-Solutions.important theme:topology
Milestone

Comments

@immavalls
Copy link

Readiness Checklist

Expected Behavior

Add a cold or frozen tier to an existing deployment. Or removing from an existing one.

Current Behavior

We get the error:

│ Error: failed updating deployment: 3 errors occurred:
│       * api error: clusters.cluster_invalid_plan: Instance configuration [gcp.es.datacold.n2.68x10x190] does not allow usage of node types [master,ingest]. You must either change instance configuration or use only allowed node types [data]. (resources.elasticsearch[0].cluster_topology[2].instance_configuration_id)
│       * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = cold] (resources.elasticsearch[0])
│       * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = hot_content] (resources.elasticsearch[0])
│ 
│ 
│ 
│   with ec_deployment.multi_tier,
│   on deployment.tf line 17, in resource "ec_deployment" "multi_tier":
│   17: resource "ec_deployment" "multi_tier" {

Even if using alphabetical order when defining the resources.

Steps to Reproduce

  • Create a simple deployment using terraform with terraform apply -auto-approve (it requires previously setting the EC Ap key with EC_API_KEY="<ESS_API_KEY>" and terraform init.
terraform {
  required_version = ">= 0.12.29"

  required_providers {
    ec = {
      source  = "elastic/ec"
      version = "0.4.0"
    }
  }
}

provider "ec" {}

# Create an Elastic Cloud deployment
resource "ec_deployment" "multi_tier" {
  name = "multi_tier"

  region                 = "gcp-europe-west3"
  version                = "7.17.1"
  deployment_template_id = "gcp-storage-optimized"

  elasticsearch {
    autoscale = "false"

    topology {
      id         = "hot_content"
      size       = "1g"
      zone_count = 1
    }
   topology {
      id         = "warm"
      zone_count = 1
      size       = "2g"
    }
  }

  kibana {
    topology {
      size               = "1g"
      zone_count         = 1
    }
  }
}
  • We have a correct terraform.state
                    "id": "hot_content",
                    "instance_configuration_id": "gcp.es.datahot.n2.68x10x45",
                    "node_roles": [
                      "data_content",
                      "data_hot",
                      "ingest",
                      "master",
                      "remote_cluster_client",
                      "transform"
                    ]
                ....
                   
                    "id": "warm",
                    "instance_configuration_id": "gcp.es.datawarm.n2.68x10x190",
                    "node_roles": [
                      "data_warm",
                      "remote_cluster_client"
                    ]
  • Change the resources above to add a cold tier and apply again terraform apply -auto-approve
elasticsearch {
    autoscale = "false"

    topology {
      id         = "cold"
      size       = "4g"
      zone_count = 1
    }
   topology {
      id         = "hot_content"
      size       = "1g"
      zone_count = 1
    }
   topology {
      id         = "warm"
      zone_count = 1
      size       = "2g"
    }
  }
  • We'll get:
│ Error: failed updating deployment: 3 errors occurred:
│       * api error: clusters.cluster_invalid_plan: Instance configuration [gcp.es.datacold.n2.68x10x190] does not allow usage of node types [master,ingest]. You must either change instance configuration or use only allowed node types [data]. (resources.elasticsearch[0].cluster_topology[2].instance_configuration_id)
│       * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = cold] (resources.elasticsearch[0])
│       * api error: deployments.elasticsearch.node_roles_error: Invalid node_roles configuration: The node_roles in the plan contains values not present in the template. [id = hot_content] (resources.elasticsearch[0])
│ 
│ 
│ 
│   with ec_deployment.multi_tier,
│   on deployment.tf line 17, in resource "ec_deployment" "multi_tier":
│   17: resource "ec_deployment" "multi_tier" {
  • And in the terrafom.state we can see that it has mixed, the cold id has hot attributes (instance configuration, node roles not allowed incold like ingest or master). The hot_content has warm attributes. And the warm has emptry attributes.
                    "id": "cold",
                    "instance_configuration_id": "gcp.es.datahot.n2.68x10x45",
                    "node_roles": [
                      "data_content",
                      "data_hot",
                      "ingest",
                      "master",
                      "remote_cluster_client",
                      "transform"
                    ],

... 

                    "id": "hot_content",
                    "instance_configuration_id": "gcp.es.datawarm.n2.68x10x190",
                    "node_roles": [
                      "data_warm",
                      "remote_cluster_client"
                    ],
                    "id": "warm",
                    "instance_configuration_id": "",
                    

Context

Trying to add a cold tier to a deployment that already has a hot and warm. Several combinations lead to this same error.

Possible Solution

We have found no solution/workaround so far. Once this is hit, we have to use the cloud UI to add or remove tiers, and then terraform apply -refresh-only.

Your Environment

  • Version used: Terraform v1.1.7 on darwin_amd64 + provider registry.terraform.io/elastic/ec v0.4.0
  • Running against Elastic Cloud SaaS or Elastic Cloud Enterprise and version: ESS, stack version 7.17.1
  • Operating System and version: macOS Monterey 12.3.1
@immavalls immavalls added the bug Something isn't working label Apr 12, 2022
@AndriiLavrekha
Copy link

Hi guys, any updates on this issue?
It feels I'm being stuck with the same bug issue

@AndriiLavrekha
Copy link

@Kushmaro @jaggederest any ideas when and how it can be fixed?
It looks like a critical issue for me as it prevents using elastic major features.
Also, @immavalls, have you maybe found any solutions or workaround for this issue since the ticket was opened in April and there is no updates since?

@Kushmaro
Copy link
Collaborator

Kushmaro commented Aug 8, 2022

We are looking into this @AndriiLavrekha , but we can't provide any timelines as of yet.

@AndriiLavrekha
Copy link

@Kushmaro Thank you for the comment.
Can you maybe also confirm that issue affects only 'cold' and 'frozen' topologies usage?

@Kushmaro
Copy link
Collaborator

Kushmaro commented Aug 8, 2022

I can't @AndriiLavrekha , this needs further investigation to confirm or deny it affects only a single type of tier.

@pascal-hofmann
Copy link
Contributor

pascal-hofmann commented Aug 9, 2022

I think this is due to #336.

Even if you specify the blocks in alphabetical order things don't always work.

In my case the order in the state changes after running terraform refresh. I'm trying to find out where this happens, but had no luck so far.

@dimuon
Copy link
Contributor

dimuon commented Sep 5, 2022

The defect indeed is caused by the same logic and limitations that cause #336.

The possible workaround:

If autoscale disabled

Initial deployment creation

Topology elements (tiers) with non-zero sizes have to be listed in alphabetical order of their id fields.

Update - new tier adding

  • add new tier in the end of the tolology list (that is already sorted by alphabetical order)
  • run terraform apply
  • reorder the topology list in alphabetical order
  • check that there are no pending changes by terraform plan - it should output empty diff

Update - removing existing tier

  • set the tier's size to 0
  • run terraform apply
  • remove tier from the topology list
  • check that there are no pending changes - terraform plan should output empty diff

If autoscale enabled

The idea is the same but applies to all tiers that either have non-zero sizes or can be resized by auto scaling (it happens when a corresponding deployment template specifies non-zero autoscaling_max for the tier) - all these tiers should be listed in alphabetical order of their id fields, even if their blocks don't specify other fields beside id.

However, if the tier's size is zero and a corresponding deployment template doesn't specify autoscaling_max for the tier or its value is zero, the tier should be omitted from the topology list.

Also, make sure to ignore size attributes if you'd like to specify initial sizes for tiers - the sized can be changed later on by the autoscaler e.g. the snippet ignores updates of sizes of the 2nd and 4th entries of the topology list:

  lifecycle {
    ignore_changes = [
      elasticsearch[0].topology[2].size,
      elasticsearch[0].topology[4].size
    ]
  }

@dimuon dimuon added Known issue Known issue that is well documented framework-limitation This issue is caused by limitations of the Terraform SDK labels Sep 5, 2022
@Kushmaro Kushmaro modified the milestones: 0.5.0, 0.6.0 Oct 18, 2022
@dimuon
Copy link
Contributor

dimuon commented Mar 1, 2023

Closed by #567

@dimuon dimuon closed this as completed Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working framework-limitation This issue is caused by limitations of the Terraform SDK Known issue Known issue that is well documented Team:Stack-And-Solutions.important theme:topology
Projects
None yet
Development

No branches or pull requests

6 participants