Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terraform crashes while creating emr cluster using cloudformation template #5487

Closed
pmann91 opened this issue Mar 7, 2016 · 11 comments · Fixed by #5606
Closed

terraform crashes while creating emr cluster using cloudformation template #5487

pmann91 opened this issue Mar 7, 2016 · 11 comments · Fixed by #5606
Assignees
Labels
bug provider/aws waiting-response An issue/pull request is waiting for a response from the community

Comments

@pmann91
Copy link

pmann91 commented Mar 7, 2016

Hi,

Terraform crashes while creating emr cluster using cloudformation template giving unexpected EOF error.

version being used:

terraform --version 
Terraform v0.6.12

script:

resource "aws_cloudformation_stack" "amp-emr-cluster" {
  name = "trialCluster"
  template_body = <<STACK
  { "Resources": {
      "TestCluster": {
        "Type": "AWS::EMR::Cluster",
        "Properties": {
          "Instances": {
            "MasterInstanceGroup": {
              "InstanceCount": 1,
              "InstanceType": "m3.xlarge",
              "Name": "Master"
            },
            "CoreInstanceGroup": {
              "InstanceCount": 1,
              "InstanceType": "m3.xlarge",
              "Name": "Core"
            },
            "TerminationProtected" : "False"
          },
          "Name": "trialCluster",
          "JobFlowRole" : "EMR_EC2_DefaultRole",
          "ServiceRole" : "EMR_DefaultRole",
          "ReleaseLabel" : "emr-4.2.0",
          "Tags": [
          {
            "Key": "Name",
            "Value": "trialCluster"
          }
          ],
        "LogUri" : "s3://trialclusterlog/log"
        }
      }
    }
  }
STACK
}

crash log:

Error applying plan:

1 error(s) occurred:

* aws_cloudformation_stack.amp-emr-cluster: unexpected EOF

Terraform does not automatically rollback in the face of errors.
panic: runtime error: index out of range
2016/03/07 13:49:12 [DEBUG] terraform-provider-aws: 
2016/03/07 13:49:12 [DEBUG] terraform-provider-aws: goroutine 93 [running]:
2016/03/07 13:49:12 [DEBUG] terraform-provider-aws: panic(0x10ef260, 0xc82000a090)
2016/03/07 13:49:12 [DEBUG] terraform-provider-aws:     /opt/go/src/runtime/panic.go:464 +0x3e6
2016/03/07 13:49:12 [DEBUG] terraform-provider-aws: github.com/hashicorp/terraform/builtin/providers/aws.resourceAwsCloudFormationStackCreate.func1(0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
2016/03/07 13:49:12 [DEBUG] terraform-provider-aws:     /opt/gopath/src/github.com/hashicorp/terraform/builtin/providers/aws/resource_aws_cloudformation_stack.go:152 +0x78e
2016/03/07 13:49:12 [DEBUG] terraform-provider-aws: github.com/hashicorp/terraform/helper/resource.(*StateChangeConf).WaitForState.func1(0xc820423920, 0xc8204238c0, 0xc82048b470, 0xc82048b480, 0xc82048b460)
2016/03/07 13:49:12 [DEBUG] terraform-provider-aws:     /opt/gopath/src/github.com/hashicorp/terraform/helper/resource/state.go:83 +0x1ef
2016/03/07 13:49:12 [DEBUG] terraform-provider-aws: created by github.com/hashicorp/terraform/helper/resource.(*StateChangeConf).WaitForState
2016/03/07 13:49:12 [DEBUG] terraform-provider-aws:     /opt/gopath/src/github.com/hashicorp/terraform/helper/resource/state.go:129 +0x20c
2016/03/07 13:49:12 [DEBUG] root: eval: *terraform.EvalWriteState
2016/03/07 13:49:12 [DEBUG] root: eval: *terraform.EvalApplyProvisioners
2016/03/07 13:49:12 [DEBUG] root: eval: *terraform.EvalIf
2016/03/07 13:49:12 [DEBUG] root: eval: *terraform.EvalWriteDiff
2016/03/07 13:49:12 [DEBUG] root: eval: *terraform.EvalIf
2016/03/07 13:49:12 [DEBUG] root: eval: *terraform.EvalWriteState
2016/03/07 13:49:12 [DEBUG] root: eval: *terraform.EvalApplyPost
2016/03/07 13:49:12 [ERROR] root: eval: *terraform.EvalApplyPost, err: 1 error(s) occurred:

* aws_cloudformation_stack.amp-emr-cluster: unexpected EOF
2016/03/07 13:49:12 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* aws_cloudformation_stack.amp-emr-cluster: unexpected EOF
2016/03/07 13:49:12 [ERROR] root: eval: *terraform.EvalOpFilter, err: 1 error(s) occurred:

* aws_cloudformation_stack.amp-emr-cluster: unexpected EOF
2016/03/07 13:49:12 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* aws_cloudformation_stack.amp-emr-cluster: unexpected EOF
2016/03/07 13:49:12 [TRACE] [walkApply] Exiting eval tree: aws_cloudformation_stack.amp-emr-cluster
2016/03/07 13:49:12 [DEBUG] vertex provider.aws (close), got dep: aws_cloudformation_stack.amp-emr-cluster
2016/03/07 13:49:12 [DEBUG] waiting for all plugin processes to complete...
2016/03/07 13:49:12 [DEBUG] /root/terraform-dir/terraform-provider-aws: plugin process exited



Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
!!!!!!!!!!!!!!!!!!!!!!!!!!! TERRAFORM CRASH !!!!!!!!!!!!!!!!!!!!!!!!!!!!

While on the prompt it shows failure in creating the emr cluster, the cluster is properly created and working when seen on UI.

@radeksimko
Copy link
Member

Hi @pmann91
thanks for the report.

It looks like the API happened to return no stacks via DescribeStacks right after it was created.

@sumitkarn
Copy link

Hi @radeksimko,
Terraform v0.6.1 supports old dependencies of cloudformation (which doesnt support emr) . In the recent edition of cloudformatiom we have EMR support. do you think this might be the reason ?
i am new here however had similar issues when i was investigating EMR support with terraform so thought to add my view. I also saw that go-aws-sdk with latest master in git has emr support . does this mean we need update the vendor deps? May be i am missing something obvious please let me know your view. Thanks

@pmann91
Copy link
Author

pmann91 commented Mar 9, 2016

Hi @radeksimko

I was working on it and found that the above Terraform script executes successfully if template_url is used and s3 bucket path containing the template body is provided instead of template_body. Terraform doesn't crash and creates the cluster properly.

@radeksimko
Copy link
Member

I was able to reproduce similar problem with the following config:

resource "aws_cloudformation_stack" "full" {
  name = "tf-full-stack"
  template_body = <<STACK
{
  "Parameters" : {
    "VpcCIDR" : {
      "Description" : "CIDR to be used for the VPC",
      "Type" : "String"
    }
  },
  "Resources" : {
    "MyVPC": {
      "Type" : "AWS::EC2::VPC",
      "Properties" : {
        "CidrBlock" : {"Ref": "VpcCIDR"},
        "Tags" : [
          {"Key": "Name", "Value": "Primary_CF_VPC"}
        ]
      }
    },
    "StaticVPC": {
      "Type" : "AWS::EC2::VPC",
      "Properties" : {
        "CidrBlock" : {"Ref": "VpcCIDR"},
        "Tags" : [
          {"Key": "Name", "Value": "Static_CF_VPC"}
        ]
      }
    },
    "InstanceRole" : {
      "Type" : "AWS::IAM::Role",
      "Properties" : {
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [ {
            "Effect": "Allow",
            "Principal": { "Service": "ec2.amazonaws.com" },
            "Action": "sts:AssumeRole"
          } ]
        },
        "Path" : "/",
        "Policies" : [ {
          "PolicyName": "terraformtest",
          "PolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [ {
              "Effect": "Allow",
              "Action": [ "ec2:DescribeSnapshots" ],
              "Resource": [ "*" ]
            } ]
          }
        } ]
      }
    }
  }
}
STACK
  parameters {
    VpcCIDR = "10.0.0.0/16"
  }

  policy_body = <<POLICY
{
  "Statement" : [
    {
      "Effect" : "Deny",
      "Action" : "Update:*",
      "Principal": "*",
      "Resource" : "LogicalResourceId/StaticVPC"
    },
    {
      "Effect" : "Allow",
      "Action" : "Update:*",
      "Principal": "*",
      "Resource" : "*"
    }
  ]
}
POLICY
  capabilities = ["CAPABILITY_IAM"]
  notification_arns = ["${aws_sns_topic.cf-updates.arn}"]
  on_failure = "DELETE"
  timeout_in_minutes = 1
  tags {
    First = "Mickey"
    Second = "Mouse"
  }
}

resource "aws_sns_topic" "cf-updates" {
  name = "tf-cf-notifications"
}

which lead me to this error:

* aws_cloudformation_stack.full: unexpected state 'DELETE_IN_PROGRESS', wanted target '[CREATE_COMPLETE]'

The reason this happened is because the stack creation exceeds given timeout and CloudFormation then starts deleting it because of on_failure = "DELETE".

When I took your example, I was also able to reproduce an error, but a graceful error:

* aws_cloudformation_stack.amp-emr-cluster: ROLLBACK_COMPLETE:
["The following resource(s) failed to create: [TestCluster]. . Rollback requested by user." "ElasticMapReduce Cluster failed to stabilize."]

i.e. not the index out of range error & crash. I totally believe that a user may hit this bug, but I can't believe it would be happening with the exact config you provided (which is missing on_failure attribute), because the default behaviour is rollback, not deletion.

@pmann91 Can you please double check you provided the exact config that caused the crash?

@radeksimko radeksimko added the waiting-response An issue/pull request is waiting for a response from the community label Mar 12, 2016
@radeksimko
Copy link
Member

@sumitkarn

Terraform v0.6.1 supports old dependencies of cloudformation (which doesnt support emr) . In the recent edition of cloudformatiom we have EMR support. do you think this might be the reason ?

Since the CloudFormation API (not the Go SDK) is responsible for parsing the actual JSON definition for the stack, I don't believe the version of SDK would be causing such problems.

@sumitkarn
Copy link

@radeksimko , Thanks for the detail i understood it now.

@pmann91
Copy link
Author

pmann91 commented Mar 14, 2016

Hey @radeksimko ,

I have double checked the config and its same. Even tried executing the same script again but got the same crash report.

@radeksimko radeksimko removed the waiting-response An issue/pull request is waiting for a response from the community label Mar 14, 2016
@radeksimko
Copy link
Member

@pmann91 It is good (at least for debugging purposes) that you're getting this error consistently and not intermittently. However I was really not able to reproduce it even after applying & destroying your config 5 times.

The last difference I can think of is region (I'm using us-west-2) or platform (OSX 10.10.5 / 64bit)?

I have prepared a "trap" for these kind of situations and I can further increase the logging verbosity around this. Would you be able to compile the code from my PR/branch and try rerunning your example? If not, I can send you a binary - I just need to know what platform you run on.

I'm afraid that I won't be able to get to the bottom of it without your help (since I cannot reproduce it).

@radeksimko radeksimko added the waiting-response An issue/pull request is waiting for a response from the community label Mar 14, 2016
@pmann91
Copy link
Author

pmann91 commented Mar 14, 2016

@radeksimko

Region: us-west-2
Platform: Ubuntu 14.04, 64 bit, kernel version 3.16

If you can provide the library, I can re-execute and check for results.

@pedrodparkes
Copy link

tested config on us-west-2.
tested own config on same zone.

fails with "cluster failed stabilize". What does it mean?

@ghost
Copy link

ghost commented Apr 22, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug provider/aws waiting-response An issue/pull request is waiting for a response from the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants