Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create/Destroy ruleset rule error #206

Closed
bartoszspiechowicz opened this issue Apr 15, 2020 · 12 comments
Closed

Create/Destroy ruleset rule error #206

bartoszspiechowicz opened this issue Apr 15, 2020 · 12 comments

Comments

@bartoszspiechowicz
Copy link

Hello,

It looks like there could be some kind of bug probably in plugin which implies randomly error occured during creating/destroying more than one rules at once. Create/Destroy only one rule at ones works correctly.

Terraform Version

0.12.24

PagerDuty Provider Version

1.6.1

Affected Resource(s)

  • pagerduty_ruleset_rule

Terraform Configuration Files

provider "pagerduty" {
  token = "xxx"
  version = "~> 1.6.1"
}

resource "pagerduty_ruleset" "ruleset1" {
    name = "Ruleset1"
    team { 
        id = "PL4AQLJ"
    }
}

resource "pagerduty_ruleset_rule" "warning" {
  ruleset = pagerduty_ruleset.ruleset1.id
  disabled = "false"
  conditions {
    operator = "or"
    subconditions {
      operator = "contains"
      parameter {
        value = "***WARNING***"
        path = "Summary"
      }
    }
  }
  actions {
    route {
      value = "PJ9WH2G"
    }
    severity  {
      value = "warning"
    }
    annotate {
      value = "From Terraform"
    }
  }
}

resource "pagerduty_ruleset_rule" "info" {
  ruleset = pagerduty_ruleset.ruleset1.id
  disabled = "false"
  conditions {
    operator = "or"
    subconditions {
      operator = "contains"
      parameter {
        value = "***INFO***"
        path = "Summary"
      }
    }
  }
  actions {
    route {
      value = "PJ9WH2G"
    }
    severity  {
      value = "info"
    }
    annotate {
      value = "From Terraform"
    }
  }
}

resource "pagerduty_ruleset_rule" "error" {
  ruleset = pagerduty_ruleset.ruleset1.id
  disabled = "false"
  conditions {
    operator = "or"
    subconditions {
      operator = "contains"
      parameter {
        value = "***HIGH***"
        path = "Summary"
      }
    }
  }
  actions {
    route {
      value = "PJ9WH2G"
    }
    severity  {
      value = "error"
    }
    annotate {
      value = "From Terraform"
    }
  }
}

resource "pagerduty_ruleset_rule" "critical" {
  ruleset = pagerduty_ruleset.ruleset1.id
  position = 0
  disabled = "false"
  conditions {
    operator = "or"
    subconditions {
      operator = "contains"
      parameter {
        value = "***CRITICAL***"
        path = "Summary"
      }
    }
  }
  actions {
    route {
      value = "PJ9WH2G"
    }
    severity  {
      value = "critical"
    }
    annotate {
      value = "From Terraform"
    }
  }
}

Debug Output

https://gist.github.com/bartoszspiechowicz/9d52adab81ec7138d3e13f8e187a8fba
https://gist.github.com/bartoszspiechowicz/c5bf308f3bfd40290b997e0545e57255

Expected Behavior

All Ruleset rules are created without errors.

Actual Behavior

PagerDuty API randomly return "500 Internal Server Error" during creating/destroying more than one rules at once.
It doesn't occurs during creating/destroying only one rule.

Steps to Reproduce

  1. Set correct PageDuty token
  2. Set correct Team Id
  3. Set correct Service Id
  4. terraform apply
@matt-canty-dragon
Copy link

Yes, seeing same behaviour. Same issue affects pagerduty_event_rule type:
https://github.com/terraform-providers/terraform-provider-pagerduty/issues/175

Use case for this is nesting rules with a service. When service module is instantiated, it creates rules which are unaware of other rules. To create dependencies would mean also making services depend on each other (at least in code).

@matt-canty-dragon
Copy link

My attempted hack to a solution also presents a problem. I'd hoped to be able to pass around references to other rules in the set. The following successfully creates 3 rules:

resource null_resource dependency_hack {}

module one {
  source = "../modules/simple-rule"

  service_id      = "ABCDEFG"
  service_name    = "unicorn-service1"
  priority        = "P1"
  rule_dependency = null_resource.dependency_hack.id
}

module two {
  source = "../modules/simple-rule"

  service_id      = "ABCDEFG"
  service_name    = "unicorn-service2"
  priority        = "P2"
  rule_dependency = module.one.rule_id
}

module three {
  source = "../modules/simple-rule"

  service_id      = "ABCDEFG"
  service_name    = "unicorn-service3"
  priority        = "P3"
  rule_dependency = module.two.rule_id
}

However if I then decide that I am no longer interested in module two I:

  1. Delete module two
  2. Set module.three.rule_dependency to module.one.rule_id

The same error occurs. This really needs to be fixed!

@matt-canty-dragon
Copy link

matt-canty-dragon commented Apr 20, 2020

Retry would be great interim fix. Any idea when this might be merged & available @stmcallister?

Also I am currently using pagerduty_event_rule, is this being deprecated?

@stmcallister
Copy link
Contributor

@matt-canty-dragon Let me know if that PR doesn't fix the issue for you.

We'll leave the pagerduty_event_rule resource available for a little while, since the API is still active. But, no future development will be done on the /event_rules API, so it's highly recommended you transition to using ruleset_rules at some point.

@matt-canty-dragon
Copy link

Hi @stmcallister I have transitioned already to pagerduty_ruleset_rule, things were looking promising locally, however I am now seeing 500 errors:

Error: PUT API call to https://api.pagerduty.com/rulesets/_default/rules/4336ff02-390a-4b85-b27f-25d6535abaed failed: 500 Internal Server Error

  on modules/datadog-base-service-rules/main.tf line 1, in resource "pagerduty_ruleset_rule" "p1":
   1: resource pagerduty_ruleset_rule p1 {

Overall, 202 event rules were expected to be created. However I am currently stuck at his position now. I think the problem is it is continually trying to change the position back to null.

image

I don't really want to set a position. Also I don't really want to store an arbitrary number anywhere which the service module would then pass down to the ruleset module.

I hope this makes sense.

@matt-canty-dragon
Copy link

matt-canty-dragon commented Apr 23, 2020

Hi @stmcallister a little more context and info.

In our solution I intend to:

  1. Follow the standard advice and use a single event rule. This gives me a single integration point between PagerDuty and Datadog.
  2. Raise incidents in PagerDuty based on the tags in Datadog. For example the tags priority:P1 and service:example-service will raise a P1 incident on Example Service. In a micro-service environment this quickly becomes 100+ rules.
  3. Nest event rules inside a service module: meaning groups of event rules have no knowledge of each other in my Terraform declaration.

Having experimented with the Terraform I have concluded that logic behind the API adjusts the position attribute and causes this problem.

Applying a rule and then re-applying with no change will always cause a change in the plan due to the computed (PagerDuty) and the declared (Terraform) priority differing.

I see the following options:

  1. Provider is altered to use the computed priority - I hope this is possible.
  2. I refactor my solution to store all rules in a single file - not ideal because I don't model them this way in my mind. They relate to the services.
  3. I provide a number to my rule module so that it uses that to set the priority - seems flaky, if you remove a service then all the numbers would have to be moved around.

I hope this helps. It's quite convoluted and largely a result of the API design rather than the Terraform provider I believe. Would appreciate some kind of timelines, I am thinking of interim workarounds however I don't like the sound of any of them!

@stmcallister stmcallister reopened this Apr 23, 2020
@stmcallister
Copy link
Contributor

@matt-canty-dragon Thanks for this feedback! I'm going to connect with the Engineer team over the rulesets API and share this to see if they have additional insight. Then we'll figure out the best way to adjust the provider, if needed.

@matt-canty-dragon
Copy link

I am going to implement a work-around in order to prevent further delay.

Thanks, look forward to hearing more.

@matt-canty-dragon
Copy link

For now I have worked around the issue, by creating all ruleset rules in a single file like so:

module unicorn_service_priority_rules {
  source = "./modules/datadog-base-service-rules"

  service_id    = module.unicorn_service.service_id
  service_name  = module.unicorn_service.service_name
  last_position = 0
  last_rule     = null
}

module my_service_priority_rules {
  source = "./modules/datadog-base-service-rules"

  service_id    = module.my_service.service_id
  service_name  = module.my_service.service_name
  last_position = module.unicorn_service_priority_rules.last_position
  last_rule     = module.unicorn_service_priority_rules.last_rule
}

This way all rules (each module creates 1 rule per priority) are created sequentially and in order of priority.

No updates are required if no changes are made to this file or the module now, which is great.

@stmcallister
Copy link
Contributor

Thanks for sharing the feedback @matt-canty-dragon! I shared this thread with the engineering team over rulesets and they noticed that the conflict error codes returned by the API were incorrect.

In reading through this again, I'm seeing that you're having issues with rules associated with different services conflicting with each other, and that service-level event rules will be ideal. Those types of rules are not available yet, but they are on the roadmap. While not ideal for how you think about the problem, I think your solution makes sense.

@matt-canty-dragon
Copy link

Interesting. I would like to discuss this further with our account manager. Happy to continue with this work-around for now.

Official advice is as I said to have one ruleset to minimise the number of integration points for Datadog. So, I look forward to service-level rulesets.

@stmcallister
Copy link
Contributor

Cool. Seeing as this issue has evolved a bit, and we have a temporary resolution, I'm going to close this one. Please feel free to open another issue if you have any further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants