You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue contains a proposed workflow and set of commands that would be built to support the workflow. This should be decomposed into a number of additional issues, likely one per command, which cover the specific details related to implementing those commands.
The example command lines included are just that, examples. We should feel free to iterate on the UX around the command interface and make improvements as we use them in practice while they're being built.
Jane enters #ops and acknowledges that she has received the page and is investigating.
pagerduty:ack 123
Jane investigates the problem and confirms that the site is down.
statuspage:incident new -s investigating -c website "Site Outage" "We are currently investigating problems with example.com. We expect to resolve the issue shortly and will post updates as additional information is available. Thanks for your patience." *> here #support
Incident Id: 911
Jane fixes the problem outside of chat.
statuspage:incident update 911 The example.com site is back online. We will monitor the site closely to ensure that the problems are fully resolved. *> here #support
... time passes ...
Jane marks the PagerDuty event resolved and closes out the status incidents.
statuspage:component status website green
statuspage:incident update -s resolved 911 All systems go. *> here #support
pagerduty:resolve 123
Other Use Cases:
Support team notices a problem, needs help:
pagerduty:alert website Customers complaining the site is down. Need help in #support
Support team tweets from @ExampleComSupport to let customers know of an issue:
twitter:tweet Our engineers are currently investigating problems connecting to example.com. Watch status.example.com for more information
Developer wants to know who is oncall for web so they can ask for help for a non-emergency issue without paging:
pagerduty:oncall web
Developer pushed changes to the website and wants to make sure that it didn't generate any monitoring issues:
This issue contains a proposed workflow and set of commands that would be built to support the workflow. This should be decomposed into a number of additional issues, likely one per command, which cover the specific details related to implementing those commands.
The example command lines included are just that, examples. We should feel free to iterate on the UX around the command interface and make improvements as we use them in practice while they're being built.
Scenario Assumptions:
Scenario:
Example.com site goes down. The on-call engineer is paged and fixes the problem while managing status updates and incident state in PagerDuty.
example.com
isDOWN
.filter -p check_params.hostname -m example.com | filter -p current_state -m DOWN | statuspage:component website yellow *> #ops #support
Tpagerduty:ack 123
statuspage:incident new -s investigating -c website "Site Outage" "We are currently investigating problems with example.com. We expect to resolve the issue shortly and will post updates as additional information is available. Thanks for your patience." *> here #support
statuspage:incident update 911 The example.com site is back online. We will monitor the site closely to ensure that the problems are fully resolved. *> here #support
statuspage:component status website green
statuspage:incident update -s resolved 911 All systems go. *> here #support
pagerduty:resolve 123
Other Use Cases:
pagerduty:alert website Customers complaining the site is down. Need help in #support
twitter:tweet Our engineers are currently investigating problems connecting to example.com. Watch status.example.com for more information
pagerduty:oncall web
pingdom:check list | filter -p hostname -m example.com | pingdom:check results $id
Bundles
The text was updated successfully, but these errors were encountered: