Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tips for debugging failed deployments #2458

Merged
merged 2 commits into from
Jun 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added documentation/_static/ecs_events_tab.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
29 changes: 29 additions & 0 deletions documentation/general/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,35 @@ of our CI. Nonetheless, given certain semantic merge conflicts, it is possible
that a PR could pass CI and still cause a build failure when merged, so it is
technically possible for a failure to occur during the first step.

#### Debugging Failures

```{tip}
Please refer to the [general ECS logging documentation](/meta/monitoring/cloudwatch_logs/index.md)
for details about how to find logs for individual tasks.

An additional resource that is often helpful other than logs is the events list for a
service. You can find that under the "Events" tab of the ECS service's page:
Comment on lines +104 to +105
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good tip 💯 I sometimes forget AWS CLI provides extra things to investigate.


![Example ECS events tab for the production API](/_static/ecs_events_tab.png)

This tab shows a chronological list of the 100 most recent "events". Please
[refer to the AWS ECS documentation for information on what each of these events mean](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html).
```

#### Re-running Failed Deployments

```{tip}
To re-run a failed deployment with the same tag, simply re-run the failed deployment
workflow in the WordPress/openverse-infrastructure repository. **Do not re-run the
release workflow in the monorepo or this will create duplicate tags**. This approach
is useful if you think the deployment failed due to a temporary fluke.

![Example failed production API deployment workflow.](/_static/rerun_failed_deployment.png)

To re-run this failed production API deployment, re-run the failed jobs via the GitHub
UI.
```

## Staging

Staging is automatically deployed any time code is merged to the `main` branch
Expand Down