Skip to content
This repository was archived by the owner on Feb 8, 2024. It is now read-only.

Commit 7e8f4c0

Browse files
authored
Merge pull request rebuy-de#1 from aws-samples/feature-initial
Feature initial
2 parents 6291573 + 5e0e7b2 commit 7e8f4c0

5 files changed

+1034
-5
lines changed

README.md

+172-5
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,178 @@
1-
## My Project
1+
## AWS Account Cleanser framework using aws-nuke
22

3-
TODO: Fill this README out!
43

5-
Be sure to:
4+
## AWS Nuke is an open source tool created by rebuy.de
5+
## https://github.com/rebuy-de/aws-nuke
6+
## AWS Nuke searches for deleteable resources in the provided AWS acccount and deletes those which are not considered "Default" or "AWS-Managed"
7+
## In short, it will take your account back to Day1 with few exceptions
68

7-
* Change the title in this README
8-
* Edit your repository description on GitHub
9+
10+
```sh
11+
$ First and Final Warning: This is a dangerous and very destructive tool and should not be deployed without fully understanding the impact it will have on the accounts you allow it to interface with.
12+
```
13+
14+
## Overview
15+
16+
![infrastructure-overview](images/architecture-overview.png)
17+
18+
* The approach covered in this pattern is suitable for customers needing an automated mechanism to clean up their obsolete resources from test or sandbox accounts periodically. It is quite common that customers will have a set of Dev/Sandbox accounts where developers can create and experiment with various services and resources, which are then left unattended or obsolete, and can quickly lead to high and unnecessary AWS Cost/Spending ( example: Developers created an expensive DB instance ( RDS) or EBS/EFS volume for testing and missed to terminate those resources ).
19+
20+
21+
* This solution sets up an automated mechanism using the binary aws-nuke along with AWS Step Functions , EventBridge and AWS CodeBuild which can run on a daily scheduled basis to scan and delete the resources from the Sandbox account across each region in a scalable manner. A CodeBuild project is invoked from the Step Functions map state for each region of an account to delete resources specific to that region, thus providing scalability and better control in terms of time and monitoring efficiency.
22+
23+
This architecture provides the following features:
24+
25+
1. The workflow is kicked off based off a scheduled event trigger set up in AWS EventBridge which invokes the Step Functions.
26+
2. The orchestration of this pattern using Step Functions serves the purpose of handling resources in each region using a separate invocation of CodeBuild Project ( the region attribute in the nuke config file required by the aws-nuke binary is dynamically updated using a custom python class inside the CodeBuild project ) with the region parameter and the customized nuke config , thus providing dynamic parallelism with the Map state that fans out for all the regions as needed within the sandbox account and thus saves time and provide scalability to handle a lot of resources.
27+
3. Generally the aws-nuke command line need to assume STS temporary credentials for handling the resources in each account using IAM Role-chaining which limits it's max time to 60 minutes most of the times and hence if multiple regions are configured to run in one CodeBuild project execution , there could be lot of stale resources still left out without being destroyed.
28+
- This pattern takes care of that by executing the clean up for each region in a single account in parallel using the Step Functions Map state.
29+
- This also uses the credentials for the aws-nuke binary configured with the profile, that can be configured with the shared aws config file ( ~/.aws/config) which doesn't expire with a 1 hour session limit.
30+
- Also this increases the default CodeBuild project time out to about 2-4 hours , allowing more time for the nuke to complete deleting all resources for each reg
31+
4. The aws-nuke binary works based off the nuke-config.yaml file , which is dynamically updated in this pattern using a python filtering class , to provide flexibility in handling the resource filters and region constraints based on the supplied override parameters.
32+
5. This workflow invokes the CodeBuild project synchronously and waits for Success. It also has a retry mechanism to trigger the CodeBuild project again with a configured amount of time , if in case it errors out, making sure all the resources are handled in the daily run without any manual intervention.
33+
6. The workflow also sends out a detailed report to an SNS topic subscription on what resources were deleted after the job is successful for each region which simplifies traversing and parsing the complex logs written out by the aws-nuke binary.
34+
35+
## Prerequisites
36+
37+
1. https://github.com/rebuy-de/aws-nuke --> Open source library staged/downloaded to artifactory or S3. This aws-nuke binary is owned by rebuy-de
38+
39+
2. AWS Account alias needs to exist in the IAM Dashboard for the target sandbox account for 'aws-nuke' to work
40+
41+
3. AWS CodeBuild project --> for runtime/compute
42+
43+
4. AWS S3 Bucket --> for storing the nuke config file and the aws-nuke binary ( latest version ) if needed. Make sure to source the latest 'aws-nuke' binary downloaded in S3 (or from your internal artifactory)
44+
45+
5. AWS StepFunctions --> For orchestration for fan out of multi-region parallel CodeBuild invocation targets
46+
47+
6. AWS SNS Topic --> An active email address with SNS topic subscription to send the CodeBuild job status and the detailed report of resources nuked daily
48+
49+
7. AWS EventBridge Rule --> Configured with the required cron schedule to trigger the workflow periodically. The input parameters to invoke the Rule target should be updated with the required region lists as needed
50+
51+
8. Make sure you have sufficient network connectivity from the VPC where this is run, as CodeBuild downloads the nuke binary from github. If running in restricted environment, have the binary uploaded to S3 bucket or artifactory and reference that.
52+
53+
## Design
54+
55+
* CodeBuild provides a super-handy way for us to spin up a container and run a script without having to worry about provisioning and maintaining resources. We’re using CloudFormation to define the whole project, so the code sample below displays the bulk of the Project resource configuration for a CloudFormation template. In short, we’re building a standard AWS Linux Docker container, configuring the log output channel, assigning an AWS Role for the nuke binary to assume the account roles (mentioned earlier) to start the clean up based on the region and config file.
56+
57+
* AWS EventBridge Events provides both event-based and scheduled triggers for automated actions in other services. This is basically a fancy Cron job to schedule the build project. As with the CodeBuild Project, the example below is a resource defined within a CloudFormation template. In short, it defines the schedule expression for the Cron job (3:00a EST Mon-Fri), a role to allow the event trigger to run and kick off the StepFunctions workflow as a target which will orchestrate the CodeBuild project invocation based on the supplied region list parameter and the dynamic nuke config file modified during runtime.
58+
59+
* AWS StepFunctions provides this design with scalability and help achieve dynamic parallelism across accounts/regions using the Map State. The orchestration of this pattern using Step Functions serves the purpose of handling resources in each region using a separate invocation of CodeBuild Project ( the region attribute in the nuke config file required by the aws-nuke binary is dynamically updated using a custom python class inside the CodeBuild project ) with the region parameter and the customized nuke config , thus providing dynamic parallelism with the Map state that fans out for all the regions as needed within the sandbox account and thus saves time and provide scalability to handle a lot of resources.
60+
61+
## Dry Runs vs Production
62+
63+
By default, this script will not take any destructive action on any resources in your account(s). It will provide a log of the “dry run” output as if it actually completed the actions specified. When you’ve thoroughly tested this and whitelisted any resources in your own aws-nuke-config.yaml, you need to add the –-no-dry-run flag to the aws-nuke command in this script to force a destructive run.
64+
65+
```sh
66+
$ aws-nuke -c $line.yaml --force --no-dry-run --access-key-id $ACCESS_KEY_ID --secret-access-key $SECRET_ACCESS_KEY --session-token $SESSION_TOKEN |tee -a aws-nuke.log;
67+
```
68+
69+
## Setup and Installation
70+
71+
* Clone the repo
72+
* Determine the ID of the account to be deployed for clean up ( This is only to be deployed to Dev/Test/Sandbox environments )
73+
* Verify and Update your nuke config file as needed with specific filters for the resources/accounts
74+
* Deploy the stack using the below command. You can run it in any desired region. Replace the required parameter with the SNS Topic Arn for notification email
75+
```sh
76+
aws cloudformation create-stack --stack-name NukeCleanser --template-body file://nuke-cfn-stack.yaml --region us-east-2 --capabilities CAPABILITY_NAMED_IAM --parameters ParameterKey=NukeTopicArn,ParameterValue='arn:aws:sns:us-east-2:{ACCT_ID}:TestSNSTopic'
77+
```
78+
* Once the S3 bucket is created using the cfn template, upload the Nuke generic config file and the config update python script
79+
```sh
80+
aws s3 cp config/nuke_generic_config.yaml --region us-east-2 s3://nuke-account-cleanser-config
81+
aws s3 cp config/nuke_config_update.py --region us-east-2 s3://nuke-account-cleanser-config
82+
```
83+
* Run the stack manually by triggering the StepFunctions with the below sample input payload. (which is pre-configured in the EventBridge Target as a Constant JSON input). You can configure this to run in parallel on the required number of regions by updating the region_list parameter.
84+
85+
```sh
86+
{
87+
"InputPayLoad": {
88+
"nuke_dry_run": "true",
89+
"nuke_version": "2.5",
90+
"nuke_config_bucket": "nuke-account-cleanser-config",
91+
"sns_notification_arn": "sns_topic_arn",
92+
"region_list": [
93+
"us-west-1",
94+
"us-east-1"
95+
]
96+
}
97+
}
98+
```
99+
100+
* The tool is currently configured to run at a schedule as desired typically off hours 3:00a EST. It can be easily configured with a rate() or cron() expression by editing the cfn template file
101+
102+
* The workflow also sends out a detailed report to an SNS topic with an active email subscription on what resources were deleted after the job is successful for each region which simplifies traversing and parsing the complex logs spit out by the aws-nuke binary.
103+
104+
* If the workflow is successful , the stack will send out
105+
- One email for each of the regions where nuke CodeBuild job was invoked with details of the build execution , the list of resources which was deleted along with the log file path.
106+
- The StepFunctions workflow also sends out another email when the whole Map state process completes successfully. Sample email template given below.
107+
108+
```sh
109+
Account Cleansing Process Completed;
110+
111+
------------------------------------------------------------------
112+
Summary of the process:
113+
------------------------------------------------------------------
114+
DryRunMode : true
115+
Account ID : 123456789012
116+
Target Region : us-west-1
117+
Build State : JOB SUCCEEDED
118+
Build ID : AccountNuker-NukeCleanser:4509a9b5
119+
CodeBuild Project Name : AccountNuker-NukeCleanser
120+
Process Start Time : Thu Dec 2 02:04:40 UTC 2021
121+
Process End Time : Thu Dec 2 02:06:45 UTC 2021
122+
Log Stream Path : AccountNuker-NukeCleanser/logPath
123+
------------------------------------------------------------------
124+
################ Removed the following resources #################
125+
126+
```
127+
* By default the stack runs aws-nuke in DryRun mode, To actually delete resources update the stack with AWSNukeDryRunFlag parameter flipped to false OR udpate manually in the CodeBuild environment variables section.
128+
129+
## Monitoring queries
130+
131+
* Using aws-cli cloudwatch logs
132+
133+
```sh
134+
aws logs filter-log-events \
135+
--log-group-name AccountNuker-nuke-auto-account-cleanser \
136+
--start-time 1628838256000 --end-time 1628839216000 \
137+
--log-stream-names "10409c89-a90f-4af7-9642-0df9bc9f0855" \
138+
--filter-pattern removed \
139+
--no-interleaved \
140+
--output text \
141+
--limit 5
142+
```
143+
144+
* Using awslogs for analyzing output from aws-nuke runs
145+
146+
```sh
147+
awslogs get AccountNuker-nuke-auto-account-cleanser --filter-pattern '"Scan complete: "' --start='1d ago' --timestamp
148+
awslogs get AccountNuker-nuke-auto-account-cleanser --filter-pattern '"Error: failed"' --start='1d ago' | sort -u
149+
awslogs get AccountNuker-nuke-auto-account-cleanser --filter-pattern '"Removal requested: 0 waiting"' --start='1d ago' | sort -u
150+
awslogs get AccountNuker-nuke-auto-account-cleanser --filter-pattern '"AccessDenied"' --start='1d ago' | sort -u | wc -l
151+
```
152+
153+
154+
* Using CW Logs Insights query
155+
156+
```sh
157+
fields @timestamp, @message
158+
| filter userIdentity.sessionContext.sessionIssuer.userName = "nuke-auto-account-cleanser" and ispresent(errorCode)
159+
| sort @timestamp desc
160+
| limit 500
161+
162+
163+
fields @timestamp, @message
164+
| filter ispresent(errorCode) and userIdentity.sessionContext.sessionIssuer.userName = "nuke-auto-account-cleanser"
165+
and errorCode != "AccessDenied" and eventName like "Delete"
166+
| sort @timestamp desc
167+
| limit 500
168+
169+
170+
fields @timestamp, @message
171+
| filter ispresent(errorCode) and userIdentity.sessionContext.sessionIssuer.userName = "nuke-auto-account-cleanser"
172+
and errorCode == "AccessDenied" and eventName like "Delete"
173+
| sort @timestamp desc
174+
| limit 500
175+
```
9176

10177
## Security
11178

0 commit comments

Comments
 (0)