A Terraform module that automatically handles AWS ECS Fargate Spot instance termination notifications by triggering forced redeployments of affected services. This ensures high availability and seamless service continuity when Spot instances are interrupted.
- Automatic Spot Interruption Handling: Monitors AWS EventBridge for ECS Fargate Spot termination events
- Proactive Service Redeployment: Triggers forced redeployments before hard interruption occurs
- Comprehensive Error Handling: Implements retry logic with exponential backoff for resilient operation
- Structured Logging: Provides detailed JSON-formatted logs for monitoring and debugging
- Minimal Permissions: Follows the principle of least privilege for security
- Configurable: Supports customization of Lambda function settings and resource naming
graph TB
A[ECS Fargate Spot Instance] -->|Interruption Notice| B[AWS EventBridge]
B -->|Filtered Event| C[Lambda Function]
C -->|Force New Deployment| D[ECS Service]
C -->|Logs| E[CloudWatch Logs]
subgraph "Terraform Module"
F[EventBridge Rule]
G[Lambda Function]
H[IAM Role & Policies]
I[CloudWatch Log Group]
end
B -.-> F
C -.-> G
G -.-> H
E -.-> I
The module creates:
- EventBridge Rule: Filters ECS Task State Change events with
stopCode: "SpotInterruption"
- Lambda Function: Processes interruption events and triggers service redeployments
- IAM Role & Policies: Provides minimal required permissions for ECS operations
- CloudWatch Log Group: Centralized logging with configurable retention
module "ecs_spot_handler" {
source = "path/to/terraform-aws-ecs-fargate-spot-handler"
# Optional: Customize function name
lambda_function_name = "my-spot-handler"
# Optional: Add resource tags
tags = {
Environment = "production"
Team = "platform"
}
}
module "ecs_spot_handler" {
source = "path/to/terraform-aws-ecs-fargate-spot-handler"
# Lambda Configuration
lambda_function_name = "custom-spot-handler"
lambda_timeout = 120
lambda_memory_size = 256
lambda_reserved_concurrent_executions = 10
log_level = "DEBUG"
# CloudWatch Configuration
log_retention_days = 30
# EventBridge Configuration
eventbridge_rule_name = "custom-spot-rule"
eventbridge_rule_state = "ENABLED"
# Resource Naming
name_prefix = "prod"
# Tags
tags = {
Environment = "production"
Team = "platform"
Module = "ecs-spot-handler"
}
}
# Production Environment
module "ecs_spot_handler_prod" {
source = "path/to/terraform-aws-ecs-fargate-spot-handler"
name_prefix = "prod"
lambda_function_name = "ecs-spot-handler"
lambda_timeout = 90
lambda_memory_size = 256
log_retention_days = 30
log_level = "INFO"
tags = {
Environment = "production"
Team = "platform"
}
}
# Staging Environment
module "ecs_spot_handler_staging" {
source = "path/to/terraform-aws-ecs-fargate-spot-handler"
name_prefix = "staging"
lambda_function_name = "ecs-spot-handler"
lambda_timeout = 60
lambda_memory_size = 128
log_retention_days = 14
log_level = "DEBUG"
tags = {
Environment = "staging"
Team = "platform"
}
}
Name | Version |
---|---|
terraform | >= 1.0 |
aws | >= 5.0 |
Name | Version |
---|---|
aws | >= 5.0 |
Name | Source | Version |
---|---|---|
spot_handler_lambda | terraform-aws-modules/lambda/aws | ~> 8.0 |
Name | Type |
---|---|
aws_cloudwatch_event_rule.spot_interruption | resource |
aws_cloudwatch_event_target.lambda_target | resource |
aws_lambda_permission.allow_eventbridge | resource |
aws_iam_policy_document.ecs_operations | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
eventbridge_rule_name | Name of the EventBridge rule | string |
"ecs-spot-interruption" |
no |
eventbridge_rule_state | State of the EventBridge rule (ENABLED or DISABLED) | string |
"ENABLED" |
no |
lambda_function_name | Name of the Lambda function | string |
"ecs-fargate-spot-handler" |
no |
lambda_memory_size | Lambda function memory size in MB | number |
128 |
no |
lambda_reserved_concurrent_executions | Reserved concurrent executions for the Lambda function. Set to -1 for unreserved | number |
-1 |
no |
lambda_timeout | Lambda function timeout in seconds | number |
60 |
no |
log_level | Log level for Lambda function | string |
"INFO" |
no |
log_retention_days | CloudWatch log retention period in days | number |
14 |
no |
name_prefix | Prefix for resource names. If empty, no prefix will be used | string |
"" |
no |
tags | A map of tags to apply to all resources | map(string) |
{} |
no |
Name | Description |
---|---|
cloudwatch_log_group_arn | The Amazon Resource Name (ARN) of the CloudWatch log group |
cloudwatch_log_group_name | The name of the CloudWatch log group for the Lambda function |
eventbridge_rule_arn | The Amazon Resource Name (ARN) of the EventBridge rule |
eventbridge_rule_id | The ID of the EventBridge rule |
eventbridge_rule_name | The name of the EventBridge rule |
eventbridge_target_id | The ID of the EventBridge target |
lambda_execution_role_arn | The Amazon Resource Name (ARN) of the Lambda execution role |
lambda_execution_role_name | The name of the Lambda execution role |
lambda_execution_role_unique_id | The unique ID of the Lambda execution role |
lambda_function_arn | The Amazon Resource Name (ARN) of the Lambda function |
lambda_function_invoke_arn | The invoke ARN of the Lambda function, used for API Gateway integration |
lambda_function_name | The name of the Lambda function |
lambda_function_qualified_arn | The qualified ARN of the Lambda function (includes version) |
lambda_function_version | The version of the Lambda function |
module_name | The name of this Terraform module |
module_version | The version of this Terraform module |
- Spot Interruption Detection: AWS sends ECS Task State Change events to EventBridge when Spot instances receive termination notices
- Event Filtering: The EventBridge rule filters for events with
stopCode: "SpotInterruption"
- Lambda Invocation: Matching events trigger the Lambda function
- Event Validation: Lambda validates the event structure and extracts cluster/service information
- Service Redeployment: Lambda calls ECS
UpdateService
withforceNewDeployment=True
- Error Handling: Comprehensive error handling with retry logic ensures reliability
The Lambda function processes EventBridge events with the following structure:
{
"version": "0",
"id": "9bcdac79-b31f-4d3d-9410-fbd727c29fab",
"detail-type": "ECS Task State Change",
"source": "aws.ecs",
"account": "111122223333",
"time": "2023-01-01T12:00:00Z",
"region": "us-east-1",
"resources": [
"arn:aws:ecs:us-east-1:111122223333:task/b99d40b3-5176-4f71-9a52-9dbd6f1cebef"
],
"detail": {
"clusterArn": "arn:aws:ecs:us-east-1:111122223333:cluster/default",
"stopCode": "SpotInterruption",
"group": "service:my-service"
}
}
The Lambda function requires the following minimal IAM permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:DescribeServices",
"ecs:UpdateService",
"ecs:DescribeTasks"
],
"Resource": "*"
}
]
}
The Lambda function provides structured JSON logging with the following fields:
timestamp
: ISO 8601 timestamplevel
: Log level (DEBUG, INFO, WARNING, ERROR)message
: Human-readable messagefunction
: Function name where log was generatedline
: Line numbercluster
: ECS cluster ARN (when applicable)service
: ECS service name (when applicable)task_arn
: ECS task ARN (when applicable)request_id
: Lambda request ID for correlation
{
"timestamp": "2023-01-01T12:00:00.000Z",
"level": "INFO",
"message": "Service redeployment triggered successfully",
"function": "trigger_service_redeployment",
"line": 245,
"cluster": "arn:aws:ecs:us-east-1:111122223333:cluster/default",
"service": "my-service",
"deployment_id": "arn:aws:ecs:us-east-1:111122223333:service/default/my-service/deployment/123456789"
}
Monitor the following CloudWatch metrics:
-
Lambda Function Metrics:
AWS/Lambda/Invocations
: Number of function invocationsAWS/Lambda/Errors
: Number of function errorsAWS/Lambda/Duration
: Function execution durationAWS/Lambda/Throttles
: Number of throttled invocations
-
EventBridge Metrics:
AWS/Events/MatchedEvents
: Number of events matching the ruleAWS/Events/InvocationsCount
: Number of target invocationsAWS/Events/FailedInvocations
: Number of failed invocations
The module implements comprehensive error handling:
- Exponential Backoff: Retries with exponential backoff and jitter
- Maximum Retries: Up to 3 retry attempts for transient errors
- Non-Retryable Errors: Immediate failure for permission and validation errors
- Service Not Found: Logged as warning, returns success (service may have been deleted)
- Cluster Not Found: Logged as error, returns failure
- Access Denied: Logged as error, returns failure
- Throttling: Automatic retry with exponential backoff
- Network Errors: Automatic retry with exponential backoff
- Principle of Least Privilege: Only grants necessary ECS permissions
- Resource Scoping: Permissions apply to all resources (required for cross-service operation)
- Managed Policies: Uses AWS managed policies where appropriate
- VPC Configuration: Lambda function can be configured to run in VPC if required
- Security Groups: Standard Lambda security group rules apply
- Encryption: All logs are encrypted at rest using CloudWatch default encryption
-
Lambda Function Not Triggered
- Check EventBridge rule is enabled
- Verify event pattern matches actual ECS events
- Check Lambda permissions for EventBridge invocation
-
Permission Denied Errors
- Verify IAM role has required ECS permissions
- Check if Lambda execution role is properly attached
- Ensure ECS resources exist in the same account/region
-
Service Redeployment Fails
- Check if ECS service exists
- Verify cluster ARN is correct
- Ensure service is not already updating
Enable debug logging by setting log_level = "DEBUG"
:
module "ecs_spot_handler" {
source = "path/to/terraform-aws-ecs-fargate-spot-handler"
log_level = "DEBUG"
}
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run
terraform fmt
andterraform validate
- Submit a pull request
This module is licensed under the MIT License. See LICENSE for details.
Created and maintained by FivexL.
provider "aws" {
region = var.aws_region
}
# Basic usage of the ECS Fargate Spot Handler module
module "ecs_spot_handler" {
source = "../../"
# Optional: Customize function name
lambda_function_name = var.lambda_function_name
# Optional: Add resource tags
tags = var.tags
}
Name | Version |
---|---|
terraform | >= 1.0 |
aws | >= 5.0 |
Name | Version |
---|---|
aws | 6.10.0 |
Name | Source | Version |
---|---|---|
spot_handler_lambda | terraform-aws-modules/lambda/aws | ~> 8.0 |
Name | Type |
---|---|
aws_cloudwatch_event_rule.spot_interruption | resource |
aws_cloudwatch_event_target.lambda_target | resource |
aws_lambda_permission.allow_eventbridge | resource |
aws_iam_policy_document.ecs_operations | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
eventbridge_rule_name | Name of the EventBridge rule | string |
"ecs-spot-interruption" |
no |
eventbridge_rule_state | State of the EventBridge rule (ENABLED or DISABLED) | string |
"ENABLED" |
no |
lambda_function_name | Name of the Lambda function | string |
"ecs-fargate-spot-handler" |
no |
lambda_memory_size | Lambda function memory size in MB | number |
128 |
no |
lambda_reserved_concurrent_executions | Reserved concurrent executions for the Lambda function. Set to -1 for unreserved | number |
-1 |
no |
lambda_timeout | Lambda function timeout in seconds | number |
60 |
no |
log_level | Log level for Lambda function | string |
"INFO" |
no |
log_retention_days | CloudWatch log retention period in days | number |
14 |
no |
name_prefix | Prefix for resource names. If empty, no prefix will be used | string |
"" |
no |
tags | A map of tags to apply to all resources | map(string) |
{} |
no |
Name | Description |
---|---|
cloudwatch_log_group_arn | The Amazon Resource Name (ARN) of the CloudWatch log group |
cloudwatch_log_group_name | The name of the CloudWatch log group for the Lambda function |
eventbridge_rule_arn | The Amazon Resource Name (ARN) of the EventBridge rule |
eventbridge_rule_id | The ID of the EventBridge rule |
eventbridge_rule_name | The name of the EventBridge rule |
eventbridge_target_id | The ID of the EventBridge target |
lambda_execution_role_arn | The Amazon Resource Name (ARN) of the Lambda execution role |
lambda_execution_role_name | The name of the Lambda execution role |
lambda_execution_role_unique_id | The unique ID of the Lambda execution role |
lambda_function_arn | The Amazon Resource Name (ARN) of the Lambda function |
lambda_function_invoke_arn | The invoke ARN of the Lambda function, used for API Gateway integration |
lambda_function_name | The name of the Lambda function |
lambda_function_qualified_arn | The qualified ARN of the Lambda function (includes version) |
lambda_function_version | The version of the Lambda function |
module_name | The name of this Terraform module |
module_version | The version of this Terraform module |