AWS Lambda function to check and update person records with Wikipedia data.
This service maintains person records in DynamoDB by fetching and updating birth dates, death dates, and ages from Wikipedia/Wikidata. It runs on a nightly schedule using EventBridge.
.
├── docs/
│ └── architecture.md # Detailed architecture documentation
├── src/
│ ├── lambda_function.py # Main Lambda handler
│ └── utils/
│ ├── dynamo.py # DynamoDB operations
│ └── wiki.py # Wikipedia/Wikidata operations
├── tests/ # Unit and integration tests
├── requirements.txt # Python dependencies
└── template.yaml # AWS SAM template
- AWS CLI configured with appropriate credentials
- Python 3.9+
- AWS SAM CLI for local testing and deployment
- DynamoDB table with required schema (see architecture.md)
-
Create Python virtual environment:
python -m venv venv source venv/bin/activate # Unix
-
Install dependencies:
pip install -r requirements.txt
-
Configure local environment:
cp .env.example .env # Edit .env with your configuration
- Test Lambda locally:
sam local invoke -e events/schedule.json
-
Build SAM application:
sam build
-
Deploy to AWS:
sam deploy --guided
BATCH_SIZE
: Number of records to process in each batch (default: 25)TABLE_NAME
: DynamoDB table nameLOG_LEVEL
: Logging level (default: INFO)
The Lambda function is scheduled using Amazon EventBridge (CloudWatch Events). The schedule configuration is defined in template.yaml
:
- Runs daily at midnight UTC:
cron(0 0 * * ? *)
- Automatic retries: Maximum 2 retry attempts on failure
- Timeout: 300 seconds (5 minutes)
To change the schedule:
- Edit the cron expression in
template.yaml
:
Events:
DailyCheck:
Type: Schedule
Properties:
Schedule: cron(0 0 * * ? *) # Modify this expression
Common cron patterns:
- Daily at midnight:
cron(0 0 * * ? *)
- Every 6 hours:
cron(0 */6 * * ? *)
- Every hour:
cron(0 * * * ? *)
- Every 5 minutes:
cron(0/5 * * * ? *)
- Adjust retry policy if needed:
RetryPolicy:
MaximumRetryAttempts: 2 # Modify retry attempts
- Deploy changes:
sam deploy
You can also trigger the function manually through:
- AWS Console
- AWS CLI:
aws lambda invoke --function-name deadpool-status-checker output.json
The Lambda function logs to the /aws/lambda/deadpool-status-checker
log group. Below are useful CloudWatch Logs Insights queries for monitoring different aspects of the service:
- Death Discoveries (New Deaths):
fields @timestamp, @message
| filter @message like "Found death date"
| parse @message "Found death date * for *" as death_date, person_name
| sort @timestamp desc
- Failed Wiki Lookups:
fields @timestamp, @message
| filter @message like "No Wiki ID available"
| parse @message "No Wiki ID available for * (WikiPage: *)" as person_name, wiki_page
| sort @timestamp desc
- Non-Death Updates (Age Changes):
fields @timestamp, @message
| filter @message like "Updated age"
| parse @message "Updated age from * to * for *" as old_age, new_age, person_name
| sort @timestamp desc
- Processing Errors:
fields @timestamp, @message
| filter level = "ERROR"
| sort @timestamp desc
- Execution Statistics:
fields @timestamp, @message
| filter @message like "Execution complete"
| parse @message "Execution complete - Duration: *s, Processed: *, Updated: *, Failed: *" as duration, processed, updated, failed
| sort @timestamp desc
- Custom metrics for tracking processing
- CloudWatch Alarms: Configured for error rates and duration
- Added comprehensive logging for person updates including death dates, age changes, and wiki lookup failures
- Implemented retry logic for API calls with exponential backoff
- Added detailed execution statistics logging
- Enhanced error handling and reporting