This is a sample project to demonstrate how to update DynamoDB with AWS Glue. On this table there are no one pokemon category, so we need to filter the data and update each row with the correct category from the pokemon API.
number | name |
---|---|
1 | Bulbasaur |
2 | Ivysaur |
3 | Venusaur |
4 | Charmander |
5 | Charmeleon |
6 | Charizard |
Create a DynamoDB table with the following schema.
field | description |
---|---|
number | The pokemon number |
name | The pokemon name |
Create a role with the following permissions to access DynamoDB and S3.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Add policies:
- DynamoDBFullAccess
- S3FullAccess
Create a Glue job with the following settings.
- Job type: Spark
- Job language: Python
- Glue version: 3.0
- Number of workers: 2
You can see the following records in the DynamoDB table.
number | name | category |
---|---|---|
1 | Bulbasaur | grass |
2 | Ivysaur | grass |
3 | Venusaur | grass |
4 | Charmander | fire |
5 | Charmeleon | fire |
6 | Charizard | fire |
- AWS Glue
- Connection types and options for ETL in AWS Glue
- PySpark Glue Tutorial
- How to export an Amazon DynamoDB table to Amazon S3 using AWS Step Functions and AWS Glue
Developed by Jean Jacques Barros