This repository describes how to design and implement Natural Language Processing(NLP)-based service using AWS Serverless, Amazon Comprehend and AWS Cloud Development Kit(CDK). This sample specifically illustrates a real-time user review analysis system as an example. All resources and configuration is provided through AWS CDK(typescript codes).
Amazon Comprehend provides a various solution(APIs) to analyze text within document. If we use this to build Review Analysis System
, we can get very easy, fast and high-accuracy AI features. Particulary, If we provide real-time analysis, a combination of AWS CDK and AWS Serverless can make this easier. AWS Serverless can be used in a wide variety of fields from web development to data processing, and configuring and deploying these as IaC(AWS CDK) can maximize development productivity.
The following features in Amazon Comprehend were applied.
The following services in AWS Serverless were applied.
- AWS Lambda
- Amazon S3
- Amazon DynamoDB
- Amazon Kinesis
- Amazon API Gateway
- AWS Glue
- Amazon Athena
- Amazon QuickSight
The following AWS CDK-related open sources were applied.
This architecture covers the following features.
- Serverless Realtime Review API Service
- Serverless Realtime Review Sentiment Analysis
- Serverless Review Entity/Syntax Stream-Batch Analysis
- Serverless Near Realtime Data Processing & Visualization
- Serverless System Monitoring Dashboard
- config: CDK project configuration json file for each deployment stage
- infra: CDK typescript source codes
- infra/app-main: CDK project main file
- infra/stack: CDK Stack classes
- codes/lambda: python source codes for each lambda function
- script: utility scripts such as setup/deploy/destroy/simulation
- script/simulation: simulation test scripts
- test: test files such as CDK-nag
All the resources described above are implemented and provided through AWS CDK ver2. Because this CDK project is built on top of AWS CDK Project Template for DevOps , please refer to that repository for details.
And other "Using AWS CDK" series can be found at:
- AWS Serverless Using AWS CDK
- Amazon Cognito and API Gateway based machine to machine authorization using AWS CDK
- AWS ECS DevOps using AWS CDK
- AWS IoT Greengrass Ver2 using AWS CDK
- Amazon SageMaker Built-in Algorithms MLOps Pipeline Using AWS CDK
First of all, AWS Account and IAM User is required. And then the following modules must be installed.
- AWS CLI: aws configure --profile [profile name]
- Node.js: node --version
- AWS CDK: cdk --version
- jq: jq --version
- curl: curl --version
- python: python3 --version
Open one of configuration json files in config
directory, and update Name/Stage/Accouont/Region/Profile in Project
. Accouont/Region/Profile
depdends on your AWS Account, and you don't need to change Name/Stage
. Additionaly, update email address in Stack/ReviewDashboard/SubscriptionEmails
.
{
"Project": {
"Name": "ReviewService", <----- Optional: your project name, all stacks will be prefixed with [Project.Name+Project.Stage]
"Stage": "Dev", <----- Optional: your project stage, all stacks will be prefixed with [Project.Name+Project.Stage]
"Account": "your aws account number", <----- Essential: update according to your AWS Account
"Region": "your aws region name", <----- Essential: update according to your target region
"Profile": "your aws credential profile name" <----- Essential: AWS Profile, keep empty string if you use `default` profile
},
"Stack": {
...
...
"ReviewDashboard": {
"Name": "ReviewDashboardStack",
"DashboardName": "ReviewDashboard",
"SubscriptionEmails": ["your email address"], <----- Essential: Alarm notification Emails
"ApiGatewayOverallCallThreshold": 100, <----- Optional: Alarm Threshold for Overall Call
"ApiGatewayError4xxCallThreshold": 20, <----- Optional: Alarm Threshold for 4XX Error Call
"ApiGatewayError5xxCallThreshold": 20 <----- Optional: Alarm Threshold for 5XX Error Call
}
}
}
In this guide, I have chosen config/app-config-dev.json
file for convenience of explanation.
Caution: This solution contains not-free tier AWS services. So be careful about the possible costs.
sh script/setup_initials.sh config/app-config-dev.json
Caution: This solution contains not-free tier AWS services. So be careful about the possible costs.
Execute this single script:
sh script/deploy_stacks.sh config/app-config-dev.json
or you can deploy manually like this:
export AWS_PROFILE=[your profile name]
export APP_CONFIG=config/app-config-dev.json
cdk list
cdk deploy *-ReviewBackendStack
cdk deploy *-ApiGatewayStack --outputs-file script/output/ApiGatewayStack.json
cdk deploy *-ReviewAnalysisStack --outputs-file script/output/ReviewAnalysisStack.json
cdk deploy *-ReviewDashboardStack
Caution: You must match this order for the first deployment. After that, these Stacks can be deployed independently in any order.
This is a deployment result in CloudFormation.
Execute this single script. This create_user.sh
script will create a new user and confirm in Cognito.
sh script/simulation/create_user.sh [aws profile name] [new user id, for example user-01] [new user pw] [cognito user pool id]
where
[cognito user pool id] is OutputUserPoolId
in script/output/ApiGatewayStack.json
.
This is Password Policy in Cognito in api-gateway-stack.ts:
{
requireSymbols: true,
minLength: 8,
requireUppercase: true,
requireDigits: true
}
Execute this single script. This request_reviews.py
will log in to get Token and request POST REST API using Amazon review data - Toy.
python3 script/simulation/request_reviews.py --profile [aws profile name] --url [APIGatewaty URL + /review] --pool [cognito user pool client id] --id [new user id] --pw [new user pw]
where
[APIGatewaty URL] is OutputRestApiUrl
in script/output/ApiGatewayStack.json
.
[cognito user pool client id] is OutputUserPoolClientId
in script/output/ApiGatewayStack.json
.
After a while, go to CloudWatch Dashboard. You can check the metrics that new data is coming in.
Our CDK ReviewAnalysisStack deploy Workgroup
and pre-defined queries
in Athena. So we can easily execute those queries on demand.
Go to Qthena console, and Query editor
menu, and then Saved queries
. After changing Workgroup
, execute the queries in order(3~7).
These quries will create the following tables in Athena. We will use sentiment-table/syntax-table/entities-table in QuickSight
Our CDK ReviewAnalysisStack just deploy QuickSight role only for QuickSight. So we have to set up QuickSight's DataSource/Analysis/Dashboard manually.
Go to QuickSight console, and Manage QuickSigh
menu, and then Security & permissions
. Please change QuickSight-managed role(default)
to an existing role
which CDK created in ReviewAnalysisStack for us.
where
[an existing role] is OutputQuickSightRole
in script/output/ReviewAnalysisStack.json
sh script/destroy_stacks.sh config/app-config-dev.json
Caution: You must delete
S3/DynamoDB manually because of removal policy.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.