POC AWS Comprehend

Step 1 - Setting things up

Install dependencies

npm i

Rename .env.structure to .env

Step 2 - Getting the Twitter BEARER TOKEN

Acess Twitter Dev portal and create yourself a Bearer token to consume from the twitter API.

Paste your token on the .env file.

Step 3 - AWS Setup

Paste your AWS credentials on the .env file.

Make a AWS Bucket with an Input and an output folder in it

Copy the name of the bucket you just created to BUCKET_NAME field in constants.js. do the same for the input and output folders into INPUT_FOLDER and OUTPUT_FOLDER.

You may skip the collection of the data (step 4) if you want to. To do so, in your input folder you can upload the content from temp file OR the tweets.csv.

Step 4 - Collect the data

There are 2 way of collecting data from the Twitter API (there are more, but i'll stick with this two for now):

Stream

With the stream option you connects with the Twitter API and listen to a tweet that complies with the rule you've set up. You can read more about Rule's definition here.

In the tweet-stream.js file you can change the rule to fit your needs.

const rules = [
    {
        'value': '"Snyder Cut" lang:pt -is:retweet -has:links',
        'tag': 'liga justiça snyder'
    },
];

To start your stream data gathering just:

npm run stream

Each tweet caught on the stream will be saved in the temp folder as a .csv file and will be sent to the S3 bucket previously configured.

Recent Search

Twitter also gives us the possibility to search for tweets in a 7-day window. For this you must pass your parameter of the search in the body of your request.

In the tweet-search.js file you can change this parameters to fit your needs.

const params = {
    'query': '"Snyder Cut" lang:pt -is:retweet -has:links',
    'tweet.fields': 'author_id',
    'max_results': 100
}

To start your recent seatch data gathering just:

npm run search

It will compile all tweets for each page of the results into a .csv file (named as configured in constants.js) and then upload this file to the S# bucket previously configured.

Step 5 - Do the Thing

Now you are ready to set up your analysis on AWS Comprehend.

Lauch it from yout console (https://console.aws.amazon.com/comprehend/);
Click on 'Analysis Job';
Create a Job;
Give it a name;
Set the Analysis Type to be Sentiment;
Set the language of the input;
Browse on S3 to set the input file/folder and set the format accordingly;
Set the output folder in S3
give it a IAM Role
Done!

You should wait until the analysis is completed, you can follow its progress on the Analysis Job dashboard.

Step 6 - Analyse it!

AWS comprehend will create a folder into you output folder. Copy that folders name and copy it to CPHD_IDfield on constants.js

Now you can see a brief analysis on your console:

npm run analyse

That's it.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
temp		temp
.env.structure		.env.structure
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POC AWS Comprehend

Step 1 - Setting things up

Step 2 - Getting the Twitter BEARER TOKEN

Step 3 - AWS Setup

Step 4 - Collect the data

Stream

Recent Search

Step 5 - Do the Thing

Step 6 - Analyse it!

About

Languages

EduardoBilk/AWS_CPHD_POC

Folders and files

Latest commit

History

Repository files navigation

POC AWS Comprehend

Step 1 - Setting things up

Step 2 - Getting the Twitter BEARER TOKEN

Step 3 - AWS Setup

Step 4 - Collect the data

Stream

Recent Search

Step 5 - Do the Thing

Step 6 - Analyse it!

About

Topics

Resources

Stars

Watchers

Forks

Languages