Skip to content

AWS-Big-Data-Projects/AWS_File_Trans_Lamda_S3_SNS

 
 

Repository files navigation

TUTORIAL ON AMAZON WEB SERVICES (AWS) S3 EVENT-TRIGGERED LAMBDA FUNCTION AND SNS NOTIFICATION

Date : 28/08/2022

 

Project Architecture

  • The Workflow of this project is shown below depicting how the following AWS solutions are used and can aid your data engineering processes. AWS provides cloud serveless compute solutions like Lambda & Glue, storage services like s3 - (Simple Storage Service) alongside an array of pay-as-you-go services which are cheap and affordable.  

Project Workflow

 

About The Project

Occassionally, as part of your ETL/ELT process as a data engineer, you would want to save raw data from apllications to your data lake in an unprocessed format (bronze form) and then begin downstream processing to either silver format or even a usable format for analytics like loading to an RDBMS (Relational Database Management System). However, on some occassions you may want the following ;

  • If the process would be recurrent, you may want to make it fully automated.
  • Trigger one event after another
  • To get notifications when a task is complete/ encounters a failure.

In the above workflow, we assume you already have a loading system in place that ingests files into your s3 bucket in json format . Our sample file is the transactions.json file. The Automation is as follows;

  • Immediately a file arrives the raw zone of your s3 bucket (data lake).
  • An s3 Put event is Triggered and subsequently makes a call to a Lambda function .
    • N/B : Lambda is a serveless compute solution that can run your workloads . It can use either a python or node.js runtime as at the time of this writing. So in this case we have a python script running on Lambda to process our json data to a more readable fromat (.csv) which can easily be loaded to a RDBMS/Data Warehouse (eg; AWS Redshift, for downstream queries by data analysts/BI analysts). or for raw access by your business teams.
  • Lastly once a transaction stream is processed and written to the processed zone of our s3 bucket, we want our s3 bucket put event to also trigger an email using SNS (Simple Notification Service) to let us know some details about the workload like runtime, and some metadata of what was put into the s3 bucket.
    • N/B : SNS helps us create Topics that users can subscribe to in order to be able to receive simple notification on events. This can be either email or SMS.
  • We also want to be able tocheck our logs using Cloudwatch.
    • N/B : Cloudwatch is a monitoring and observability service that helps us monitor applications/infrastructure on AWS. In simple terms, this service allows us to see logs for events used on AWS. Each log is stored with a corresponding timestamp for easy tracking.

Prerequisites

- An AWS account (Free)
- Basic Understanding of the above workflow and what we are tring to achieve
- Lambda code in `lambda_function.py` - from repository
- Sample file - `transactions.json` - from repository

LET'S GET STARTED

 

STEP 1 - Set Up an AWS s3 Bucket with two objects (folders) raw & processed

Go to your AWS console and search for S3 follow the steps in the image below.

  • a) Project Workflow 1

  • b) Project Workflow 2

  • c) Project Workflow 3

  • d) Cheers..., Now we are done with setting up our s3 Bucket. We'd come bact to it in STEP 5.

 

STEP 2 - Set up Lambda Function

Now let us set up our lambda function which is going to be at the heart of our data transformation. This Function would parse/deserialize the json object's into a pandas dataframe (in memory) and subsequently write it to the /processed directory of our s3 bucket. On your console, search for Lambda and follow the steps below.

  • a) Project Workflow 5

  • b) - While creating the Lambda Function, set up IAM role for the lambda function On AWS , IAM stands for Identity Access Management. and is used to define roles an dpolicies which different users/services can assume within your cloud infrastructure. It helps with manageing secure access to services . Read/write access and scope can be configured on a defined role and also be re-used across similar workflows on AWS cloud. Below we simply call it lambdas3ReadRole .   Project Workflow 6

  • c) Copy the script from the lambda_function.py file on this github repository and paste it within that of your newly created lambda function. Project Workflow 7

  • d) In the Next 5 steps, we would modify the policy for the role we created for our Lambda function to also allow Lambda access to write to s3. Note that Failure to do this will result in an 'Access Denied Error' as seen here

  • i ] Project Workflow 8

  • ii ] Project Workflow 9

  • iii ] Project Workflow 10

  • iv ] Since this policy is for Amazon s3 bucket we search for s3 Project Workflow 11

  • v ] Project Workflow 12

  • vi ] Project Workflow 13

  • vii ] Project Workflow 14

  • viii ] Lastly review the policy to be sure it is in order. Project Workflow 15

 

For our Lambda Function to run properly, we would need to import the pandas library in a layer. This is because, by default, the pandas library does not come with the python 3.8 runtime on AWS. Below is a sample of Runtime error we would encounter if we tried to run our lambda function before adding the virtual environment layer containg pandas.

  • ix] Project Workflow 16

 

While we can import this environment layer as a zipfile, To easily package this layer on AWS, we can run some commands on Cloud9 against an EC-2 Instance.

STEP 3 : Package Python virtual environment as a Layer with Cloud9 interface on an EC2 Instance.

Navigate back to your console, and search for cloud9 and follw the steps in the images below.

  • i ] Project Workflow 17

  • ii ] Project Workflow 18

  • iii ] Project Workflow 19

  • iv ] Project Workflow 20

 

  • v ] Once you are done with the tasks above , click on 'Open IDE' to open up the command session of your EC-2 instance where you can now run your commands.

  • vi ] Next, run the commands outlined in the cloud9_IDE_scripts.sh file within the github repository. Copy and run them line by line.

  • vii ] After the above execution , this layer should now be accessible to our Lambda Function.  

STEP 4 - ADD Layer to LAMBDA Function

In this step we would add the Layer created in Step 3 above to our Lambda Function. To do this, follow the images below;  

  • i ] On your Lambda Function Page, Scroll down to the Layer Section. Project Workflow 21

  • ii ] Project Workflow 22

  • iii ] Project Workflow 23

STEP 5 - Configure s3 Events for Lambda

For Lambda to be triggered when an event occurs in s3, we need to configire our s3 Events. In this case, we are interested in Put Events. Navigate back to your earlier created s3 bucket and go to the Event Notification section and follow the steps below;

  • i ] Project Workflow 24

  • ii ] Project Workflow 25

  • iii ] Project Workflow 26

  • iv ] Project Workflow 27

STEP 6 - Set Up SNS Notification

In Order for us to receive Mail alerts when our file is processed, we would create an SNS Topic to do this for us. On your Console, search for SNS and Follow the steps in the images below;

  • i ] Project Workflow 28

  • ii ] Project Workflow 29

  • iii ] Project Workflow 30

  • iv ] Immediately after the subscription above , navigate to your inbox to find a mail that looks like the one shown below. Click Confirm. Project Workflow 31

  • v ] After confirmation, you should see a page that looks like the one shown below. Project Workflow 32

  • vi ] When you go back to the SNS Console, you should now see the confirmed status on that subscription. It should look similar to the image shown below. Project Workflow 33

 

STEP 7 - Configure s3 Events for SNS

Lastly ,For Our SNS to be able to send messages whn an event occurs in the processed section of our s3 bucket, we would need to configure a new even notification on our s3 bucket similar to what we did for the /raw section for our Lambda Function.

On your Console, Navigate back to the s3 Bucket created earlier and go to the Event Notification section, then follow the steps in the images below;

  • i ] Project Workflow 34

  • ii ] Project Workflow 35

EXECUTION

With everything now set up, we can go ahead to upload a sample json file into the /raw directory of our s3 bucket. Use the file called transactions.json from the github repository. Follow the steps below to see the Magic afterwards !! ..

  1. a] Navigate to s3 Project Workflow 36

    b] Select the file and Upload. Project Workflow 37

OBSERVATION

Now that we are done with this, let's navigate to Cloudwatch to see what has occurred almost instantly.

  1. Go through the Lambda Function Project Workflow 38

  2. View Project Workflow 39

  3. View Project Workflow 40

  4. View Log breakdown Project Workflow 41

 

WE can also go to our S3 bucket to see if a processed file landed in our /processed directory.

Project Workflow 42

A sample of the File shown above is accessible here.

Oh Yes... And Lastly,, your email notification must have landed almost instantly.....

Within your inbox, you should see a mail similar to the one shown below.

Project Workflow 42

CLOSING NOTES

  • Congratulations on Successfully completing this project on AWS. Amazon Web Services like other major cloud providers, gives an array of cloud computing services which can greatly increase efficiency of engineering teams within any establishment.

  • You might have run into errors while executing this project, That is part of the learning/implementation journey. Do well to check out AWS Community pages /Faq pages for solutions to your errors. You can also contact me via Linkedin @viciwuoha.

  • Depending on your Use Case for similar operations, AWS advices that you use two different s3 buckets to avoid conflicts during transactions within one bucket. However, this use case is a sample and can cover for that.

  • In the course of developing this project, this video from Be A Better Dev on Youtube was helpful. You can check it out.

  • If you are not using this in production, Remember to delete the resources you used for this project to avoid encountering unprecedented bills. In my case, i used these services on the Free Tier provided by AWS, hence my bill was $0 as seen here.

About

AWS Data Engineering Project using Lambda, S3 and SNS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.4%
  • Shell 19.6%