AWS Analysis of Medical Device Data using Data Lake

This solution is built to allow ingestion of Medical Device Data into a Data Lake. The sources of data can be diverse. The current solution is designed for ingestion via files . It is recommended that you run the HIPAA QuickStart prior to running the scripts here.

Architecture

Data encryption provides a strong layer of security to protect data that you store within AWS services. AWS services can help you achieve ubiquitous encryption for data in transit as well as data at rest.

Ingestion

In the Ingestion segment this solution cerates the following component

Staging S3 Bucket : It will be used to ingest raw dataset from the source.

Data Processing

In the Data Processing segment, this solution creates the following components

SQS Queue : Holds list of files that are yet to be processed or in-processing
AWS Lambda : This processes 1 file at a time from the SQS Queue
DynamoDB : Holds the list of file names already processed. Used for duplicate file detection
SNS : Creates a topic and subscription for any error notification
Glue : Sample Glue job is created based on the file you will be downloading and saving to the S3 location later
S3 Processed Bucket : This will hold the raw files that are moved away from the Staging Bucket once processing is successful.

Data Lake

In the Data Lake segment this solution creates the following components

S3 Data Lake Bucket : This bucket holds the content of the data lake in the specified partition_schme : Metric/year/month/day/patient

The AWS Lake Formation component can be created using the instructions from here

Analytics & AI/ML

In the Analytics segment, this solution doesn't create any components. The architecture diagram is showing the possibilities of using the data from Data Lake .

This architecture uses IAM Policies for Service based access, KMS keys for encryption of S3 buckets, DynamoDB, Parameter Store The Parameter Store is used to save the values of

DataLake Bucket Name
Data Lake folder Name
Processed Bucket Name
Processed Bucket Folder Name
SQS Queue Name
Failure Notification ARN
Athena Database to use for creating the tables.

The processing Logs will be recorded in the Cloudwatch logs.

Installation

This script does not create the VPC, subnets , route tables etc.

Please ensure that you have run the HIPAA QuickStart

To get started now, just sign in to your AWS account and create a stack based on criteria below.

git clone git@github.com:aws-samples/analysis-of-medical-device-data-using-data-lake.git
Upload the sample heart_rate_job.py to your S3 bucket and copy the location cd analysis-of-medical-device-data-using-data-lake aws s3 cp heart_rate_job.py s3://[YOUR-BUCKET-NAME-HERE]

Copy the location of the job file s3://[YOUR-BUCKET-NAME-HERE]/heart_rate_job.py

If you want to ensure that all traffic to your AWS resources is within the AWS Network, use the script "Cloudformation_WithVPC.json". It will create VPC Endpoints for SQS, S3, DynamoDB, Glue, Athena, SSM.
If you don't have to ensure all traffic to your AWS resorces is not to be restricted to AWS Network, you can use the script "Cloudformation_WithoutVPC.json". It will create same resources as
Supply all the parameters as required .

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License. See the LICENSE file.

Name	Name	Last commit message	Last commit date
Latest commit vishallakhotia Add files via upload Mar 9, 2023 f2267a8 · Mar 9, 2023 History 15 Commits
images	images	Initial checkin	Nov 16, 2020
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Initial commit	Nov 6, 2020
CONTRIBUTING.md	CONTRIBUTING.md	Initial commit	Nov 6, 2020
Cloudformation_WithVPC.json	Cloudformation_WithVPC.json	Add files via upload	Mar 9, 2023
Cloudformation_WithoutVPC.json	Cloudformation_WithoutVPC.json	Add files via upload	Mar 9, 2023
LICENSE	LICENSE	Initial commit	Nov 6, 2020
LICENSE-LAMBDA.txt	LICENSE-LAMBDA.txt	Initial checkin	Nov 16, 2020
LICENSE-SUMMARY.txt	LICENSE-SUMMARY.txt	Initial checkin	Nov 16, 2020
NOTICE	NOTICE	Initial commit	Nov 6, 2020
P1.20201012.heart_rate.csv	P1.20201012.heart_rate.csv	Initial checkin	Nov 16, 2020
P1.20201012.heartrate.csv.license.txt	P1.20201012.heartrate.csv.license.txt	Initial checkin	Nov 16, 2020
README.md	README.md	bug fixes	Dec 8, 2020
heart_rate_job.py	heart_rate_job.py	capturing the error when the the parameters arent available and setti…	Dec 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

AWS Analysis of Medical Device Data using Data Lake

Architecture

Ingestion

Data Processing

Data Lake

Analytics & AI/ML

Installation

Security

License

About

Licenses found

Releases

Packages

Contributors 2

Languages

License

aws-samples/analysis-of-medical-device-data-using-data-lake

Folders and files

Latest commit

History

Repository files navigation

AWS Analysis of Medical Device Data using Data Lake

Architecture

Ingestion

Data Processing

Data Lake

Analytics & AI/ML

Installation

Security

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages