Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Pattern: aws-kinesisstreams-gluejob #122

Merged
merged 133 commits into from
Feb 12, 2021
Merged
Show file tree
Hide file tree
Changes from 104 commits
Commits
Show all changes
133 commits
Select commit Hold shift + click to select a range
18b45fc
initial commit for construct
knihit Nov 7, 2020
15930ce
initial commit for construct
knihit Nov 7, 2020
309cce6
adding a new construct
knihit Nov 13, 2020
e024476
adding a new construct
knihit Nov 13, 2020
73062c8
Merge remote-tracking branch 'upstream/master'
knihit Nov 19, 2020
0263463
updates to construct default glue attributes
knihit Nov 30, 2020
5f77445
updating package.json
knihit Dec 1, 2020
ec1b1a6
updating ts files
knihit Dec 2, 2020
726ea01
adding unit tests and integ tests
knihit Dec 3, 2020
37b6ffd
adding unit tests and integ tests
knihit Dec 3, 2020
e931fdd
adding unit tests and helper methods to complete the construct
knihit Dec 4, 2020
8cb3e26
adding unit tests
knihit Dec 5, 2020
da90d53
Merge remote-tracking branch 'upstream/master'
knihit Dec 5, 2020
0f4d7e2
adding unit tests
knihit Dec 5, 2020
a8f03e2
adding unit tests
knihit Dec 5, 2020
395af14
fix for linting error
knihit Dec 9, 2020
97e8797
adding glue job example
knihit Dec 10, 2020
a7e3529
update readme.md
knihit Dec 10, 2020
2e53b39
update snapshots and kms arn
knihit Dec 10, 2020
47a0332
update description in package.json
knihit Dec 10, 2020
b10c28d
integ test cases
knihit Dec 10, 2020
fc9d3c3
added simulator for kinesis stream
knihit Dec 11, 2020
50af095
added simulator for kinesis stream
knihit Dec 11, 2020
67b5ad1
integ test cases
knihit Dec 11, 2020
27c0401
updating construct for scriptlocation
knihit Dec 11, 2020
b21836c
updating python generator file
knihit Dec 11, 2020
7a5709a
updating example code
knihit Dec 12, 2020
7050d22
updating example code
knihit Dec 12, 2020
f57b68f
updating example code
knihit Dec 12, 2020
3f05829
updating example code
knihit Dec 12, 2020
47676ef
updating example code
knihit Dec 12, 2020
5dcf45e
updating example code
knihit Dec 12, 2020
94f7175
updating example code
knihit Dec 12, 2020
b1b8640
update README and Architecture
knihit Dec 12, 2020
979c477
updating README
knihit Dec 12, 2020
854b0c6
updating README
knihit Dec 12, 2020
b07d151
updating construct
knihit Dec 14, 2020
a300ebe
updating construct
knihit Dec 14, 2020
f6c464e
updating construct
knihit Dec 14, 2020
9e9d4e1
updating construct
knihit Dec 15, 2020
76d1e41
updating construct
knihit Dec 15, 2020
3793c73
updating construct
knihit Dec 15, 2020
341a6bf
updating construct
knihit Dec 15, 2020
743d971
updating construct
knihit Dec 15, 2020
bab75a4
updating construct
knihit Dec 15, 2020
4cb8edc
updating sample
knihit Dec 15, 2020
56e1cca
updating construct
knihit Dec 16, 2020
4aad13b
updating construct
knihit Dec 16, 2020
2b4bea3
updating construct
knihit Dec 16, 2020
b8915a3
updating construct
knihit Dec 16, 2020
f735cac
updating sample
knihit Dec 16, 2020
6253179
merging changes
knihit Dec 16, 2020
9322de0
updating sample
knihit Dec 16, 2020
993bde4
updating construct kinesis policy
knihit Dec 16, 2020
b664909
updating construct kinesis policy
knihit Dec 17, 2020
4730152
updating construct kinesis policy
knihit Dec 17, 2020
58773c5
updating construct kinesis policy
knihit Dec 17, 2020
ef8436e
updating construct kinesis policy
knihit Dec 17, 2020
0a02004
updating construct kinesis policy
knihit Dec 17, 2020
872650f
updating construct kinesis policy
knihit Dec 17, 2020
4bc12b6
updating construct
knihit Dec 17, 2020
80fd800
updating construct kinesis policy
knihit Dec 17, 2020
a6f4d4e
updating construct kinesis policy
knihit Dec 17, 2020
9e7b780
updating construct kinesis policy
knihit Dec 17, 2020
4df7b77
updating construct kinesis policy
knihit Dec 17, 2020
0d71ad4
updating construct policy
knihit Dec 17, 2020
58d09b0
updating readme
knihit Dec 17, 2020
5e5f142
Merge remote-tracking branch 'upstream/master'
knihit Dec 29, 2020
a8c70e4
Merge remote-tracking branch 'upstream/master'
knihit Jan 1, 2021
9a4d955
updated README
knihit Jan 5, 2021
3f09960
updated README
knihit Jan 5, 2021
536dd03
updated README
knihit Jan 5, 2021
49502ab
updated README
knihit Jan 5, 2021
32ff18a
updated README
knihit Jan 5, 2021
190e9ce
updated README
knihit Jan 5, 2021
83d5486
updated README
knihit Jan 5, 2021
8325790
updated README
knihit Jan 5, 2021
e668827
updated README
knihit Jan 5, 2021
d57f3b1
updates based on review
knihit Jan 6, 2021
7a8c0df
incoporating review comments
knihit Jan 6, 2021
8a7cec9
incoporating review comments
knihit Jan 6, 2021
22bc12d
incoporating review comments
knihit Jan 6, 2021
aa0bccd
incoporating review comments
knihit Jan 6, 2021
5a96af4
incoporating review comments
knihit Jan 6, 2021
b04003d
update to construct
knihit Jan 7, 2021
b867dd7
update to construct
knihit Jan 8, 2021
d2b92ca
update to construct
knihit Jan 8, 2021
6a7f40a
update to construct
knihit Jan 8, 2021
a067304
update to construct
knihit Jan 8, 2021
97979c5
update to construct
knihit Jan 8, 2021
34eeae8
refactoring code to match construct patterns
knihit Jan 15, 2021
438d31e
refactoring code to match construct patterns
knihit Jan 15, 2021
8e89453
refactoring code to match construct patterns
knihit Jan 15, 2021
90c5cb5
refactoring code to match construct patterns
knihit Jan 15, 2021
fc15d89
refactoring code to match construct patterns
knihit Jan 15, 2021
b8691a5
refactoring code to match construct patterns
knihit Jan 15, 2021
d1c44cd
refactoring code to match construct patterns
knihit Jan 15, 2021
7d3ffe6
fix for readme file
knihit Jan 19, 2021
62068e6
Merge remote-tracking branch 'upstream/master'
knihit Jan 19, 2021
7f3f148
eslint fixes
knihit Jan 19, 2021
fde06ae
update sample after refactoring construct
knihit Jan 19, 2021
dc83095
update sample after refactoring construct
knihit Jan 20, 2021
6b0ef07
remove _ from variable names
knihit Jan 20, 2021
7ad1ca6
updating glue version 2.0 as recommended in Glue service documentation
knihit Jan 20, 2021
14ffdb0
removed snapshot (that was not required) to fix build failure
knihit Jan 21, 2021
1d84029
update viperlight to fix build issue
knihit Jan 21, 2021
5783f5e
update viperlight to fix build issue
knihit Jan 21, 2021
f40ef76
cfn_nag fix for build issues
knihit Jan 21, 2021
e64e742
merge 1.82.0
knihit Jan 21, 2021
8100d19
updating integ snapshots
knihit Jan 21, 2021
0fc781f
updating integ snapshots
knihit Jan 21, 2021
c3ab317
Merge branch 'master' into master
knihit Jan 21, 2021
6a7032e
Merge remote-tracking branch 'upstream/master'
knihit Jan 21, 2021
afb173f
Merge branch 'master' of github.com:knihit/aws-solutions-constructs
knihit Jan 21, 2021
8ff7c8c
update header
knihit Jan 21, 2021
342f3de
update header
knihit Jan 21, 2021
7b2264b
incorporate publisher review comments
knihit Feb 3, 2021
1830784
Merge remote-tracking branch 'upstream/master'
knihit Feb 3, 2021
5a82e15
incorporate publisher review comments
knihit Feb 3, 2021
844620b
incorporate publisher review comments
knihit Feb 3, 2021
0e2c9c3
reorganization unitt tests
knihit Feb 3, 2021
7211d7d
reorganization unitt tests
knihit Feb 4, 2021
24779df
reorganization unitt tests
knihit Feb 4, 2021
3291b5e
reorganization unitt tests
knihit Feb 4, 2021
6928f6f
reorganization unitt tests
knihit Feb 4, 2021
75a869f
reorganization unitt tests
knihit Feb 4, 2021
cfd9836
incorporate review comments
knihit Feb 5, 2021
8c12c6c
fix for output s3 bucket
knihit Feb 5, 2021
285dbfa
fix for output s3 bucket
knihit Feb 5, 2021
7a8a8d1
fix for output s3 bucket
knihit Feb 5, 2021
a489345
update README with the correct class name
knihit Feb 5, 2021
fd6c92d
Merge remote-tracking branch 'upstream/master'
knihit Feb 8, 2021
649e629
updating README
knihit Feb 8, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 48 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,51 @@ source/patterns/**/tsconfig.json
deployment/dist/*
.DS_Store

*.pptx
*.pptx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove/rollback these changes, it's not needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing all the package.json entries

package-lock.json
source/package.json
source/patterns/@aws-solutions-constructs/aws-apigateway-dynamodb/package.json
source/patterns/@aws-solutions-constructs/aws-apigateway-iot/package.json
source/patterns/@aws-solutions-constructs/aws-apigateway-kinesisstreams/package.json
source/patterns/@aws-solutions-constructs/aws-apigateway-lambda/package.json
source/patterns/@aws-solutions-constructs/aws-apigateway-sagemakerendpoint/package.json
source/patterns/@aws-solutions-constructs/aws-apigateway-sqs/package.json
source/patterns/@aws-solutions-constructs/aws-cloudfront-apigateway/package.json
source/patterns/@aws-solutions-constructs/aws-cloudfront-apigateway-lambda/package.json
source/patterns/@aws-solutions-constructs/aws-cloudfront-s3/package.json
source/patterns/@aws-solutions-constructs/aws-cognito-apigateway-lambda/package.json
source/patterns/@aws-solutions-constructs/aws-dynamodb-stream-lambda/package.json
source/patterns/@aws-solutions-constructs/aws-dynamodb-stream-lambda-elasticsearch-kibana/package.json
source/patterns/@aws-solutions-constructs/aws-events-rule-kinesisfirehose-s3/package.json
source/patterns/@aws-solutions-constructs/aws-events-rule-kinesisstreams/package.json
source/patterns/@aws-solutions-constructs/aws-events-rule-lambda/package.json
source/patterns/@aws-solutions-constructs/aws-events-rule-sns/package.json
source/patterns/@aws-solutions-constructs/aws-events-rule-sqs/package.json
source/patterns/@aws-solutions-constructs/aws-events-rule-step-function/package.json
source/patterns/@aws-solutions-constructs/aws-iot-kinesisfirehose-s3/package.json
source/patterns/@aws-solutions-constructs/aws-iot-lambda/package.json
source/patterns/@aws-solutions-constructs/aws-iot-lambda-dynamodb/package.json
source/patterns/@aws-solutions-constructs/aws-kinesisfirehose-s3/package.json
source/patterns/@aws-solutions-constructs/aws-kinesisfirehose-s3-and-kinesisanalytics/package.json
source/patterns/@aws-solutions-constructs/aws-kinesisstream-gluejob/package.json
source/patterns/@aws-solutions-constructs/aws-kinesisstreams-kinesisfirehose-s3/package.json
source/patterns/@aws-solutions-constructs/aws-kinesisstreams-lambda/package.json
source/patterns/@aws-solutions-constructs/aws-lambda-dynamodb/package.json
source/patterns/@aws-solutions-constructs/aws-lambda-elasticsearch-kibana/package.json
source/patterns/@aws-solutions-constructs/aws-lambda-s3/package.json
source/patterns/@aws-solutions-constructs/aws-lambda-sagemaker/package.json
source/patterns/@aws-solutions-constructs/aws-lambda-sns/package.json
source/patterns/@aws-solutions-constructs/aws-lambda-sqs/package.json
source/patterns/@aws-solutions-constructs/aws-lambda-sqs-lambda/package.json
source/patterns/@aws-solutions-constructs/aws-lambda-step-function/package.json
source/patterns/@aws-solutions-constructs/aws-s3-lambda/package.json
source/patterns/@aws-solutions-constructs/aws-s3-step-function/package.json
source/patterns/@aws-solutions-constructs/aws-sns-lambda/package.json
source/patterns/@aws-solutions-constructs/aws-sns-sqs/package.json
source/patterns/@aws-solutions-constructs/aws-sqs-lambda/package.json
source/patterns/@aws-solutions-constructs/core/package.json
source/tools/cdk-integ-tools/package.json
source/use_cases/aws-s3-static-website/package.json
source/use_cases/aws-serverless-image-handler/package.json
source/use_cases/aws-serverless-image-handler/lib/lambda/image-handler/package.json
source/use_cases/aws-serverless-web-app/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
lib/*.js
test/*.js
*.d.ts
coverage
test/lambda/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
lib/*.js
test/*.js
!test/lambda/*
*.js.map
*.d.ts
node_modules
*.generated.ts
dist
.jsii

.LAST_BUILD
.nyc_output
coverage
.nycrc
.LAST_PACKAGE
*.snk
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Exclude typescript source and config
*.ts
tsconfig.json
coverage
.nyc_output
*.tgz
*.snk
*.tsbuildinfo

# Include javascript files and typescript declarations
!*.js
!*.d.ts

# Exclude jsii outdir
dist

# Include .jsii
!.jsii

# Include .jsii
!.jsii
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# aws-kinesisstreams-gluejob module

<!--BEGIN STABILITY BANNER-->

---

![Stability: Experimental](https://img.shields.io/badge/stability-Experimental-important.svg?style=for-the-badge)

> All classes are under active development and subject to non-backward compatible changes or removal in any
> future version. These are not subject to the [Semantic Versioning](https://semver.org/) model.
> This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.

---

<!--END STABILITY BANNER-->

| **Reference Documentation**: | <span style="font-weight: normal">https://docs.aws.amazon.com/solutions/latest/constructs/</span> |
| :--------------------------- | :------------------------------------------------------------------------------------------------ |

<div style="height:8px"></div>

| **Language** | **Package** |
| :--------------------------------------------------------------------------------------------- | ------------------------------------------------------------- |
| ![Python Logo](https://docs.aws.amazon.com/cdk/api/latest/img/python32.png) Python | `aws_solutions_constructs.aws_kinesis_streams_gluejob` |
| ![Typescript Logo](https://docs.aws.amazon.com/cdk/api/latest/img/typescript32.png) Typescript | `@aws-solutions-constructs/aws-kinesisstreams-gluejob` |
| ![Java Logo](https://docs.aws.amazon.com/cdk/api/latest/img/java32.png) Java | `software.amazon.awsconstructs.services.kinesisstreamgluejob` |

This AWS Solutions Construct deploys a Kinesis Stream and configures a AWS Glue Job to perform custom ETL transformation with the appropriate resources/properties for interaction and security. It also creates an S3 bucket where the python script for the AWS Glue Job can be uploaded

Here is a minimal deployable pattern definition in Typescript:

```javascript
const fieldSchema: CfnTable.ColumnProperty [] = [{
name: 'id',
type: 'int',
comment: 'Identifier for the record',
},
{
name: 'name',
type: 'string',
comment: 'Name for the record',
},
{
name: 'address',
type: 'string',
comment: 'Address for the record',
},
{
name: 'value',
type: 'int',
comment: 'Value for the record',
},
]
);

const _customEtlJob = new KinesisStreamGlueJob(this, 'CustomETL', {
glueJobProps: {
command: {
name: 'gluestreaming',
pythonVersion: '3',
scriptLocation: new Asset(this, 'ScriptLocation', {
path: `${__dirname}/../etl/transform.py`
}).s3ObjectUrl
}
},
fieldSchema: fieldSchema
}
});

```

## Initializer

```text
new KinesisStreamGlueJob(scope: Construct, id: string, props: KinesisStreamsToLambdaProps);
```

_Parameters_

- scope [`Construct`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_core.Construct.html)
- id `string`
- props [`KinesisStreamGlueJobProps`](#pattern-construct-props)

## Pattern Construct Props

| **Name** | **Type** | **Description** |
| :------------------ | :---------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| kinesisStreamProps? | [`kinesis.StreamProps`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-kinesis.StreamProps.html) | Optional user-provided props to override the default props for the Kinesis stream. |
| existingStreamObj? | [`kinesis.Stream`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-kinesis.Stream.html) | Existing instance of Kinesis Stream, if this is set then kinesisStreamProps is ignored. |
| glueJobProps? | [`cfnJob.CfnJobProps`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-glue.CfnJobProps.html) | User provided props to override the default props for the CfnJob. |
| existingGlueJob? | [`cfnJob.CfnJob`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-glue.CfnJob.html) | Existing CfnJob configuration, if this this set then glueJobProps is ignored. |
| fieldSchema? | [`CfnTable.ColumnProperty[]`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-glue.CfnTable.ColumnProperty.html) | Glue Database for this construct. If not provided the construct will create a new Glue Database. The database is where the schema for the data in Kinesis Data Streams is stored |
| database? | [`CfnDatabase`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-glue.CfnDatabase.html) | Glue Database for this construct. If not provided the construct will create a new Glue Database. The database is where the schema for the data in Kinesis Data Streams is stored |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we allow the client to pass existingDatabase, we should allow them the alternative of passing database props.

Same for table (and also for bucket which we noted below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updating the interface.

| table? | [`CfnTable`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-glue.CfnTable.html) | Glue Table for this construct, If not provided the construct will create a new Table in the database. This table should define the schema for the records in the Kinesis Data Streams. Either @table or @fieldSchema is mandatory. If @table is provided then @fieldSchema is ignored |
| outputDataStore? | [`SinkDataStoreProps`](#sinkdatastoreprops) | The output datastore properties where the ETL output should be. Current datastore types suported is only S3. written |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete sentence or typo at the end written

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch, will be removed.


## SinkDataStoreProps

| **Name** | **Type** | **Description** |
| :-------------- | :-------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| s3OutputBucket? | [`Bucket`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-s3.Bucket.html) | The output S3 location where the data should be written. The provided S3 bucket will be used to pass the output location to the etl script as an argument to the AWS Glue job. If not provided, the construct will create one |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name should start with existing

If we accept an existing bucket, we should also accept a BucketProps object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating interface prop names

| datastoreStype | [`SinkStoreType`](#sinkstoretype) | Sink data store type |

## SinkStoreType

Enumeration of data store types that could include S3, DynamoDB, DocumentDB, RDS or Redshift. Current construct implementation only supports S3, but potential to add other output types in the future

| **Name** | **Type** | **Description** |
| :------- | :------- | --------------- |
| S3 | `string` | S3 storage type |

# Default settings

Out of the box implementation of the Construct without any override will set the following defaults:

### Amazon Kinesis Stream

- Configure least privilege access IAM role for Kinesis Stream
- Enable server-side encryption for Kinesis Stream using AWS Managed KMS Key
- Deploy best practices CloudWatch Alarms for the Kinesis Stream

### Glue Job

- Create a Glue Security Config that configures encryption for CloudWatch, Job Bookmarks, and S3. CloudWatch and Job Bookmarks are encrypted using AWS Managed KMS Key created for AWS Glue Service. The S3 bucket is configured with SSE-S3 encryption mode
- Configure service role policies that allow AWS Glue to read from Kinesis Data Streams.

### Glue Database

- A Glue database to add the table required to define a Glue Table structure and schema for the Kinesis stream

### Glue Table

- A Table with storage descriptor and table input properties using the schema details provided for the records in the Kinesis Data Streams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description sounds too complicated and confusing, can it be simplified ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updating the readme


### IAM Role

- A job execution role that has privileges to read the ETL script from the S3 bucket location, read from the Kinesis Stream, and execute the Glue Job. The permissions to write to a specific location, is not configured by the construct.

### Output S3 Bucket

- An S3 bucket to store the output of the ETL transformation. This bucket will be passed as an argument to the created glue job so that it can be used in the ETL script to write data into it

## Architecture

![Architecture Diagram](architecture.png)

## Reference Implementation

A sample use case which uses this pattern is available under [`use_cases/aws-custom-glue-etl`](https://github.com/awslabs/aws-solutions-constructs/tree/master/source/use_cases/aws-custom-glue-etl).

&copy; Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading