-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Pattern: aws-kinesisstreams-gluejob #122
Changes from 104 commits
18b45fc
15930ce
309cce6
e024476
73062c8
0263463
5f77445
ec1b1a6
726ea01
37b6ffd
e931fdd
8cb3e26
da90d53
0f4d7e2
a8f03e2
395af14
97e8797
a7e3529
2e53b39
47a0332
b10c28d
fc9d3c3
50af095
67b5ad1
27c0401
b21836c
7a5709a
7050d22
f57b68f
3f05829
47676ef
5dcf45e
94f7175
b1b8640
979c477
854b0c6
b07d151
a300ebe
f6c464e
9e9d4e1
76d1e41
3793c73
341a6bf
743d971
bab75a4
4cb8edc
56e1cca
4aad13b
2b4bea3
b8915a3
f735cac
6253179
9322de0
993bde4
b664909
4730152
58773c5
ef8436e
0a02004
872650f
4bc12b6
80fd800
a6f4d4e
9e7b780
4df7b77
0d71ad4
58d09b0
5e5f142
a8c70e4
9a4d955
3f09960
536dd03
49502ab
32ff18a
190e9ce
83d5486
8325790
e668827
d57f3b1
7a8c0df
8a7cec9
22bc12d
aa0bccd
5a96af4
b04003d
b867dd7
d2b92ca
6a7f40a
a067304
97979c5
34eeae8
438d31e
8e89453
90c5cb5
fc15d89
b8691a5
d1c44cd
7d3ffe6
62068e6
7f3f148
fde06ae
dc83095
6b0ef07
7ad1ca6
14ffdb0
1d84029
5783f5e
f40ef76
e64e742
8100d19
0fc781f
c3ab317
6a7032e
afb173f
8ff7c8c
342f3de
7b2264b
1830784
5a82e15
844620b
0e2c9c3
7211d7d
24779df
3291b5e
6928f6f
75a869f
cfd9836
8c12c6c
285dbfa
7a8a8d1
a489345
fd6c92d
649e629
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
lib/*.js | ||
test/*.js | ||
*.d.ts | ||
coverage | ||
test/lambda/index.js |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
lib/*.js | ||
test/*.js | ||
!test/lambda/* | ||
*.js.map | ||
*.d.ts | ||
node_modules | ||
*.generated.ts | ||
dist | ||
.jsii | ||
|
||
.LAST_BUILD | ||
.nyc_output | ||
coverage | ||
.nycrc | ||
.LAST_PACKAGE | ||
*.snk |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Exclude typescript source and config | ||
*.ts | ||
tsconfig.json | ||
coverage | ||
.nyc_output | ||
*.tgz | ||
*.snk | ||
*.tsbuildinfo | ||
|
||
# Include javascript files and typescript declarations | ||
!*.js | ||
!*.d.ts | ||
|
||
# Exclude jsii outdir | ||
dist | ||
|
||
# Include .jsii | ||
!.jsii | ||
|
||
# Include .jsii | ||
!.jsii |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
# aws-kinesisstreams-gluejob module | ||
|
||
<!--BEGIN STABILITY BANNER--> | ||
|
||
--- | ||
|
||
 | ||
|
||
> All classes are under active development and subject to non-backward compatible changes or removal in any | ||
> future version. These are not subject to the [Semantic Versioning](https://semver.org/) model. | ||
> This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package. | ||
|
||
--- | ||
|
||
<!--END STABILITY BANNER--> | ||
|
||
| **Reference Documentation**: | <span style="font-weight: normal">https://docs.aws.amazon.com/solutions/latest/constructs/</span> | | ||
| :--------------------------- | :------------------------------------------------------------------------------------------------ | | ||
|
||
<div style="height:8px"></div> | ||
|
||
| **Language** | **Package** | | ||
| :--------------------------------------------------------------------------------------------- | ------------------------------------------------------------- | | ||
|  Python | `aws_solutions_constructs.aws_kinesis_streams_gluejob` | | ||
|  Typescript | `@aws-solutions-constructs/aws-kinesisstreams-gluejob` | | ||
|  Java | `software.amazon.awsconstructs.services.kinesisstreamgluejob` | | ||
|
||
This AWS Solutions Construct deploys a Kinesis Stream and configures a AWS Glue Job to perform custom ETL transformation with the appropriate resources/properties for interaction and security. It also creates an S3 bucket where the python script for the AWS Glue Job can be uploaded | ||
|
||
Here is a minimal deployable pattern definition in Typescript: | ||
|
||
```javascript | ||
const fieldSchema: CfnTable.ColumnProperty [] = [{ | ||
name: 'id', | ||
type: 'int', | ||
comment: 'Identifier for the record', | ||
}, | ||
{ | ||
name: 'name', | ||
type: 'string', | ||
comment: 'Name for the record', | ||
}, | ||
{ | ||
name: 'address', | ||
type: 'string', | ||
comment: 'Address for the record', | ||
}, | ||
{ | ||
name: 'value', | ||
type: 'int', | ||
comment: 'Value for the record', | ||
}, | ||
] | ||
); | ||
|
||
const _customEtlJob = new KinesisStreamGlueJob(this, 'CustomETL', { | ||
glueJobProps: { | ||
command: { | ||
name: 'gluestreaming', | ||
pythonVersion: '3', | ||
scriptLocation: new Asset(this, 'ScriptLocation', { | ||
path: `${__dirname}/../etl/transform.py` | ||
}).s3ObjectUrl | ||
} | ||
}, | ||
fieldSchema: fieldSchema | ||
} | ||
}); | ||
|
||
``` | ||
|
||
## Initializer | ||
|
||
```text | ||
new KinesisStreamGlueJob(scope: Construct, id: string, props: KinesisStreamsToLambdaProps); | ||
``` | ||
|
||
_Parameters_ | ||
|
||
- scope [`Construct`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_core.Construct.html) | ||
- id `string` | ||
- props [`KinesisStreamGlueJobProps`](#pattern-construct-props) | ||
|
||
## Pattern Construct Props | ||
|
||
| **Name** | **Type** | **Description** | | ||
| :------------------ | :---------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| kinesisStreamProps? | [`kinesis.StreamProps`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-kinesis.StreamProps.html) | Optional user-provided props to override the default props for the Kinesis stream. | | ||
| existingStreamObj? | [`kinesis.Stream`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-kinesis.Stream.html) | Existing instance of Kinesis Stream, if this is set then kinesisStreamProps is ignored. | | ||
| glueJobProps? | [`cfnJob.CfnJobProps`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-glue.CfnJobProps.html) | User provided props to override the default props for the CfnJob. | | ||
| existingGlueJob? | [`cfnJob.CfnJob`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-glue.CfnJob.html) | Existing CfnJob configuration, if this this set then glueJobProps is ignored. | | ||
| fieldSchema? | [`CfnTable.ColumnProperty[]`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-glue.CfnTable.ColumnProperty.html) | Glue Database for this construct. If not provided the construct will create a new Glue Database. The database is where the schema for the data in Kinesis Data Streams is stored | | ||
| database? | [`CfnDatabase`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-glue.CfnDatabase.html) | Glue Database for this construct. If not provided the construct will create a new Glue Database. The database is where the schema for the data in Kinesis Data Streams is stored | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we allow the client to pass existingDatabase, we should allow them the alternative of passing database props. Same for table (and also for bucket which we noted below). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, updating the interface. |
||
| table? | [`CfnTable`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-glue.CfnTable.html) | Glue Table for this construct, If not provided the construct will create a new Table in the database. This table should define the schema for the records in the Kinesis Data Streams. Either @table or @fieldSchema is mandatory. If @table is provided then @fieldSchema is ignored | | ||
| outputDataStore? | [`SinkDataStoreProps`](#sinkdatastoreprops) | The output datastore properties where the ETL output should be. Current datastore types suported is only S3. written | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Incomplete sentence or typo at the end There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, good catch, will be removed. |
||
|
||
## SinkDataStoreProps | ||
|
||
| **Name** | **Type** | **Description** | | ||
| :-------------- | :-------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| s3OutputBucket? | [`Bucket`](https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-s3.Bucket.html) | The output S3 location where the data should be written. The provided S3 bucket will be used to pass the output location to the etl script as an argument to the AWS Glue job. If not provided, the construct will create one | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. name should start with existing If we accept an existing bucket, we should also accept a BucketProps object. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updating interface prop names |
||
| datastoreStype | [`SinkStoreType`](#sinkstoretype) | Sink data store type | | ||
|
||
## SinkStoreType | ||
|
||
Enumeration of data store types that could include S3, DynamoDB, DocumentDB, RDS or Redshift. Current construct implementation only supports S3, but potential to add other output types in the future | ||
|
||
| **Name** | **Type** | **Description** | | ||
| :------- | :------- | --------------- | | ||
| S3 | `string` | S3 storage type | | ||
|
||
# Default settings | ||
|
||
Out of the box implementation of the Construct without any override will set the following defaults: | ||
|
||
### Amazon Kinesis Stream | ||
|
||
- Configure least privilege access IAM role for Kinesis Stream | ||
- Enable server-side encryption for Kinesis Stream using AWS Managed KMS Key | ||
- Deploy best practices CloudWatch Alarms for the Kinesis Stream | ||
|
||
### Glue Job | ||
|
||
- Create a Glue Security Config that configures encryption for CloudWatch, Job Bookmarks, and S3. CloudWatch and Job Bookmarks are encrypted using AWS Managed KMS Key created for AWS Glue Service. The S3 bucket is configured with SSE-S3 encryption mode | ||
- Configure service role policies that allow AWS Glue to read from Kinesis Data Streams. | ||
|
||
### Glue Database | ||
|
||
- A Glue database to add the table required to define a Glue Table structure and schema for the Kinesis stream | ||
|
||
### Glue Table | ||
|
||
- A Table with storage descriptor and table input properties using the schema details provided for the records in the Kinesis Data Streams | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The description sounds too complicated and confusing, can it be simplified ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, updating the readme |
||
|
||
### IAM Role | ||
|
||
- A job execution role that has privileges to read the ETL script from the S3 bucket location, read from the Kinesis Stream, and execute the Glue Job. The permissions to write to a specific location, is not configured by the construct. | ||
|
||
### Output S3 Bucket | ||
|
||
- An S3 bucket to store the output of the ETL transformation. This bucket will be passed as an argument to the created glue job so that it can be used in the ETL script to write data into it | ||
|
||
## Architecture | ||
|
||
 | ||
|
||
## Reference Implementation | ||
|
||
A sample use case which uses this pattern is available under [`use_cases/aws-custom-glue-etl`](https://github.com/awslabs/aws-solutions-constructs/tree/master/source/use_cases/aws-custom-glue-etl). | ||
|
||
© Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove/rollback these changes, it's not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing all the package.json entries