-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(glue-alpha): glue L2 alpha construct #30833
base: main
Are you sure you want to change the base?
Conversation
…-glue-l2 to mjanardhan/aws-cdk-glue-l2 to resolve non-trivial merge situation
Workflow Triggers
Added Scala jobs unit tests
fixed the unit test for scala etl and flex etl
WorkerType and numberofWorkers defaults are enforced when not set
Integ tests snapshots
…s-cdk-glue-l2 into glue-alpha-refacor
## Database | ||
|
||
A `Database` is a logical grouping of `Tables` in the Glue Catalog. | ||
|
||
```ts | ||
new glue.Database(this, 'MyDatabase', { | ||
databaseName: 'my_database', | ||
description: 'my_database_description', | ||
}); | ||
``` | ||
|
||
## Table | ||
|
||
A Glue table describes a table of data in S3: its structure (column names and types), location of data (S3 objects with a common prefix in a S3 bucket), and format for the files (Json, Avro, Parquet, etc.): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come the documentation for Database
, Table
and many others are being deleted here? The code files are not being modified, is it still WIP or will they be removed or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @natalie-white-aws and the removal of these docs are unintended and will be updated to add them back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this huge effort on re-writing L2 glue. Left some feedback on it. Still working my way to finish the review.
A general feedback on the README is to add more examples with usage.
|
||
There are 3 types of jobs supported by AWS Glue: Spark ETL, Spark Streaming, and Python Shell jobs. | ||
1. **ETL Jobs** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give an example of ETL job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were iterating on the examples; the code snippets we had in the README / RFC weren't compiling properly via pack so we had bias for action to get the actual code out for review to revisit the README later. There are certainly examples in the unit tests, but yes we need to get them back in the README to be compiled and included in documentation.
/** | ||
* The IAM role assumed by Glue to run this job. | ||
* | ||
* @default - undefined | ||
*/ | ||
readonly role?: iam.IRole; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary for the import attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking if it should be optional or if it should be removed? We do need a Role since every job requires an execution role, but if it's not provided we assign an UnknownPrincipal (line 468)
* @param id The construct's id. | ||
* @param attrs Attributes for the Glue Job we want to import | ||
*/ | ||
public static fromJobAttributes(scope: constructs.Construct, id: string, attrs: JobImportAttributes): IJob { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is fromXXX
function even needed on an abstract class where users won't be able to instantiate anyway? Should we consider adding it to the concrete class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the fromXXX function is to allow a CDK app to reference an existing resource in a read-only way, which is why the function and the attributes (which are the only ones that are common across all job types) are in the fromXXX function. For instance, a developer could create a Trigger to an existing job by referencing the job's ARN, but they wouldn't be able to change the job parameters, which are different across job types.
Is there a developer use case would require separate fromXXX functions based on the parameters that make Glue Jobs different, that the abstract / common ARN / Name reference doesn't support?
} | ||
|
||
private setupSparkUI(role: iam.IRole, sparkUiProps: SparkUIProps) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.role = props.role, { | ||
assumedBy: new iam.ServicePrincipal('glue.amazonaws.com'), | ||
managedPolicies: [iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSGlueServiceRole')], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you're using a comma operator to assign the values to this.role
. Then why not make role
optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in job.ts comment.
|
||
// Enable CloudWatch metrics and continuous logging by default as a best practice | ||
const continuousLoggingArgs = this.setupContinuousLogging(this.role, props.continuousLogging); | ||
const profilingMetricsArgs = { '--enable-metrics': '' }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as pyspark-etl
one.
This PR has been in the CHANGES REQUESTED state for 3 weeks, and looks abandoned. To keep this PR from being closed, please continue work on it. If not, it will automatically be closed in a week. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Thanks for addressing the comments. I'll give it another review this week or so. Meanwhile I want to let @TheRealAmazonKendra to also review this change as I believe she has the most context on the change. |
Issue # (if applicable)
RFC 0497-glue-l2-construct
Link: https://github.com/aws/aws-cdk-rfcs/blob/main/text/0497-glue-l2-construct.md
Reason for this change
Refactor existing Glue L2 alpha construct to adhere to CDK best practices, enforce Job types and parameters via interface constructs, and improve the developer experience pushing validations left and disallowing deprecated Glue and language versions.
Description of changes
The existing Job and Job Executable monoliths have been decomposed into Job Type and Language specific classes that implement and extend an abstract Job parent class. Developers will be able to see mandatory and optional parameters that apply just to their selected job type and language, rather than having to reference documentation and examples or find out during synth or deploy time that they've selected the wrong configuration.
BREAKING CHANGE: Developers must refactor their existing Job instantiation method calls to choose the right job type and language, and use the new constants static values to define the associated Job configuration settings. See the RFC and/or new README for examples.
Description of how you validated changes
We've added unit and integration tests for each valid job type and language combination and validated the creation of new Glue Jobs adhere to opinionated best practices such as continuous logging and runtime versions.
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license