-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(glue-alpha): include extra jars parameter in pyspark jobs #33238
Conversation
Exemption Request: no changes in README or integration tests needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This review is outdated)
One of the authors of the new L2 here - We talked about this during RFC and implementation phases as a potential anti-pattern. Can you share why you need extra jars for a python job? |
Hi Natalie, we need to use the spark-xml package in order to read XML files in Spark v3 (as you probably know, this package will be included in Spark v4). This package must be provided via the |
Thanks for the extra clarification. Let me get with the Glue service team; it sounds like this may be more of a Glue feature request than something we should work around in the L2 construct. Stay tuned. |
+1 to this. Some libraries that provide additional spark capabilities require a jar, even if one is actually using spark via python (pyspark). Here's a chatgpt-generated list of examples https://chatgpt.com/share/67a8e12d-ccd8-800e-a641-75e58db91d7b |
We had some internal discussions and (in addition to the data here) decided this is a valid use case. But we should add them to all 3 PySpark job types. |
@gontzalm Would you be able to add this change to all 3 pyspark job types? |
399e8df
to
6e15960
Compare
not from CDK team, but with the discussion I started in #33356, I would suggest also supporting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
@humanzz I think that's a good point, but I don't think it's worth blocking to merge this PR. If anyone is interested in contributing |
✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #33238 +/- ##
=======================================
Coverage 82.16% 82.16%
=======================================
Files 119 119
Lines 6857 6857
Branches 1157 1157
=======================================
Hits 5634 5634
Misses 1120 1120
Partials 103 103
Flags with carried forward coverage won't be shown. Click here to find out more.
|
This pull request has been removed from the queue for the following reason: The pull request can't be updated You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it. If you want to requeue this pull request, you need to post a comment with the text: |
@mergify update |
❌ Mergify doesn't have permission to updateFor security reasons, Mergify can't update this pull request. Try updating locally. |
Pull request has been modified.
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
Comments on closed issues and PRs are hard for our team to see. |
Issue # (if applicable)
Closes #33225.
Reason for this change
PySpark jobs with extra JAR dependencies cannot be defined with the new L2 constructs introduced in v2.177.0.
Description of changes
Add the
extraJars
parameter in the PySpark job L2 constructs.Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license