-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exposing Scala DataFrames in PySpark #214
Conversation
Adding reference to #209 - see discussion there on how to exactly use this new PySpark feature. |
Codecov Report
@@ Coverage Diff @@
## master #214 +/- ##
==========================================
- Coverage 66.76% 66.16% -0.61%
==========================================
Files 33 34 +1
Lines 659 665 +6
Branches 124 124
==========================================
Hits 440 440
- Misses 178 184 +6
Partials 41 41
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've played with this and it works nicely – I think it's a great start to the PySpark functionality, and will be nice to have in the main repo as experimental functionality.
I'm going to try this this afternoon. I am going to make it part of my CSDH presentation / paper about switching from Scala to Python. |
What does this Pull Request do?
As the title suggests, this PR exposes DataFrames in Scala for use in PySpark. This is a cleanup of initial prototyping done at the Toronto Datathon in April 2018.
How should this be tested?
It's not. This is an experimental feature that is completely independent of existing AUT capabilities.