Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing Scala DataFrames in PySpark #214

Merged
merged 3 commits into from
May 2, 2018
Merged

Exposing Scala DataFrames in PySpark #214

merged 3 commits into from
May 2, 2018

Conversation

lintool
Copy link
Member

@lintool lintool commented May 1, 2018

What does this Pull Request do?

As the title suggests, this PR exposes DataFrames in Scala for use in PySpark. This is a cleanup of initial prototyping done at the Toronto Datathon in April 2018.

How should this be tested?

It's not. This is an experimental feature that is completely independent of existing AUT capabilities.

@lintool
Copy link
Member Author

lintool commented May 1, 2018

Adding reference to #209 - see discussion there on how to exactly use this new PySpark feature.

@codecov
Copy link

codecov bot commented May 1, 2018

Codecov Report

Merging #214 into master will decrease coverage by 0.6%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #214      +/-   ##
==========================================
- Coverage   66.76%   66.16%   -0.61%     
==========================================
  Files          33       34       +1     
  Lines         659      665       +6     
  Branches      124      124              
==========================================
  Hits          440      440              
- Misses        178      184       +6     
  Partials       41       41
Impacted Files Coverage Δ
...n/scala/io/archivesunleashed/DataFrameLoader.scala 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef76758...b384b66. Read the comment docs.

@ianmilligan1 ianmilligan1 requested a review from ruebot May 1, 2018 15:03
Copy link
Member

@ianmilligan1 ianmilligan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've played with this and it works nicely – I think it's a great start to the PySpark functionality, and will be nice to have in the main repo as experimental functionality.

@greebie
Copy link
Contributor

greebie commented May 1, 2018

I'm going to try this this afternoon. I am going to make it part of my CSDH presentation / paper about switching from Scala to Python.

@ruebot
Copy link
Member

ruebot commented May 2, 2018

Tested with:

N.B. I had to remove the line with .enableHiveSupport() in python/pyspark/shell.py with 2.1.1 and 2.2.1 as prescribed in #209. However, this was not necessary in 2.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants