-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support executing polars SQL against pandas
and pyarrow
objects
#16746
feat: Support executing polars SQL against pandas
and pyarrow
objects
#16746
Conversation
30ed1b7
to
45b1d19
Compare
pandas
and pyarrow
objectspandas
and pyarrow
objects
45b1d19
to
62069d7
Compare
pandas
and pyarrow
objectspandas
and pyarrow
objects
62069d7
to
6027482
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16746 +/- ##
==========================================
- Coverage 81.45% 81.45% -0.01%
==========================================
Files 1413 1413
Lines 186306 186343 +37
Branches 2777 2784 +7
==========================================
+ Hits 151750 151780 +30
- Misses 34036 34040 +4
- Partials 520 523 +3 ☔ View full report in Codecov by Sentry. |
Outstanding Could Spark dataframes be included too? I never remember the syntax to convert Spark to Polars (below Stack Overflow answer from Ritchie) It would make it quite easy to use Polars for any Spark dataframe that fits into memory |
Unfortunately that conversion uses a private method ( |
It looks like it was just added to Spark Better if I open a separate request for this? |
Collecting a pyspark dataframe is also very dangerous. We would force it to bring all the data to one machine. This should not be done implictly. |
Good point; a |
@ritchie46 - as suggested 😉
Notably expands the scope of the Polars SQL interface .
pl.sql
function can now transparently identify and register additional frame/object types when referenced from a Polars SQL query.SQLContext
can also now operate on these additional types.Compatible objects include:
DataFrame
,LazyFrame
, and (new)Series
.DataFrame
andSeries
.Table
andRecordBatch
.(💡If you have more types that you think would be a good fit for this, let me know).
Objects of these types are transparently converted/registered as LazyFrame iff their variable name appears in the SQL query (this prevents unnecessary conversions of compatible objects that aren't actually referenced).
Example
Join polars LazyFrame with a pandas DataFrame and a pyarrow Table: