Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dask select method #667

Merged
merged 7 commits into from
Jul 29, 2024
Merged

feat: dask select method #667

merged 7 commits into from
Jul 29, 2024

Conversation

FBruzzesi
Copy link
Member

@FBruzzesi FBruzzesi commented Jul 29, 2024

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below.

Edit: For DaskNamespace, as of now:

  • _create_expr_from_callable: implemented
  • _create_expr_from_series, _create_compliant_series, _create_series_from_scalar raise NotImplementedError

feedbacks are welcomed!

@github-actions github-actions bot added the enhancement New feature or request label Jul 29, 2024
ListOfCompliantExpr = Union[list[PandasLikeExpr], list[ArrowExpr]]
CompliantDataFrame = Union[PandasLikeDataFrame, ArrowDataFrame]
ListOfCompliantSeries = Union[
list[PandasLikeSeries], list[ArrowSeries], list[DaskExpr]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ok? Using DaskExpr instead of Series?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes

df: DaskLazyFrame,
*exprs: IntoDaskExpr,
**named_exprs: IntoDaskExpr,
) -> list[DaskExpr]: ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, Expr instead of Series


data = {"a": [1, 3, 2], "b": [4, 4, 6], "z": [7.0, 8, 9]}
df = nw.from_native(dd.from_pandas(pd.DataFrame(data)))
result = df.select("a", nw.col("b") + 1, (nw.col("z") * 2).alias("z*2"))
Copy link
Member Author

@FBruzzesi FBruzzesi Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few things to test to be (a bit more) confident about the implementation

@FBruzzesi FBruzzesi marked this pull request as ready for review July 29, 2024 12:06
@FBruzzesi FBruzzesi changed the title WIP feat: dask select method feat: dask select method Jul 29, 2024
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, nice one!

at this rate, we might be running the tpc-h q1 query with dask within a couple of weeks πŸ”₯

narwhals/_dask/typing.py Outdated Show resolved Hide resolved
ListOfCompliantExpr = Union[list[PandasLikeExpr], list[ArrowExpr]]
CompliantDataFrame = Union[PandasLikeDataFrame, ArrowDataFrame]
ListOfCompliantSeries = Union[
list[PandasLikeSeries], list[ArrowSeries], list[DaskExpr]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes

Comment on lines +39 to +42
def __narwhals_namespace__(self) -> DaskNamespace: # pragma: no cover
from narwhals._dask.namespace import DaskNamespace

return DaskNamespace(backend_version=self._backend_version)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exists for mypy only for now, so no cover. I imagine it can be useful in future anyway

)
raise NotImplementedError(msg)

def _create_expr_from_callable( # pragma: no cover
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used only in reuse_series_implementation and reuse_series_namespace_implementation for now, so we are not hitting it, and may never will. If that's the case we can also move it to NotImplementedError group

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah i'm tempted to say NotImplementedError then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make mypy to shut up, let's address this later

tests/dask_test.py Outdated Show resolved Hide resolved
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome, thanks @FBruzzesi ! exciting how quickly this is moving

couple of minor comments, but feel to merge and address separately

tests/dask_test.py Outdated Show resolved Hide resolved
)
raise NotImplementedError(msg)

def _create_expr_from_callable( # pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah i'm tempted to say NotImplementedError then

@FBruzzesi FBruzzesi merged commit 9b60ed7 into main Jul 29, 2024
23 checks passed
@FBruzzesi FBruzzesi deleted the feat/dask-select branch July 29, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants