Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support __getitem__ with slices for columns #839

Merged
merged 5 commits into from
Aug 22, 2024

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Aug 21, 2024

I kinda wish we didn't have to go overboard with __getitem__ overloads, but scikit-learn are using this, so I think we should

My general thinking on this is: make sure that Narwhals is easily usable by scikit-learn. Then, it's their decision whether or not they adopt - but we should at least be ready

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

  • Related issue #
  • Closes #

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below.

@github-actions github-actions bot added enhancement New feature or request internal labels Aug 21, 2024
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a tiny comment on a typo.

I wonder if it could be worth to:

  • Add an example in the docstring of the method
  • Add a note for the behavior since slicing here is
    • If string: both start and stop are included
    • If int: typical python behavior, start included and stop excluded

narwhals/_arrow/dataframe.py Outdated Show resolved Hide resolved
Comment on lines +550 to +563
- `df['a']` extracts column `'a'` and returns a `Series`.
- `df[0:2]` extracts the first two rows and returns a `DataFrame`.
- `df[0:2, 'a']` extracts the first two rows from column `'a'` and returns
a `Series`.
- `df[0:2, 0]` extracts the first two rows from the first column and returns
a `Series`.
- `df[[0, 1], [0, 1, 2]]` extracts the first two rows and the first three columns
and returns a `DataFrame`
- `df[0: 2, ['a', 'c']]` extracts the first two rows and columns `'a'` and `'c'` and
returns a `DataFrame`
- `df[:, 0: 2]` extracts all rows from the first two columns and returns a `DataFrame`
- `df[:, 'a': 'c']` extracts all rows and all columns positioned between `'a'` and `'c'`
_inclusive_ and returns a `DataFrame`. For example, if the columns are
`'a', 'b', 'c', 'd'`, then that would extract columns `'a'`, `'b'`, and `'c'`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing ✨🚀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 thanks, maybe we should upstream this to Polars itself, there's zero docs on getitem there

Copy link
Member

@FBruzzesi FBruzzesi Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, I had never seen df[:, "a":"c"] in polars code and I was not aware that's even possible 😂

Copy link
Member

@FBruzzesi FBruzzesi Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rendering may need one less indentation level:

image

@MarcoGorelli
Copy link
Member Author

good points, thanks! I think this is easier to explain with examples

@MarcoGorelli MarcoGorelli merged commit a3e5f48 into narwhals-dev:main Aug 22, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants