-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33973: [Python][Docs] Update documentation for Parquet filter keyword #33974
Conversation
|
I don't have a problem with this change but wonder if the root cause might be a little more fundamental. I wonder if |
@westonpace Thanks for the reply!
I would agree with you, but I went for a lower API because I want to re-use the connection. This avoids another
Do you want me to create a PR to copy those docs? Because of the redirect, the arguments are also not showing up in PyCharm. Let me know and I'll create a PR. |
Yes, I think that is a good idea but I might CC @jorisvandenbossche or @amol- to weigh in on whether they know some better way to avoid the duplication or have a preference here (I normally focus on the C++ end of things.) |
@Fokko thanks for the catch! You are currently updating the docstring of ParquetDataset, but we should do the same update for read_table: arrow/python/pyarrow/parquet/core.py Line 2775 in 78a8da4
(I think it's also
Yeah, I think we should prefer some duplication if that gives better docstrings. I agree the indirection for the user right now isn't very user friendly. |
@jorisvandenbossche Updating table one is a good suggestion indeed. Updated that one as well 👍🏻
Currently, we use the
Makes a lot of sense, I think sharing would be best. Let me create a separate PR for that |
@jorisvandenbossche first a bit of cleanup in #34034 |
One more thing about the docstring: for both read_table and ParquetDataset, the docstring also injects the content of |
@jorisvandenbossche Good call. I've updated the docstring and added an example. |
@jorisvandenbossche WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Benchmark runs are scheduled for baseline = e63215c and contender = 306026d. 306026d is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
…r keyword (apache#33974) ### Rationale for this change I wrote a converter from an arbitrary expression to DNF, but this was not needed after learning that it just accepts an expression now. Closes apache#33973 Authored-by: Fokko Driesprong <fokko@tabular.io> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…r keyword (apache#33974) ### Rationale for this change I wrote a converter from an arbitrary expression to DNF, but this was not needed after learning that it just accepts an expression now. Closes apache#33973 Authored-by: Fokko Driesprong <fokko@tabular.io> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Rationale for this change
I wrote a converter from an arbitrary expression to DNF, but this was not needed after learning that it just accepts an expression now.
Closes #33973