-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polars vet? #7721
Comments
I'm happy to add something like this! I'd prefer if we had a non-trivial set of rules in mind (e.g., at least five or so?) before we started to add them. I want to avoid a situation in which we create a new category, add one rule, then fail to expand it to a meaningful set. |
Sure, thanks! For a start, there's all the rewrites from pola-rs/polars#9968, such as - pl.col('a').map_elements(lambda x: np.sin(x))
+ pl.col('a').sin()
- pl.col('a').map_elements(lambda x: x+1)
+ (pl.col('a') + 1)
- pl.col('a').map_elements(lambda x: json.loads(x))
+ pl.col("a").str.json_extract()
- pl.col('a').map_elements(lambda x: dt.datetime.strptime(x, "%Y-%m-%d"))
+ pl.col('a').str.to_datetime(format='%Y-%m-%d')
- pl.col('a').map_elements(lambda x: x.upper())
+ pl.col("a").str.to_uppercase() . Within Polars, warnings are emitted for some of these by parsing the bytecode of the passed function - but as Ruff deals with the AST, then I'd expect it to be possible to cover a lot more from that list The full list of test cases is here, there's quite a few already: |
Any read operation followed by a lazy is very fishy. E.g. And that for all our scan supported file types. |
One more suggestion in the 'lazy' category: - DataFrame(...).lazy()
+ LazyFrame(...) Maybe one for assertions (the equality statements would result in an error): - assert s1 == s2
+ assert_series_equal(s1, s2)
- assert df1 == df2
+ assert_frame_equal(df1, df2)
- assert lf1 == lf2
+ assert_frame_equal(lf1, lf2)
- assert s1 != s2
+ assert_series_not_equal(s1, s2)
...
One for - df.select(pl.all(), ...)
+ df.with_columns(...)
- df.select(pl.col("*"), ...)
+ df.with_columns(...) Keyword syntax in - df.select(pl.col('a').abs().alias('abs'))
+ df.select(abs=pl.col('a').abs()) Keyword syntax in - df.filter(pl.col('a') == 'foo')
+ df.filter(a='foo') Using positional args instead of lists where possible: - df.sort(['a', 'b'])
+ df.sort('a', 'b') ...I'm sure I can come up with more 😄 |
Hello,
I've noticed that
ruff
has apandas-vet
plugin. Would you open to adding a Polars-vet one?It could make suggestions such as
or
which can have a real impact on performance
I could try putting something together if you'd be open to it
The text was updated successfully, but these errors were encountered: