polars vet? #7721

MarcoGorelli · 2023-09-30T12:21:35Z

Hello,

I've noticed that ruff has a pandas-vet plugin. Would you open to adding a Polars-vet one?

It could make suggestions such as

- pl.read_csv(file_name).lazy()
+ pl.scan_csv(file_name)

or

- df.select(pl.col('a').map_elements(lambda lst: ' '.join([str(x) for x in lst])))
+ df.select(pl.col('a').list.join(' '))

which can have a real impact on performance

I could try putting something together if you'd be open to it

The text was updated successfully, but these errors were encountered:

charliermarsh · 2023-09-30T17:13:53Z

I'm happy to add something like this! I'd prefer if we had a non-trivial set of rules in mind (e.g., at least five or so?) before we started to add them. I want to avoid a situation in which we create a new category, add one rule, then fail to expand it to a meaningful set.

MarcoGorelli · 2023-09-30T18:08:45Z

Sure, thanks! For a start, there's all the rewrites from pola-rs/polars#9968, such as

- pl.col('a').map_elements(lambda x: np.sin(x))
+ pl.col('a').sin()

- pl.col('a').map_elements(lambda x: x+1)
+ (pl.col('a') + 1)

- pl.col('a').map_elements(lambda x: json.loads(x))
+ pl.col("a").str.json_extract()

- pl.col('a').map_elements(lambda x: dt.datetime.strptime(x, "%Y-%m-%d"))
+ pl.col('a').str.to_datetime(format='%Y-%m-%d')

- pl.col('a').map_elements(lambda x: x.upper())
+ pl.col("a").str.to_uppercase()

. Within Polars, warnings are emitted for some of these by parsing the bytecode of the passed function - but as Ruff deals with the AST, then I'd expect it to be possible to cover a lot more from that list

The full list of test cases is here, there's quite a few already:

https://github.com/pola-rs/polars/blob/f3142ccd321873d7be5b339fd6ec5536e8c3153e/py-polars/tests/test_udfs.py#L28-L144

ritchie46 · 2023-09-30T18:35:28Z

Any read operation followed by a lazy is very fishy.

E.g. pl.read_parquet(..).lazy() should suggest pl.scan_parquet(..).

And that for all our scan supported file types.

stinodego · 2023-10-19T11:05:25Z

One more suggestion in the 'lazy' category:

- DataFrame(...).lazy()
+ LazyFrame(...)

Maybe one for assertions (the equality statements would result in an error):

- assert s1 == s2
+ assert_series_equal(s1, s2)

- assert df1 == df2
+ assert_frame_equal(df1, df2)

- assert lf1 == lf2
+ assert_frame_equal(lf1, lf2)

- assert s1 != s2
+ assert_series_not_equal(s1, s2)
...

One for select/with_columns:

- df.select(pl.all(), ...)
+ df.with_columns(...)

- df.select(pl.col("*"), ...)
+ df.with_columns(...)

Keyword syntax in select/with_columns:

- df.select(pl.col('a').abs().alias('abs'))
+ df.select(abs=pl.col('a').abs())

Keyword syntax in filter:

- df.filter(pl.col('a') == 'foo')
+ df.filter(a='foo')

Using positional args instead of lists where possible:

- df.sort(['a', 'b'])
+ df.sort('a', 'b')

...I'm sure I can come up with more 😄
@MarcoGorelli Is this enough input?

charliermarsh added the plugin Implementing a known but unsupported plugin label Sep 30, 2023

MarcoGorelli mentioned this issue Nov 14, 2023

WIP Polars #8686

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polars vet? #7721

polars vet? #7721

MarcoGorelli commented Sep 30, 2023 •

edited

Loading

charliermarsh commented Sep 30, 2023

MarcoGorelli commented Sep 30, 2023 •

edited

Loading

ritchie46 commented Sep 30, 2023

stinodego commented Oct 19, 2023 •

edited

Loading

polars vet? #7721

polars vet? #7721

Comments

MarcoGorelli commented Sep 30, 2023 • edited Loading

charliermarsh commented Sep 30, 2023

MarcoGorelli commented Sep 30, 2023 • edited Loading

ritchie46 commented Sep 30, 2023

stinodego commented Oct 19, 2023 • edited Loading

MarcoGorelli commented Sep 30, 2023 •

edited

Loading

MarcoGorelli commented Sep 30, 2023 •

edited

Loading

stinodego commented Oct 19, 2023 •

edited

Loading