-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rename counts
to count
in struct field using value_counts
#11462
Comments
I don't think this is worth the breaking change. There is something to say for both. But naming you series plural to me feels very natural in relational data. |
Yeah, I am with you that it is hard to justify a breaking change for this small change :/ Was also thinking about instead adding an alternative |
@ritchie46 You break tons of other stuff, which is good, why not make this right? Could also add a drive-by normalize parameter |
I agree, this should be fixed in my opinion. We can't really deprecate it nicely though. Still, I'd vote for including this with the next breaking release. |
On the same topic; I'd be strongly in favour of changing |
happy to hear that this idea was accepted. =) I am curious what you think about also renaming the function name. Most function names fall into one of these categories:
Following this very rough categorization, a new name could be:
|
If we were to rename this it should be That would actually make it easier to rename the column as well (deprecate the old method with some message that the new version has a different column name). |
This one is a bit hard to rename. Coming from the outside, I would think that |
Ah yea fair point... |
I personally like @mcrumiller with regard to unique/distinct, you speak from my soul. |
For clarification, "unique" means the value only occurs once in the dataset, distinct indicates how many values there are once all duplicates are removed. IN the above example, there is only one value that is unique (2 is not unique, since it is repeated). Most programming languages use unique and distinct interchangeably. If I call |
I have a PR that will address the original issue - we will merge this in the next breaking release. Please open a separate issue for renaming |
Description
Best practices (in my experience) for naming columns in tabular data like a dataframe is to use singular nouns which refer to a single value in each row.
The
value_counts
method violates this convention by returning a "counts" field which suggests that it contains multiple values per rowRecommended solution:
rename the "counts" field to "count"
Benefits:
pl.count
methodExample:
The text was updated successfully, but these errors were encountered: