Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse the scikit-learn convention for attributes with ending _ #1118

Open
glemaitre opened this issue Jan 15, 2025 · 5 comments · May be fixed by #1128
Open

Reuse the scikit-learn convention for attributes with ending _ #1118

glemaitre opened this issue Jan 15, 2025 · 5 comments · May be fixed by #1128
Assignees

Comments

@glemaitre
Copy link
Member

Right now, the reporter do not expose the same convention than scikit-learn:

estimator = RandomForestClassifier()
report = EstimatorReport(estimator=estimator, ...)
assert report.estimator is not estimator

To avoid side-effect, we make sure to copy/clone the part of Python object that we want to modify. In scikit-learn, fit is known to have such processing. Here, it is our class constructor. We could reuse the same convention and make sure to expose attribute ending with _ to acknowledge that it is not the attribute passed in init. Concretely, we would have:

estimator = RandomForestClassifier()
report = EstimatorReport(estimator=estimator, ...)
assert report.estimator is estimator
assert report.estimator_ is not estimator  # the estimator resulting from the internal fit

@MarieS-WiMLDS What is your feeling about this part of the API?

@sylvaincom
Copy link
Contributor

I like the _ convention of scikit-learn and believe it is somewhat known by fellow data scientists

@MarieS-WiMLDS
Copy link
Contributor

On my end I never quite understood what the convention _ covered.
In scikit-learn, why didn't you choose to have something like fitted_estimator and a constructor_estimator? (I don't really like my second proposition, but hopefully you'll get the idea)

@sylvaincom
Copy link
Contributor

https://scikit-learn.org/dev/developers/develop.html#estimated-attributes

According to scikit-learn conventions, attributes which you’d want to expose to your users as public attributes and have been estimated or learned from the data must always have a name ending with trailing underscore, for example the coefficients of some regression estimator would be stored in a coef_ attribute after fit has been called. Similarly, attributes that you learn in the process and you’d like to store yet not expose to the user, should have a leading underscore, e.g. _intermediate_coefs. You’d need to document the first group (with a trailing underscore) as “Attributes” and no need to document the second group (with a leading underscore).
The estimated attributes are expected to be overridden when you call fit a second time.

@MarieS-WiMLDS
Copy link
Contributor

it's actually pretty obvious now that I read it.
Let's do as in scikit-learn indeed.

@glemaitre
Copy link
Member Author

Great. I'll make a PR to adapt to the convention.

@glemaitre glemaitre self-assigned this Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants