Reuse the scikit-learn convention for attributes with ending `_` #1118

glemaitre · 2025-01-15T12:37:34Z

Right now, the reporter do not expose the same convention than scikit-learn:

estimator = RandomForestClassifier()
report = EstimatorReport(estimator=estimator, ...)
assert report.estimator is not estimator

To avoid side-effect, we make sure to copy/clone the part of Python object that we want to modify. In scikit-learn, fit is known to have such processing. Here, it is our class constructor. We could reuse the same convention and make sure to expose attribute ending with _ to acknowledge that it is not the attribute passed in init. Concretely, we would have:

estimator = RandomForestClassifier()
report = EstimatorReport(estimator=estimator, ...)
assert report.estimator is estimator
assert report.estimator_ is not estimator  # the estimator resulting from the internal fit

@MarieS-WiMLDS What is your feeling about this part of the API?

The text was updated successfully, but these errors were encountered:

sylvaincom · 2025-01-15T14:52:37Z

I like the _ convention of scikit-learn and believe it is somewhat known by fellow data scientists

MarieS-WiMLDS · 2025-01-15T15:21:22Z

On my end I never quite understood what the convention _ covered.
In scikit-learn, why didn't you choose to have something like fitted_estimator and a constructor_estimator? (I don't really like my second proposition, but hopefully you'll get the idea)

sylvaincom · 2025-01-15T17:30:27Z

https://scikit-learn.org/dev/developers/develop.html#estimated-attributes

According to scikit-learn conventions, attributes which you’d want to expose to your users as public attributes and have been estimated or learned from the data must always have a name ending with trailing underscore, for example the coefficients of some regression estimator would be stored in a coef_ attribute after fit has been called. Similarly, attributes that you learn in the process and you’d like to store yet not expose to the user, should have a leading underscore, e.g. _intermediate_coefs. You’d need to document the first group (with a trailing underscore) as “Attributes” and no need to document the second group (with a leading underscore).
The estimated attributes are expected to be overridden when you call fit a second time.

MarieS-WiMLDS · 2025-01-15T18:27:50Z

it's actually pretty obvious now that I read it.
Let's do as in scikit-learn indeed.

glemaitre · 2025-01-15T18:37:05Z

Great. I'll make a PR to adapt to the convention.

glemaitre self-assigned this Jan 15, 2025

glemaitre linked a pull request Jan 15, 2025 that will close this issue

refactor: Prepend with name of variable that we modify #1128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse the scikit-learn convention for attributes with ending `_` #1118

Reuse the scikit-learn convention for attributes with ending `_` #1118

glemaitre commented Jan 15, 2025

sylvaincom commented Jan 15, 2025

MarieS-WiMLDS commented Jan 15, 2025

sylvaincom commented Jan 15, 2025

MarieS-WiMLDS commented Jan 15, 2025

glemaitre commented Jan 15, 2025

Reuse the scikit-learn convention for attributes with ending _ #1118

Reuse the scikit-learn convention for attributes with ending _ #1118

Comments

glemaitre commented Jan 15, 2025

sylvaincom commented Jan 15, 2025

MarieS-WiMLDS commented Jan 15, 2025

sylvaincom commented Jan 15, 2025

MarieS-WiMLDS commented Jan 15, 2025

glemaitre commented Jan 15, 2025

Reuse the scikit-learn convention for attributes with ending `_` #1118

Reuse the scikit-learn convention for attributes with ending `_` #1118