-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
Description
Dear Developers,
First of all, thank you for your work and the really interesting autosklearn
package.
In AutoSklearnRegressor
(maybe AutoSklearnClassifier
too), when memory_limit
is low enough to force autosklearn
to decimate the training set, a resampling strategy like GroupKFold fails because the argument groups
, which is a vector of group indices for each example in the training set, is not decimated accordingly. In essence, the following line fails:
if np.shape(self.resampling_strategy_args['groups'])[0] != y.shape[0]: |
because
y.shape[0]
refers to the decimated training set, while np.shape(self.resampling_strategy_args['groups'])[0]
refers to the original (non decimated) training set.
As a consequence, for large training sets, this problem occurs basically always, preventing to use of group-based resampling strategies.