memory_limit interferes with "resampling_strategy=GroupKFold"

Dear Developers,

First of all, thank you for your work and the really interesting ```autosklearn``` package.

In ```AutoSklearnRegressor``` (maybe ```AutoSklearnClassifier``` too), when ```memory_limit``` is low enough to force ```autosklearn``` to decimate the training set, a resampling strategy like GroupKFold fails because the argument ```groups```, which is a vector of group indices for each example in the training set, is not decimated accordingly. In essence, the following line fails:
https://github.com/automl/auto-sklearn/blob/275d0d6b20d16822252d8b50bf71b1c787187f09/autosklearn/evaluation/train_evaluator.py#L994
because ```y.shape[0]``` refers to the decimated training set, while ```np.shape(self.resampling_strategy_args['groups'])[0]``` refers to the original (non decimated) training set.

As a consequence, for large training sets, this problem occurs basically always, preventing to use of group-based resampling strategies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory_limit interferes with "resampling_strategy=GroupKFold" #1137

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

memory_limit interferes with "resampling_strategy=GroupKFold" #1137

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions