Skip to content

memory_limit interferes with "resampling_strategy=GroupKFold" #1137

@emanuele

Description

@emanuele

Dear Developers,

First of all, thank you for your work and the really interesting autosklearn package.

In AutoSklearnRegressor (maybe AutoSklearnClassifier too), when memory_limit is low enough to force autosklearn to decimate the training set, a resampling strategy like GroupKFold fails because the argument groups, which is a vector of group indices for each example in the training set, is not decimated accordingly. In essence, the following line fails:

if np.shape(self.resampling_strategy_args['groups'])[0] != y.shape[0]:

because y.shape[0] refers to the decimated training set, while np.shape(self.resampling_strategy_args['groups'])[0] refers to the original (non decimated) training set.

As a consequence, for large training sets, this problem occurs basically always, preventing to use of group-based resampling strategies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions