Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

CV creates incorrect split of user defined transforms. #409

Open
pieths opened this issue Jan 11, 2020 · 1 comment
Open

CV creates incorrect split of user defined transforms. #409

pieths opened this issue Jan 11, 2020 · 1 comment

Comments

@pieths
Copy link
Collaborator

pieths commented Jan 11, 2020

When specifying split_start='after_transforms' in CV.fit(), the user defined transforms are not split up correctly. See the graph created by the fit() call in the code below.

It seems like if a user defined transform has presteps then the split location will not be in the right place. This might also effect splitting the transforms given an integer value.

from nimbusml import DataSchema, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import LightGbmRegressor
from nimbusml.model_selection import CV
from nimbusml.preprocessing.missing_values import Indicator, Handler

path = get_dataset("airquality").as_filepath()
schema = DataSchema.read_schema(path)
data = FileDataStream(path, schema)

pipeline_steps = [
    Indicator() << {
        'Ozone_ind': 'Ozone',
        'Solar_R_ind': 'Solar_R'},
    Handler(
        replace_with='Mean') << {
        'Solar_R': 'Solar_R',
        'Ozone': 'Ozone'},
    LightGbmRegressor(
        feature=['Ozone',
                 'Solar_R',
                 'Ozone_ind',
                 'Solar_R_ind',
                 'Temp'],
        label='Wind')]

cv_results = CV(pipeline_steps).fit(data, split_start='after_transforms')
@pieths pieths self-assigned this Jan 11, 2020
@pieths
Copy link
Collaborator Author

pieths commented Jan 21, 2020

Commit d5c7c82 resolves the issue with split_start='after_transforms' but it does not fix the issue when the user specifies an integer index as the split_start value.

When a transform has presteps then the integer index the user specified will not correspond to the index of the transform in the pipeline.

@pieths pieths removed their assignment Feb 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant