-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GridSearch with pipelines of dataframes #24
Comments
Hi, thank you for filing an issue about this. That's definitely a bug. I think that DataFrames have never been tested as input to grid-search. I just removed the I'll have more time to look into it tomorrow. |
Pull requests are welcome. |
It looks like this isn't possible with scikit-learn in Python either. See scikit-learn-contrib/sklearn-pandas#61. Some proposed solutions in scikit-learn-contrib/sklearn-pandas#62 and scikit-learn-contrib/sklearn-pandas#64. The primary challenge is to implement using DataFrames: DataFrame
using ScikitLearn
using ScikitLearn.GridSearch: GridSearchCV
@sk_import ensemble: RandomForestClassifier
@sk_import preprocessing: StandardScaler
X_train = DataFrame(Any[randn(100), randn(100)], [:a, :b])
Y_train = rand(0:1, 100)
mapper = DataFrameMapper([([:a, :b], StandardScaler())])
pipe = Pipelines.Pipeline([
("featurize", mapper),
("forest", RandomForestClassifier(n_estimators=200))
])
# GridSearch
grid = Dict(:forest__n_estimators => 10:30:240)
gridsearch = GridSearchCV(pipe, grid)
fit!(gridsearch, X_train, Y_train)
println("Best hyper-parameters: $(gridsearch.best_params_)") |
Hello again Cédric,
Following your help on transformer I am now trying to use a GridSearch to optimize the hyperparameters of a RandomForest.
I have a pipeline with lots of transformer which works great with Cross Validation and actual prediction, however I get a type error when trying to use it in a GridSearchCV, it seems like there is an extra argument of type ScikitLearn.Skcore.ParameterGrid in my setup :
The error I get is :
So the proc is receiving _fit!(::ScikitLearn.Skcore.GridSearchCV, ::DataFrames.DataFrame, ::Array{Int64,1}, ::ScikitLearn.Skcore.ParameterGrid) but expecting an array instead of a Dataframe. The thing is it should have been converted away by the DataFrameMapper.
If needed the full code is there https://github.com/mratsim/MachineLearning_Kaggle/blob/9c07a64a981a6512e021ae01623212a278fd05d1/Kaggle%20-%20001%20-%20Titanic%20Survivors/Kaggle-001-Julia-MagicalForest.jl#L530
The text was updated successfully, but these errors were encountered: