-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GridSearch - how to access internal parameters #61
Comments
First of all, sorry for not getting back to you sooner. What you ask for is not possible right now, but would be a very interesting addition. I will review your related PR to see if we can introduce this implementation change without breaking existing pipelines. |
While changing the whole DataFrameMapper to use FeatureUnion allows to set transformers internal parameters and has the benefit of being able to run in parallel, I wonder if we cannot do the same:
This is because I feel we might be losing a great deal of control if we delegate all the work to Thoughts? cc @calpaterson |
Hi, This is my first contribution to a GitHub public project, so I am excited :-) I find that What do you think? |
I agree with what @chanansh said. I think the function of output feature naming will be very useful for feature selection and feature union. As far as I know, However, feature naming may be very complex as there are too many ways to do feature engineering. It's hard to find a general rule for feature naming. |
@chanansh you're right that it is better to reuse existing proven and maintained code instead of replicating the same feature whenever possible. It's just that I'm afraid of breaking something that is working for current users. However if we make a 2.0 release this should not be an issue; people can always continue using the 1.x branch if needed. Probably the output feature tracking can be implemented by subclassing |
Could you please review my PR and tell me what you think? |
Sure, I will be happy to help doing the 2.0 version if you want to collaborate together. From: Israel Saeta Pérez [mailto:notifications@github.com] @chananshhttps://github.com/chanansh you're right that it is better to reuse existing proven and maintained code instead of replicating the same feature whenever possible. It's just that I'm afraid of breaking something that is working for current users. However if we make a 2.0 release this should not be an issue; people can always continue using the 1.x branch if needed. Probably the output feature tracking can be implemented by subclassing FeatureUnion and storing the length of each output feature in an attribute. @DataTerminatorXhttps://github.com/DataTerminatorX I believe the issue here is to track which input features generated which output ones, since some transformers like OneHotEncoder or LabelBinarizer can expand features. The output names can simply be <input_feature_name>_0, <input_feature_name>_1, etc. — |
cc @hshteingart |
I feel that if we move to feature union we should actually PR scikit learn On Sep 3, 2016 9:41 AM, "Israel Saeta Pérez" notifications@github.com
|
AFAIK |
Asked here: scikit-learn/scikit-learn#7334 |
How can I access the Mapper's internal parameters e.g. number of bits for a FeatureHasher?
The text was updated successfully, but these errors were encountered: