-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PFI (Permutation Feature Importance) API needs to be simpler to use #4216
Comments
Hi. I was getting started to learn about PFI on ML.net, and I was confused by what you state in the first part of this issue. I am actually able to run successfully the example given in the docs of PFI for Regression. I was also able to run the test of PFI you mentioned, without any failure, and I used the debugger to see if everything went as expected on that test, and it appears to me that it works just fine. In general, I think the following is right (and it is pretty much what is done in the documentation and in the test case):
I believe it's right since there's no problem when accessing the model.LastTransformer attribute in that case, when 'model' is returned by the .Fit() method, since 'model' would be already of type However, I do see that the problem you mention appears when passing the model as an ITransformer parameter of a method (for example, as done in one of the samples) and so I understand the need of a simpler API in such cases. In short, I wanted to point out that I believe that the documentation and test you cited actually work; and so I wanted to ask you if I am missing something to fully understand and replicate the problems you described about them. Thanks. |
@antoniovs1029 - You're right, I thought I also tested it directly in the same root method, but I just tried now and it works with no issues. |
I especially agree on Cesar's #2 point. Most of PFI samples I found seemed to use only numeric feature columns and you can map back to the original feature columns after PFI permutation with index values. Not perfect, but still okay. However, the most real-world dataset will include both numerical and categorical features and if you would employ OneHotEncoding for categorical columns then complexity increases drastically. I had to find the way by debugging through the runtime and figured out by examining Slots and felt code becomes unnecessarily complex. |
Hi.its works in AutoML also. The following cast lets me access the LastTransformer, however I cannot use it for PFI until I provide a better type for predictor. Debugging I can see it is of type Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.IPredictorProducing> but I am unable to cast to that because Microsoft.ML.IPredictorProducing is not visible, so it seems like we're still stuck. //setup code similar to famschopman var experimentResults = experiment.Execute(split.TrainSet, split.TestSet); //this will not compile. The following compile error is produced. The type arguments for method 'PermutationFeatureImportanceExtensions.PermutationFeatureImportance(RegressionCatalog, ISingleFeaturePredictionTransformer, IDataView, string, bool, int?, int)' cannot be inferred from the usage. Try specifying the type arguments explicitly. how we get bias and weight using PFI? |
The easiest way to fix it would be probably to add some method that knows how to extract appropriate transformer from a TransformationChain, aka Model. #1 Working example with Multiclass.LightGbm and model produced by trainer-estimator in a real-time
#2 Broken example with previously saved model
The type arguments for method 'PermutationFeatureImportanceExtensions.PermutationFeatureImportance(RegressionCatalog, ISingleFeaturePredictionTransformer, IDataView, string, bool, int?, int)' cannot be inferred from the usage. Try specifying the type arguments explicitly. Hacky solution - replace model with a predictor of a specific type
|
@artemiusgreat the term var predictor = (model as IEnumerable<dynamic>).OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>().FirstOrDefault(); can be simplified to var predictor = model.OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>().FirstOrDefault(); but of course the problem still exists that we have to know exactly which training algorithm was used inside the trained model. I think in this case something more dynamic has to be used to infer the concrete type parameters at runtime, so generics might not be the right way to go in that scenario? |
PFI (Permutation Feature Importance) API needs to be simpler to use
1. First, it is awkward to need to access to the LastTransformer method from the model (Chain of Transformers). In addition, if you are using additional methods to structure your training, evaluation and PFI calculation and try to pass the model as an ITransformer (the usual way) you need to cast it back to the concrete type of transformer chain (such as
TransformerChain<RegressionPredictionTransformer<LightGbmRegressionModelParameters>>
), which then requires a hard reference to the type of algorithm used.This is the code to calculate the PFI metrics:
Needing to only use/provide the last transformer feels a bit convoluted...
The API should be simpler to use here and make such a thing transparent to the user?
2. Second, once you get the permutation metrics (such as
ImmutableArray<RegressionMetricsStatistics> permutationMetrics
), you only get the values based on the indexes, but you don't have the names of the input columns. It is then not straightforward to correlate it to the input column names since you need to use the indexes to be used across two separated arrays that , if sorted previously, it won't match...You need to do something like the following or comparable loops driven by the indexes in the permutationMetrics array:
First, obtain all the column names used in the PFI process and exclude the ones not used:
Then you need to correlate and find the column names based on the indexes in the permutationMetrics:
This should be directly provided by the API and you'd simply need to access it and show it.
The current code feels very much convoluted...
The text was updated successfully, but these errors were encountered: