-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can we use OmniXAI for pyspark models? Anyone tried it out? #79
Comments
You can simply implment a prediction function using the trained pyspark model, and pass it to the "model" parameter when initializing a "TabularExplainer". Some examples: How to use TabularExplainer: |
@yangwenz Do you use some parallelization technique while solving the problem....like if i have 1M rows then, will you use mine various workers of cluster or only Driver will be used? |
If you want to generate explanations for 1M examples, please also use pyspark to run the explainer, e.g. distribute workloads into multiple workers. |
I will use pyspark model in predict function to infer and my tabular data will also be pyspark dataframe..... |
"tabular_data" is only used for initializing the explainers, so there is no need to use the whole dataset. The lib provides a function for extracting a subset of the whole dataset: https://github.com/salesforce/OmniXAI/blob/main/omnixai/sampler/tabular.py. If your data is pyspark dataframe, you can convert a partition into a pandas dataframe that can fit the memory. |
@yangwenz Can you please explain these |
You can install OmniXAI as an additional package when launching a pyspark
job. Then you can use it directly as I mentioned above.
…On Thu, Apr 6, 2023 at 5:28 PM akshat-suwalka ***@***.***> wrote:
@yangwenz <https://github.com/yangwenz> Can you please explain these please
also use pyspark to run the explainer, e.g. distribute workloads into
multiple workers. that you mention above...
As you mention that i will put the data as pandas format and then pass as
tabular_data....My model is pyspark model so dont u think my model will not
be able to infer bcz the format of the data is pandas not pyspark..
—
Reply to this email directly, view it on GitHub
<#79 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZWA6NHJN5YZSCS7XGJKMTW72EFNANCNFSM6AAAAAAWTWUBMI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@yangwenz Unable to resolve the error, can you please help with it
Below is the error while running explainer code File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/omnixai/explainers/tabular/agnostic/shap.py:63, in ShapTabular.init(self, training_data, predict_function, mode, ignored_features, **kwargs) File /databricks/python/lib/python3.9/site-packages/shap/explainers/_kernel.py:95, in Kernel.init(self, model, data, link, **kwargs) ValueError: operands could not be broadcast together with shapes (2,) (100,) ` |
Hi, please check your input data. It is probably not a problem coming from the lib. |
I have pyspark model - RandomForest which is pretrained model....and also i don't want to retrain it.
The text was updated successfully, but these errors were encountered: