-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce number of 3rd party packages required for a prediction-only setup #594
Comments
I couldn't find similar issues in the tracker, apologies if I just missed them. In case I didn't I'd be surprised though, am I actually the first user who has this issue? Is dockerizing / running causalml in a server a strange thing to do? Regarding PRs, I might be able to write one, but wouldn't start unless the issue itself is green-flagged by the maintainers. |
Thanks for submitting this, @a-recknagel. Addressing this will help many others who'd like to deploy the causalml models. Can you take a stab at it? A couple of things I can think of are:
|
Ok, that's good to know, I'd love to try. I hope to keep the changes to these two domains, changing import paths and writing extra groups, but either of these I'd consider a breaking change. Not that that'll stop me, and the project is still in zero_ver so it won't matter much, but I guess I want to ask how careful I should be. Should I read up on custom importer overloads to try and keep existing import paths working, or would that be a wasted effort? Also, I'll probably touch most files in the project due to moving folders. Are there any particular WIPs or branches that I should consider or wait for before starting? The merge conflicts would be spectacularly bad. |
Hi @a-recknagel, In the latest v0.15.2 release, we made the torch optional. Can you check if this change addressed the issue? |
Will do, thanks for the update. |
My use case is that I'm running a trained
causalml
model in a server. I'm done with analysis, hyperopt, visualization, ... all that isn't necessary any more. So I pickled my model and moved it to a designated production environment which I configured in a way that it can unpickle the model and run predictions on it.But the way
causalml
is set up, many of those "non-core" packages that deal with training and analysis are still hard runtime-dependencies, even if I were to installcausalm
with--no-deps
(as suggested here #250 (comment), which I'd really like to avoid). Just to show an example, the model I'm using iscausalml.inference.tree.causal.causalforest.CausalRandomForestRegressor
, and incausalml.inference.tree.__init__.py
all of the local modules are imported as well (e.g.causalml.inference.tree.plot
, leading to a number of the 3rd part imports that I have an issue with, likeseaborn
,matplotlib
,pydotplus
, ...).Would it be possible to separate every dependency that isn't necessary to run predictions into extras? Or at least, restructure the code in a way where a manual install of the actual runtime-dependencies won't lead to unrelated 3rd party package imports? I realize this is a massive ask, but it's a serious problem for me that I can't solve without forking your project and run my own builds (which I'd really, really like to avoid).
Just to give an idea of why it's an issue:
base/Dockerfile
This image contains the core set of 3rd party packages necessary to predict with a
CausalRandomForestRegressor
. I didn't investigate what other models would need, but numerical computation libraries don't have a massive disk footprint any way -- the whole image is 507MB big, which is reasonable for a simple ML backend.actual/Dockerfile
This is the whole package, and visualization libs do tend to eat up a fair share of disk space. Plus torch. The image clocks in at 6.54GB, so a difference of ~6GB which I do not need.
My CI/CD straight up refuses to run this build for me because it doesn't support artifacts of this size. I didn't even know that could happen.
The text was updated successfully, but these errors were encountered: