-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable model selection for first stage models #808
Conversation
3a67554
to
a96601a
Compare
a96601a
to
3a024ea
Compare
Signed-off-by: AnthonyCampbell208 <78286293+AnthonyCampbell208@users.noreply.github.com> Co-authored-by: ShrutiRM97 <98553136+ShrutiRM97@users.noreply.github.com> Co-authored-by: CooperGibbs <coopergibbs@Coopers-MacBook-Pro.local>
Signed-off-by: AnthonyCampbell208 <78286293+AnthonyCampbell208@users.noreply.github.com>
…d notebook to showcase some of the functionality Signed-off-by: AnthonyCampbell208 <78286293+AnthonyCampbell208@users.noreply.github.com>
737dad1
to
356f36e
Compare
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
356f36e
to
fe1c5e1
Compare
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
27acdc0
to
6d41ada
Compare
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
fit_with_groups(self._model, self._combine(X, W, Target.shape[0]), Target, groups=groups) | ||
return self | ||
class _FirstStageWrapper: | ||
def __init__(self, model, discrete_target): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the refactoring. There is a consequence of not being able to make model-specific (Y vs T) warnings though (like the binary outcomes PR tries to do in this class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think about the cleanest way to do that as part of your PR; one thing to keep in mind is that we also use this wrapper for some Z models, so it's not just T vs. Y, which is what we assumed when we first implemented this logic when we only had DML but not the IV classes.
|
||
|
||
class _FirstStageWrapper: | ||
def __init__(self, model, is_Y, featurizer, linear_first_stages, discrete_treatment): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is linear_first_stages used anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; linear_first_stages has always been an awkward hack and this gives us a good opportunity to get rid of it. Given
T = f(X, W) + error
Y = T <alpha phi(X)> + g(X,W) + error
if both f and g are linear, then because we multiply T by theta in the equation for Y, Y is not actually a linear function of X and W, we need to include interaction terms. So we introduced linear_first_stages because our default y and t models were linear and we wanted to make sure we could estimate the correct model if f and g were linear. Now, since we do model selection, the y and t models are not (necessarily) linear, so this makes less sense.
There's also a potentially big performance cost if users accidentally leave linear_first_stages set to True, because we can end up generating a huge number of interaction terms, and we see this happen fairly often in practice.
Additionally, linear models aren't the only time that specifying the right models for f and g as your t- and y-models wouldn't give the correct estimate - if f and g are both degree-n polynomials then you'd need to interact the degree-n interaction terms of [X;W] with the featurized X, for example, and we've never supported that; users can always supply a more complicated pipelined model that does something like that themselves if they'd like, but I'd argue it doesn't make sense to try to solve it on our end.
class ModelSelector(metaclass=abc.ABCMeta): | ||
""" | ||
This class enables a two-stage fitting process, where first a model is selected | ||
by calling `train` with `is_selecting=True`, and then the selected model is fit (presumably |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why we opt for a 'train' method over a 'fit' method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I used fit
, but this makes the error messages much easier to understand if a selector is being passed where an estimator is expected or vice-versa.
return None | ||
|
||
|
||
def _fit_with_groups(model, X, y, *, groups, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
marking for discussion
@@ -716,13 +741,19 @@ class LinearDML(StatsModelsCateEstimatorMixin, DML): | |||
|
|||
def __init__(self, *, | |||
model_y='auto', model_t='auto', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably update our docstrings to say that users can pass something like 'forest' to model_y right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should probably create a whole new documentation section on how to specify model selectors, because there are a number of possibilities now. But I was planning on deferring that until after this PR.
@@ -202,13 +204,17 @@ def predict(self, X, y, W=None): | |||
|
|||
""" | |||
model_list = [] | |||
|
|||
kwargs = filter_none_kwargs(**kwargs) | |||
model.train(True, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in some earlier conversations we were thinking about giving the users the option to do "dirty crossfitting" i.e. picking a good est from all data before cross fitting. Am I correct in my understanding that this PR just does "dirty crossfitting" by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and that's definitely something we could consider making easier for users.
It's possible, though not straightforward, to do non-dirty crossfitting now, by wrapping a CV estimator in a FixedModelSelector, which will always use the estimator as is for both selecting and fitting. However, there are some changes we could make to make this more efficient, since then the selecting step is unnecessary and so we could just skip it.
I'd propose tabling that for now and implementing that as one of several future enhancements to the model selection logic.
|
||
|
||
class _FirstStageWrapper: | ||
def __init__(self, model, is_Y, featurizer, linear_first_stages, discrete_treatment): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also do we no longer featurizer our first stage models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only ever used featurization in the first stage models to support linear_first_stages, since then we needed to interact phi(X) with [X;W].
Logically, the featurization is only about the final model, the first stage models just get the raw X and W and can fit whatever kind of model they'd like on them, including one that pipelines featurization with any other logic.
General comment not just about this PR but of process of adding new features - I feel like a demo notebook (even if brief) would help users get up to speed with new features we add. Maybe we can add demo notebooks before moving from a beta release to an official release? |
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
Signed-off-by: Keith Battocchi <kebatt@microsoft.com>
No description provided.