-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable model selection for first stage models #808
Merged
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
f200512
Adding model selection functionality
AnthonyCampbell208 30c290a
Fixed fitting with groups, fixed one param grid case, other bugs
AnthonyCampbell208 55c5858
Final commit, added encoding for categorical data (untested) and adde…
AnthonyCampbell208 fe1c5e1
Model selection WIP
kbattocchi 7104f00
Merge branch 'main' into kebatt/modelSelection
kbattocchi 6d41ada
Fix some model selection logic
kbattocchi 0435b26
Remove deprecated "normalize" param
kbattocchi db74413
Adjust tests for lack of linear_first_stages
kbattocchi 7e61c00
Remove vestigal functionality
kbattocchi fe63f23
Fix linting
kbattocchi 6f6a514
Speed up tests by doing less model selection
kbattocchi 2451faa
Ensure use of models that can fit arrays and vectors in DMLIV tests
kbattocchi 6d4a203
Fix tests
kbattocchi 818ff9c
Speed up tests
kbattocchi ba7de62
Make tests more reliable
kbattocchi a551c19
Try to fix tests
kbattocchi 96cb47e
Fix tests
kbattocchi f454f24
Fix docstrings
kbattocchi 9b45601
Fix doctests
kbattocchi 4618ffa
Fix doctests
kbattocchi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in some earlier conversations we were thinking about giving the users the option to do "dirty crossfitting" i.e. picking a good est from all data before cross fitting. Am I correct in my understanding that this PR just does "dirty crossfitting" by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and that's definitely something we could consider making easier for users.
It's possible, though not straightforward, to do non-dirty crossfitting now, by wrapping a CV estimator in a FixedModelSelector, which will always use the estimator as is for both selecting and fitting. However, there are some changes we could make to make this more efficient, since then the selecting step is unnecessary and so we could just skip it.
I'd propose tabling that for now and implementing that as one of several future enhancements to the model selection logic.