-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support binary case #215
Support binary case #215
Conversation
✅ Deploy Preview for silly-keller-664934 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, not approving yet since we'll want a test for the binary case too.
Looks like the failing test is
Seems like we need to either install |
Looks like this was broken a few days ago: https://github.com/drivendataorg/zamba/runs/7904627719?check_suite_focus=true |
Codecov Report
@@ Coverage Diff @@
## master #215 +/- ##
======================================
Coverage 86.9% 87.0%
======================================
Files 29 29
Lines 1910 1918 +8
======================================
+ Hits 1661 1669 +8
Misses 249 249
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @AllenDowney you can merge once the tests finish passing
Picking up where #200 left off...
If a user provides only two labels, we check that they are mutually exclusive and, if so, train the binary model. We log which column we're keeping.
Outstanding:
Note: I gave this a quick try on a dataset of 100 videos balanced evenly between blank and non blank and we indeed see some learning.
Some downsides to the binary case is that these metrics can look misleading when the class are imbalanced. Not sure the best way to warn users about that. If users provide highly imbalanced data, models may learn problematically to only predict the default class, but I suppose that is the same in the multilabel case.