-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Label model symmetry breaking #1451
Conversation
- Factor out two post-processing ops on mu in LabelModel.fit - Implement heuristic symmetry breaking on mu
Codecov Report
@@ Coverage Diff @@
## master #1451 +/- ##
==========================================
+ Coverage 97.55% 97.58% +0.03%
==========================================
Files 55 55
Lines 2001 2032 +31
Branches 328 334 +6
==========================================
+ Hits 1952 1983 +31
Misses 22 22
Partials 27 27
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a test for checking whether the symmetry breaking is working correctly or is it inherent in one of the other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offline: added tests
I'm experiencing problems with the symmetric breaking code. Given the combinatorial explosion of the number of possible permutations (as a function of the number of output classes), the method does not really scale to problems with more than 6-7 classes. Maybe the code should stop after a few thousand permutations? |
Description of proposed changes
This PR primarily implements a heuristic procedure for selecting one of several symmetric (equally optimal) solutions to the
LabelModel
parameter estimation procedure arising from orthogonal symmetries there. Basically, for anymu
that we learn (the estimated conditional probabilities of the LFs), we can often also accept column permutations of this result. So, we choose the solution where the most LFs are estimated to be better than random, as per our standard modeling assumption.This PR also:
LabelModel.get_conditional_probs
sub-functionLabelModel.fit
, i.e. right now, this symmetry breaking operation and clamping.Related issue(s)
Fixes #1437 (at least to first order)
Test plan
Adding additional tests for (a) conditional probability table calculation, and (b) symmetry breaking specifically
Checklist