-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lowering] Fix MatrixBlockMatrixWriter lowering when some partitions are empty #12797
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsolicited feedback - is there no way to test this that doesn't involve an integration test and/or the need to check in so many files? I wonder if this can be captured in a scala unit test?
Great question. If I could create a table that could reproduce this bug in pure code, I would. |
shouldn't we be able to create a MatrixTable with empty partitions, write as BM, and trigger this? |
9beb84a
to
a3c10b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems unnecessarily integration-y for something that we should able to test directly. If we have the partition counts of the input to write_from_entry_expr, we should be able to just generate the MT with those counts and trigger the failure directly?
I'll try it. |
a3c10b1
to
062886d
Compare
062886d
to
cdd04ce
Compare
I was able to replicate the reported bug using a different method that doesn't require 500 files of matrix table. It's still similar, but I feel that going further is a waste of time.
@@ -1,7 +1,7 @@ | |||
import hail as hl | |||
import hail.utils as utils | |||
|
|||
from ...helpers import resource, skip_when_service_backend | |||
from ...helpers import resource, skip_when_service_backend, fails_service_backend, fails_local_backend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
…are empty Using changePartitionerNoRepartition with a partitioner with a different number of partitions cannot be correct and will result in dropped data. Just construct the new partitioner (based on row index, so it will comply with invariants) directly.
cdd04ce
to
3730068
Compare
def test_from_entry_expr_empty_parts(self): | ||
with hl.TemporaryDirectory(ensure_exists=False) as path: | ||
mt = hl.balding_nichols_model(n_populations=5, n_variants=2000, n_samples=20, n_partitions=200) | ||
mt = mt.filter_rows((mt.locus.position <= 500) | (mt.locus.position > 1500)).checkpoint(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, this is a bit of a load bearing checkpoint, the issue we've run into doesn't replicate otherwise
e5a50f5
to
3730068
Compare
Using changePartitionerNoRepartition with a partitioner with a different
number of partitions cannot be correct and will result in dropped data.
Just construct the new partitioner (based on row index, so it will
comply with invariants) directly.