-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vds/combiner] Stop dropping GT in reference data during gvcf import #14560
Merged
hail-ci-robot
merged 7 commits into
hail-is:main
from
chrisvittal:vds/combiner/ref-can-be-haploid
May 31, 2024
Merged
[vds/combiner] Stop dropping GT in reference data during gvcf import #14560
hail-ci-robot
merged 7 commits into
hail-is:main
from
chrisvittal:vds/combiner/ref-can-be-haploid
May 31, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
517733b
to
070686a
Compare
|
Reference GT/PGT may have ploidy information, so we need to stop dropping the GT/PGT.
070686a
to
56baa59
Compare
Gotta say I don't feel very well equipped to review this. Happy to hop on a zoom to walk through it or have @ehigham do the review |
- pure set logic for shared_fields/ref_fields in to_dense_mt/coalesce_join - annotate the call_field from ref_call_field rather than transmute it, since both 'ref_call_field' and 'call_field' might still be in the variant data
b8e2dc3
to
c623645
Compare
I _think_ this is fine, and users won't need reference PGT by default. (Famous last words, I know, but we can burn that bridge when we get to it)
5b1b4a9
to
1aaaf91
Compare
6f3689d
to
9d3debc
Compare
ehigham
approved these changes
May 31, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chris kindly walked me through the genomic details that back this PR. With that, I'll approve.
chrisvittal
added a commit
to chrisvittal/hail
that referenced
this pull request
Jul 10, 2024
…ail-is#14560) CHANGELOG: The gvcf import stage of the VDS combiner now preserves the GT of reference blocks. Some datasets have haploid calls on sex chromosomes, and the fact that the reference was haploid should be preserved.
hail-ci-robot
pushed a commit
that referenced
this pull request
Jul 10, 2024
#14560 updated `to_dense_mt` to take into account reference the existence of reference GT fields. However, it was untested. I take our old `test_to_dense_mt` test, and add a haploid `LGT` field to the reference, and check to make sure that the haploid reference is present in the result.
chrisvittal
added a commit
that referenced
this pull request
Jul 10, 2024
#14560 updated `to_dense_mt` to take into account reference the existence of reference GT fields. However, it was untested. I take our old `test_to_dense_mt` test, and add a haploid `LGT` field to the reference, and check to make sure that the haploid reference is present in the result.
chrisvittal
added a commit
that referenced
this pull request
Jul 11, 2024
…14560) CHANGELOG: The gvcf import stage of the VDS combiner now preserves the GT of reference blocks. Some datasets have haploid calls on sex chromosomes, and the fact that the reference was haploid should be preserved.
chrisvittal
added a commit
that referenced
this pull request
Jul 11, 2024
#14560 updated `to_dense_mt` to take into account reference the existence of reference GT fields. However, it was untested. I take our old `test_to_dense_mt` test, and add a haploid `LGT` field to the reference, and check to make sure that the haploid reference is present in the result.
chrisvittal
added a commit
that referenced
this pull request
Jul 11, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [query] Don't error on VCF export when haploid call is unphased (#14375) - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal
added a commit
that referenced
this pull request
Jul 15, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [query] Don't error on VCF export when haploid call is unphased (#14375) - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal
added a commit
that referenced
this pull request
Jul 15, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [query] Don't error on VCF export when haploid call is unphased (#14375) - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal
added a commit
that referenced
this pull request
Jul 15, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [query] Don't error on VCF export when haploid call is unphased (#14375) - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal
added a commit
that referenced
this pull request
Jul 15, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [query] Don't error on VCF export when haploid call is unphased (#14375) - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal
added a commit
that referenced
this pull request
Jul 15, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [query] Don't error on VCF export when haploid call is unphased (#14375) - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal
added a commit
that referenced
this pull request
Jul 15, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [query] Don't error on VCF export when haploid call is unphased (#14375) - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal
added a commit
that referenced
this pull request
Jul 15, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [query] Don't error on VCF export when haploid call is unphased (#14375) - [compiler] apply scalafix to all scala sources (#14156) - [annotationdb][datasets] regional buckets (#14286)
chrisvittal
added a commit
that referenced
this pull request
Jul 15, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [query] Don't error on VCF export when haploid call is unphased (#14375) - [compiler] apply scalafix to all scala sources (#14156) - [annotationdb][datasets] regional buckets (#14286)
chrisvittal
added a commit
that referenced
this pull request
Jul 16, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [query] Don't error on VCF export when haploid call is unphased (#14375) - [compiler] apply scalafix to all scala sources (#14156) - [annotationdb][datasets] regional buckets (#14286)
chrisvittal
added a commit
that referenced
this pull request
Jul 30, 2024
…14560) CHANGELOG: The gvcf import stage of the VDS combiner now preserves the GT of reference blocks. Some datasets have haploid calls on sex chromosomes, and the fact that the reference was haploid should be preserved.
chrisvittal
added a commit
that referenced
this pull request
Jul 30, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
chrisvittal
added a commit
that referenced
this pull request
Jul 30, 2024
…14560) CHANGELOG: The gvcf import stage of the VDS combiner now preserves the GT of reference blocks. Some datasets have haploid calls on sex chromosomes, and the fact that the reference was haploid should be preserved.
chrisvittal
added a commit
that referenced
this pull request
Jul 30, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [vds/combiner] Fix truncation of PL in GVCF import with haploid calls (#14577)
chrisvittal
added a commit
that referenced
this pull request
Jul 30, 2024
…14560) CHANGELOG: The gvcf import stage of the VDS combiner now preserves the GT of reference blocks. Some datasets have haploid calls on sex chromosomes, and the fact that the reference was haploid should be preserved.
chrisvittal
added a commit
that referenced
this pull request
Jul 30, 2024
This patch version implements necessary changes for working with Variant Datasets with haploid calls, as well as one critical correctness bug, the substantial backports are: - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560) - [vds/combiner] Fix truncation of PL in GVCF import with haploid calls (#14577)
chrisvittal
added a commit
to chrisvittal/hail
that referenced
this pull request
Sep 18, 2024
After split_multi, LGT is dropped from the variant data of a VDS. After PR hail-is#14560, LGT is added to datasets after creation via the combiner. After hail-is#14675 the same is true for `from_merged_representation`. We should keep the GT/LGT field consistent across ref and var data. This change does so for split_multi. Resolves hail-is#14694
chrisvittal
added a commit
to chrisvittal/hail
that referenced
this pull request
Sep 19, 2024
After split_multi, LGT is dropped from the variant data of a VDS. After PR hail-is#14560, LGT is added to datasets after creation via the combiner. After hail-is#14675 the same is true for `from_merged_representation`. We should keep the GT/LGT field consistent across ref and var data. This change does so for split_multi. Resolves hail-is#14694
hail-ci-robot
pushed a commit
that referenced
this pull request
Sep 19, 2024
After split_multi, LGT is dropped from the variant data of a VDS. After PR #14560, LGT is added to datasets after creation via the combiner. After #14675 the same is true for `from_merged_representation`. We should keep the GT/LGT field consistent across ref and var data. This change does so for split_multi. Resolves #14694
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CHANGELOG: The gvcf import stage of the VDS combiner now preserves the GT of reference blocks. Some datasets have haploid calls on sex chromosomes, and the fact that the reference was haploid should be preserved.