Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: estimated phasing quality #19

Closed
jts opened this issue Jul 13, 2022 · 4 comments
Closed

feature request: estimated phasing quality #19

jts opened this issue Jul 13, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@jts
Copy link

jts commented Jul 13, 2022

Hi,

Thanks for longphase, I have been using it the last few weeks and am impressed with its speed and usability. My main use case is generating a haplotagged .bam file, for use in downstream analysis. Is it possible to add a tag to the bam record with an estimated "phasing quality" score (a phred-scaled estimate that the assigned phase is incorrect)? If not, it would be great to have simple matching statistics (number of heterozygous variants that the read covered, number consistent with h1/h2, etc) available to the user.

Thanks for considering,
Jared

@ythuang0522
Copy link
Collaborator

This's an interesting suggestion as we know quite a few regions are challenging for phasing. It should be good to provide this info. Simple statistics would be easier as we are not sure how to estimate the Phred-scaled quality properly.

@ythuang0522 ythuang0522 added the enhancement New feature or request label Jul 14, 2022
@ythuang0522
Copy link
Collaborator

Hi @jts, We have added the Phred-scaled phasing/tagging quality of each read at the bam as requested at release v1.3. It's a phred scale of inconsistent probability, i.e., -10*log_10(Number of inconsistent loci/Number of consistent+inconsistent loci). It's written into the PQ flag (e.g., PQ:i:40).

If there are no inconsistent loci during haplotype assignment, we directly set PQ = 40 for this read. Note that the untagged reads were set to PQ=0. Below please find the distribution of PQ values at 10x HG002 (left: Number of reads, right: PQ value).
image

@ythuang0522
Copy link
Collaborator

Forgot to mention that the haplotag provided a --log option which outputs a tabular file storing a few statistics for each read that you might be interested. We also added the Phasing Quality into the table at v1.3.
image

@jts
Copy link
Author

jts commented Aug 25, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants