-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion to BAM requires post conversion reordering and sorting #760
Comments
Did you try this with the |
The -sort_reads option in transform does not result in ADAM preserving bam ordering. The order of the bam file that is emitted from ADAM appears completely random: @sq SN:Y LN:59373566 M5:Y The input file starts off numerically ordered. |
Sounds like |
It looks like according to the SAM spec the order of the |
As we discussed in the group call today, the reads presumably are sorted. However, when ADAM outputted BAMs are re-concatenated into a single file, the resulting file has an apparently random chromosome order and (for what is probably the same reason) the overall reads are unsorted. This outcome seems consistent w/partition numbers not correlating directly to genomic coordinate ordering. |
Working on this. |
I have pushed a partial fix as #784 that fixes the header issues and works correctly for a single partition. I will be looking into this more tomorrow to check into the partition sort order for large files. |
OK, so now that #784 is prepped, I believe that ADAM should be good-to-go for writing confirmed sorted BAM/SAM, minus one detail in the BAM/SAM header (I will open a PR for this shortly). I confirmed this with the following bash "one liner":
This downloads a small BAM file (which is a single chromosome), converts it into a 10 partition ADAM file, sorts it and converts it to SAM, then prints the position of each read in order to a file, resorts that file, and then compares whether the two files are different. If it prints 0 at the end, then all is well. |
Reference chromosome ordering and read sorting information are lost when converting an ADAM file that has undergone MD,Indel, & BQSR back to BAM.
This means that the resulting BAM must be Reordered and Sorted prior to use with another tool (like a variant caller), which seems unnecessary/wasteful.
The text was updated successfully, but these errors were encountered: