Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion to BAM requires post conversion reordering and sorting #760

Closed
beaunorgeot opened this issue Aug 7, 2015 · 9 comments
Closed
Assignees
Milestone

Comments

@beaunorgeot
Copy link
Contributor

Reference chromosome ordering and read sorting information are lost when converting an ADAM file that has undergone MD,Indel, & BQSR back to BAM.

This means that the resulting BAM must be Reordered and Sorted prior to use with another tool (like a variant caller), which seems unnecessary/wasteful.

@massie
Copy link
Member

massie commented Aug 7, 2015

Did you try this with the -sort_reads option in transform? That should result in ADAM emitting sorted/reordered files.

@beaunorgeot
Copy link
Contributor Author

The -sort_reads option in transform does not result in ADAM preserving bam ordering.

The order of the bam file that is emitted from ADAM appears completely random:
samtools view -H ADAMcat.bam ##first couple of lines of output bam. These aren't ordered at all

@sq SN:Y LN:59373566 M5:Y
@sq SN:14 LN:107349540 M5:14
@sq SN:9 LN:141213431 M5:9
@sq SN:7 LN:159138663 M5:7
@sq SN:17 LN:81195210 M5:17
@sq SN:18 LN:78077248 M5:18
@sq SN:8 LN:146364022 M5:8

The input file starts off numerically ordered.
samtools view -H input.bam ## first few lines. These are ordered correctly
@sq SN:1 LN:249250621
@sq SN:2 LN:243199373
@sq SN:3 LN:198022430
@sq SN:4 LN:191154276
@sq SN:5 LN:180915260
@sq SN:6 LN:171115067

@ryan-williams
Copy link
Member

Sounds like -sort_reads sorts the reads but @beaunorgeot is commenting on the header lines not being sorted?

@fnothaft
Copy link
Member

It looks like according to the SAM spec the order of the @SQ lines defines the sort order of the file. @beaunorgeot would it be correct to say that in a BAM file outputted from ADAM, the reads are sorted, but the read sort order disagrees with the @SQ line sort order?

@beaunorgeot
Copy link
Contributor Author

As we discussed in the group call today, the reads presumably are sorted. However, when ADAM outputted BAMs are re-concatenated into a single file, the resulting file has an apparently random chromosome order and (for what is probably the same reason) the overall reads are unsorted. This outcome seems consistent w/partition numbers not correlating directly to genomic coordinate ordering.

@fnothaft fnothaft self-assigned this Aug 19, 2015
@fnothaft
Copy link
Member

Working on this.

@fnothaft
Copy link
Member

I have pushed a partial fix as #784 that fixes the header issues and works correctly for a single partition. I will be looking into this more tomorrow to check into the partition sort order for large files.

@fnothaft
Copy link
Member

OK, so now that #784 is prepped, I believe that ADAM should be good-to-go for writing confirmed sorted BAM/SAM, minus one detail in the BAM/SAM header (I will open a PR for this shortly). I confirmed this with the following bash "one liner":

rm -rf mouse_chrM.* pos sPos; wget http://www.eecs.berkeley.edu/~massie/bams/mouse_chrM.bam; ./bin/adam-submit transform mouse_chrM.bam mouse_chrM.adam -repartition 10 ; ./bin/adam-submit transform mouse_chrM.adam mouse_chrM.sam -sort_reads; f=$(ls mouse_chrM.sam/*part*); for file in ${f[@]}; do grep -v "\@" $file | awk '{print $4}' | tee -a pos; done; sort -n pos > sPos; diff --brief pos sPos; echo $?

This downloads a small BAM file (which is a single chromosome), converts it into a 10 partition ADAM file, sorts it and converts it to SAM, then prints the position of each read in order to a file, resorts that file, and then compares whether the two files are different. If it prints 0 at the end, then all is well.

@fnothaft
Copy link
Member

fnothaft commented Oct 2, 2015

Closed by #784 and improved by 283ea9d.

@fnothaft fnothaft closed this as completed Oct 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants