-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAM/BAM sorting should follow @SQ order #823
Comments
tl;dr: possible way to actually fix this issue: add an optional Many comments flying around various PRs and things atm but this seems like the best place to centralize discussion. Here are three salient points that I see in this discussion:
However, I think there's a way we can actually fix this: add an (optional) Then, whenever we build a Curious to hear thoughts, lmk if this isn't clear. |
In full agreement with you @ryan-williams. Amusingly enough, back in the old days we did have
|
I wouldn't say "break" so much as "revert".
IMO that sounds fine. I don't think we should be shackled to SAM/BAM. In my mind, supporting export of BAM is merely good manners. I would be fine with letting go of a perfect BAM -> ADAM -> BAM roundtrip, as that's not what ADAM is really designed for. @ryan-williams If we want to ensure that |
Yeah, I'm was thinking about this more, and I'm a strong -1 on a revert of #784 as I am depending on that for a pipeline I am running with the UCSC folks.
I think that it makes sense to be able to define an expected sequence index order; I'm not against that. My problem with this PR is that reverting #784 leaves you worse off RE: exporting ADAM to SAM/BAM. It only makes the BAM -> ADAM -> BAM single shot case better.
IMO, this is a significantly worse solution than #784. |
I think we're talking past each other a little, but I'm pretty sure we can have the best of all worlds here: So you want to write a BAM from an
|
+1 @ryan-williams. Let people impose the order, and if not, make it deterministic via sort. It would be great to avoid the aggregation required to build the header if you know your data has a well-defined @fnothaft "break" and "revert" are both too strong :) We can definitely support both. |
Supporting both is my preference. |
I feel like we're all agreed then, modulo the question of which to default to in the presence of an existing non-lex sort order; as I said in my last comment, I vote that we use the existing sort-order when it is present and complete, and support a "sort" flag to proactively lex-sort in this case (should be easy given the existing Here are the things that need to happen, then:
These should all be pretty straightforward steps, and I'm up for getting started on them if this plan sounds good to people. Please holler if you'd like to do work on it instead / in parallel. One aside is that I don't foresee us wanting the header-injection approach from #824 in this world; lmk if you still feel like it's useful in a world where we embark on the above plan @laserson. |
@ryan-williams that SGTM |
I've just sent fnothaft#7 based on #815 which fixes this and #822 (I think; haven't specifically tested the latter). |
Thanks @ryan-williams ! I just merged fnothaft#7 |
Following up on discussion in #760 here:
#760 (comment)
Not clear that we cannot correctly implement the spec.
The text was updated successfully, but these errors were encountered: