-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add -merge command to Transform #734
Conversation
rebase fork
rebase adam master into adam2bam branch
change -as_regular_file to -single, add .coalesce(1)
Conflicts: adam-cli/src/main/scala/org/bdgenomics/adam/cli/Transform.scala adam-core/src/main/scala/org/bdgenomics/adam/rdd/read/AlignmentRecordRDDFunctions.scala
I'm sorry for old commits that were manually merged by @fnothaft |
Can one of the admins verify this patch? |
So, the impetus for this is your work on #689 and #733, where you have BAM/SAM file pieces laying around without headers, right? Is there any motivation to be able to do this with ADAM files as well? Do the reads in the merged files retain any provenance info about which file they came from? Do we care? I guess my concern is that this makes it seem like we support merging BAM files that are logically actually multiple files, not just partitions of one; given that this is only about the latter, it feels like we're shipping some implementation details if we don't have a story for how this works in the general case. At the very least maybe we should document the context that this is meant to be used in, and the contexts where it wouldn't make sense or would be lossy? |
@ryan-williams no no no, this is just a small tweak to load splitted bam/sam. In my case it were output files from http://ngsutils.org/modules/bamutils/split/ |
ah ok, cool. any reason to not allow it to work for e.g. ADAM files as well? |
in any case, this lgtm but i'll let @fnothaft or someone else have a look before merging |
Add to whitelist |
Jenkins, test this please. |
1 similar comment
Jenkins, test this please. |
Seems like |
@antonstamov Can you run the Thanks. |
@antonstamov I'd like to merge this PR as soon as possible. Can you run the following?
This will update this PR and fix the syntax errors. Tests are failing because of the format of the source. |
@ryan-williams, @massie fixed, sorry for long response |
@ryan-williams I think this is what you were looking for on #831, no? |
I believe this has been superseded by #831. We can reopen this later if needed. |
Add ability to use as INPUT multiple splitted BAMs or SAMs separated by the comma and merge them to one RDD.
Example of INPUT:
/path/to/1.bam,/path/to/2.bam,/path/to/3.bam,/path/to/4.bam