Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add -merge command to Transform #734

Closed
wants to merge 11 commits into from

Conversation

antonstamov
Copy link
Contributor

Add ability to use as INPUT multiple splitted BAMs or SAMs separated by the comma and merge them to one RDD.
Example of INPUT: /path/to/1.bam,/path/to/2.bam,/path/to/3.bam,/path/to/4.bam

@antonstamov
Copy link
Contributor Author

I'm sorry for old commits that were manually merged by @fnothaft

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@antonstamov antonstamov changed the title add -merge command to Transofm add -merge command to Transform Jul 21, 2015
@ryan-williams
Copy link
Member

So, the impetus for this is your work on #689 and #733, where you have BAM/SAM file pieces laying around without headers, right?

Is there any motivation to be able to do this with ADAM files as well?

Do the reads in the merged files retain any provenance info about which file they came from? Do we care?

I guess my concern is that this makes it seem like we support merging BAM files that are logically actually multiple files, not just partitions of one; given that this is only about the latter, it feels like we're shipping some implementation details if we don't have a story for how this works in the general case.

At the very least maybe we should document the context that this is meant to be used in, and the contexts where it wouldn't make sense or would be lossy?

@antonstamov
Copy link
Contributor Author

@ryan-williams no no no, this is just a small tweak to load splitted bam/sam. In my case it were output files from http://ngsutils.org/modules/bamutils/split/

@ryan-williams
Copy link
Member

ah ok, cool. any reason to not allow it to work for e.g. ADAM files as well?

@ryan-williams
Copy link
Member

in any case, this lgtm but i'll let @fnothaft or someone else have a look before merging

@massie
Copy link
Member

massie commented Aug 4, 2015

Add to whitelist

@massie
Copy link
Member

massie commented Aug 4, 2015

Jenkins, test this please.

1 similar comment
@massie
Copy link
Member

massie commented Aug 4, 2015

Jenkins, test this please.

@ryan-williams
Copy link
Member

Seems like ./scripts/format-source is failing?

@massie
Copy link
Member

massie commented Sep 9, 2015

@antonstamov Can you run the ./scripts/format-source script in the ADAM repo and then push any changes in the source? That's the cause of the test failure -- it's failing on the format of your source.

Thanks.

@massie
Copy link
Member

massie commented Sep 17, 2015

@antonstamov I'd like to merge this PR as soon as possible. Can you run the following?

$ cd <adam repo>
$ ./scripts/format-source
$ git add .
$ git commit -m "Formatting source"
$ git push --force origin adam-merge-tool:adam-merge-tool

This will update this PR and fix the syntax errors. Tests are failing because of the format of the source.

@antonstamov
Copy link
Contributor Author

@ryan-williams, @massie fixed, sorry for long response

@fnothaft
Copy link
Member

fnothaft commented Oct 2, 2015

@ryan-williams I think this is what you were looking for on #831, no?

@fnothaft fnothaft modified the milestone: 0.18.0 Oct 2, 2015
@heuermh
Copy link
Member

heuermh commented Nov 2, 2015

@fnothaft now that #831 has been merged, do you know what the status of this PR should be?

@fnothaft
Copy link
Member

fnothaft commented Feb 3, 2016

I believe this has been superseded by #831. We can reopen this later if needed.

@fnothaft fnothaft closed this Feb 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants