Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor saving with -single flag #733

Conversation

antonstamov
Copy link
Contributor

Inpired by @ryan-williams comment:
.coalesce(1) and renaming of part-r-00000 to dirname is replaced by saving head partition with header and tail partitions without headers (with using of PartitionPruningRDD) to two separate newApiHadoopFIles and merging them to filePath with FileUtil.copyMerge.
Was tested on ADAM file with 9050501 records. Saved SAM and BAM has same number of records.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@fnothaft
Copy link
Member

Jenkins, add to whitelist.

@massie
Copy link
Member

massie commented Sep 9, 2015

Jenkins, test this please.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/907/

Build result: FAILURE

GitHub pull request #733 of commit 63372b1 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/733/merge^{commit} # timeout=10 > git branch -a --contains 527b0e8 # timeout=10 > git rev-parse remotes/origin/pr/733/merge^{commit} # timeout=10Checking out Revision 527b0e8 (origin/pr/733/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 527b0e83401ecf81cbf9bf418a257bbdb9282d93First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@massie
Copy link
Member

massie commented Sep 9, 2015

Please run ./scripts/format-source on this PR and push the changes.

@massie
Copy link
Member

massie commented Sep 17, 2015

@antonstamov I'd like to get this merged as soon as possible. Can you please do the following?

$ cd <adam repo>
$ ./scripts/format-source
$ git add .
$ git commit -m "Formatting source"
$ git push --force origin save-as-single-file-refactor:save-as-single-file-refactor

This will reformat the source and update this PR. Thank!

@heuermh
Copy link
Member

heuermh commented Sep 17, 2015

FYI I am also interested in seeing this merged, as I would like to add the same feature to #816

@antonstamov
Copy link
Contributor Author

@massie fixed

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/916/
Test PASSed.

@massie
Copy link
Member

massie commented Sep 18, 2015

This looks good. Can you think of a simple way to add a test? We have a number of small SAM files checked in (see the src/resources/test directories) that you can use as input. For example, create a temp directory, transform a SAM using the -single flag and confirm the data is the same.

@heuermh
Copy link
Member

heuermh commented Dec 29, 2015

Thank you @antonstamov! This has been superceded by #901.

@heuermh heuermh closed this Dec 29, 2015
heuermh added a commit that referenced this pull request Dec 29, 2015
Single file save from #733, rebased
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants