Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CRAM reference argument to CLI for loading alignments and fragments #2293

Closed
heuermh opened this issue Jan 21, 2021 · 4 comments · Fixed by #2294
Closed

Add CRAM reference argument to CLI for loading alignments and fragments #2293

heuermh opened this issue Jan 21, 2021 · 4 comments · Fixed by #2294
Milestone

Comments

@heuermh
Copy link
Member

heuermh commented Jan 21, 2021

No description provided.

@heuermh heuermh added this to the 0.34.0 milestone Jan 21, 2021
@heuermh heuermh changed the title Add CRAM reference argument to transformAlignments Add CRAM reference argument to CLI for loading alignments and fragments Jan 21, 2021
@FriederikeHanssen
Copy link

@heuermh Thank you for this fix. I am wondering how to properly set this. So far I have:

       --master local[*] \
       --conf spark.local.dir=. \
       --driver-memory ${task.memory.toGiga()}g \
       -- \
       transformAlignments \
       -mark_duplicate_reads \
       -single \
       -stringency LENIENT \
       -reference ${reference} \
       -sort_by_reference_position \
       ${input_cram} \
       $output.adam.md.cram

I keep getting this error

  21/05/29 12:11:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on cfc010:32929 (size: 27.4 KiB, free: 4.6 GiB)
  21/05/29 12:11:47 INFO SparkContext: Created broadcast 0 from newAPIHadoopFile at ADAMContext.scala:2054
  21/05/29 12:11:48 INFO TransformAlignments: Marking duplicates
  21/05/29 12:11:48 INFO FileInputFormat: Total input files to process : 1
  21/05/29 12:11:48 INFO TransformAlignments: Sorting alignments by reference position, with references ordered by name
  21/05/29 12:11:48 INFO RDDBoundAlignmentDataset: Sorting alignments by reference position
  21/05/29 12:11:48 INFO RDDBoundAlignmentDataset: Saving data in SAM/BAM/CRAM format
  21/05/29 12:11:48 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 1912.0 B, free 4.6 GiB)
  21/05/29 12:11:48 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 355.0 B, free 4.6 GiB)
  21/05/29 12:11:48 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on cfc010:32929 (size: 355.0 B, free: 4.6 GiB)
  21/05/29 12:11:48 INFO SparkContext: Created broadcast 1 from broadcast at AlignmentDataset.scala:735
  Command body threw exception:
  java.lang.IllegalArgumentException: requirement failed: To save as CRAM, the reference source must be set in your config as hadoopbam.cram.reference-source-path.
  Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: To save as CRAM, the reference source must be set in your config as hadoopbam.cram.reference-source-path.

I am using Adam 0.35 with Singularity. Apologies, if this is stated somewhere in the docs and I overlooked it. Thank you very much for your help :)

@heuermh
Copy link
Member Author

heuermh commented May 29, 2021

Sorry, transformAlignments has a lot of command line arguments.

-reference is used to specify a genomic reference in FASTA format (plus indexes) for indel realignment.

-cram_reference is used to specify the CRAM format reference, if either the input or output are in CRAM format.

$ adam-submit --help
...
 -cram_reference VAL : CRAM format reference, if necessary
 -reference VAL : Path to a reference file to use for indel realignment.

Hope this helps!

@FriederikeHanssen
Copy link

Hi @heuermh ,

yes this helps a little. the reference error is gone, but now I ran into this:

1/05/31 08:02:24 INFO RDDBoundAlignmentDataset: Sorting alignments by reference index, using SequenceDictionary{1->200000, 0
  2->200000, 1
  3->200000, 2
  8->1282, 3
  11->3696, 4
  X->200000, 5}.
  21/05/31 08:02:24 INFO RDDBoundAlignmentDataset: Saving data in SAM/BAM/CRAM format
  21/05/31 08:02:24 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.1 KiB, free 4.6 GiB)
  21/05/31 08:02:24 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1012.0 B, free 4.6 GiB)
  21/05/31 08:02:24 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on cfc010:32920 (size: 1012.0 B, free: 4.6 GiB)
  21/05/31 08:02:24 INFO SparkContext: Created broadcast 1 from broadcast at AlignmentDataset.scala:735
  Command body threw exception:
  java.lang.IllegalArgumentException: Missing scheme
        at java.base/java.nio.file.Path.of(Path.java:199)
        at java.base/java.nio.file.Paths.get(Paths.java:97)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.saveAsSam(AlignmentDataset.scala:890)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.saveAsSam(AlignmentDataset.scala:802)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.maybeSaveBam(AlignmentDataset.scala:594)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.save(AlignmentDataset.scala:637)
        at org.bdgenomics.adam.cli.TransformAlignments.run(TransformAlignments.scala:646)
        at org.bdgenomics.utils.cli.BDGSparkCommand.run(BDGCommand.scala:52)
        at org.bdgenomics.utils.cli.BDGSparkCommand.run$(BDGCommand.scala:45)
        at org.bdgenomics.adam.cli.TransformAlignments.run(TransformAlignments.scala:203)
        at org.bdgenomics.adam.cli.ADAMMain.apply(ADAMMain.scala:127)
        at org.bdgenomics.adam.cli.ADAMMain$.main(ADAMMain.scala:65)
        at org.bdgenomics.adam.cli.ADAMMain.main(ADAMMain.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  21/05/31 08:02:24 INFO SparkContext: Invoking stop() from shutdown hook
  21/05/31 08:02:24 INFO SparkUI: Stopped Spark web UI at http://cfc010:4040
  21/05/31 08:02:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
  21/05/31 08:02:24 INFO MemoryStore: MemoryStore cleared
  21/05/31 08:02:24 INFO BlockManager: BlockManager stopped
  21/05/31 08:02:24 INFO BlockManagerMaster: BlockManagerMaster stopped
  21/05/31 08:02:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
  21/05/31 08:02:24 INFO SparkContext: Successfully stopped SparkContext
  21/05/31 08:02:24 INFO ShutdownHookManager: Shutdown hook called

For context, I am running this with nextflow and the singularity container. Using the same setup and bam files works though.
The full command I am running now:

 """
    adam-submit \
       --master local[${task.cpus}] \
       --driver-memory ${task.memory.toGiga()}g \
       -- \
       transformAlignments \
       -mark_duplicate_reads \
       -single \
       -stringency LENIENT \
       -cram_reference ${reference} \
       -sort_by_reference_position_and_index \
       ${cram} \
       ${cram.simpleName}.adam.md.cram
    """

I added -sort_by_reference_position_and_index because i got java.lang.IllegalArgumentException: requirement failed: To save as CRAM, input must be sorted. without it, I did sort the reads after mapping though with samtools sort. I tried googleing the error, but so far got nowhere. Do you have any hints what could cause it? Thank you so much!

@heuermh
Copy link
Member Author

heuermh commented Jun 2, 2021

Sorry this is turning out to be so much trouble!

This is a file system issue:

  java.lang.IllegalArgumentException: Missing scheme
        at java.base/java.nio.file.Path.of(Path.java:199)
        at java.base/java.nio.file.Paths.get(Paths.java:97)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.saveAsSam(AlignmentDataset.scala:890)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.saveAsSam(AlignmentDataset.scala:802)

Where on the file system is the CRAM reference stored? What does the -cram_reference ${reference} argument look like?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants