Add CRAM reference argument to CLI for loading alignments and fragments #2293

heuermh · 2021-01-21T22:55:57Z

No description provided.

FriederikeHanssen · 2021-05-29T12:21:37Z

@heuermh Thank you for this fix. I am wondering how to properly set this. So far I have:

       --master local[*] \
       --conf spark.local.dir=. \
       --driver-memory ${task.memory.toGiga()}g \
       -- \
       transformAlignments \
       -mark_duplicate_reads \
       -single \
       -stringency LENIENT \
       -reference ${reference} \
       -sort_by_reference_position \
       ${input_cram} \
       $output.adam.md.cram

I keep getting this error

  21/05/29 12:11:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on cfc010:32929 (size: 27.4 KiB, free: 4.6 GiB)
  21/05/29 12:11:47 INFO SparkContext: Created broadcast 0 from newAPIHadoopFile at ADAMContext.scala:2054
  21/05/29 12:11:48 INFO TransformAlignments: Marking duplicates
  21/05/29 12:11:48 INFO FileInputFormat: Total input files to process : 1
  21/05/29 12:11:48 INFO TransformAlignments: Sorting alignments by reference position, with references ordered by name
  21/05/29 12:11:48 INFO RDDBoundAlignmentDataset: Sorting alignments by reference position
  21/05/29 12:11:48 INFO RDDBoundAlignmentDataset: Saving data in SAM/BAM/CRAM format
  21/05/29 12:11:48 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 1912.0 B, free 4.6 GiB)
  21/05/29 12:11:48 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 355.0 B, free 4.6 GiB)
  21/05/29 12:11:48 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on cfc010:32929 (size: 355.0 B, free: 4.6 GiB)
  21/05/29 12:11:48 INFO SparkContext: Created broadcast 1 from broadcast at AlignmentDataset.scala:735
  Command body threw exception:
  java.lang.IllegalArgumentException: requirement failed: To save as CRAM, the reference source must be set in your config as hadoopbam.cram.reference-source-path.
  Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: To save as CRAM, the reference source must be set in your config as hadoopbam.cram.reference-source-path.

I am using Adam 0.35 with Singularity. Apologies, if this is stated somewhere in the docs and I overlooked it. Thank you very much for your help :)

heuermh · 2021-05-29T22:34:26Z

Sorry, transformAlignments has a lot of command line arguments.

-reference is used to specify a genomic reference in FASTA format (plus indexes) for indel realignment.

-cram_reference is used to specify the CRAM format reference, if either the input or output are in CRAM format.

$ adam-submit --help
...
 -cram_reference VAL : CRAM format reference, if necessary
 -reference VAL : Path to a reference file to use for indel realignment.

Hope this helps!

FriederikeHanssen · 2021-05-31T08:43:15Z

Hi @heuermh ,

yes this helps a little. the reference error is gone, but now I ran into this:

1/05/31 08:02:24 INFO RDDBoundAlignmentDataset: Sorting alignments by reference index, using SequenceDictionary{1->200000, 0
  2->200000, 1
  3->200000, 2
  8->1282, 3
  11->3696, 4
  X->200000, 5}.
  21/05/31 08:02:24 INFO RDDBoundAlignmentDataset: Saving data in SAM/BAM/CRAM format
  21/05/31 08:02:24 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.1 KiB, free 4.6 GiB)
  21/05/31 08:02:24 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1012.0 B, free 4.6 GiB)
  21/05/31 08:02:24 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on cfc010:32920 (size: 1012.0 B, free: 4.6 GiB)
  21/05/31 08:02:24 INFO SparkContext: Created broadcast 1 from broadcast at AlignmentDataset.scala:735
  Command body threw exception:
  java.lang.IllegalArgumentException: Missing scheme
        at java.base/java.nio.file.Path.of(Path.java:199)
        at java.base/java.nio.file.Paths.get(Paths.java:97)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.saveAsSam(AlignmentDataset.scala:890)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.saveAsSam(AlignmentDataset.scala:802)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.maybeSaveBam(AlignmentDataset.scala:594)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.save(AlignmentDataset.scala:637)
        at org.bdgenomics.adam.cli.TransformAlignments.run(TransformAlignments.scala:646)
        at org.bdgenomics.utils.cli.BDGSparkCommand.run(BDGCommand.scala:52)
        at org.bdgenomics.utils.cli.BDGSparkCommand.run$(BDGCommand.scala:45)
        at org.bdgenomics.adam.cli.TransformAlignments.run(TransformAlignments.scala:203)
        at org.bdgenomics.adam.cli.ADAMMain.apply(ADAMMain.scala:127)
        at org.bdgenomics.adam.cli.ADAMMain$.main(ADAMMain.scala:65)
        at org.bdgenomics.adam.cli.ADAMMain.main(ADAMMain.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  21/05/31 08:02:24 INFO SparkContext: Invoking stop() from shutdown hook
  21/05/31 08:02:24 INFO SparkUI: Stopped Spark web UI at http://cfc010:4040
  21/05/31 08:02:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
  21/05/31 08:02:24 INFO MemoryStore: MemoryStore cleared
  21/05/31 08:02:24 INFO BlockManager: BlockManager stopped
  21/05/31 08:02:24 INFO BlockManagerMaster: BlockManagerMaster stopped
  21/05/31 08:02:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
  21/05/31 08:02:24 INFO SparkContext: Successfully stopped SparkContext
  21/05/31 08:02:24 INFO ShutdownHookManager: Shutdown hook called

For context, I am running this with nextflow and the singularity container. Using the same setup and bam files works though.
The full command I am running now:

 """
    adam-submit \
       --master local[${task.cpus}] \
       --driver-memory ${task.memory.toGiga()}g \
       -- \
       transformAlignments \
       -mark_duplicate_reads \
       -single \
       -stringency LENIENT \
       -cram_reference ${reference} \
       -sort_by_reference_position_and_index \
       ${cram} \
       ${cram.simpleName}.adam.md.cram
    """

I added -sort_by_reference_position_and_index because i got java.lang.IllegalArgumentException: requirement failed: To save as CRAM, input must be sorted. without it, I did sort the reads after mapping though with samtools sort. I tried googleing the error, but so far got nowhere. Do you have any hints what could cause it? Thank you so much!

heuermh · 2021-06-02T14:22:52Z

Sorry this is turning out to be so much trouble!

This is a file system issue:

  java.lang.IllegalArgumentException: Missing scheme
        at java.base/java.nio.file.Path.of(Path.java:199)
        at java.base/java.nio.file.Paths.get(Paths.java:97)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.saveAsSam(AlignmentDataset.scala:890)
        at org.bdgenomics.adam.ds.read.AlignmentDataset.saveAsSam(AlignmentDataset.scala:802)

Where on the file system is the CRAM reference stored? What does the -cram_reference ${reference} argument look like?

heuermh added this to the 0.34.0 milestone Jan 21, 2021

heuermh mentioned this issue Jan 21, 2021

Add CRAM reference argument for loading alignments and fragments bigdatagenomics/cannoli#287

Closed

heuermh changed the title ~~Add CRAM reference argument to transformAlignments~~ Add CRAM reference argument to CLI for loading alignments and fragments Jan 21, 2021

heuermh mentioned this issue Jan 21, 2021

[ADAM-2293] Add CRAM reference argument to CLI for loading alignments and fragments #2294

Merged

heuermh closed this as completed in #2294 Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CRAM reference argument to CLI for loading alignments and fragments #2293

Add CRAM reference argument to CLI for loading alignments and fragments #2293

heuermh commented Jan 21, 2021

FriederikeHanssen commented May 29, 2021

heuermh commented May 29, 2021

FriederikeHanssen commented May 31, 2021

heuermh commented Jun 2, 2021

Add CRAM reference argument to CLI for loading alignments and fragments #2293

Add CRAM reference argument to CLI for loading alignments and fragments #2293

Comments

heuermh commented Jan 21, 2021

FriederikeHanssen commented May 29, 2021

heuermh commented May 29, 2021

FriederikeHanssen commented May 31, 2021

heuermh commented Jun 2, 2021