Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXITING because of fatal ERROR: not enough memory for BAM sorting #289

Closed
serge2016 opened this issue Jul 7, 2017 · 7 comments
Closed

Comments

@serge2016
Copy link

serge2016 commented Jul 7, 2017

Hello!
I am trying to reproduce TCGA pipeline for users without access to the ICGC pipeline or a server with 8 cores, 60 Gb RAM and 8 Gb SWAP. But I am getting an error

EXITING because of fatal ERROR: not enough memory for BAM sorting: 
SOLUTION: re-run STAR with at least --limitBAMsortRAM 78628592870
Jul 05 16:43:52 ...... FATAL ERROR, exiting 

immediately after STAR stars sorting BAM in the 2nd pass alignment (pipeline's step 4).

So I am using STAR 2.4.2a (as in TCGA).
I am talking only about VanAllen2015_pat03-tumor-rna (SRR2689711) and VanAllen2015_pat123-re-tumor-rna (SRR2779596). Other samples didn't result in such problems and errors.
I've tried both variants for --sjdbOverhang option: 75 - maxReadLength-1; and 100 - as in TCGA.
Basically I started with:
--runThreadN 8 --genomeLoad NoSharedMemory
--runThreadN 8 --genomeLoad NoSharedMemory --limitBAMsortRAM 56998928790
I'm getting this:

  Command: '/soft/STAR-STAR_2.4.2a/bin/Linux_x86_64/STAR --runThreadN 8 --genomeLoad NoSharedMemory --limitBAMsortRAM 56998928790 --genomeDir /outputs/output/VanAllen2015_pat03-tumor-rna-SRR2689711_STAR075_step3_newIndexDir --readFilesIn /inputs/fastq1/pat03-tumor-rna-SRR2689711_1.fastq.gz /inputs/fastq2/pat03-tumor-rna-SRR2689711_2.fastq.gz --readFilesCommand gunzip -c --outFileNamePrefix /outputs/output/VanAllen2015_pat03-tumor-rna-SRR2689711_STAR075_step4/VanAllen2015_pat03-tumor-rna-SRR2689711. --outFilterMultimapScoreRange 1 --outFilterMultimapNmax 20 --outFilterMismatchNmax 10 --alignIntronMax 500000 --alignMatesGapMax 1000000 --sjdbScore 2 --alignSJDBoverhangMin 1 --outFilterMatchNminOverLread 0.33 --outFilterScoreMinOverLread 0.33 --sjdbOverhang 75 --outSAMstrandField intronMotif --outSAMattributes NH HI NM MD AS XS --outSAMunmapped Within --outSAMtype BAM SortedByCoordinate --outSAMheaderHD @HD VN:1.4 --outSAMattrRGline ID:VanAllen2015_pat03-tumor-rna-SRR2689711 PU:m LB:lib SM:VanAllen2015_pat03-tumor-rna-SRR2689711 PL:ILLUMINA'.
  PID=9782 (last job)
Jul 05 14:55:24 ..... Started STAR run
Jul 05 14:55:24 ..... Loading genome
Jul 05 14:58:29 ..... Started mapping
Jul 05 16:43:52 ..... Started sorting BAM

EXITING because of fatal ERROR: not enough memory for BAM sorting: 
SOLUTION: re-run STAR with at least --limitBAMsortRAM 78628592870
Jul 05 16:43:52 ...... FATAL ERROR, exiting 

After that I've tested it on version 2.5.3a and got the same.
Same runs result in same suggested RAM to use. Another patient or sjdbOverhang or STAR version - another RAM suggestion.

I also have tried this, but it seems to be incorrect due to ideology.
--runThreadN 8 --genomeLoad LoadAndKeep --limitBAMsortRAM 56998928790

  Command: '/soft/STAR-2.5.3a/bin/Linux_x86_64/STAR --runThreadN 8 --genomeLoad LoadAndKeep --limitBAMsortRAM 56998928790 --genomeDir /outputs/output/VanAllen2015_pat123_re-tumor-rna-SRR2779596_STAR100_step3_newIndexDir --readFilesIn /inputs/fastq1/pat123_re-tumor-rna-SRR2779596_1.fastq.gz /inputs/fastq2/pat123_re-tumor-rna-SRR2779596_2.fastq.gz --readFilesCommand gunzip -c --outFileNamePrefix /outputs/output/VanAllen2015_pat123_re-tumor-rna-SRR2779596_STAR100_step4/VanAllen2015_pat123_re-tumor-rna-SRR2779596. --outFilterMultimapScoreRange 1 --outFilterMultimapNmax 20 --outFilterMismatchNmax 10 --alignIntronMax 500000 --alignMatesGapMax 1000000 --sjdbScore 2 --alignSJDBoverhangMin 1 --outFilterMatchNminOverLread 0.33 --outFilterScoreMinOverLread 0.33 --sjdbOverhang 100 --outSAMstrandField intronMotif --outSAMattributes NH HI NM MD AS XS --outSAMunmapped Within --outSAMtype BAM SortedByCoordinate --outSAMheaderHD @HD VN:1.4 --outSAMattrRGline ID:VanAllen2015_pat123_re-tumor-rna-SRR2779596 PU:m LB:lib SM:VanAllen2015_pat123_re-tumor-rna-SRR2779596 PL:ILLUMINA'.
  PID=9783 (last job)
Jul 06 21:29:00 ..... started STAR run
Jul 06 21:29:00 ..... loading genome
Jul 06 21:32:19 ..... started mapping
Jul 06 22:10:29 ..... started sorting BAM
terminate called recursively
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/usr/local/bin/common_funcs.sh: line 25:  9787 Aborted                 (core dumped) /soft/STAR-2.5.3a/bin/Linux_x86_64/STAR --runThreadN 8 --genomeLoad LoadAndKeep --limitBAMsortRAM 56998928790 --genomeDir /outputs/output/VanAllen2015_pat123_re-tumor-rna-SRR2779596_STAR100_step3_newIndexDir --readFilesIn /inputs/fastq1/pat123_re-tumor-rna-SRR2779596_1.fastq.gz /inputs/fastq2/pat123_re-tumor-rna-SRR2779596_2.fastq.gz --readFilesCommand gunzip -c --outFileNamePrefix /outputs/output/VanAllen2015_pat123_re-tumor-rna-SRR2779596_STAR100_step4/VanAllen2015_pat123_re-tumor-rna-SRR2779596. --outFilterMultimapScoreRange 1 --outFilterMultimapNmax 20 --outFilterMismatchNmax 10 --alignIntronMax 500000 --alignMatesGapMax 1000000 --sjdbScore 2 --alignSJDBoverhangMin 1 --outFilterMatchNminOverLread 0.33 --outFilterScoreMinOverLread 0.33 --sjdbOverhang 100 --outSAMstrandField intronMotif --outSAMattributes NH HI NM MD AS XS --outSAMunmapped Within --outSAMtype BAM SortedByCoordinate --outSAMheaderHD @HD VN:1.4 --outSAMattrRGline ID:VanAllen2015_pat123_re-tumor-rna-SRR2779596 PU:m LB:lib SM:VanAllen2015_pat123_re-tumor-rna-SRR2779596 PL:ILLUMINA 

Here are my logs. The error is in the step4_*

step2_pat123_re-tumor-rna-SRR2779596.Log.final.out.txt
step2_pat123_re-tumor-rna-SRR2779596.Log.out.txt
step2_pat123_re-tumor-rna-SRR2779596.Log.progress.out.txt
step3_pat123_re-tumor-rna-SRR2779596.Log.out.txt
step4_pat123_re-tumor-rna-SRR2779596.Log.out.txt

@alexdobin
Copy link
Owner

Hi Sergey,

my suspicion is that your FASTQ files were derived from coordinate-sorted BAM files.
This would explain why STAR needs so much RAM for sorting - it decides on the genomic bin size for sorting based on the first 100000 reads.

If this is the case, I think you would have to sort it with samtools. Another option is to shuffle the reads in the FASTQ (or BAM before conversion to FASTQ.

Cheers
Alex

@serge2016
Copy link
Author

@alexdobin, I think it should be mentioned in the manual! And in the description of --limitBAMsortRAM option too.
How can I ask STAR (or any other tool) to print the necessary amount of RAM quickly? I mean not during the alignment, but using absolutely separate command.
I want to check this before alignment and to run STAR with or without BAM sorting depending on this.

if (maxMem>P->limitBAMsortRAM) {

@alexdobin
Copy link
Owner

Hi Sergey,

to estimate the amount of RAM needed for sorting, STAR would basically need to map all the reads.
However, predominantly the RAM overflow is caused by reads sorted by alignment coordinate - this will happen if the reads are extracted from sorted BAM files.

The way to check whether the reads are sorted is to map the first ~100,000 reads (--readMapNumber 100000) with default parameters, and check which chromosomes are present in the SAM output, i.e.
$ cat Aligned.out.sam | grep -v "^@" | cut -f3 | sort | uniq -c
If one chromosome is dominant, the reads are likely to be sorted and STAR sorting should not be used.

Cheers
Alex

@ayyasa
Copy link

ayyasa commented Aug 3, 2018

Hi Alex, I have the same issue. If I would like to increase RAM size, how can I do it? I read few other forums but I couldn't find an answer.

I have attached my Log.out here for your reference.
Log.out.txt

Thanks
Archana

@alexdobin
Copy link
Owner

Hi Archana,

the problem in your run is not RAM related, it's likely that your disk does not have enough space to write the temp files. Also, please remove the _STARtmp directory from the STAR run directory before re-running this job.

Cheers
Alex

@oa5is
Copy link

oa5is commented Apr 25, 2022

Hi Alex!@@alexdobin
I'm new to data processing. I'm facing the same problem. My question is do I really need to re-run the whole 2nd alignment from the beginning? It took several days for alignment and was stuck in the last sorting step. If I re-run it, will it restart all alignment? Can I use samtools to sort the BAM files instead of re-running?

Thanks
Jing

@alexdobin
Copy link
Owner

Hi Jing,

using samtools sort is a good solution.
But you will need to re-run STAR without BAM sorting option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants