Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STARsolo 2.7.3a and 2.7.4a segmentation fault (core dumped). #936

Closed
J-Burgess opened this issue Jun 9, 2020 · 5 comments
Closed

STARsolo 2.7.3a and 2.7.4a segmentation fault (core dumped). #936

J-Burgess opened this issue Jun 9, 2020 · 5 comments
Labels

Comments

@J-Burgess
Copy link

Hi Alex, I hope you are well!

I am attempting to create a Snakemake pipeline to benchmark scRNA cell type annotations. STARsolo is the aligner I am using for the 10x Chromium V3/V2 paired end input FASTQ files. When aligning to the Hg19 human genome everything worked perfectly! However, the paper I am trying to benchmark against used reference genome Hg38. One of the V3 datasets that successfully completed before is now causing a segmentation fault (core dumped) error at the started mapping stage when using Hg38.
Links to reference files:
genome: ftp://ftp.ensembl.org/pub/release-100/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz genes: ftp://ftp.ensembl.org/pub/release-100/gtf/homo_sapiens/Homo_sapiens.GRCh38.100.gtf.gz

Generating genome indices command:
"STAR --runMode genomeGenerate --runThreadN '{threads}' --sjdbGTFfile '{input.ref_gtf}' --genomeDir '{params.outdir}' " "--genomeFastaFiles '{input.ref_genome}' --limitGenomeGenerateRAM 31000000000 --genomeSAsparseD 3 --genomeSAindexNbases 14 " "--genomeChrBinNbits 18 --outFileNamePrefix '{params.outdir}_' --outTmpDir '{params.tmp_dir}'"
Running STARsolo command:
"STAR --genomeDir '{params.index_dir}' --sjdbGTFfile '{input.ref_gtf}' --readFilesIn '{input.cDNA_reads}' '{input.barcode_reads}' " "--runThreadN '{threads}' --twopassMode Basic --outWigType bedGraph --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 30000000000 " "--readFilesCommand zcat --runDirPerm '{params.run_dir_perm}' --outFileNamePrefix '{params.outdir}_' --soloType Droplet --soloCBwhitelist '{params.bc_whitelist}' " "--soloUMIlen '{params.UMI_len}' --outTmpDir '{params.tmp_dir}' --soloFeatures Gene"

STAR_index_Hg38_Log.out.txt
SRR10587810_Log.out.txt

Please find attached the logs for 2.7.3a as I know this version was successful for Hg19. I have tried both versions of 2.7.3a and 2.7.4a (after genome index re-generation) and both result in the same error at the same point. The machine I ran on initially had 8 cores and 31Gb of RAM. I have tried on the same machine but left 3 cores free but no luck. Additionally, I have ran on a 16 core machine with 64Gb RAM and still got the same error. I have ~90gb free space on the machine after genome index generation and the dataset is 13Gb in total so I do not think it could be caused by insufficient space to write temp files?

An example of the error output:
Jun 08 22:03:03 ..... inserting junctions into the genome indices Jun 08 22:04:15 ..... started mapping /bin/bash: line 1: 2419 Segmentation fault (core dumped) STAR --genomeDir '/home/ubuntu/workspace/scRNA-seq-benchmarking/Snakemake/Snakemake-scRNAseq-Output/STAR_indices/STAR_index_Hg38' --sjdbGTFfile '/home/ubuntu/workspace/scRNA-seq-benchmarking/Snakemake/Snakemake-scRNAseq-Output/reference_files/Hg38/Hg38_gtf.gtf' --readFilesIn '/home/ubuntu/workspace/scRNA-seq-benchmarking/Snakemake/Snakemake-scRNAseq-Output/SRR10587810/fastq_merged_lanes/SRR10587810_R2_merged.fastq.gz' '/home/ubuntu/workspace/scRNA-seq-benchmarking/Snakemake/Snakemake-scRNAseq-Output/SRR10587810/fastq_merged_lanes/SRR10587810_R1_merged.fastq.gz' --runThreadN '14' --twopassMode Basic --outWigType bedGraph --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --runDirPerm 'All_RWX' --outFileNamePrefix '/home/ubuntu/workspace/scRNA-seq-benchmarking/Snakemake/Snakemake-scRNAseq-Output/SRR10587810/STAR/STARsolo_output/SRR10587810_' --soloType Droplet --soloCBwhitelist '/home/ubuntu/workspace/scRNA-seq-benchmarking/Snakemake/Snakemake-scRNAseq-Output/reference_files/barcode_whitelists/V3_whitelist.txt' --soloUMIlen '12' --outTmpDir '/home/ubuntu/workspace/scRNA-seq-benchmarking/Snakemake/Snakemake-scRNAseq-Output/SRR10587810/STAR/STARsolo_output/tmp' --soloCellFilter None --soloFeatures Gene

Thanks in advance!
Kind regards,
James

@alexdobin alexdobin added the bug label Jun 12, 2020
@alexdobin
Copy link
Owner

Hi James,

this looks like a bug. Could you run the same example without the --twopassMode Basic, to see if it's the 2nd pass that causes the problem?

Cheers
Alex

@J-Burgess
Copy link
Author

J-Burgess commented Jun 12, 2020 via email

@alexdobin
Copy link
Owner

Hi James,

I just checked that empty whitelist indeed causes a seg-fault. Hopefully, this will resolve the problem. I will patch it to exit with error for in this case.

Cheers
Alex

@J-Burgess
Copy link
Author

Hi Alex, indeed fixing the barcodes text file solved the issue! Thanks!

Kind regards,
James

alexdobin added a commit that referenced this issue Jun 16, 2020
…allowing for multiple adapters (e.g. ddSeq). SJ.out.tab is sym-linked as features.tsv for Solo SJ output. Issue #882: 3rd field is now optional in Solo Gene features.tsv with --soloOutFormatFeaturesGeneField3. Issue #936: Throw an error if an empty whitelist is provided to STARsolo.
@alexdobin
Copy link
Owner

Hi James,

I added the check for empty whitelist in the 2.7.5a, now it will throw an error.

Cheers
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants