Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted #37

Open
AhmedArslan opened this issue Feb 26, 2019 · 11 comments

Comments

@AhmedArslan
Copy link

I am trying to run demuxlet on a big vcf file (~500Gb); but I am getting the following error that can be due to the memory running out the error (?), can you please suggest how can I get rid of it?

Here is my commend:
/home/demuxlet --sam /home/possorted_genome_bam.bam --vcf /home/picard_sorted.vcf --field GT --min-mac 10 --min-uniq 4 --out mut

Here is the error:
NOTICE [2019/02/25 17:12:47] - Reading 522000000 reads at Y:2865101 and skipping 292213112
NOTICE [2019/02/25 17:13:27] - WARNING: Suppressed a total of 217241 UMI warnings...
NOTICE [2019/02/25 17:13:27] - WARNING: Suppressed a total of 8688106 droplet/cell barcode warnings...
NOTICE [2019/02/25 17:13:27] - Finished reading 13 markers from the VCF file
NOTICE [2019/02/25 17:13:27] - Total number input reads : 547992973
NOTICE [2019/02/25 17:13:27] - Total number valid droplets observed : 631538
NOTICE [2019/02/25 17:13:27] - Total number valid SNPs observed : 13
NOTICE [2019/02/25 17:13:27] - Total number of read-QC-passed reads : 230131249
NOTICE [2019/02/25 17:13:27] - Total number of skipped reads with ignored barcodes : 0
NOTICE [2019/02/25 17:13:27] - Total number of non-skipped reads with considered barcodes : 230056564
NOTICE [2019/02/25 17:13:27] - Total number of gapped/noninformative reads : 230013896
NOTICE [2019/02/25 17:13:27] - Total number of base-QC-failed reads : 706
NOTICE [2019/02/25 17:13:27] - Total number of redundant reads : 1213
NOTICE [2019/02/25 17:13:27] - Total number of pass-filtered reads : 40749
NOTICE [2019/02/25 17:13:27] - Total number of pass-filtered reads overlapping with multiple SNPs : 16738
NOTICE [2019/02/25 17:13:27] - Starting to prune out cells with too few reads...
NOTICE [2019/02/25 17:13:27] - Finishing pruning out 0 cells with too few reads...
NOTICE [2019/02/25 17:13:27] - Starting to identify best matching individual IDs
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted

@hyunminkang
Copy link
Contributor

hyunminkang commented Feb 28, 2019 via email

@AhmedArslan
Copy link
Author

tens of individuals? please explain.

@ONeillMB1
Copy link

I got the same error when I tried to run demuxlet with a VCF of 14 individuals (5726916 variants, 515M) and my 10X data, even when requesting up to 40Gb RAM and only analyzing 10 barcodes. When I reduced the search space to 8 individuals (the true number of multiplexed individuals) using the --sm-list flag, the program exits normally, successfully writing the output files. Thus it seems that the memory bottleneck has something to do with the number of potential doublets. Is this correct?

@royoelen
Copy link

I am getting the same error 'std::bad_alloc':

NOTICE [2019/07/14 01:37:54] - Finished reading 7329072 markers from the VCF file
NOTICE [2019/07/14 01:37:54] - Total number input reads : 371052235
NOTICE [2019/07/14 01:37:54] - Total number valid droplets observed : 6741
NOTICE [2019/07/14 01:37:54] - Total number valid SNPs observed : 7329072
NOTICE [2019/07/14 01:37:54] - Total number of read-QC-passed reads : 144881215
NOTICE [2019/07/14 01:37:54] - Total number of skipped reads with ignored barcodes : 27217573
NOTICE [2019/07/14 01:37:54] - Total number of non-skipped reads with considered barcodes : 110362916
NOTICE [2019/07/14 01:37:54] - Total number of gapped/noninformative reads : 82026940
NOTICE [2019/07/14 01:37:54] - Total number of base-QC-failed reads : 793521
NOTICE [2019/07/14 01:37:54] - Total number of redundant reads : 20150764
NOTICE [2019/07/14 01:37:54] - Total number of pass-filtered reads : 7391691
NOTICE [2019/07/14 01:37:54] - Total number of pass-filtered reads overlapping with multiple SNPs : 1241138
NOTICE [2019/07/14 01:37:54] - Starting to prune out cells with too few reads...
NOTICE [2019/07/14 01:37:54] - Finishing pruning out 0 cells with too few reads...
NOTICE [2019/07/14 01:37:55] - Starting to identify best matching individual IDs
NOTICE [2019/07/14 01:38:01] - Identifying best-matching individual..
NOTICE [2019/07/14 01:38:01] - Finished processing 6741 droplets total
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
/var/spool/slurmd/job5656459/slurm_script: line 22: 4431 Aborted (core dumped)

The .single files seems to be complete, as there is a chance for every barcode. The .sing2 and .best files however, are empty.

I have encountered this for three runs with different files now. I allocated 64GB of RAM to the process, the .sam input was ~23GB in size and the input .vcf file was ~2GB in size

demuxlanes.err.txt

@hyunminkang
Copy link
Contributor

hyunminkang commented Jul 15, 2019 via email

@royoelen
Copy link

Try to limit the number of SNPs to common exonic SNPs or increase the memory. It happens because of insufficient SNPs

Do you perhaps mean 'insufficient memory' in that last sentence? I do not understand how limiting to exonic SNPs or increasing memory would lead to more SNPs.

@arogozhnikov
Copy link

arogozhnikov commented Aug 26, 2019

I have a different error text, but with the same meaning

libc++abi.dylib: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc

I am running with many samples (dozens), and that's exactly what I want to do.
It also produces .single but nothing else (files are empty as others mentioned), so seems that memory issue is during doublets postprob estimation, which doesn't sound reasonable.

(For my purposes it would be sufficient to run without doublets estimation at all, so if there was an option to switch that off, that would be great. I understand that's not how software meant to be used)

@hyunminkang
Copy link
Contributor

hyunminkang commented Aug 26, 2019 via email

@arogozhnikov
Copy link

arogozhnikov commented Aug 26, 2019

~10k cells, ~50 samples (yes, much), ~500k SNPs in my case, memory is ~32 Gb.
(and it worked with ~10k SNPs flawlessly)

@hyunminkang
Copy link
Contributor

hyunminkang commented Aug 26, 2019 via email

@santiagorevale
Copy link

Hi there,

I'm having the same error:

NOTICE [2019/10/21 16:55:00] - Processing 286000 droplets...
NOTICE [2019/10/21 16:55:00] - Processing 287000 droplets...
NOTICE [2019/10/21 16:55:00] - Finished processing 287089 droplets total
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

My data is as follows:

  • 13 GB BAM file
  • 3000 cells
  • 18 multiplexed individuals
  • genotype VCF file for 56 individuals with 1.9Mi SNPs (154 MB)

I tried running it using 64, 128, 384, 1024 Gb of RAM, always with the same outcome. OS is CentOS 7. demuxlet version is from November 2018.

I tried to get some memory profiling, so I ran it using the following command:

valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out ./test.sh; grep mem_heap_B massif.out | sed -e 's/mem_heap_B=\(.*\)/\1/' | sort -g | tail -n 1

massif.txt

I'm attaching the output file to see if it makes sense to you.

I also tried running in with /usr/bin/time -v and this was the output:

Command terminated by signal 6
	Command being timed: "/apps/htseq/demuxlet/bin/centos/demuxlet --sam /well/singlecell/P190381/10X-count.hg19/726070_GX06/outs/possorted_genome_bam.bam --vcf /well/singlecell/P190381/RNAseq-genotyping/variant_calling/rna-germline-variant-calling/variant_filtered.merged.sorted.vcf.gz --field GT --geno-error 0.0001 --out 726070_GX06-0.0001"
	User time (seconds): 775.59
	System time (seconds): 4.43
	Percent of CPU this job got: 97%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 13:18.85
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 700628
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 1623385
	Voluntary context switches: 562
	Involuntary context switches: 1381
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Any help on why this is happening and how to sorted out would be really appreciated.

Let me know if you need any additional information.

Thanks in advance.

Cheers,
Santiago

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants