-
Notifications
You must be signed in to change notification settings - Fork 152
Understanding excessive memory usage #154
Comments
Thanks @ernfrid - Sorry for the delay. In general, we haven't had problems with high memory usage, the main area we've had to deal with this is for certain FFPE samples going into the tumor-normal somatic analysis with chimeric read-pair rates above 10%, but even in these cases, it was a question of whether the graph merge (which has always been the point of peak memory demand form in-house observations) would exceed 2Gb -- so if you're observing a problem approaching 70 Gb this is likely to be a different type of issue. A few questions which might us make suggestions:
I would also note that the statistics files output by manta, summarized here: https://github.com/Illumina/manta/tree/master/docs/userGuide#statistics ....could be helpful. These stats are primarily focused on runtime rather than memory debugging, but if you could share these files for a problematic case we might be able to spot something helpful. |
No trouble on the delay. If I limit the memory to ~12 GB and a single core, one of the subjobs running GenerateSVCandidates crashes. When running the same sample through with only a single thread and excess RAM, I saw peak usage of ~14GB so it doesn't seem to be a particular subjob using huge amounts of RAM by itself. I've attached the stats directory (with paths redacted) for the 8 core run that consumed 70GB. The manta version is v1.4.0 I haven't attempted the CRAM -> BAM conversion and run, but I will give that a try. |
Changing the input to a BAM instead of a CRAM dropped the memory usage precipitously (to the point I almost don't believe it) to 1.1GB. |
Either a leak in htsilib or our API usage... Possibly this one? This fix appears in htslib 1.8. I will go ahead and see if the update to 1.9 is easy, and send you a branch link to see if we can confirm the issue. |
That looks promising. I'd be happy to test it if you're able to point me to a branch. |
This update is motivated by a possible CRAM-specfic memory issue suggested by #154
If you are setup to build manta from source, the branch is here: |
Compiled and ran this branch. The samtools in libexec does report as v1.9. Memory usage was ~72GB as reported by LSF. Seems like this didn't change the memory usage. |
Okay, perhaps we can reproduce the problem with an in-house TN sample? I would like to recreate your CRAM file conditions -- can you describe what command-line/tool versions/reference is being used to create these? |
Sorry about the delayed response @ctsa This CRAM was created with a pipeline conforming to the guidelines here: https://github.com/CCDG/Pipeline-Standardization/blob/master/PipelineStandard.md See also the pre-print here: https://www.biorxiv.org/content/early/2018/04/10/269316 This is not my pipeline, but it looks like Picard MarkDuplicates (v2.6.0) is being used. From the header I see:
I also see SAMBLASTER entries
GATK PrintReads
There should also be base recalibration, but I see no |
I think I may have discovered the issue here. The CRAM(s) in question appear to be missing the |
@ctsa - I'm going to resolve this since I'm fairly certain that the missing header line is the source of my issues and since converting to BAM or adding the header line remedies the problem. I did go back and look at the logs and didn't see an indication that the missing line is triggering sorting (or some other issue). If there is a sort step in this case, as I am guessing, it would be wonderful if there was feedback to the person who was running Manta that sorting was occurring. My apologies if this is in there and I'm simply missing it or if I'm wildly off track on what is going on here. |
Hi Dave, Sorry to lose track of this one. The effect of the |
Perhaps @jkbonfield could put some htslib developer perspective on this? As described in the above comments, when Manta uses htslib to make a large set of indexed genome region requests (similar to the calls that would be made by samtools view at the API level) all over the genome the observed memory usage is normal (ie. fairly low) when the indexed read requests are coming from a BAM file or CRAM file with an But for the CRAM file without that header line, the memory usage increases drastically. Do you have any notion of what could be triggering this? |
I don't understand why a coordinate sorted file that misses the coordinate sorted tag would make a difference, at least not from an htslib perspective. There is some code to check sort order in relation to the CRAM "AP_delta" field. This is set to true by the encoder when writing out sorted data (as 10, 12, 15, 16, 16, 18, ... is smaller when written as 10, 2, 3, 1, 0, 3) and to false when the data is unsorted. The decoder appears to double check this in as far as AP_delta==0 and no HD SO:coord will enter "unsorted" mode, meaning it caches the reference differently because it assumes we'll keep spamming through all references in a random order. (Incidentally if doing writing aligned data not coordinate sorted, it's probably best to simply encode using referenceless mode; "samtools view -O cram,no_ref") I'm not familiar with manta, but is this high memory usage being seen when doing "samtools view -c -@8" to simply count the records in a CRAM file? What about when giving it many many regions to iterate through? It's possible we have another memory leak somewhere related to sort order, so I'll have a look. You may find the heap profiling of tcmalloc a good place to start for diagnosing where most of your memory is being spent. |
Thanks @ctsa and @jkbonfield. Unless you've independently replicated, let me triple check this behavior before proceeding and investigating some of James' suggestions. |
Ok @ctsa and @jkbonfield. I've verified this again by reheadering to remove the I also tried @jkbonfield's |
@jkbonfield per your question about Manta's access pattern: The problematic memory usage seems to be coming up in manta's second 'SV generation' phase, wherein each process will be cycling through a large number of SV breakpoint locations and requesting access for a window of several hundred bases surrounding each candidate breakend. This access pattern is quite different from the small number of large regions typically requested in many other variant calling activities, and thus Manta's SV generation phase has triggered some CRAM issues before: |
So without the header, I see peak usage of 70.14 GiB and with the header present I see 5.47 GiB. |
Do you mean that samtools view -c is also using 40+GB of RAM? If so then it's definitely something in our code causing this high memory usage! Odd and a cause for concern. It does sound rather like a memory leak. Is this tied to using multi-threaded mode? Ie with no threads (no -@ option) does it also gradually climb in memory? |
Doh. That was poorly worded. I see only a slight difference in the rather modest memory usage of samtools view. It’s about 0.8GiB in both cases.
|
Thanks for the clarification. What htslib calls does Manta make? Is it a pure open and stream through the data set work-flow? Is it opening and closing many times? Opening once, but doing lots of seeks? Lots of iterator creation / destruction? It's possible we still have a memory leak within htslib, but I'd suggest to start with using a memory leak checking tool (tcmalloc, valgrind, clang -fsanitize=address, etc first to see if it shows up where the source is. |
To double check, I just tried samtools view on a local CRAM file with and without the |
I'm afraid I'm not going to have time to figure out how to run Manta with valgrind to perform a similar analysis as @jkbonfield. @ctsa - I'm going to leave this open, but I've probably taken it as far as I am able to. |
Thanks for the updates and analysis @ernfrid & @jkbonfield , unfortunately no one on our team is able to prioritize the issue either as we don't have a CRAM production support requirement. I agree this is still an important issue we'd like to address and should remain open. Some notes for whomever is able to pick this up next:
Line 77 in a922b74
Line 80 in a922b74
More details on LeakSanitizer options can be found here: https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer |
This update is motivated by a possible CRAM-specfic memory issue suggested by #154
I've been running a large number of ~ 30X human CRAMs (aligned to GRCh38DH) through Manta for germline SV detection. I've typically been providing 16GB of RAM and 8 cores. For the majority of samples, this is sufficient, but for a small subset I need additional RAM and I've recently encountered a sample that utilized ~70GB of RAM to complete.
Unfortunately, I will be unable to share the CRAM in question. Is there any part of the output or log files that might shed light on the source of this excess usage? In particular, I'm curious if there is a particular region or contig that I might exclude from my analysis to reduce the usage (or use to predict excess usage in advance).
The text was updated successfully, but these errors were encountered: