Understanding excessive memory usage #154

ernfrid · 2018-10-12T15:19:14Z

I've been running a large number of ~ 30X human CRAMs (aligned to GRCh38DH) through Manta for germline SV detection. I've typically been providing 16GB of RAM and 8 cores. For the majority of samples, this is sufficient, but for a small subset I need additional RAM and I've recently encountered a sample that utilized ~70GB of RAM to complete.

Unfortunately, I will be unable to share the CRAM in question. Is there any part of the output or log files that might shed light on the source of this excess usage? In particular, I'm curious if there is a particular region or contig that I might exclude from my analysis to reduce the usage (or use to predict excess usage in advance).

ctsa · 2018-10-19T19:52:13Z

Thanks @ernfrid - Sorry for the delay. In general, we haven't had problems with high memory usage, the main area we've had to deal with this is for certain FFPE samples going into the tumor-normal somatic analysis with chimeric read-pair rates above 10%, but even in these cases, it was a question of whether the graph merge (which has always been the point of peak memory demand form in-house observations) would exceed 2Gb -- so if you're observing a problem approaching 70 Gb this is likely to be a different type of issue.

A few questions which might us make suggestions:

What manta version are you using?
Is the problem CRAM-specific? For, say, the worst (70 Gb) case, can you convert it to BAM and rerun?
Is it clear at which stage of the SV caller the problem is occurring? Is it early during the inital graph build or later in the process during SV-calling?

I would also note that the statistics files output by manta, summarized here:

https://github.com/Illumina/manta/tree/master/docs/userGuide#statistics

....could be helpful. These stats are primarily focused on runtime rather than memory debugging, but if you could share these files for a problematic case we might be able to spot something helpful.

ernfrid · 2018-10-22T17:23:42Z

No trouble on the delay.

If I limit the memory to ~12 GB and a single core, one of the subjobs running GenerateSVCandidates crashes. When running the same sample through with only a single thread and excess RAM, I saw peak usage of ~14GB so it doesn't seem to be a particular subjob using huge amounts of RAM by itself. I've attached the stats directory (with paths redacted) for the 8 core run that consumed 70GB.

stats.tar.gz

The manta version is v1.4.0

I haven't attempted the CRAM -> BAM conversion and run, but I will give that a try.

ernfrid · 2018-10-23T02:45:48Z

Changing the input to a BAM instead of a CRAM dropped the memory usage precipitously (to the point I almost don't believe it) to 1.1GB.

ctsa · 2018-10-23T21:21:43Z

Either a leak in htsilib or our API usage... Possibly this one?

samtools/htslib#679

This fix appears in htslib 1.8. I will go ahead and see if the update to 1.9 is easy, and send you a branch link to see if we can confirm the issue.

ernfrid · 2018-10-23T21:48:14Z

That looks promising. I'd be happy to test it if you're able to point me to a branch.

This update is motivated by a possible CRAM-specfic memory issue suggested by #154

ctsa · 2018-10-23T22:08:38Z

If you are setup to build manta from source, the branch is here:

https://github.com/Illumina/manta/tree/feature-MANTA-1483

ernfrid · 2018-10-24T14:23:11Z

Compiled and ran this branch. The samtools in libexec does report as v1.9.

Memory usage was ~72GB as reported by LSF. Seems like this didn't change the memory usage.

ctsa · 2018-10-25T03:37:04Z

Okay, perhaps we can reproduce the problem with an in-house TN sample? I would like to recreate your CRAM file conditions -- can you describe what command-line/tool versions/reference is being used to create these?

ernfrid · 2018-10-31T11:12:08Z

Sorry about the delayed response @ctsa

This CRAM was created with a pipeline conforming to the guidelines here: https://github.com/CCDG/Pipeline-Standardization/blob/master/PipelineStandard.md

See also the pre-print here: https://www.biorxiv.org/content/early/2018/04/10/269316

This is not my pipeline, but it looks like Picard MarkDuplicates (v2.6.0) is being used. From the header I see:

ASSUME_SORT_ORDER=queryname TMP_DIR=[/tmp/] QUIET=true COMPRESSION_LEVEL=0    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO VALIDATION_STRINGENCY=STRICT MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.jso

I also see SAMBLASTER entries

ID:SAMBLASTER	CL:samblaster -i stdin -o stdout --acceptDupMarks --addMateTags	VN:0.1.24

GATK PrintReads

@PG	ID:GATK PrintReads	VN:3.6-0-g89b7209	CL:readGroup=null platform=null number=-1 sample_file=[] sample_name=[] simplify=false no_pg_tag=false

There should also be base recalibration, but I see no @PG entries and can't comment on those.

ernfrid · 2018-11-07T19:53:37Z

I think I may have discovered the issue here. The CRAM(s) in question appear to be missing the @HD line of the header. At least in one case, adding in this line (as @HD\tVN:1.5\tSO:coordinate seems to resolve the issue. Does that make sense from a developer perspective?

ernfrid · 2018-11-17T17:02:24Z

@ctsa - I'm going to resolve this since I'm fairly certain that the missing header line is the source of my issues and since converting to BAM or adding the header line remedies the problem.

I did go back and look at the logs and didn't see an indication that the missing line is triggering sorting (or some other issue). If there is a sort step in this case, as I am guessing, it would be wonderful if there was feedback to the person who was running Manta that sorting was occurring. My apologies if this is in there and I'm simply missing it or if I'm wildly off track on what is going on here.

ctsa · 2018-11-18T19:50:30Z

Hi Dave,

Sorry to lose track of this one. The effect of the @HD would have to be coming from somewhere in htslib itself, as Manta doesn't look at this detail in the header nor communicate anything about sorting status of the cram file -- it is assumed that the required index could not have been created if the file were not sorted to begin with, and an out of order read traversal should just trigger an assertion.

ctsa · 2018-11-18T19:57:46Z

Perhaps @jkbonfield could put some htslib developer perspective on this? As described in the above comments, when Manta uses htslib to make a large set of indexed genome region requests (similar to the calls that would be made by samtools view at the API level) all over the genome the observed memory usage is normal (ie. fairly low) when the indexed read requests are coming from a BAM file or CRAM file with an @HD coordinate sorting tag as described here:

#154 (comment)

But for the CRAM file without that header line, the memory usage increases drastically. Do you have any notion of what could be triggering this?

jkbonfield · 2018-11-19T09:33:09Z

I don't understand why a coordinate sorted file that misses the coordinate sorted tag would make a difference, at least not from an htslib perspective.

There is some code to check sort order in relation to the CRAM "AP_delta" field. This is set to true by the encoder when writing out sorted data (as 10, 12, 15, 16, 16, 18, ... is smaller when written as 10, 2, 3, 1, 0, 3) and to false when the data is unsorted.

The decoder appears to double check this in as far as AP_delta==0 and no HD SO:coord will enter "unsorted" mode, meaning it caches the reference differently because it assumes we'll keep spamming through all references in a random order. (Incidentally if doing writing aligned data not coordinate sorted, it's probably best to simply encode using referenceless mode; "samtools view -O cram,no_ref")

I'm not familiar with manta, but is this high memory usage being seen when doing "samtools view -c -@8" to simply count the records in a CRAM file? What about when giving it many many regions to iterate through? It's possible we have another memory leak somewhere related to sort order, so I'll have a look.

You may find the heap profiling of tcmalloc a good place to start for diagnosing where most of your memory is being spent.

https://gperftools.github.io/gperftools/heapprofile.html

ernfrid · 2018-11-19T14:53:16Z

Thanks @ctsa and @jkbonfield. Unless you've independently replicated, let me triple check this behavior before proceeding and investigating some of James' suggestions.

ernfrid · 2018-11-19T18:56:33Z

Ok @ctsa and @jkbonfield. I've verified this again by reheadering to remove the @HD line. Currently I'm seeing 44GiB of RAM being used by Manta when the CRAM doesn't have the @HD line and only 1.1GiB when it does. I expect these will peak higher, and show usage close to what I'd reported above, once the processes finish. I'm feeling pretty confident I am not leading you astray.

I also tried @jkbonfield's samtools view -c test and see nearly identical RAM utilization. I haven't yet tried iterating through many regions.

ctsa · 2018-11-19T19:44:30Z

@jkbonfield per your question about Manta's access pattern: The problematic memory usage seems to be coming up in manta's second 'SV generation' phase, wherein each process will be cycling through a large number of SV breakpoint locations and requesting access for a window of several hundred bases surrounding each candidate breakend. This access pattern is quite different from the small number of large regions typically requested in many other variant calling activities, and thus Manta's SV generation phase has triggered some CRAM issues before:

samtools/htslib#654
samtools/samtools#455

ernfrid · 2018-11-19T20:29:27Z

So without the header, I see peak usage of 70.14 GiB and with the header present I see 5.47 GiB.

jkbonfield · 2018-11-19T23:48:25Z

I also tried @jkbonfield's samtools view -c test and see nearly identical RAM utilization. I haven't yet tried iterating through many regions.

Do you mean that samtools view -c is also using 40+GB of RAM? If so then it's definitely something in our code causing this high memory usage! Odd and a cause for concern. It does sound rather like a memory leak.

Is this tied to using multi-threaded mode? Ie with no threads (no -@ option) does it also gradually climb in memory?

ernfrid · 2018-11-20T00:55:16Z

Doh. That was poorly worded. I see only a slight difference in the rather modest memory usage of samtools view. It’s about 0.8GiB in both cases.

jkbonfield · 2018-11-20T09:33:29Z

Thanks for the clarification.

What htslib calls does Manta make? Is it a pure open and stream through the data set work-flow? Is it opening and closing many times? Opening once, but doing lots of seeks? Lots of iterator creation / destruction?

It's possible we still have a memory leak within htslib, but I'd suggest to start with using a memory leak checking tool (tcmalloc, valgrind, clang -fsanitize=address, etc first to see if it shows up where the source is.

jkbonfield · 2018-11-21T14:37:14Z

To double check, I just tried samtools view on a local CRAM file with and without the @HD header with 20 ranges, using valgrind, and it found no leaks. The total number of memory allocations and usage was identical too.

ernfrid · 2018-12-04T19:52:37Z

I'm afraid I'm not going to have time to figure out how to run Manta with valgrind to perform a similar analysis as @jkbonfield. @ctsa - I'm going to leave this open, but I've probably taken it as far as I am able to.

ctsa · 2018-12-05T20:17:14Z

Thanks for the updates and analysis @ernfrid & @jkbonfield , unfortunately no one on our team is able to prioritize the issue either as we don't have a CRAM production support requirement.

I agree this is still an important issue we'd like to address and should remain open. Some notes for whomever is able to pick this up next:

A good shot at debugging this would be to create an AddressSanitizer build of manta. This can be done by following the instructions in the developer guide here: https://github.com/Illumina/manta/blob/master/docs/developerGuide/README.md#general-debugging-address-sanitizer ...AND ALSO uncomment the following lines in the cmake to extend the address-sanitizer settings into htslib:

manta/redist/CMakeLists.txt

Line 77 in a922b74

    
           #set(HTSLIB_CFLAGS '-O0 -g -fsanitize=address -fno-omit-frame-pointer -fno-optimize-sibling-calls')

manta/redist/CMakeLists.txt

Line 80 in a922b74

#set(HTSLIB_CONFIGOPT CFLAGS=${HTSLIB_CFLAGS} LDFLAGS=${HTSLIB_CFLAGS})

Given a Manta build with both manta and htslib build with ASan settings, define the following environment variable before running manta to ensure LeakSanitizer is enabled (I think it is enabled by default anyway, but might as well be sure...).

ASAN_OPTIONS=detect_leaks=1

More details on LeakSanitizer options can be found here:

https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer

This update is motivated by a possible CRAM-specfic memory issue suggested by #154

ctsa self-assigned this Oct 19, 2018

ctsa added a commit that referenced this issue Oct 23, 2018

MANTA-1483 Update htslib/samtools to 1.9

c74c262

This update is motivated by a possible CRAM-specfic memory issue suggested by #154

ernfrid closed this as completed Nov 17, 2018

ernfrid reopened this Nov 19, 2018

ctsa added a commit that referenced this issue Jun 24, 2019

MANTA-1483 Update htslib/samtools to 1.9

62120a3

This update is motivated by a possible CRAM-specfic memory issue suggested by #154

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding excessive memory usage #154

Understanding excessive memory usage #154

ernfrid commented Oct 12, 2018

ctsa commented Oct 19, 2018

ernfrid commented Oct 22, 2018

ernfrid commented Oct 23, 2018

ctsa commented Oct 23, 2018

ernfrid commented Oct 23, 2018

ctsa commented Oct 23, 2018

ernfrid commented Oct 24, 2018

ctsa commented Oct 25, 2018

ernfrid commented Oct 31, 2018 •

edited

Loading

ernfrid commented Nov 7, 2018 •

edited

Loading

ernfrid commented Nov 17, 2018

ctsa commented Nov 18, 2018

ctsa commented Nov 18, 2018

jkbonfield commented Nov 19, 2018

ernfrid commented Nov 19, 2018

ernfrid commented Nov 19, 2018 •

edited

Loading

ctsa commented Nov 19, 2018

ernfrid commented Nov 19, 2018

jkbonfield commented Nov 19, 2018

ernfrid commented Nov 20, 2018 via email •

edited

Loading

jkbonfield commented Nov 20, 2018

jkbonfield commented Nov 21, 2018

ernfrid commented Dec 4, 2018

ctsa commented Dec 5, 2018

Understanding excessive memory usage #154

Understanding excessive memory usage #154

Comments

ernfrid commented Oct 12, 2018

ctsa commented Oct 19, 2018

ernfrid commented Oct 22, 2018

ernfrid commented Oct 23, 2018

ctsa commented Oct 23, 2018

ernfrid commented Oct 23, 2018

ctsa commented Oct 23, 2018

ernfrid commented Oct 24, 2018

ctsa commented Oct 25, 2018

ernfrid commented Oct 31, 2018 • edited Loading

ernfrid commented Nov 7, 2018 • edited Loading

ernfrid commented Nov 17, 2018

ctsa commented Nov 18, 2018

ctsa commented Nov 18, 2018

jkbonfield commented Nov 19, 2018

ernfrid commented Nov 19, 2018

ernfrid commented Nov 19, 2018 • edited Loading

ctsa commented Nov 19, 2018

ernfrid commented Nov 19, 2018

jkbonfield commented Nov 19, 2018

ernfrid commented Nov 20, 2018 via email • edited Loading

jkbonfield commented Nov 20, 2018

jkbonfield commented Nov 21, 2018

ernfrid commented Dec 4, 2018

ctsa commented Dec 5, 2018

ernfrid commented Oct 31, 2018 •

edited

Loading

ernfrid commented Nov 7, 2018 •

edited

Loading

ernfrid commented Nov 19, 2018 •

edited

Loading

ernfrid commented Nov 20, 2018 via email •

edited

Loading