Memory leak in moka 0.12 #329

Swatinem · 2023-09-28T09:27:23Z

I tried upgrading from moka 0.11 to 0.12, which lead to a memory leak.

As we use moka extensively in multiple places, both sync and async variants, its hard to pinpoint this to a very specific usage.

One indication that I have though is that the leak only occurred in our native processing pipeline, and not the JS processing pipeline.
The native pipeline uses moka in a couple more places than the JS pipeline, both sync and async.

So far I’m a bit at a loss how to pinpoint the exact leak, I will try to run this through a local stresstest to see if that also exhibits the leak we were observing in production.

tatsuya6502 · 2023-09-28T10:37:23Z

Thank you for reporting. I saw the pull request for symbolicator to revert the moka version to v0.11, and I started to look into this issue. I was thinking to run your stress tool by myself and modify it until I can reproduce the issue. I was reading the manuals for storing debug symbol files on S3, and getting minidump from a native app, but they could take some time to me to understand.

Anyway, to investigate the issue, we could use GlobalDebugCounters under unstable-debug-counters feature:

moka/Cargo.toml

Lines 43 to 46 in 13bc37f

    
           # This unstable feature adds `GlobalDebugCounters::current` function, which returns 
        
           # counters of internal object construction and destruction. It will have some 
        
           # performance impacts and is intended for debugging. 
        
           unstable-debug-counters = ["future"]

By periodically calling GlobalDebugCounters::current function, you will get the information in the table in #72 (comment) (except the allocator info, which was obtained by using jemalloc). By comparing creation count and drop count of each item, you/we will get better idea what is going on.

One of the issues with GlobalDebugCounters is that you cannot get counts for each individual cache instance (as the name implies), so it should be hard to identify which cache instance(s) is leaking.

I have not used GlobalDebugCounters for a while, so it can be broken. If so, please let me know.

tatsuya6502 · 2023-10-01T01:29:30Z

I presume the memory leak occurred due to stalled cache maintenance task processing in future::Cache. But I have not figured out why it was stalled.

When it is stalled, the followings will happen:

No entries will be evicted from the cache.
- The cache will get overcapacity and its internal hash table will continue growing as new key-values are inserted.
All insertion attempts to the cache should get async time out.

Details of 2:

The key-value will be inserted to the internal hash table of the cache.
- So other async tasks can get the value. (cache hits)
However, insert()/entry_by_ref().or_insert_with() will be blocked at the last step to send a WriteOp struct to the internal "bounded" channel (write_op_ch).
- As a result, the async task that is inserting will eventually get timed out.
When the time out occurred, the WriteOp will be sent to another internal "unbound" channel (interrupted_op_ch_snd), so that the cache can retry later.
- Improve async cancellation safety of future::Cache (v0.12.0-beta.X) #309
- Since this channel is unbound, it will continue growing.

tatsuya6502 · 2023-10-01T01:34:35Z

Yesterday, I tried Symbolicator's stress tool, but I did not reproduce the problem.

Built 4 × Mach-O debug info files (from a simple Rust program).
- Stored them to a symbol source: Tried Amazon S3 and local filesystem
- Did not create debug info files for system libraries, so Synbolicator will retry to download them.
Generated 4 × Minidumps for each debug info files. (Total 16 Minidumps)
- Set them to the workload config.
Configured Symbolicator's caches as the followings. (60 times shorter than the original config)

caches:
  downloaded:
    max_unused_for: 168m  # 24 x 7
    retry_misses_after: 1m
    retry_malformed_after: 24m
  derived:
    max_unused_for: 168m
    retry_misses_after: 1m
    retry_malformed_after: 24m

Modified the source code of the stress tool:
- Disabled Sentry.
- Enabled statsd metrics for Symbolicator.
- Switched the global allocator to jemalloc.
Ran the workloads for 10 minutes (several times) on macOS, and got stable memory usage.

I will continue using/modifying the stress tool to see if I can get a stalled cache maintenance task processing.

tatsuya6502 · 2023-10-01T08:40:47Z

Action Items for reproducing the problem:

Try to trigger the problem:

by inserting more entries than the cache capacity.
- so that insertions and evictions will keep happening.
by randomly dropping requests to Symbolicator?
- But there will be very small chances to interrupt a maintenance task
  - because Symbolicator does not set eviction listeners to the most of the caches, and even though when set, that eviction listener does not seem slow.
- There are not many await points in the maintenance task unless eviction listener is set.
- So, this might not relate to the problem?

Action Items

Write a script to generate more debug info files and Minidumps.
Modify Symbolicator config to reduce the max capacities of in-memory caches.
- e.g. symbolicator_service::config::InMemoryCacheConfig
(low priority) Modify moka to inject random delays to the maintenance tasks.
- so that dropping requests to Symbolicator will have higher chances to interrupt a maintenance task.
- Implementation note: Do not use async stuff for delay as it will add more await points, which can alter the async cancellation behavior.
  - Blocking stuff like std::thread::sleep will be a good candidate.
(low priority) Modify the stress tool to randomly drop requests to Symbolicator.

Track some metrics:

Action Items

Modify moka to expose the following per-cache metrics.
- The number of entries in the internal hash table.
- Counters about the maintenance tasks:
  - Initiation count
  - Retry count
  - Completion count
- The number of items in the interrupted ops channel.
Modify Symbolicator to report above metrics via statsd protocol.

tatsuya6502 · 2023-10-01T09:36:57Z

I presume the memory leak occurred due to stalled cache maintenance task processing in future::Cache. But I have not figured out why it was stalled.

To ensure this will not happen, I spent hours on running mokabench, moka’s own stress test tool, during the beta period of v0.12.0-beta.X.

For example,

$ ./target/release/mokabench --num-clients 16,32 --ttl 3 --tti 1 \
    --per-key-expiration --invalidate --insert-once --entry-api \
    --eviction-listener immediate

and confirmed that v0.12.0-beta.3 with #323 never got the maintenance tasks to stall. I ran it on both Linux x86_64 and macOS arm64.

I also ran an integration test of Materialize database with moka v0.12 for ~67 hours so far, and it never failed (Linux x86_64).

My fork of Materialize with moka v0.12.0 future::Cache:
- https://gitlab.com/moka-labs/moka-gh279-materialize-memory-corruption/materialize
- (moka-v0.12-async branch)
Materialize's Zippy w/ user tables test:
- Memory corruption observed when using Moka v0.9.6 #281 (comment)

Some thoughts:

The stall seems never happen in normal cache operations.
- The only thing that is not tested by above tests will be async cancellations.
But given that there will be a very small chance (timing) that async cancellation would kill maintenance task processing and break it in some way,
- the root cause of this memory leak might not be related to async cancellations/stall?
It will be worth to rerun these mokabench/Materialize tests to see any sign of memory leaks.
- Updating the metrics branch of mokabench for moka v0.12.0 will be a good start:
  - https://github.com/moka-rs/mokabench/blob/metrics/src/metrics_exporter/statsd_exporter.rs#L136
  - EDIT: Created Add metrics exporter (statsd) for tracking memory usage mokabench#14.

tatsuya6502 · 2023-10-01T17:10:36Z

It will be worth to rerun these mokabench/Materialize tests to see any sign of memory leaks.

Updating the metrics branch of mokabench for moka v0.12.0 will be a good start:

I updated mokabench: moka-rs/mokabench#14

I will run more test tomorrow, but for now, I ran the tests only for moka::future of v0.12.0. (I commented out the moka::sync tests in src/main.rs)

I ran the tests with the following command to mimic the InMemoryCache in Symbolicator:

$ cargo run --release --features metrics -- --num-clients 8 --repeat 20 \
    --ttl 9 --per-key-expiration --size-aware --insert-once --entry-api

This plot shows the allocated memory from the global allocator (jemalloc) while running test with three different cache sizes.

There were between 112 and 280 million inserts to each cache in about 6 minutes, and the memory usage was stable during the tests. (The left one was slightly increasing. I will run the same test on v0.11.3 to see if there is any difference)

Cache	Max Capacity	Clients	Inserts	Reads	Hit Ratio	Duration Secs
Moka Async Cache	3,276,800,000	8	280,850,594	298,321,855	15.857	325.566
Moka Async Cache	13,107,200,000	8	185,265,682	298,321,855	47.897	388.731
Moka Async Cache	26,214,400,000	8	112,095,172	298,321,855	72.425	344.849

tatsuya6502 · 2023-10-02T01:35:53Z

(The left one was slightly increasing. I will run the same test on v0.11.3 to see if there is any difference)

v0.11.3 does not have such a trend. Perhaps this is the memory leak you saw? I will look closely into this trend in v0.12.0.

v0.11.3 with Hit Ratio ~15%

Above: Allocated MiB
Below: Resident MiB

Cache	Max Capacity	Clients	Inserts	Reads	Hit Ratio	Duration Secs
Moka Async Cache	3,276,800,000	8	281,060,924	298,321,855	15.786	298.770

v0.12.0 with Hit Ratio ~15%

Above: Allocated MiB
Below: Resident MiB

The scales of both X and Y axes are different between v0.11.3 and v0.12.0 plots.

Resident MiB of v0.12.0 was almost 3 times larger than v0.11.3 at the peak.
Allocate MiB increased ~9% (from 84.23 MiB to 91.56 MiB) after 280 millions inserts.

Swatinem · 2023-10-02T09:26:52Z

Wow, that is a very thorough analysis so far, thank you so much for putting in all that effort.

For us, the memory leak manifested like this:

The canary server was running for a little bit over an hour, and was steadily increasing / leaking memory, at what looks like a rate of about ~240M / minute.

I have looked at all the other metrics, and the hit/miss rates of all the various caches look completely normal.
We have quite a bunch of different caches, the two with the highest access rate are:

object_meta cache, which gets 4-5k requests per second with a hit rate of ~96%
downloader query cache, which gets 1-2k requests per second with a hit rate of ~92-98%
all other caches get lower request numbers, ranging in the hundreds or tens per second range.

What makes me think is the fact that only our native canary was leaking memory, but not the JS one.
Thinking about the high level difference between those two, what comes to mind is that the native processing has some nested cache accesses:

minidump module cache (per processing task, unlimited capacity, no TTL)
- downloader query cache (global, limited, TTL)
- object meta cache (global, limited, per-item TTL)
- CFI cache (global, limited, per-item TTL)
  - object cache (global, limited, per-item TTL)

If I am tracking all of these correctly, we are nesting things up to 3 levels deep in that case.

On the other hand though, I don’t think the JS processing is nesting any of its caches.

I will try to create a more minimal testcase in moka that is using some nested caches with various kinds of delays and timeouts, lets see if maybe that has something to do with this leak.

Swatinem · 2023-10-02T10:56:05Z

I believe I might be onto something with the reentrant cache theory. I modified the async_example to include some nested caches, using insert_with and mixing that with some randomized delays and timeouts:

use std::sync::Arc;
use std::time::Duration;

// Use the asynchronous cache.
use moka::future::Cache;
use rand::Rng;
use tokio::sync::Semaphore;

const NUM_TASKS: usize = 8;
const NUM_KEYS_PER_TASK: usize = 4;

fn value(n: usize) -> String {
    format!("value {}", n)
}

#[derive(Clone)]
struct GlobalCache(Arc<Cache<u8, Vec<u8>>>);

async fn handle_a(a: &GlobalCache, b: &GlobalCache) {
    let fut = async {
        let init = Box::pin(async { handle_b(b).await });
        let key = rand::random();
        let _entry = a.0.entry(key).or_insert_with(init).await;
    };
    let timeout = rand::thread_rng().gen_range(0..25);
    let _res = tokio::time::timeout(Duration::from_millis(timeout), fut).await;
}

async fn handle_b(b: &GlobalCache) -> Vec<u8> {
    let init = Box::pin(async {
        let delay = rand::thread_rng().gen_range(0..50);
        tokio::time::sleep(Duration::from_millis(delay)).await;
        Vec::with_capacity(1024)
    });
    let key = rand::random();
    let entry = b.0.entry(key).or_insert_with(init).await;
    entry.value().clone()
}

async fn run_once(a: GlobalCache, b: GlobalCache) {
    // Create a cache that can store up to 10,000 entries.
    let cache = Cache::new(10_000);

    // Spawn async tasks and write to and read from the cache.
    let tasks: Vec<_> = (0..NUM_TASKS)
        .map(|i| {
            // To share the same cache across the async tasks, clone it.
            // This is a cheap operation.
            let my_cache = cache.clone();
            let start = i * NUM_KEYS_PER_TASK;
            let end = (i + 1) * NUM_KEYS_PER_TASK;

            let a = a.clone();
            let b = b.clone();
            tokio::spawn(async move {
                // Insert 64 entries. (NUM_KEYS_PER_TASK = 64)
                for key in start..end {
                    my_cache
                        .entry(key)
                        .or_insert_with(async {
                            handle_a(&a, &b).await;
                            value(key)
                        })
                        .await;
                    // get() returns Option<String>, a clone of the stored value.
                    assert_eq!(my_cache.get(&key).await, Some(value(key)));
                }

                // Invalidate every 4 element of the inserted entries.
                for key in (start..end).step_by(4) {
                    // invalidate() is an async method, so await it.
                    my_cache.invalidate(&key).await;
                }
            })
        })
        .collect();

    // Wait for all tasks to complete.
    futures_util::future::join_all(tasks).await;

    // Verify the result.
    for key in 0..(NUM_TASKS * NUM_KEYS_PER_TASK) {
        if key % 4 == 0 {
            assert_eq!(cache.get(&key).await, None);
        } else {
            assert_eq!(cache.get(&key).await, Some(value(key)));
        }
    }
    drop(cache)
}

#[tokio::main]
async fn main() {
    let a = GlobalCache(Arc::new(Cache::new(100)));
    let b = GlobalCache(Arc::new(Cache::new(100)));

    let semaphore = Arc::new(Semaphore::new(1_000));
    loop {
        let guard = semaphore.clone().acquire_owned().await.unwrap();
        let a = a.clone();
        let b = b.clone();
        tokio::spawn(async move {
            run_once(a, b).await;
            drop(guard);
        });
    }
}

Running this with macOs Instruments, this seems to leak about ~1M per minute.

According to that, the leaks seem to be related to async RwLock Guards created in ValueInitializer::try_init_or_read.
For some reason, this waiter is being kept alive:

https://github.com/Swatinem/moka/blob/142783ede0c0f05f988e7a572806890604b89ee7/src/sync/value_initializer.rs#L74

As well as other allocations related to it, like the owned key:

https://github.com/Swatinem/moka/blob/0e07ebac73cb69e6446b65dc4ad5bc306591297b/src/sync/entry_selector.rs#L146

I have so far only run this for ~3 minutes, and its quite possible that these things are being freed eventually given more time, but it does not look like it on first glance.
I hope this repro helps you nail down the source of the leaks somehow.

Swatinem · 2023-10-02T10:59:12Z

Also the interesting thing here is that it does not seem like the actual cache values are being leaked, just these RwLock-related allocations.
Unless of course I am reading the instruments output wrong, which might as well be the case as I don’t have a lot of experience with using it. 🤷🏻‍♂️

tatsuya6502 · 2023-10-02T13:05:19Z

According to that, the leaks seem to be related to async RwLock Guards created in ValueInitializer::try_init_or_read.
For some reason, this waiter is being kept alive:

Thank you. It was very helpful! I found the source of the bug and fixed it by this commit fef6a77ed5 on a topic branch. I am going to open a pull request for it.

This bug was introduced by this PR for v0.12.0, which made ValueInitializer::try_init_or_read to use a wrong hash value for key when inserting, getting and removing to/from the waiter map.

When the map gets more than half full with live and deleted keys (or number of deleted keys exceeds a threshold), it will reallocate a memory region and copies the live key-values to them, by using the correct hash values for keys. After this happens, try_init_or_read (who uses wrong hash values for keys) cannot get/remove the live keys anymore, causing the memory leak.

tatsuya6502 · 2023-10-04T00:32:17Z

I merged the following pull requests into the main branch and published v0.12.1 to crates.io:

I believe the following was fixed by #330.

According to that, the leaks seem to be related to async RwLock Guards created in ValueInitializer::try_init_or_read.
For some reason, this waiter is being kept alive:

I have not ran the reproducer in #329 (comment) with macOS Instruments yet, but I ran mokabench on future::Cache and sync::Cache on v0.12.1 and saw no sign of memory leaks. I will run the reproducer later today. (I have not used Instruments for a while, and trying to recall how to do some tricks)

This does a `cargo update`, and also updates the `moka` dependency again. It was downgraded in #1310 because of a memory leak, which should now be solved. See moka-rs/moka#329.

Swatinem · 2023-10-04T15:04:46Z

We have updated to 0.12.1 and it has been running for over an hour without any sign of memory leaks, so I will call this fixed.

Thanks so much for the help here.

tatsuya6502 self-assigned this Sep 28, 2023

tatsuya6502 mentioned this issue Sep 28, 2023

Update dependency moka to v0.12.1 rust-lang/crates.io#7075

Merged

tatsuya6502 added the bug Something isn't working label Oct 2, 2023

tatsuya6502 added this to the v0.12.1 milestone Oct 2, 2023

tatsuya6502 mentioned this issue Oct 2, 2023

Fix memory leak in get_with and friend methods of future::Cache #330

Merged

tatsuya6502 closed this as completed in #330 Oct 3, 2023

tatsuya6502 reopened this Oct 3, 2023

Swatinem mentioned this issue Oct 4, 2023

Update moka and other dependencies getsentry/symbolicator#1317

Merged

Swatinem closed this as completed Oct 4, 2023

tatsuya6502 mentioned this issue Jan 9, 2024

Compute API - Update the tests to ensure the internal waiter map is empty at the end #375

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak in moka 0.12 #329

Memory leak in moka 0.12 #329

Swatinem commented Sep 28, 2023

tatsuya6502 commented Sep 28, 2023

tatsuya6502 commented Oct 1, 2023 •

edited

Loading

tatsuya6502 commented Oct 1, 2023 •

edited

Loading

tatsuya6502 commented Oct 1, 2023 •

edited

Loading

tatsuya6502 commented Oct 1, 2023 •

edited

Loading

tatsuya6502 commented Oct 1, 2023

tatsuya6502 commented Oct 2, 2023

Swatinem commented Oct 2, 2023

Swatinem commented Oct 2, 2023

Swatinem commented Oct 2, 2023

tatsuya6502 commented Oct 2, 2023 •

edited

Loading

tatsuya6502 commented Oct 4, 2023

Swatinem commented Oct 4, 2023

Memory leak in moka 0.12 #329

Memory leak in moka 0.12 #329

Comments

Swatinem commented Sep 28, 2023

tatsuya6502 commented Sep 28, 2023

tatsuya6502 commented Oct 1, 2023 • edited Loading

tatsuya6502 commented Oct 1, 2023 • edited Loading

tatsuya6502 commented Oct 1, 2023 • edited Loading

Try to trigger the problem:

Action Items

Track some metrics:

Action Items

tatsuya6502 commented Oct 1, 2023 • edited Loading

tatsuya6502 commented Oct 1, 2023

tatsuya6502 commented Oct 2, 2023

v0.11.3 with Hit Ratio ~15%

v0.12.0 with Hit Ratio ~15%

Swatinem commented Oct 2, 2023

Swatinem commented Oct 2, 2023

Swatinem commented Oct 2, 2023

tatsuya6502 commented Oct 2, 2023 • edited Loading

tatsuya6502 commented Oct 4, 2023

Swatinem commented Oct 4, 2023

tatsuya6502 commented Oct 1, 2023 •

edited

Loading

tatsuya6502 commented Oct 1, 2023 •

edited

Loading

tatsuya6502 commented Oct 1, 2023 •

edited

Loading

tatsuya6502 commented Oct 1, 2023 •

edited

Loading

tatsuya6502 commented Oct 2, 2023 •

edited

Loading